Commit Graph

62 Commits

Author SHA1 Message Date
05ac7fcd4a [Function] Add BE udf bitmap_xor (#5098)
this function will return the xor result of inputs two bitmap .
2021-01-04 09:27:46 +08:00
a8b8c4760c [Doc] Fix some spelling mistakes and default value mistakes in document (#5180) 2021-01-03 15:45:56 +08:00
279ae1cb75 Add fuzzy_parse option to speed up json import (#5114)
add a flag of fuzzy_parse, if the json file all object keys are the same and has same order, we only need to parse the first row, and then use index instead key to parse value
2020-12-25 09:19:42 +08:00
6673306fda [DOC] fix toSql of ShowPartitionsStmt (#5070) 2020-12-19 11:18:00 +08:00
74bfd69595 [Bug] Forbidden creating table with dynamic partition when FE.config dynamic_partition_enable=false (#5043)
- There is a fe configuration called dynamic_partition_enable
    which controls the opening and closing of the dynamic partition function.
  When this configuration is false, it means that all tables do not support dynamic partitioning.

- But when the user tried to create the dynamic partition table, Doris did not detect this parameter.
  This will cause the user can normally create a dynamic partition table,
    but in fact Doris cannot create a partition for this table.

- This pr detect this config when building the table.
  The dynamic partition table can be created only when the dynamic_partition_enable configuration is true.
  If the configuration is false, the command to create a dynamic partition table will directly report an error.
2020-12-16 23:44:20 +08:00
650536d53e [Feature] Add Topn udaf (#4803)
For #4674 
This is a udaf for approximate topn using Space-Saving algorithm.  At present, we can only calculate
the frequent items and their frequencies in a certain column, based on which we can implement similar
topN functions supported by Kylin in the future. 

I have also added a test to calculate the accuracy of this algorithm. The following is a rough running result.
The total amount of data is 1 million lines and follows the Zipfian distribution, where Element Cardinality
represents the data cardinality, 20X, 50X.. The value representing space_expand_rate is 20,50, which is
used to set the counter number in the space-saving algorithm

```
zf exponent = 0.5
Element cardinality	        20X        50X          100X
               1000		100%	   100%         100%
               10000		100%	   100%		100%
	       100000		100%	   100%		100%
	       500000		 94%	    98%		 99%

zf exponent = 0.6,1
Element cardinality	        20X        50X          100X
		1000		100%	   100%         100%
		10000		100%	   100%		100%
		100000		100%	   100%		100%
		500000		100%	   100%		100%

```
2020-12-16 21:58:34 +08:00
c5f780305e [Repair] Add an option whether to allow the partition column to be NULL (#5013) 2020-12-05 14:58:32 +08:00
55ce88da34 [Schema change] Support More column type in schema change (#4938)
1. Support modify column type CHAR to TINYINT/SMALLINT/INT/BIGINT/LARGEINT/FLOAT/DOUBLE/DATE
and TINYINT/SMALLINT/INT/BIGINT/LARGEINT/FLOAT/DOUBLE convert to a wider range of numeric types (#4937)

2. Use template to refactor code of types.h and schema_change.cpp to delete redundant code.
2020-11-28 09:52:28 +08:00
234e9b532f [Doc] Fiexed example content in bitmap_union.md (#4919) 2020-11-20 09:49:31 +08:00
d6497fedc4 [Config] Change config name 'streaming_load_max_batch_size_mb' to 'streaming_load_json_max_mb' (#4791)
The name and another config name are close to each other and are indistinguishable.
So this pr modify the name.
The document description has also been changed
2020-10-28 23:27:33 +08:00
a95ce69c0d [Doc] Bug fix that help commend not work (#4760)
There are 2 docs with same name "bitmap", which cause error
when building help system.
2020-10-20 09:47:51 +08:00
a605b3160f [Docs] update data types doc and fix some typo (#4712)
* update data types doc and fix some typo

* update data types doc and fix some typo

Co-authored-by: lixueyan07 <lixueyan07@meituan.com>
2020-10-14 09:34:58 +08:00
1f3a430b40 fix docs typo (#4726) 2020-10-14 09:28:07 +08:00
28f4e922a7 [CREATE TABLE]Support new syntax CREATE TABLE LIKE to clone an existe… (#4705)
Support new synatx CREATE TABLE [IF NOT EXISTS] [db_name].table_name AS [db_name2].table_name2;
to create a new table from existed table with same table schema.
ISSUE: #4355
2020-10-10 21:16:53 +08:00
f3cdf167d1 [Feature] Add time_round builtin functions (#4640)
#4619 
Add time_round functions that provides `time_floor` & `time_ceil` at each time unit.

Fix two related bugs.
- #4618 
- Fix `struct TimeInterval` to use `int64_t` instead of `int32_t`, in case when the second diff overflow
2020-10-09 16:05:51 +08:00
0475aa9b93 [Bug]Fix delete on clause may not work in routineLoad (#4683)
fix delete on may not work in some cases, this is describe in #4682
2020-09-30 09:56:19 +08:00
4e3b576fd3 [NewFeature] Support ExternalCatalogResource to simplify external table manage operation. (#4559)
1. Add new Resource ExternalCatalogResource

```
create external resource "odbc"
properties
(
   "type" = "external_catalog", (required)
   "user" = "test",(required)
   "password" = "", (required)
   "host" = "192.168.0.1", (required)
   "port" = "8086", (required)
   "type" = "oracle" , (optinal,only odbc exteranl table use)
   "driver" = "Oracle 19 ODBC driver" (optional,only odbc exteranl table use)
)
```

2.After create ExternalCatalogResource, can create external table like:

```
CREATE TABLE `test_mysql` (
  `k1` tinyint(4) NOT NULL,
  `k2` smallint(6) NOT NULL,
  `k3` int(11) NOT NULL,
  `k4` bigint(20) NOT NULL,
  `k5` decimal(9,3) NOT NULL,
  `k6` char(5) NOT NULL,
  `k10` date DEFAULT NULL,
  `k11` datetime DEFAULT NULL,
  `k7` varchar(20) NOT NULL,
  `k8` double NOT NULL,
  `k9` float NOT NULL
) ENGINE=MYSQL
PROPERTIES (
"external_catalog_resource" = "odbc",
"database" = "test",
"table" = "test"
);
```
2020-09-25 10:20:33 +08:00
fd37c4f352 [Document] Fix some typo in alter table document
Fix some typos in document that confusing users.
2020-09-22 16:23:23 +08:00
a1f52ec2ab [SQL] Support where, limit, order clause in show resourcestmt. (#4502)
* [SQL] Support where, limit, order clause in show resourcestmt.

Grammar

    SHOW RESOURCES
    [
        WHERE
        [NAME [ = "your_resource_name" | LIKE "name_matcher"]]
        [RESOURCETYPE = ["SPARK"]]
    ]
    [ORDER BY ...]
    [LIMIT limit][OFFSET offset];

issue #4501
2020-09-16 17:57:48 +08:00
95111f9228 [Feature] Support alter table syntax for sequence column (#4582)
* enable sequence col

Co-authored-by: yangwenbo6 <yangwenbo3@jd.com>
2020-09-15 10:19:38 +08:00
068707484d Support sequence column for UNIQUE_KEYS Table (#4256)
* add sequence  col

Co-authored-by: yangwenbo6 <yangwenbo3@jd.com>
2020-09-04 10:10:17 +08:00
5166a6c6bc [Bug] function str_to_date()'s behavior on BE and FE is inconsistent (#4495)
Main CL:
1. Copy the code from BE to implement the `str_to_date()` function in FE. 
2. `str_to_date("2020-08-08", "%Y-%m-%d %H:%i:%s")` will return `2020-08-08 00:00:00` instead of `2020-08-08`.
2020-09-03 17:16:19 +08:00
wyb
ffe696d17c [Doc] Add spark load sql statement doc and update manual (#4463)
1. add sql statement in dml
2. update spark load manual
2020-08-30 21:09:17 +08:00
174c9f89ea [DOCS] Add batch delete docs (#4435)
update documents for batch delete #4051
2020-08-28 09:24:07 +08:00
a5d1d010c0 [Doc] Fix typo about plugin content (#4416) 2020-08-26 10:48:07 +08:00
bfb39a2826 [SQL][Function] Add replace() function (#4347)
replace is an user defined function, which is to replace all old substrings with a new substring in a string, as follow:
mysql> select replace("http://www.baidu.com:9090", "9090", "");
+------------------------------------------------------+
| replace('http://www.baidu.com:9090', '9090', '') |
+------------------------------------------------------+
| http://www.baidu.com: |
+------------------------------------------------------+
2020-08-20 09:28:53 +08:00
26fe510011 [Doc] modify the document error (#4357) 2020-08-17 23:06:23 +08:00
1d9b3aeee7 [Doc] Repair document format (#4336)
The error format '##keyword' in a lot of docs. This pr is to repair document format. #4335
2020-08-13 23:39:41 +08:00
eefad13107 [Feature] Support InPredicate in delete statement (#4006)
This PR is to add inPredicate support to delete statement,
and add max_allowed_in_element_num_of_delete variable to
limit element num of InPredicate in delete statement.
2020-08-06 23:19:40 +08:00
5ba4b024e7 [Docs] Add Materialized view manual (#4229)
Add usage manual of materialized view in Chinese and English
2020-08-06 23:18:06 +08:00
237c0807a4 [RoutineLoad] Support modify routine load job (#4158)
Support ALTER ROUTINE LOAD JOB stmt, for example:

```
alter routine load db1.label1
properties
(
"desired_concurrent_number"="3",
"max_batch_interval" = "5",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"strict_mode" = "false",
"timezone" = "+08:00"
)
```

Details can be found in `alter-routine-load.md`
2020-08-06 23:11:02 +08:00
116d7ffa3c [SQL][Function] Add approx_count_distinct() function (#4221)
Add approx_count_distinct() function to replace the ndv() function
2020-08-01 17:54:19 +08:00
fdcc223ad2 [Bug][Json] Refactor the json load logic to fix some bug
1. Add `json_root` for nest json data.
2. Remove `_jmap` to make the logic reasonable.
2020-07-30 10:36:34 +08:00
237271c764 [Bug] Fix fe meta version problem, make drop meta check code easy to read and add doc content for drop meta check (#4205)
This PR is mainly do three things:
1. Fix fe meta version bug introduced by #4029 , when fix conflict with #4086 
2. Make drop check code easy to read
3. Add doc content for drop meta check
2020-07-30 09:54:20 +08:00
1b3af783e6 [Plugin] Add properties grammar in InstallPluginStmt (#4173)
This PR is to support grammar like the following: INSTALL PLUGIN FROM [source] [PROPERTIES("KEY"="VALUE", ...)]
user can set md5sum="xxxxxxx", so we don't need to provide a md5 uri.
2020-07-29 15:02:31 +08:00
d7893f0fa7 [Bug]Fix some schema change not work right (#4009)
[Bug]Fix some schema change not work right
This CL mainly fix some schema change to varchar type not work right
because forget to logic check && Add ConvertTypeResolver to add
supported convert type in order to avoid forget logic check
2020-07-11 10:18:29 +08:00
d2ab38a5e0 [Feature] Batch update partition's property in one command (#3981)
Support following command.
```
alter table tbl_name modify partition (p1, p2, p3) set ("replication_num" = "3");
```
2020-07-09 21:48:43 +08:00
b7051d0971 [Config]Make it easier for users to find configuration items needed (#3957)
This PR is to make config items ordered by key and support like predicate for admin show config stmt
2020-07-07 23:12:21 +08:00
c3d9feed75 [Load][Json] Refactor json load logic to make it more reasonable (#4020)
This CL mainly changes:

1. Reorganized the code logic to limit the supported json format to two, and the import behavior is more consistent.
2. Modified the statistical behavior of the number of error rows when loading in json format, so that the error rows can be counted correctly.
3. See `load-json-format.md` to get details of loading json format.
2020-07-07 23:07:28 +08:00
d396408861 Correct typos (#4024) 2020-07-07 13:33:46 +08:00
af1beb6ce4 [Enhance] Add prepare phase for some timestamp functions (#3947)
Fix: #3946 

CL:
1. Add prepare phase for `from_unixtime()`, `date_format()` and `convert_tz()` functions, to handle the format string once for all.
2. Find the cctz timezone when init `runtime state`, so that don't need to find timezone for each rows.
3. Add constant rewrite rule for `utc_timestamp()`
4. Add doc for `to_date()`
5. Comment out the `push_handler_test`, it can not run in DEBUG mode, will be fixed later.
6. Remove `timezone_db.h/cpp` and add `timezone_utils.h/cpp`

The performance shows bellow:

11,000,000 rows

SQL1: `select count(from_unixtime(k1)) from tbl1;`
Before: 8.85s
After: 2.85s

SQL2: `select count(from_unixtime(k1, '%Y-%m-%d %H:%i:%s')) from tbl1 limit 1;`
Before: 10.73s
After: 4.85s

The date string format seems still slow, we may need a further enhancement about it.
2020-06-29 19:15:09 +08:00
b2b9e22b24 [CreateTable] Check backend disk has available capacity by storage medium before create table (#3519)
Currently we choose BE random without check disk is available, 
the create table will failed until create tablet task is sent to BE
and BE will check is there has available capacity to create tablet.
So check backend disk available by storage medium will reduce unnecessary RPC call.
2020-06-28 09:36:31 +08:00
feec4ee5bf [UDF] Support external users to contribute udf (#3760) 2020-06-23 13:43:08 +08:00
2f99f632e8 Modify docs format (#3896) 2020-06-18 09:43:28 +08:00
b3811f910f [Spark load][Fe 4/6] Add hive external table and update hive table syntax in loadstmt (#3819)
* Add hive external table and update hive table syntax in loadstmt

* Move check hive table from SelectStmt to FromClause and update doc

* Update hive external table en sql reference
2020-06-13 16:28:24 +08:00
wyb
44dbdf4986 Update hive external table en sql reference 2020-06-12 21:38:05 +08:00
wyb
7f7ee63723 Move check hive table from SelectStmt to FromClause and update doc 2020-06-11 16:53:41 +08:00
4adc9d45c2 [Doc] Update ALTER TABLE.md 2020-06-10 22:58:29 +08:00
wyb
4c2e73a5fe Add hive external table and update hive table syntax in loadstmt 2020-06-10 16:32:32 +08:00
a7bf006b51 Use BackendStatus to show BE's infomation in show backends; (#3713)
The infomation is displayed in JSON format.For example:
{"lastTabletReportTime":"2020-05-28 15:29:01"}
2020-06-06 11:37:48 +08:00