Commit Graph

62 Commits

Author SHA1 Message Date
e93a6da0e5 [Doc] correct format errors in English doc (#5321)
Fix some English doc format errors
2021-02-26 11:32:14 +08:00
6ede4c6ec1 [Feature] Support backup,restore,load,export directly connect to s3 (#5399)
* [doris-1008] support backup and restore directly to cloud storage via aws s3 protocol

* Internal][S3DirectAccess] Support backup,restore,load,export directlyconnect to s3
1. Support load and export data from/to s3 directly.
2. Add a config to auto convert broker access to s3 acces when available

Change-Id: Iac96d4b3670776708bc96a119ff491db8cb4cde7

(cherry picked from commit 2f03832ca52221cc7436069b96c45c48c4bc7201)

* [Internal][S3DirectAccess] File path glob compatible with broker

Change-Id: Ie55e07a547aa22c6fa8d432ca926216c10384e68
(cherry picked from commit d4fb25544c0dc06d23e1ada571ec3f8edd4ba56f)

* [internal] [doris-1008] fix log4j class not found

Change-Id: I468176aca0d821383c74ee658d461aba9e7d5be3
(cherry picked from commit 029adaa9d6ded8503acbd6644c1519456f3db232)

* add poms

Co-authored-by: yangzhengguo01 <yangzhengguo01@baidu.com>
2021-02-22 16:07:56 +08:00
b098261253 docs(Doc): correct wrong num in create table help doc (#5365)
Co-authored-by: liuyuan <liuyuan.a@miaozhen.com>
2021-02-20 10:07:48 +08:00
a1808c1a71 [Function] Add BE udf bitmap_not (#5346) (#5357)
this function will return the not result of inputs two bitmap.
2021-02-07 22:39:17 +08:00
780900ac9c [Feature] Support preceding filter original data when loading (#5338)
Support conditional filtering of original data in broker load and routine load
eg:

```
LOAD LABEL `label1`
(
DATA INFILE ('bos://cmy-repo/1.csv')
INTO TABLE tbl2
COLUMNS TERMINATED BY '\t'
(event_day, product_id, ocpc_stage, user_id)
SET (
	ocpc_stage = ocpc_stage + 100
)
PRECEDING FILTER user_id = 1381035
WHERE ocpc_stage > 30
)
...
```
2021-02-07 22:37:48 +08:00
b315244ba7 [Doc] Fix the error description for the number of bytes of double type. (#5273)
Modify the error description of double type: 12 bytes is modified to 8 bytes
2021-02-01 00:11:14 +08:00
de57667d6d [Delete] Support delete with multi partitions (#5252)
Support delete statement like:
1. delete from table partitions(p1, p2) where xxx;  // apply to p1, p2
2. delete from table where xxx;     // apply to all partitions

Also remove code about the deprecated sync/async delete job.

This CL changes FE meta version to 94
2021-01-30 20:33:34 +08:00
ca10205137 [Function] Support show create function statement (#5197)
* [Function]Support show create function stmt

Co-authored-by: caiconghui [蔡聪辉] <caiconghui@xiaomi.com>
2021-01-28 10:52:37 +08:00
83b7a23d5c fix alter routine load not work (#5257) 2021-01-20 10:52:02 +08:00
05ac7fcd4a [Function] Add BE udf bitmap_xor (#5098)
this function will return the xor result of inputs two bitmap .
2021-01-04 09:27:46 +08:00
279ae1cb75 Add fuzzy_parse option to speed up json import (#5114)
add a flag of fuzzy_parse, if the json file all object keys are the same and has same order, we only need to parse the first row, and then use index instead key to parse value
2020-12-25 09:19:42 +08:00
74bfd69595 [Bug] Forbidden creating table with dynamic partition when FE.config dynamic_partition_enable=false (#5043)
- There is a fe configuration called dynamic_partition_enable
    which controls the opening and closing of the dynamic partition function.
  When this configuration is false, it means that all tables do not support dynamic partitioning.

- But when the user tried to create the dynamic partition table, Doris did not detect this parameter.
  This will cause the user can normally create a dynamic partition table,
    but in fact Doris cannot create a partition for this table.

- This pr detect this config when building the table.
  The dynamic partition table can be created only when the dynamic_partition_enable configuration is true.
  If the configuration is false, the command to create a dynamic partition table will directly report an error.
2020-12-16 23:44:20 +08:00
650536d53e [Feature] Add Topn udaf (#4803)
For #4674 
This is a udaf for approximate topn using Space-Saving algorithm.  At present, we can only calculate
the frequent items and their frequencies in a certain column, based on which we can implement similar
topN functions supported by Kylin in the future. 

I have also added a test to calculate the accuracy of this algorithm. The following is a rough running result.
The total amount of data is 1 million lines and follows the Zipfian distribution, where Element Cardinality
represents the data cardinality, 20X, 50X.. The value representing space_expand_rate is 20,50, which is
used to set the counter number in the space-saving algorithm

```
zf exponent = 0.5
Element cardinality	        20X        50X          100X
               1000		100%	   100%         100%
               10000		100%	   100%		100%
	       100000		100%	   100%		100%
	       500000		 94%	    98%		 99%

zf exponent = 0.6,1
Element cardinality	        20X        50X          100X
		1000		100%	   100%         100%
		10000		100%	   100%		100%
		100000		100%	   100%		100%
		500000		100%	   100%		100%

```
2020-12-16 21:58:34 +08:00
bc063ebce2 fix typo in docs (#5046) 2020-12-10 15:10:22 +08:00
55ce88da34 [Schema change] Support More column type in schema change (#4938)
1. Support modify column type CHAR to TINYINT/SMALLINT/INT/BIGINT/LARGEINT/FLOAT/DOUBLE/DATE
and TINYINT/SMALLINT/INT/BIGINT/LARGEINT/FLOAT/DOUBLE convert to a wider range of numeric types (#4937)

2. Use template to refactor code of types.h and schema_change.cpp to delete redundant code.
2020-11-28 09:52:28 +08:00
d6497fedc4 [Config] Change config name 'streaming_load_max_batch_size_mb' to 'streaming_load_json_max_mb' (#4791)
The name and another config name are close to each other and are indistinguishable.
So this pr modify the name.
The document description has also been changed
2020-10-28 23:27:33 +08:00
a95ce69c0d [Doc] Bug fix that help commend not work (#4760)
There are 2 docs with same name "bitmap", which cause error
when building help system.
2020-10-20 09:47:51 +08:00
a605b3160f [Docs] update data types doc and fix some typo (#4712)
* update data types doc and fix some typo

* update data types doc and fix some typo

Co-authored-by: lixueyan07 <lixueyan07@meituan.com>
2020-10-14 09:34:58 +08:00
751aa05cc0 fix docs typo (#4725) 2020-10-14 09:27:50 +08:00
dec91a3d43 fix docs typo (#4723) 2020-10-14 09:27:31 +08:00
3f55c1425c fix docs typo (#4722) 2020-10-14 09:27:12 +08:00
28f4e922a7 [CREATE TABLE]Support new syntax CREATE TABLE LIKE to clone an existe… (#4705)
Support new synatx CREATE TABLE [IF NOT EXISTS] [db_name].table_name AS [db_name2].table_name2;
to create a new table from existed table with same table schema.
ISSUE: #4355
2020-10-10 21:16:53 +08:00
f3cdf167d1 [Feature] Add time_round builtin functions (#4640)
#4619 
Add time_round functions that provides `time_floor` & `time_ceil` at each time unit.

Fix two related bugs.
- #4618 
- Fix `struct TimeInterval` to use `int64_t` instead of `int32_t`, in case when the second diff overflow
2020-10-09 16:05:51 +08:00
0475aa9b93 [Bug]Fix delete on clause may not work in routineLoad (#4683)
fix delete on may not work in some cases, this is describe in #4682
2020-09-30 09:56:19 +08:00
4e3b576fd3 [NewFeature] Support ExternalCatalogResource to simplify external table manage operation. (#4559)
1. Add new Resource ExternalCatalogResource

```
create external resource "odbc"
properties
(
   "type" = "external_catalog", (required)
   "user" = "test",(required)
   "password" = "", (required)
   "host" = "192.168.0.1", (required)
   "port" = "8086", (required)
   "type" = "oracle" , (optinal,only odbc exteranl table use)
   "driver" = "Oracle 19 ODBC driver" (optional,only odbc exteranl table use)
)
```

2.After create ExternalCatalogResource, can create external table like:

```
CREATE TABLE `test_mysql` (
  `k1` tinyint(4) NOT NULL,
  `k2` smallint(6) NOT NULL,
  `k3` int(11) NOT NULL,
  `k4` bigint(20) NOT NULL,
  `k5` decimal(9,3) NOT NULL,
  `k6` char(5) NOT NULL,
  `k10` date DEFAULT NULL,
  `k11` datetime DEFAULT NULL,
  `k7` varchar(20) NOT NULL,
  `k8` double NOT NULL,
  `k9` float NOT NULL
) ENGINE=MYSQL
PROPERTIES (
"external_catalog_resource" = "odbc",
"database" = "test",
"table" = "test"
);
```
2020-09-25 10:20:33 +08:00
fd37c4f352 [Document] Fix some typo in alter table document
Fix some typos in document that confusing users.
2020-09-22 16:23:23 +08:00
a1f52ec2ab [SQL] Support where, limit, order clause in show resourcestmt. (#4502)
* [SQL] Support where, limit, order clause in show resourcestmt.

Grammar

    SHOW RESOURCES
    [
        WHERE
        [NAME [ = "your_resource_name" | LIKE "name_matcher"]]
        [RESOURCETYPE = ["SPARK"]]
    ]
    [ORDER BY ...]
    [LIMIT limit][OFFSET offset];

issue #4501
2020-09-16 17:57:48 +08:00
95111f9228 [Feature] Support alter table syntax for sequence column (#4582)
* enable sequence col

Co-authored-by: yangwenbo6 <yangwenbo3@jd.com>
2020-09-15 10:19:38 +08:00
5166a6c6bc [Bug] function str_to_date()'s behavior on BE and FE is inconsistent (#4495)
Main CL:
1. Copy the code from BE to implement the `str_to_date()` function in FE. 
2. `str_to_date("2020-08-08", "%Y-%m-%d %H:%i:%s")` will return `2020-08-08 00:00:00` instead of `2020-08-08`.
2020-09-03 17:16:19 +08:00
0db9194dc0 [Doc] Fix wrong doc name (#4477)
Co-authored-by: morningman <chenmingyu@baidu.com>
2020-08-28 11:56:59 +08:00
174c9f89ea [DOCS] Add batch delete docs (#4435)
update documents for batch delete #4051
2020-08-28 09:24:07 +08:00
a5d1d010c0 [Doc] Fix typo about plugin content (#4416) 2020-08-26 10:48:07 +08:00
bfb39a2826 [SQL][Function] Add replace() function (#4347)
replace is an user defined function, which is to replace all old substrings with a new substring in a string, as follow:
mysql> select replace("http://www.baidu.com:9090", "9090", "");
+------------------------------------------------------+
| replace('http://www.baidu.com:9090', '9090', '') |
+------------------------------------------------------+
| http://www.baidu.com: |
+------------------------------------------------------+
2020-08-20 09:28:53 +08:00
26fe510011 [Doc] modify the document error (#4357) 2020-08-17 23:06:23 +08:00
1d9b3aeee7 [Doc] Repair document format (#4336)
The error format '##keyword' in a lot of docs. This pr is to repair document format. #4335
2020-08-13 23:39:41 +08:00
eefad13107 [Feature] Support InPredicate in delete statement (#4006)
This PR is to add inPredicate support to delete statement,
and add max_allowed_in_element_num_of_delete variable to
limit element num of InPredicate in delete statement.
2020-08-06 23:19:40 +08:00
237c0807a4 [RoutineLoad] Support modify routine load job (#4158)
Support ALTER ROUTINE LOAD JOB stmt, for example:

```
alter routine load db1.label1
properties
(
"desired_concurrent_number"="3",
"max_batch_interval" = "5",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"strict_mode" = "false",
"timezone" = "+08:00"
)
```

Details can be found in `alter-routine-load.md`
2020-08-06 23:11:02 +08:00
116d7ffa3c [SQL][Function] Add approx_count_distinct() function (#4221)
Add approx_count_distinct() function to replace the ndv() function
2020-08-01 17:54:19 +08:00
fdcc223ad2 [Bug][Json] Refactor the json load logic to fix some bug
1. Add `json_root` for nest json data.
2. Remove `_jmap` to make the logic reasonable.
2020-07-30 10:36:34 +08:00
237271c764 [Bug] Fix fe meta version problem, make drop meta check code easy to read and add doc content for drop meta check (#4205)
This PR is mainly do three things:
1. Fix fe meta version bug introduced by #4029 , when fix conflict with #4086 
2. Make drop check code easy to read
3. Add doc content for drop meta check
2020-07-30 09:54:20 +08:00
1b3af783e6 [Plugin] Add properties grammar in InstallPluginStmt (#4173)
This PR is to support grammar like the following: INSTALL PLUGIN FROM [source] [PROPERTIES("KEY"="VALUE", ...)]
user can set md5sum="xxxxxxx", so we don't need to provide a md5 uri.
2020-07-29 15:02:31 +08:00
d7893f0fa7 [Bug]Fix some schema change not work right (#4009)
[Bug]Fix some schema change not work right
This CL mainly fix some schema change to varchar type not work right
because forget to logic check && Add ConvertTypeResolver to add
supported convert type in order to avoid forget logic check
2020-07-11 10:18:29 +08:00
d2ab38a5e0 [Feature] Batch update partition's property in one command (#3981)
Support following command.
```
alter table tbl_name modify partition (p1, p2, p3) set ("replication_num" = "3");
```
2020-07-09 21:48:43 +08:00
b7051d0971 [Config]Make it easier for users to find configuration items needed (#3957)
This PR is to make config items ordered by key and support like predicate for admin show config stmt
2020-07-07 23:12:21 +08:00
c3d9feed75 [Load][Json] Refactor json load logic to make it more reasonable (#4020)
This CL mainly changes:

1. Reorganized the code logic to limit the supported json format to two, and the import behavior is more consistent.
2. Modified the statistical behavior of the number of error rows when loading in json format, so that the error rows can be counted correctly.
3. See `load-json-format.md` to get details of loading json format.
2020-07-07 23:07:28 +08:00
af1beb6ce4 [Enhance] Add prepare phase for some timestamp functions (#3947)
Fix: #3946 

CL:
1. Add prepare phase for `from_unixtime()`, `date_format()` and `convert_tz()` functions, to handle the format string once for all.
2. Find the cctz timezone when init `runtime state`, so that don't need to find timezone for each rows.
3. Add constant rewrite rule for `utc_timestamp()`
4. Add doc for `to_date()`
5. Comment out the `push_handler_test`, it can not run in DEBUG mode, will be fixed later.
6. Remove `timezone_db.h/cpp` and add `timezone_utils.h/cpp`

The performance shows bellow:

11,000,000 rows

SQL1: `select count(from_unixtime(k1)) from tbl1;`
Before: 8.85s
After: 2.85s

SQL2: `select count(from_unixtime(k1, '%Y-%m-%d %H:%i:%s')) from tbl1 limit 1;`
Before: 10.73s
After: 4.85s

The date string format seems still slow, we may need a further enhancement about it.
2020-06-29 19:15:09 +08:00
b2b9e22b24 [CreateTable] Check backend disk has available capacity by storage medium before create table (#3519)
Currently we choose BE random without check disk is available, 
the create table will failed until create tablet task is sent to BE
and BE will check is there has available capacity to create tablet.
So check backend disk available by storage medium will reduce unnecessary RPC call.
2020-06-28 09:36:31 +08:00
b3811f910f [Spark load][Fe 4/6] Add hive external table and update hive table syntax in loadstmt (#3819)
* Add hive external table and update hive table syntax in loadstmt

* Move check hive table from SelectStmt to FromClause and update doc

* Update hive external table en sql reference
2020-06-13 16:28:24 +08:00
wyb
44dbdf4986 Update hive external table en sql reference 2020-06-12 21:38:05 +08:00
4adc9d45c2 [Doc] Update ALTER TABLE.md 2020-06-10 22:58:29 +08:00