Commit Graph

42 Commits

Author SHA1 Message Date
068707484d Support sequence column for UNIQUE_KEYS Table (#4256)
* add sequence  col

Co-authored-by: yangwenbo6 <yangwenbo3@jd.com>
2020-09-04 10:10:17 +08:00
5166a6c6bc [Bug] function str_to_date()'s behavior on BE and FE is inconsistent (#4495)
Main CL:
1. Copy the code from BE to implement the `str_to_date()` function in FE. 
2. `str_to_date("2020-08-08", "%Y-%m-%d %H:%i:%s")` will return `2020-08-08 00:00:00` instead of `2020-08-08`.
2020-09-03 17:16:19 +08:00
wyb
ffe696d17c [Doc] Add spark load sql statement doc and update manual (#4463)
1. add sql statement in dml
2. update spark load manual
2020-08-30 21:09:17 +08:00
174c9f89ea [DOCS] Add batch delete docs (#4435)
update documents for batch delete #4051
2020-08-28 09:24:07 +08:00
a5d1d010c0 [Doc] Fix typo about plugin content (#4416) 2020-08-26 10:48:07 +08:00
bfb39a2826 [SQL][Function] Add replace() function (#4347)
replace is an user defined function, which is to replace all old substrings with a new substring in a string, as follow:
mysql> select replace("http://www.baidu.com:9090", "9090", "");
+------------------------------------------------------+
| replace('http://www.baidu.com:9090', '9090', '') |
+------------------------------------------------------+
| http://www.baidu.com: |
+------------------------------------------------------+
2020-08-20 09:28:53 +08:00
26fe510011 [Doc] modify the document error (#4357) 2020-08-17 23:06:23 +08:00
1d9b3aeee7 [Doc] Repair document format (#4336)
The error format '##keyword' in a lot of docs. This pr is to repair document format. #4335
2020-08-13 23:39:41 +08:00
eefad13107 [Feature] Support InPredicate in delete statement (#4006)
This PR is to add inPredicate support to delete statement,
and add max_allowed_in_element_num_of_delete variable to
limit element num of InPredicate in delete statement.
2020-08-06 23:19:40 +08:00
5ba4b024e7 [Docs] Add Materialized view manual (#4229)
Add usage manual of materialized view in Chinese and English
2020-08-06 23:18:06 +08:00
237c0807a4 [RoutineLoad] Support modify routine load job (#4158)
Support ALTER ROUTINE LOAD JOB stmt, for example:

```
alter routine load db1.label1
properties
(
"desired_concurrent_number"="3",
"max_batch_interval" = "5",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"strict_mode" = "false",
"timezone" = "+08:00"
)
```

Details can be found in `alter-routine-load.md`
2020-08-06 23:11:02 +08:00
116d7ffa3c [SQL][Function] Add approx_count_distinct() function (#4221)
Add approx_count_distinct() function to replace the ndv() function
2020-08-01 17:54:19 +08:00
fdcc223ad2 [Bug][Json] Refactor the json load logic to fix some bug
1. Add `json_root` for nest json data.
2. Remove `_jmap` to make the logic reasonable.
2020-07-30 10:36:34 +08:00
237271c764 [Bug] Fix fe meta version problem, make drop meta check code easy to read and add doc content for drop meta check (#4205)
This PR is mainly do three things:
1. Fix fe meta version bug introduced by #4029 , when fix conflict with #4086 
2. Make drop check code easy to read
3. Add doc content for drop meta check
2020-07-30 09:54:20 +08:00
1b3af783e6 [Plugin] Add properties grammar in InstallPluginStmt (#4173)
This PR is to support grammar like the following: INSTALL PLUGIN FROM [source] [PROPERTIES("KEY"="VALUE", ...)]
user can set md5sum="xxxxxxx", so we don't need to provide a md5 uri.
2020-07-29 15:02:31 +08:00
d7893f0fa7 [Bug]Fix some schema change not work right (#4009)
[Bug]Fix some schema change not work right
This CL mainly fix some schema change to varchar type not work right
because forget to logic check && Add ConvertTypeResolver to add
supported convert type in order to avoid forget logic check
2020-07-11 10:18:29 +08:00
d2ab38a5e0 [Feature] Batch update partition's property in one command (#3981)
Support following command.
```
alter table tbl_name modify partition (p1, p2, p3) set ("replication_num" = "3");
```
2020-07-09 21:48:43 +08:00
b7051d0971 [Config]Make it easier for users to find configuration items needed (#3957)
This PR is to make config items ordered by key and support like predicate for admin show config stmt
2020-07-07 23:12:21 +08:00
c3d9feed75 [Load][Json] Refactor json load logic to make it more reasonable (#4020)
This CL mainly changes:

1. Reorganized the code logic to limit the supported json format to two, and the import behavior is more consistent.
2. Modified the statistical behavior of the number of error rows when loading in json format, so that the error rows can be counted correctly.
3. See `load-json-format.md` to get details of loading json format.
2020-07-07 23:07:28 +08:00
d396408861 Correct typos (#4024) 2020-07-07 13:33:46 +08:00
af1beb6ce4 [Enhance] Add prepare phase for some timestamp functions (#3947)
Fix: #3946 

CL:
1. Add prepare phase for `from_unixtime()`, `date_format()` and `convert_tz()` functions, to handle the format string once for all.
2. Find the cctz timezone when init `runtime state`, so that don't need to find timezone for each rows.
3. Add constant rewrite rule for `utc_timestamp()`
4. Add doc for `to_date()`
5. Comment out the `push_handler_test`, it can not run in DEBUG mode, will be fixed later.
6. Remove `timezone_db.h/cpp` and add `timezone_utils.h/cpp`

The performance shows bellow:

11,000,000 rows

SQL1: `select count(from_unixtime(k1)) from tbl1;`
Before: 8.85s
After: 2.85s

SQL2: `select count(from_unixtime(k1, '%Y-%m-%d %H:%i:%s')) from tbl1 limit 1;`
Before: 10.73s
After: 4.85s

The date string format seems still slow, we may need a further enhancement about it.
2020-06-29 19:15:09 +08:00
b2b9e22b24 [CreateTable] Check backend disk has available capacity by storage medium before create table (#3519)
Currently we choose BE random without check disk is available, 
the create table will failed until create tablet task is sent to BE
and BE will check is there has available capacity to create tablet.
So check backend disk available by storage medium will reduce unnecessary RPC call.
2020-06-28 09:36:31 +08:00
feec4ee5bf [UDF] Support external users to contribute udf (#3760) 2020-06-23 13:43:08 +08:00
2f99f632e8 Modify docs format (#3896) 2020-06-18 09:43:28 +08:00
b3811f910f [Spark load][Fe 4/6] Add hive external table and update hive table syntax in loadstmt (#3819)
* Add hive external table and update hive table syntax in loadstmt

* Move check hive table from SelectStmt to FromClause and update doc

* Update hive external table en sql reference
2020-06-13 16:28:24 +08:00
wyb
44dbdf4986 Update hive external table en sql reference 2020-06-12 21:38:05 +08:00
wyb
7f7ee63723 Move check hive table from SelectStmt to FromClause and update doc 2020-06-11 16:53:41 +08:00
4adc9d45c2 [Doc] Update ALTER TABLE.md 2020-06-10 22:58:29 +08:00
wyb
4c2e73a5fe Add hive external table and update hive table syntax in loadstmt 2020-06-10 16:32:32 +08:00
a7bf006b51 Use BackendStatus to show BE's infomation in show backends; (#3713)
The infomation is displayed in JSON format.For example:
{"lastTabletReportTime":"2020-05-28 15:29:01"}
2020-06-06 11:37:48 +08:00
wyb
fdf3415d06 [Website] Fix CREATE RESOURCE sidebar text and link not right bug (#3777) 2020-06-05 09:20:36 +08:00
wyb
4978bd6c81 [Spark load] Add resource manager (#3418)
1. User interface:

1.1 Spark resource management

Spark is used as an external computing resource in Doris to do ETL work. In the future, there may be other external resources that will be used in Doris, for example, MapReduce is used for ETL, Spark/GPU is used for queries, HDFS/S3  is used for external storage. We introduced resource management to manage these external resources used by Doris.

```sql
-- create spark resource
CREATE EXTERNAL RESOURCE resource_name
PROPERTIES 
(                 
  type = spark,
  spark_conf_key = spark_conf_value,
  working_dir = path,
  broker = broker_name,
  broker.property_key = property_value
)

-- drop spark resource
DROP RESOURCE resource_name

-- show resources
SHOW RESOURCES
SHOW PROC "/resources"

-- privileges
GRANT USAGE_PRIV ON RESOURCE resource_name TO user_identity
GRANT USAGE_PRIV ON RESOURCE resource_name TO ROLE role_name

REVOKE USAGE_PRIV ON RESOURCE resource_name FROM user_identity
REVOKE USAGE_PRIV ON RESOURCE resource_name FROM ROLE role_name
```



- CREATE EXTERNAL RESOURCE:

FOR user_name is optional. If there has, the external resource belongs to this user. If not, the external resource belongs to the system and all users are available.

PROPERTIES:
1. type: resource type. Only support spark now.
2. spark configuration: follow the standard writing of Spark configurations, refer to: https://spark.apache.org/docs/latest/configuration.html.
3. working_dir: optional, used to store ETL intermediate results in spark ETL.
4. broker: optional, used in spark ETL. The ETL intermediate results need to be read with the broker when pushed into BE.

Example: 

```sql
CREATE EXTERNAL RESOURCE "spark0"
PROPERTIES 
(                                                                             
  "type" = "spark",                   
  "spark.master" = "yarn",
  "spark.submit.deployMode" = "cluster",
  "spark.jars" = "xxx.jar,yyy.jar",
  "spark.files" = "/tmp/aaa,/tmp/bbb",
  "spark.yarn.queue" = "queue0",
  "spark.executor.memory" = "1g",
  "spark.hadoop.yarn.resourcemanager.address" = "127.0.0.1:9999",
  "spark.hadoop.fs.defaultFS" = "hdfs://127.0.0.1:10000",
  "working_dir" = "hdfs://127.0.0.1:10000/tmp/doris",
  "broker" = "broker0",
  "broker.username" = "user0",
  "broker.password" = "password0"
)
```



- SHOW RESOURCES:
General users can only see their own resources.
Admin and root users can show all resources.




1.2 Create spark load job

```sql
LOAD LABEL db_name.label_name 
(
  DATA INFILE ("/tmp/file1") INTO TABLE table_name, ...
)
WITH RESOURCE resource_name
[(key1 = value1, ...)]
[PROPERTIES (key2 = value2, ... )]
```

Example:

```sql
LOAD LABEL example_db.test_label 
(
  DATA INFILE ("hdfs:/127.0.0.1:10000/tmp/file1") INTO TABLE example_table
)
WITH RESOURCE "spark0"
(
  "spark.executor.memory" = "1g",
  "spark.files" = "/tmp/aaa,/tmp/bbb"
)
PROPERTIES ("timeout" = "3600")
```

The spark configurations in load stmt can override the existing configuration in the resource for temporary use.

#3010
2020-05-26 18:21:21 +08:00
77b9acc242 [Stmt] Add rowCount column to SHOW DATA stmt (#3676)
User can see the row count of all materialized indexes of a table.

```
mysql> show data from test;
+-----------+-----------+-----------+--------------+----------+
| TableName | IndexName | Size      | ReplicaCount | RowCount |
+-----------+-----------+-----------+--------------+----------+
| test2     | r1        | 10.000MB  | 30           | 10000    |
|           | r2        | 20.000MB  | 30           | 20000    |
|           | test2     | 50.000MB  | 30           | 50000    |
|           | Total     | 80.000    | 90           |          |
+-----------+-----------+-----------+--------------+----------+
```

Fix #3675
2020-05-26 15:53:38 +08:00
ba7d2dbf7b [Function] Support utf-8 encoding in instr, locate, locate_pos, lpad, rpad (#3638)
Support utf-8 encoding for string function `instr`, `locate`, `locate_pos`, `lpad`, `rpad`
and add unit test for them
2020-05-22 14:34:26 +08:00
ef8fd1fcbe [Load] Support load json-data into Doris by RoutineLoad or StreamLoad (#3553)
Doris support load json-data by RoutineLoad or StreamLoad
2020-05-21 13:00:49 +08:00
0d66e6bd15 Support bitmap_intersect (#3571)
* Support bitmap_intersect

Support aggregate function Bitmap Intersect, it is mainly used to take intersection of grouped data.
The function 'bitmap_intersect(expr)' calculates the intersection of bitmap columns and returns a bitmap object.
The defination is following:
FunctionName: bitmap_intersect,
InputType: bitmap,
OutputType: bitmap

The scenario is as follows:
Query which users satisfy the three tags a, b, and c at the same time.

```
select bitmap_to_string(bitmap_intersect(user_id)) from
(
    select bitmap_union(user_id) user_id from bitmap_intersect_test
    where tag in ('a', 'b', 'c')
    group by tag
) a
```
Closed #3552.

* Add docs of bitmap_union and bitmap_intersect

* Support null of bitmap_intersect
2020-05-20 21:12:02 +08:00
4464328d8f [Doc] Add doc link to char_length (#3548) 2020-05-14 21:21:31 +08:00
488aa22938 [Doc] Update plugin document (#3447) (#3505) 2020-05-09 19:19:38 +08:00
a656a7ddd4 Support append_trailing_char_if_absent function (#3439) 2020-05-09 08:59:34 +08:00
94b3a2bd50 [Bug] Fix string functions not support multibyte string (#3345)
Let string functions support utf8 encoding
2020-05-08 12:52:46 +08:00
f591976976 [Doc] Fix the incorrect docs (#3501) 2020-05-08 12:47:00 +08:00
432965e360 [Enhancement] documents rebuild with Vuepress (#3408) (#3414) 2020-04-29 09:14:31 +08:00