Commit Graph

368 Commits

Author SHA1 Message Date
c6f2b5ef0d [Doris On ES][Docs] refator documentation for doe (#3867) 2020-06-17 10:54:28 +08:00
6c4d7c60dd [Feature] Add QueryDetail to store query statistics. (#3744)
1. Store the query statistics in memory.
2. Supporting RESTFUL interface to get the statistics.
2020-06-15 18:16:54 +08:00
2211cb0ee0 [Metrics] Add metrics document and 2 new metrics of TCP (#3835) 2020-06-15 09:48:09 +08:00
3c09e1e1d8 [trace] Adapt trace util to compaction module (#3814)
Trace util is helpful for diagnosing compaction performance problems,
we can get trace log for base compaction like:
```
W0610 11:26:33.804431 56452 storage_engine.cpp:552] Trace:
0610 11:23:03.727535 (+     0us) storage_engine.cpp:554] start to perform base compaction
0610 11:23:03.728961 (+  1426us) storage_engine.cpp:560] found best tablet 546859
0610 11:23:03.728963 (+     2us) base_compaction.cpp:40] got base compaction lock
0610 11:23:03.729029 (+    66us) base_compaction.cpp:44] rowsets picked
0610 11:24:51.784439 (+108055410us) compaction.cpp:46] got concurrency lock and start to do compaction
0610 11:24:51.784818 (+   379us) compaction.cpp:74] prepare finished
0610 11:26:33.359265 (+101574447us) compaction.cpp:87] merge rowsets finished
0610 11:26:33.484481 (+125216us) compaction.cpp:102] output rowset built
0610 11:26:33.484482 (+     1us) compaction.cpp:106] check correctness finished
0610 11:26:33.513197 (+ 28715us) compaction.cpp:110] modify rowsets finished
0610 11:26:33.513300 (+   103us) base_compaction.cpp:49] compaction finished
0610 11:26:33.513441 (+   141us) base_compaction.cpp:56] unused rowsets have been moved to GC queue
Metrics: {"filtered_rows":0,"input_row_num":3346807,"input_rowsets_count":42,"input_rowsets_data_size":1256413170,"input_segments_num":44,"merge_rowsets_latency_us":101574444,"merged_rows":0,"output_row_num":3346807,"output_rowset_data_size":1228439659,"output_segments_num":6}
```
for cumulative compaction like:
```
W0610 11:14:18.714366 56468 storage_engine.cpp:518] Trace:
0610 11:14:08.068484 (+     0us) storage_engine.cpp:520] start to perform cumulative compaction
0610 11:14:08.069844 (+  1360us) storage_engine.cpp:526] found best tablet 547083
0610 11:14:08.069846 (+     2us) cumulative_compaction.cpp:42] got cumulative compaction lock
0610 11:14:08.069947 (+   101us) cumulative_compaction.cpp:46] calculated cumulative point
0610 11:14:08.070141 (+   194us) cumulative_compaction.cpp:50] rowsets picked
0610 11:14:08.070143 (+     2us) compaction.cpp:46] got concurrency lock and start to do compaction
0610 11:14:08.070518 (+   375us) compaction.cpp:74] prepare finished
0610 11:14:15.389893 (+7319375us) compaction.cpp:87] merge rowsets finished
0610 11:14:15.390916 (+  1023us) compaction.cpp:102] output rowset built
0610 11:14:15.390917 (+     1us) compaction.cpp:106] check correctness finished
0610 11:14:15.409460 (+ 18543us) compaction.cpp:110] modify rowsets finished
0610 11:14:15.409496 (+    36us) cumulative_compaction.cpp:55] compaction finished
0610 11:14:15.410138 (+   642us) cumulative_compaction.cpp:65] unused rowsets have been moved to GC queue
Metrics: {"filtered_rows":0,"input_row_num":136707,"input_rowsets_count":302,"input_rowsets_data_size":76617836,"input_segments_num":302,"merge_rowsets_latency_us":7319372,"merged_rows":0,"output_row_num":136707,"output_rowset_data_size":53893280,"output_segments_num":1}
```
2020-06-13 19:31:51 +08:00
b3811f910f [Spark load][Fe 4/6] Add hive external table and update hive table syntax in loadstmt (#3819)
* Add hive external table and update hive table syntax in loadstmt

* Move check hive table from SelectStmt to FromClause and update doc

* Update hive external table en sql reference
2020-06-13 16:28:24 +08:00
414a0a35e5 [Dynamic Partition] Use ZonedDateTime to support set timezone (#3799)
This CL mainly support timezone in dynamic partition:
1. use new Java Time API to replace Calendar.
2. support set time zone in dynamic partition parameters.
2020-06-13 16:27:09 +08:00
b8ee84a120 [Doc] Add docs to OLAP_SCAN_NODE query profile (#3808) 2020-06-13 16:25:40 +08:00
wyb
44dbdf4986 Update hive external table en sql reference 2020-06-12 21:38:05 +08:00
wyb
7f7ee63723 Move check hive table from SelectStmt to FromClause and update doc 2020-06-11 16:53:41 +08:00
8d11ad3a16 [Doc] Fix website doc error (#3823) 2020-06-11 10:01:54 +08:00
86d235a76a [Extension] Logstash Doris output plugin (#3800)
This plugin is used to output data to Doris for logstash
Use the HTTP protocol to interact with the Doris FE Http interface
Load data through Doris's stream load
2020-06-11 08:54:51 +08:00
4adc9d45c2 [Doc] Update ALTER TABLE.md 2020-06-10 22:58:29 +08:00
de91037d8c [Doc]Add some routine load docs (#3796)
Add some documentation about using routine load in the cloud environment
2020-06-10 22:57:00 +08:00
4cb5f7a535 [Config]Remove max_user_connections from config (#3805)
Update max_user_connections by user property:

```
set property `user` max_user_connections=100;
```
2020-06-10 22:56:05 +08:00
wyb
4c2e73a5fe Add hive external table and update hive table syntax in loadstmt 2020-06-10 16:32:32 +08:00
a7bf006b51 Use BackendStatus to show BE's infomation in show backends; (#3713)
The infomation is displayed in JSON format.For example:
{"lastTabletReportTime":"2020-05-28 15:29:01"}
2020-06-06 11:37:48 +08:00
ed9022a908 Ignore broken disk when BE starts up (#3741) 2020-06-05 10:26:07 +08:00
73719f263d Fix document (#3773) 2020-06-05 10:19:17 +08:00
wyb
fdf3415d06 [Website] Fix CREATE RESOURCE sidebar text and link not right bug (#3777) 2020-06-05 09:20:36 +08:00
01c1de1870 [Load] Add more metric to trace the time cost in stream load and make brpc_num_threads configurable (#3703) 2020-06-04 13:37:28 +08:00
27046c5b61 [Enhancement] Improve the performance of query with IN predicate (#3694)
This CL mainly changes:
1. Add a new BE config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to storage engine.

2. Add 2 new session variables `max_scan_key_num` and `doris_max_scan_key_num` which can set in session level and overwrite the config value in BE.
2020-06-04 11:39:00 +08:00
fc33ee3618 [Plugin] Add timeout of connection when downloading the plugins from URL (#3755)
If no timeout is set, the download process may be blocked forever.
2020-06-04 11:37:18 +08:00
791f8fee49 [Bug][Outfile] Fix bug that column separater is missing in output file. (#3765)
When output result of a query using `OUTFILE` statement, is some of output
column is null, then then following column separator is missing.
2020-06-04 10:35:32 +08:00
2ad1b20b24 [Config] Add new BE config for tcmalloc (#3732)
Add a new BE config tc_max_total_thread_cache_bytes
2020-06-03 21:58:13 +08:00
3194aa129d Add a link to Tablet Meta URL (#3745) 2020-06-03 10:10:32 +08:00
761a0ccd12 [Bug] Fix bug that runningprofile show time problem in FE web page and add the runingprofile doc (#3722) 2020-06-02 11:07:15 +08:00
bc35f3a31f [DynamicPartition] Optimize the rule of creating dynamic partition (#3679)
Problem is described in ISSUE #3678 
This CL mainly changed to rule of creating dynamic partition.

1. If time unit is DAY, the logic remains unchanged.
2. If time unit is WEEK, the logical changes are as follows:

	1. Allow to set the start day of every week, the default is Monday. Optional Monday to Sunday
	2. Assuming that the starting day is a Tuesday, the range of the partition is Tuesday of the week to Monday of the next week.

3. If time unit is MONTH, the logical changes are as follows:

	1. Allow to set the start date of each month. The default is 1st, and can be selected from 1st to 28th.
	2. Assuming that the starting date is the 2nd, the range of the partition is from the 2nd of this month to the 1st of the next month.

4. The `SHOW DYNAMIC PARTITION TABLES` statement adds a `StartOf` column to show the start day of week or month.

It is recommended to refer to the example in `dynamic-partition.md` to understand.

TODO:
Better to support HOUR and YEAR time unit. Maybe in next PR.

FIX: #3678
2020-05-27 16:42:41 +08:00
wyb
4978bd6c81 [Spark load] Add resource manager (#3418)
1. User interface:

1.1 Spark resource management

Spark is used as an external computing resource in Doris to do ETL work. In the future, there may be other external resources that will be used in Doris, for example, MapReduce is used for ETL, Spark/GPU is used for queries, HDFS/S3  is used for external storage. We introduced resource management to manage these external resources used by Doris.

```sql
-- create spark resource
CREATE EXTERNAL RESOURCE resource_name
PROPERTIES 
(                 
  type = spark,
  spark_conf_key = spark_conf_value,
  working_dir = path,
  broker = broker_name,
  broker.property_key = property_value
)

-- drop spark resource
DROP RESOURCE resource_name

-- show resources
SHOW RESOURCES
SHOW PROC "/resources"

-- privileges
GRANT USAGE_PRIV ON RESOURCE resource_name TO user_identity
GRANT USAGE_PRIV ON RESOURCE resource_name TO ROLE role_name

REVOKE USAGE_PRIV ON RESOURCE resource_name FROM user_identity
REVOKE USAGE_PRIV ON RESOURCE resource_name FROM ROLE role_name
```



- CREATE EXTERNAL RESOURCE:

FOR user_name is optional. If there has, the external resource belongs to this user. If not, the external resource belongs to the system and all users are available.

PROPERTIES:
1. type: resource type. Only support spark now.
2. spark configuration: follow the standard writing of Spark configurations, refer to: https://spark.apache.org/docs/latest/configuration.html.
3. working_dir: optional, used to store ETL intermediate results in spark ETL.
4. broker: optional, used in spark ETL. The ETL intermediate results need to be read with the broker when pushed into BE.

Example: 

```sql
CREATE EXTERNAL RESOURCE "spark0"
PROPERTIES 
(                                                                             
  "type" = "spark",                   
  "spark.master" = "yarn",
  "spark.submit.deployMode" = "cluster",
  "spark.jars" = "xxx.jar,yyy.jar",
  "spark.files" = "/tmp/aaa,/tmp/bbb",
  "spark.yarn.queue" = "queue0",
  "spark.executor.memory" = "1g",
  "spark.hadoop.yarn.resourcemanager.address" = "127.0.0.1:9999",
  "spark.hadoop.fs.defaultFS" = "hdfs://127.0.0.1:10000",
  "working_dir" = "hdfs://127.0.0.1:10000/tmp/doris",
  "broker" = "broker0",
  "broker.username" = "user0",
  "broker.password" = "password0"
)
```



- SHOW RESOURCES:
General users can only see their own resources.
Admin and root users can show all resources.




1.2 Create spark load job

```sql
LOAD LABEL db_name.label_name 
(
  DATA INFILE ("/tmp/file1") INTO TABLE table_name, ...
)
WITH RESOURCE resource_name
[(key1 = value1, ...)]
[PROPERTIES (key2 = value2, ... )]
```

Example:

```sql
LOAD LABEL example_db.test_label 
(
  DATA INFILE ("hdfs:/127.0.0.1:10000/tmp/file1") INTO TABLE example_table
)
WITH RESOURCE "spark0"
(
  "spark.executor.memory" = "1g",
  "spark.files" = "/tmp/aaa,/tmp/bbb"
)
PROPERTIES ("timeout" = "3600")
```

The spark configurations in load stmt can override the existing configuration in the resource for temporary use.

#3010
2020-05-26 18:21:21 +08:00
77b9acc242 [Stmt] Add rowCount column to SHOW DATA stmt (#3676)
User can see the row count of all materialized indexes of a table.

```
mysql> show data from test;
+-----------+-----------+-----------+--------------+----------+
| TableName | IndexName | Size      | ReplicaCount | RowCount |
+-----------+-----------+-----------+--------------+----------+
| test2     | r1        | 10.000MB  | 30           | 10000    |
|           | r2        | 20.000MB  | 30           | 20000    |
|           | test2     | 50.000MB  | 30           | 50000    |
|           | Total     | 80.000    | 90           |          |
+-----------+-----------+-----------+--------------+----------+
```

Fix #3675
2020-05-26 15:53:38 +08:00
963d4d48aa Override the style of sidebar's sub-direcotry (#3683)
Override the style of sidebar's sub-directory.
2020-05-26 09:07:55 +08:00
3ffc447b38 [OUTFILE] Support INTO OUTFILE to export query result (#3584)
This CL mainly changes:

1. Support `SELECT INTO OUTFILE` command.
2. Support export query result to a file via Broker.
3. Support CSV export format with specified column separator and line delimiter.
2020-05-25 21:24:56 +08:00
e6864a1cda Allow user to set thrift_client_timeout_ms config for thrift server (#3670)
1. Allow user to set thrift_client_timeout_ms config for thrift server
2. Add doc for thrift_client_timeout_ms config
2020-05-25 11:32:14 +08:00
ba7d2dbf7b [Function] Support utf-8 encoding in instr, locate, locate_pos, lpad, rpad (#3638)
Support utf-8 encoding for string function `instr`, `locate`, `locate_pos`, `lpad`, `rpad`
and add unit test for them
2020-05-22 14:34:26 +08:00
dbfe8a067f [Doc ]Add docs of max_running_txn_num_per_db (#3657)
Change-Id: Ibdbc19a5558b0eb3f6a5fc4ef630de255b408a92
2020-05-22 10:22:11 +08:00
f6b5c8839b [Bug] Ignore loading DELETE status tablet error when restarting BE (#3641)
Fix: #3640 

Also add a `batch delete meta` feature for `meta tool`
Fix #3639
2020-05-21 19:08:28 +08:00
ef8fd1fcbe [Load] Support load json-data into Doris by RoutineLoad or StreamLoad (#3553)
Doris support load json-data by RoutineLoad or StreamLoad
2020-05-21 13:00:49 +08:00
0d66e6bd15 Support bitmap_intersect (#3571)
* Support bitmap_intersect

Support aggregate function Bitmap Intersect, it is mainly used to take intersection of grouped data.
The function 'bitmap_intersect(expr)' calculates the intersection of bitmap columns and returns a bitmap object.
The defination is following:
FunctionName: bitmap_intersect,
InputType: bitmap,
OutputType: bitmap

The scenario is as follows:
Query which users satisfy the three tags a, b, and c at the same time.

```
select bitmap_to_string(bitmap_intersect(user_id)) from
(
    select bitmap_union(user_id) user_id from bitmap_intersect_test
    where tag in ('a', 'b', 'c')
    group by tag
) a
```
Closed #3552.

* Add docs of bitmap_union and bitmap_intersect

* Support null of bitmap_intersect
2020-05-20 21:12:02 +08:00
6be7a6232f [Config] Add ignore config to determine whether to continue to start be when load tablet from header failed. (#3632)
Add config ignore_load_tablet_failure to determine whether to continue to start be when load tablet from header failed.
2020-05-20 09:40:50 +08:00
4cbcae1574 [Spark on Doris] Shade and provide the thrift lib in spark-doris-connector (#3631)
Mainly changes:
1. Shade and provide the thrift lib in spark-doris-connector
2. Add a `build.sh` for spark-doris-connector
3. Move the README.md of spark-doris-connector to `docs/`
4. Change the line delimiter of `fe/src/test/java/org/apache/doris/analysis/AggregateTest.java`
2020-05-19 14:20:21 +08:00
87caa697a9 [Doc] Update table-restore-tool.md
Fix some format.

NOTICE(#3622 ):
This is a "revert of revert pull request".
This pr is mainly used to synthesize the PRs whose commits were
scattered and submitted due to the wrong merge method into a complete single commit.
2020-05-18 14:42:17 +08:00
24ca937877 Revert "[Doc] Update table-restore-tool.md" (#3606) 2020-05-18 12:08:54 +08:00
0d76c78537 [Doc] Update table-restore-tool.md 2020-05-18 11:12:24 +08:00
d4ff6dcdd6 fix by review 2020-05-18 10:56:12 +08:00
a4e98953be [website] modify download links & remove some links' suffix _EN(master) (#3573)
modify download links & remove some links' suffix _EN
2020-05-15 14:03:28 +08:00
4464328d8f [Doc] Add doc link to char_length (#3548) 2020-05-14 21:21:31 +08:00
47bce081d2 [website] Support documents' fulltext searching (master) (#3535)
add documents' fulltext search powered by algolia
2020-05-13 21:18:42 +08:00
95c67db712 [community] Add Committer Guide (#3522) 2020-05-13 21:17:12 +08:00
40cd5365ce [Doc] Update table-restore-tool.md
Fix some format.
2020-05-13 18:51:11 +08:00
56db6e7a35 [Config] allow user to config BRPC socket_max_unwritten_bytes (#3488)
Add new BE config `brpc_socket_max_unwritten_bytes`
2020-05-10 17:56:14 +08:00
488aa22938 [Doc] Update plugin document (#3447) (#3505) 2020-05-09 19:19:38 +08:00