Commit Graph

357 Commits

Author SHA1 Message Date
86d235a76a [Extension] Logstash Doris output plugin (#3800)
This plugin is used to output data to Doris for logstash
Use the HTTP protocol to interact with the Doris FE Http interface
Load data through Doris's stream load
2020-06-11 08:54:51 +08:00
4adc9d45c2 [Doc] Update ALTER TABLE.md 2020-06-10 22:58:29 +08:00
de91037d8c [Doc]Add some routine load docs (#3796)
Add some documentation about using routine load in the cloud environment
2020-06-10 22:57:00 +08:00
4cb5f7a535 [Config]Remove max_user_connections from config (#3805)
Update max_user_connections by user property:

```
set property `user` max_user_connections=100;
```
2020-06-10 22:56:05 +08:00
a7bf006b51 Use BackendStatus to show BE's infomation in show backends; (#3713)
The infomation is displayed in JSON format.For example:
{"lastTabletReportTime":"2020-05-28 15:29:01"}
2020-06-06 11:37:48 +08:00
ed9022a908 Ignore broken disk when BE starts up (#3741) 2020-06-05 10:26:07 +08:00
73719f263d Fix document (#3773) 2020-06-05 10:19:17 +08:00
wyb
fdf3415d06 [Website] Fix CREATE RESOURCE sidebar text and link not right bug (#3777) 2020-06-05 09:20:36 +08:00
01c1de1870 [Load] Add more metric to trace the time cost in stream load and make brpc_num_threads configurable (#3703) 2020-06-04 13:37:28 +08:00
27046c5b61 [Enhancement] Improve the performance of query with IN predicate (#3694)
This CL mainly changes:
1. Add a new BE config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to storage engine.

2. Add 2 new session variables `max_scan_key_num` and `doris_max_scan_key_num` which can set in session level and overwrite the config value in BE.
2020-06-04 11:39:00 +08:00
fc33ee3618 [Plugin] Add timeout of connection when downloading the plugins from URL (#3755)
If no timeout is set, the download process may be blocked forever.
2020-06-04 11:37:18 +08:00
791f8fee49 [Bug][Outfile] Fix bug that column separater is missing in output file. (#3765)
When output result of a query using `OUTFILE` statement, is some of output
column is null, then then following column separator is missing.
2020-06-04 10:35:32 +08:00
2ad1b20b24 [Config] Add new BE config for tcmalloc (#3732)
Add a new BE config tc_max_total_thread_cache_bytes
2020-06-03 21:58:13 +08:00
3194aa129d Add a link to Tablet Meta URL (#3745) 2020-06-03 10:10:32 +08:00
761a0ccd12 [Bug] Fix bug that runningprofile show time problem in FE web page and add the runingprofile doc (#3722) 2020-06-02 11:07:15 +08:00
bc35f3a31f [DynamicPartition] Optimize the rule of creating dynamic partition (#3679)
Problem is described in ISSUE #3678 
This CL mainly changed to rule of creating dynamic partition.

1. If time unit is DAY, the logic remains unchanged.
2. If time unit is WEEK, the logical changes are as follows:

	1. Allow to set the start day of every week, the default is Monday. Optional Monday to Sunday
	2. Assuming that the starting day is a Tuesday, the range of the partition is Tuesday of the week to Monday of the next week.

3. If time unit is MONTH, the logical changes are as follows:

	1. Allow to set the start date of each month. The default is 1st, and can be selected from 1st to 28th.
	2. Assuming that the starting date is the 2nd, the range of the partition is from the 2nd of this month to the 1st of the next month.

4. The `SHOW DYNAMIC PARTITION TABLES` statement adds a `StartOf` column to show the start day of week or month.

It is recommended to refer to the example in `dynamic-partition.md` to understand.

TODO:
Better to support HOUR and YEAR time unit. Maybe in next PR.

FIX: #3678
2020-05-27 16:42:41 +08:00
wyb
4978bd6c81 [Spark load] Add resource manager (#3418)
1. User interface:

1.1 Spark resource management

Spark is used as an external computing resource in Doris to do ETL work. In the future, there may be other external resources that will be used in Doris, for example, MapReduce is used for ETL, Spark/GPU is used for queries, HDFS/S3  is used for external storage. We introduced resource management to manage these external resources used by Doris.

```sql
-- create spark resource
CREATE EXTERNAL RESOURCE resource_name
PROPERTIES 
(                 
  type = spark,
  spark_conf_key = spark_conf_value,
  working_dir = path,
  broker = broker_name,
  broker.property_key = property_value
)

-- drop spark resource
DROP RESOURCE resource_name

-- show resources
SHOW RESOURCES
SHOW PROC "/resources"

-- privileges
GRANT USAGE_PRIV ON RESOURCE resource_name TO user_identity
GRANT USAGE_PRIV ON RESOURCE resource_name TO ROLE role_name

REVOKE USAGE_PRIV ON RESOURCE resource_name FROM user_identity
REVOKE USAGE_PRIV ON RESOURCE resource_name FROM ROLE role_name
```



- CREATE EXTERNAL RESOURCE:

FOR user_name is optional. If there has, the external resource belongs to this user. If not, the external resource belongs to the system and all users are available.

PROPERTIES:
1. type: resource type. Only support spark now.
2. spark configuration: follow the standard writing of Spark configurations, refer to: https://spark.apache.org/docs/latest/configuration.html.
3. working_dir: optional, used to store ETL intermediate results in spark ETL.
4. broker: optional, used in spark ETL. The ETL intermediate results need to be read with the broker when pushed into BE.

Example: 

```sql
CREATE EXTERNAL RESOURCE "spark0"
PROPERTIES 
(                                                                             
  "type" = "spark",                   
  "spark.master" = "yarn",
  "spark.submit.deployMode" = "cluster",
  "spark.jars" = "xxx.jar,yyy.jar",
  "spark.files" = "/tmp/aaa,/tmp/bbb",
  "spark.yarn.queue" = "queue0",
  "spark.executor.memory" = "1g",
  "spark.hadoop.yarn.resourcemanager.address" = "127.0.0.1:9999",
  "spark.hadoop.fs.defaultFS" = "hdfs://127.0.0.1:10000",
  "working_dir" = "hdfs://127.0.0.1:10000/tmp/doris",
  "broker" = "broker0",
  "broker.username" = "user0",
  "broker.password" = "password0"
)
```



- SHOW RESOURCES:
General users can only see their own resources.
Admin and root users can show all resources.




1.2 Create spark load job

```sql
LOAD LABEL db_name.label_name 
(
  DATA INFILE ("/tmp/file1") INTO TABLE table_name, ...
)
WITH RESOURCE resource_name
[(key1 = value1, ...)]
[PROPERTIES (key2 = value2, ... )]
```

Example:

```sql
LOAD LABEL example_db.test_label 
(
  DATA INFILE ("hdfs:/127.0.0.1:10000/tmp/file1") INTO TABLE example_table
)
WITH RESOURCE "spark0"
(
  "spark.executor.memory" = "1g",
  "spark.files" = "/tmp/aaa,/tmp/bbb"
)
PROPERTIES ("timeout" = "3600")
```

The spark configurations in load stmt can override the existing configuration in the resource for temporary use.

#3010
2020-05-26 18:21:21 +08:00
77b9acc242 [Stmt] Add rowCount column to SHOW DATA stmt (#3676)
User can see the row count of all materialized indexes of a table.

```
mysql> show data from test;
+-----------+-----------+-----------+--------------+----------+
| TableName | IndexName | Size      | ReplicaCount | RowCount |
+-----------+-----------+-----------+--------------+----------+
| test2     | r1        | 10.000MB  | 30           | 10000    |
|           | r2        | 20.000MB  | 30           | 20000    |
|           | test2     | 50.000MB  | 30           | 50000    |
|           | Total     | 80.000    | 90           |          |
+-----------+-----------+-----------+--------------+----------+
```

Fix #3675
2020-05-26 15:53:38 +08:00
963d4d48aa Override the style of sidebar's sub-direcotry (#3683)
Override the style of sidebar's sub-directory.
2020-05-26 09:07:55 +08:00
3ffc447b38 [OUTFILE] Support INTO OUTFILE to export query result (#3584)
This CL mainly changes:

1. Support `SELECT INTO OUTFILE` command.
2. Support export query result to a file via Broker.
3. Support CSV export format with specified column separator and line delimiter.
2020-05-25 21:24:56 +08:00
e6864a1cda Allow user to set thrift_client_timeout_ms config for thrift server (#3670)
1. Allow user to set thrift_client_timeout_ms config for thrift server
2. Add doc for thrift_client_timeout_ms config
2020-05-25 11:32:14 +08:00
ba7d2dbf7b [Function] Support utf-8 encoding in instr, locate, locate_pos, lpad, rpad (#3638)
Support utf-8 encoding for string function `instr`, `locate`, `locate_pos`, `lpad`, `rpad`
and add unit test for them
2020-05-22 14:34:26 +08:00
dbfe8a067f [Doc ]Add docs of max_running_txn_num_per_db (#3657)
Change-Id: Ibdbc19a5558b0eb3f6a5fc4ef630de255b408a92
2020-05-22 10:22:11 +08:00
f6b5c8839b [Bug] Ignore loading DELETE status tablet error when restarting BE (#3641)
Fix: #3640 

Also add a `batch delete meta` feature for `meta tool`
Fix #3639
2020-05-21 19:08:28 +08:00
ef8fd1fcbe [Load] Support load json-data into Doris by RoutineLoad or StreamLoad (#3553)
Doris support load json-data by RoutineLoad or StreamLoad
2020-05-21 13:00:49 +08:00
0d66e6bd15 Support bitmap_intersect (#3571)
* Support bitmap_intersect

Support aggregate function Bitmap Intersect, it is mainly used to take intersection of grouped data.
The function 'bitmap_intersect(expr)' calculates the intersection of bitmap columns and returns a bitmap object.
The defination is following:
FunctionName: bitmap_intersect,
InputType: bitmap,
OutputType: bitmap

The scenario is as follows:
Query which users satisfy the three tags a, b, and c at the same time.

```
select bitmap_to_string(bitmap_intersect(user_id)) from
(
    select bitmap_union(user_id) user_id from bitmap_intersect_test
    where tag in ('a', 'b', 'c')
    group by tag
) a
```
Closed #3552.

* Add docs of bitmap_union and bitmap_intersect

* Support null of bitmap_intersect
2020-05-20 21:12:02 +08:00
6be7a6232f [Config] Add ignore config to determine whether to continue to start be when load tablet from header failed. (#3632)
Add config ignore_load_tablet_failure to determine whether to continue to start be when load tablet from header failed.
2020-05-20 09:40:50 +08:00
4cbcae1574 [Spark on Doris] Shade and provide the thrift lib in spark-doris-connector (#3631)
Mainly changes:
1. Shade and provide the thrift lib in spark-doris-connector
2. Add a `build.sh` for spark-doris-connector
3. Move the README.md of spark-doris-connector to `docs/`
4. Change the line delimiter of `fe/src/test/java/org/apache/doris/analysis/AggregateTest.java`
2020-05-19 14:20:21 +08:00
87caa697a9 [Doc] Update table-restore-tool.md
Fix some format.

NOTICE(#3622 ):
This is a "revert of revert pull request".
This pr is mainly used to synthesize the PRs whose commits were
scattered and submitted due to the wrong merge method into a complete single commit.
2020-05-18 14:42:17 +08:00
24ca937877 Revert "[Doc] Update table-restore-tool.md" (#3606) 2020-05-18 12:08:54 +08:00
0d76c78537 [Doc] Update table-restore-tool.md 2020-05-18 11:12:24 +08:00
d4ff6dcdd6 fix by review 2020-05-18 10:56:12 +08:00
a4e98953be [website] modify download links & remove some links' suffix _EN(master) (#3573)
modify download links & remove some links' suffix _EN
2020-05-15 14:03:28 +08:00
4464328d8f [Doc] Add doc link to char_length (#3548) 2020-05-14 21:21:31 +08:00
47bce081d2 [website] Support documents' fulltext searching (master) (#3535)
add documents' fulltext search powered by algolia
2020-05-13 21:18:42 +08:00
95c67db712 [community] Add Committer Guide (#3522) 2020-05-13 21:17:12 +08:00
40cd5365ce [Doc] Update table-restore-tool.md
Fix some format.
2020-05-13 18:51:11 +08:00
56db6e7a35 [Config] allow user to config BRPC socket_max_unwritten_bytes (#3488)
Add new BE config `brpc_socket_max_unwritten_bytes`
2020-05-10 17:56:14 +08:00
488aa22938 [Doc] Update plugin document (#3447) (#3505) 2020-05-09 19:19:38 +08:00
a656a7ddd4 Support append_trailing_char_if_absent function (#3439) 2020-05-09 08:59:34 +08:00
94b3a2bd50 [Bug] Fix string functions not support multibyte string (#3345)
Let string functions support utf8 encoding
2020-05-08 12:52:46 +08:00
f591976976 [Doc] Fix the incorrect docs (#3501) 2020-05-08 12:47:00 +08:00
5e63629b8b [Decommission] Support NOT dropping BE after decommission (#3461)
Add a new config `drop_backend_after_decommission` in FE. if this config
is false, the BE will not be dropped after finishing decommission operation.

This new config is try to solve the problem described in ISSUE: #3460 .

TODO:
This method will generate a lot of data migration, so it is only a temporary solution.
After that, we should try to solve the problem of data balancing within the BE.

This CL also add the documents of FE and BE configuration.
These documents are incomplete and can be added later.
2020-05-06 17:14:24 +08:00
dafb356b42 [Bugfix] Fix navbar not showing on mobile clients(#3419) & image relative path problem (#3427) 2020-05-06 11:57:03 +08:00
a1500eb544 Update doris-on-es.md (#3446) 2020-05-03 12:48:48 +08:00
2cb4027164 Update doris-on-es.md (#3441) 2020-05-03 12:48:19 +08:00
54da5a491c Fix delete statement doc display not correctly (#3445) 2020-05-01 19:20:00 +08:00
73a3c59efb [Bug] Fix bug that help-resource.zip file is missing. (#3423) 2020-04-29 19:25:28 +08:00
432965e360 [Enhancement] documents rebuild with Vuepress (#3408) (#3414) 2020-04-29 09:14:31 +08:00
9a934ec9f6 [Load] Add more info in SHOW LOAD result (#3391)
Fix #3390
This CL add more info in `JobDetails` column of `SHOW LOAD` result for Broker Load Job.

For example:
```
{
	"Unfinished backends": {
		"9c3441027ff948a0-8287923329a2b6a7": [10002]
	},
        "All backends": {
		"9c3441027ff948a0-8287923329a2b6a7": [10002, 10004, 10006]
	},
	"ScannedRows": 2390016,
	"TaskNumber": 1,
	"FileNumber": 1,
	"FileSize": 1073741824
}
```

2 newly added keys:

`Unfinished backends` indicates the BE which task on them are not finished.
`All backends` indicates the BE which this job has tasks on it.

One more thing, I pass the Backend Id along with the heartbeat msg from FE to BE, so that BE can
know the Id of themselves.
2020-04-26 21:30:23 +08:00