Commit Graph

449 Commits

Author SHA1 Message Date
ac3bbdd3ab [BatchDelete] Add a configuration indicating whether to enable the batch delete function (#4493) 2020-09-03 16:56:37 +08:00
f207036cad [Spark load][Document] Add docs about spark and yarn client for spark load (#4489)
Add docs about spark and yarn client for spark load
2020-09-02 10:52:49 +08:00
8bb65863f5 [Doc] Update doc of fe-idea-dev.md (#4485) 2020-08-31 10:09:10 +08:00
1d93ba027a [Compaction] Compaction show policy type and disk format (#4466)
Add more information in compaction show api
1、add cumulative policy type
2、format rowset total disk size
2020-08-30 21:09:47 +08:00
wyb
ffe696d17c [Doc] Add spark load sql statement doc and update manual (#4463)
1. add sql statement in dml
2. update spark load manual
2020-08-30 21:09:17 +08:00
0db9194dc0 [Doc] Fix wrong doc name (#4477)
Co-authored-by: morningman <chenmingyu@baidu.com>
2020-08-28 11:56:59 +08:00
174c9f89ea [DOCS] Add batch delete docs (#4435)
update documents for batch delete #4051
2020-08-28 09:24:07 +08:00
ad738fa198 Add OLAP_ERR_DATE_QUALITY_ERR error status to display schema change failure (#4388)
In the process of historical data transformation of materialized views, it may occur that the transformation fails due to data quality.
Add an error status code :OLAP_ERR_DATE_QUALITY_ERR to determine if a data problem is causing the failure

#3344
2020-08-27 17:52:53 +08:00
8b0b120aca [Profile] Add 2 Segment related metrics in query profile (#4348)
Total number of segments and filterd number of segment
2020-08-27 12:07:21 +08:00
b4d8b3d9ba Forbidden the illegal column types on BITMAP_UNION OR HLL_UNION mv (#4432)
1. The base column of bitmap_union could must be integer. The largeint is not supported too.
2. The base column of hll_union could not be decimal.

Check error msg of const expr in Union Node

If user wants to insert a negative number into bitmap mv, Doris will thrown exception 'invalid input'.
The const value in Union Node is checked in this commit.
2020-08-26 10:49:32 +08:00
a5d1d010c0 [Doc] Fix typo about plugin content (#4416) 2020-08-26 10:48:07 +08:00
1410d4e623 [Doc] Add in predicate support content in delete-manual.md (#4404)
Add in predicate support content in delete-manual.md
2020-08-24 21:52:28 +08:00
67b842ce04 [License] Organize and modify the license of the code (#4371)
1. Disable the MySQL client and LZO library by default when building the Doris.

    MySQL client library is used for MySQL external table feature.
    This feature will be replaced by the new ODBC external table soon.

    LZO library is used to compress/decompress data of some old data format of Doris,
    which is no longer used anymore.

2. Add missing license to some files.

3. For all non-Apache-License code, all are explained in NOTICE file and the corresponding license is declared.

4. Remove the js source code from webroot, it will be downloaded as thirdparty
2020-08-24 21:51:55 +08:00
976820ba20 [SegmentV2] Change the default storage format to SegmentV2 (#4387)
Since the Segment V2 has been released for a long time, we should make it as default storage format for newly created table.

This CL mainly changes:
1. For all newly created tables, their default storage format is Segment V2.
2. For all already exist tablets, their storage format remain unchanged.
3. Fix  bugs described in Fix #4384 and Fix #4385
2020-08-24 21:51:17 +08:00
0715c54004 Fix mispelling (#4407)
Centers to centos
2020-08-21 09:14:21 +08:00
04a75b7c28 [Doc] Fix spelling errors in dynamic partition docs (#4395)
Change-Id: I84de1602b99c6b89b59ccc5869c96516c40a181d
2020-08-20 09:31:33 +08:00
bfb39a2826 [SQL][Function] Add replace() function (#4347)
replace is an user defined function, which is to replace all old substrings with a new substring in a string, as follow:
mysql> select replace("http://www.baidu.com:9090", "9090", "");
+------------------------------------------------------+
| replace('http://www.baidu.com:9090', '9090', '') |
+------------------------------------------------------+
| http://www.baidu.com: |
+------------------------------------------------------+
2020-08-20 09:28:53 +08:00
4c571cb6f5 Revert "[Metrics] Support tablet level metrics (#4327)" (#4397)
This reverts commit 56260a65c87830ffe34109195ee4d6f1d543e630.

Co-authored-by: morningman <chenmingyu@baidu.com>
2020-08-19 22:37:52 +08:00
f92428248f Support udaf_orthogonal_bitmap (#4198)
The original Doris bitmap aggregation function has poor performance on the intersection and union set of bitmap cardinality of more than one billion. There are two reasons for this. The first is that when the bitmap cardinality is large, if the data size exceeds 1g, the network / disk IO time consumption will increase; The second point is that all the sink data of the back-end be instance are transferred to the top node for intersection and union calculation, which leads to the pressure on the top single node and becomes the bottleneck.

My solution is to create a fixed schema table based on the Doris fragmentation rule, and hash fragment the ID range based on the bitmap, that is, cut the ID range vertically to form a small cube. Such bitmap blocks will become smaller and evenly distributed on all back-end be instances. Based on the schema table, some new high-performance udaf aggregation functions are developed. All Scan nodes participate in intersection and union calculation, and top nodes only summarize

The design goal is that the base number of bitmap is more than 10 billion, and the response time of cross union set calculation of 100 dimensional granularity is within 5 s.

There are three udaf functions in this commit: orthogonal_bitmap_intersect_count, orthogonal_bitmap_union_count, orthogonal_bitmap_intersect.
2020-08-19 10:29:13 +08:00
dc3ed1c525 [Compaction]Compaction rules optimization (#4212)
Compaction rules optimization, the detail problem description and design to see #4164.
This pr commits 2 functions:
(1) add the cumulative policy configable, and implement original policy.
(2) implement universal policy, the optimization version in #4164.
2020-08-19 09:34:13 +08:00
56260a65c8 [Metrics] Support tablet level metrics (#4327)
Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet,
but we have no insight about tablets in the cluster.
This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `.
However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request,
and not return tablet level metrics by default.
2020-08-18 16:56:12 +08:00
8a3eaeecf1 Update support batch delete storage design document (#4234)
* Update delete index design document
2020-08-18 15:37:14 +08:00
3359467b9a [Tablet][Recovery] Support using empty tablet to repair the damaged or missing tablet (#4255)
In some very special circumstances, such as code bugs, or human misoperation, etc.,
all replicas of some tablets may be lost. In this case, the data has been substantially lost.
However, in some scenarios, the business still hopes to ensure that the query will not
report errors even if there is data loss, and reduce the perception of the user layer.
At this point, we can use the blank Tablet to fill the missing replica to ensure that the query can be executed normally.

Add a new FE config `recover_with_empty_tablet`. default is false. true means to use empty tablet to fill the missing one.

Also fix a bug in Fix #4274
2020-08-18 06:13:53 +00:00
26fe510011 [Doc] modify the document error (#4357) 2020-08-17 23:06:23 +08:00
d6028863f3 [Compaction] Manually trigger compaction RESTapi interface (#4312)
Add restapi to be which do compaction task by manual trigger. The detail design in #4311 .
2020-08-13 23:41:46 +08:00
05fa55047e [Doc][Json Load] Improve json data format load documents (#4337)
And some detail explaination of JsonPath and Columns parameter
2020-08-13 23:39:57 +08:00
1d9b3aeee7 [Doc] Repair document format (#4336)
The error format '##keyword' in a lot of docs. This pr is to repair document format. #4335
2020-08-13 23:39:41 +08:00
d655b271b8 [Feature][Web] Add new feature to list all tablets on a particular BE (#4268)
A new feature has been added to acquire tablet id and schema hash of all the tablets on a particular BE node
via Web page,so that more detailed information of each tablet can be obtained according to these
tablet id and schema hash. In accordance with different web request, there are two ways 
(table and json)to show these acquired tablet id and schema hash on Web page.
2020-08-12 20:55:19 +08:00
651a7e50d0 [Doc] Update compilation.md (#4297) 2020-08-09 20:50:33 +08:00
eefad13107 [Feature] Support InPredicate in delete statement (#4006)
This PR is to add inPredicate support to delete statement,
and add max_allowed_in_element_num_of_delete variable to
limit element num of InPredicate in delete statement.
2020-08-06 23:19:40 +08:00
5ba4b024e7 [Docs] Add Materialized view manual (#4229)
Add usage manual of materialized view in Chinese and English
2020-08-06 23:18:06 +08:00
237c0807a4 [RoutineLoad] Support modify routine load job (#4158)
Support ALTER ROUTINE LOAD JOB stmt, for example:

```
alter routine load db1.label1
properties
(
"desired_concurrent_number"="3",
"max_batch_interval" = "5",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"strict_mode" = "false",
"timezone" = "+08:00"
)
```

Details can be found in `alter-routine-load.md`
2020-08-06 23:11:02 +08:00
421828d52a [Doc] Fix format in doris_storage_optimization.md (#4250) 2020-08-05 21:45:03 +08:00
1b341601fe Generate jave files using maven (#4133)
generate generated-java files using maven instead of by build.sh
2020-08-05 15:20:39 +08:00
3f31866169 [Bug][Load][Json] #4124 Load json format with stream load failed (#4217)
Stream load should read all the data completely before parsing the json.
And also add a new BE config streaming_load_max_batch_read_mb
to limit the data size when loading json data.

Fix the bug of loading empty json array []

Add doc to explain some certain case of loading json format data.

Fix: #4124
2020-08-04 12:55:53 +08:00
bdaef84a10 [FE] [HttpServer] Config netty param in HttpServer (#4225)
Now, if the length of URL is longer than 4096 bytes, netty will refuse.
The case can be reproduced by constructing a very long URL(longer than 4096bytes)

Add 2 http server params:
1. http_max_line_length
2. http_max_header_size
2020-08-01 17:59:01 +08:00
116d7ffa3c [SQL][Function] Add approx_count_distinct() function (#4221)
Add approx_count_distinct() function to replace the ndv() function
2020-08-01 17:54:19 +08:00
b4cb8fb9b2 [Feature][Cache]Add interface, metric, variable and config for query cache (#4159) 2020-07-30 11:24:20 +08:00
fdcc223ad2 [Bug][Json] Refactor the json load logic to fix some bug
1. Add `json_root` for nest json data.
2. Remove `_jmap` to make the logic reasonable.
2020-07-30 10:36:34 +08:00
237271c764 [Bug] Fix fe meta version problem, make drop meta check code easy to read and add doc content for drop meta check (#4205)
This PR is mainly do three things:
1. Fix fe meta version bug introduced by #4029 , when fix conflict with #4086 
2. Make drop check code easy to read
3. Add doc content for drop meta check
2020-07-30 09:54:20 +08:00
8a169981cf [Bug][TabletRepair] Fix bug that too many replicas generated when decommission BE (#4148)
Try to select the BE with an existing replicas as the destination BE for
REPLICA_RELOCATING clone task.
Fix #4147 

Also add 2 new FE configs `max_clone_task_timeout_sec` and `min_clone_task_timeout_sec`
2020-07-30 09:46:33 +08:00
1b3af783e6 [Plugin] Add properties grammar in InstallPluginStmt (#4173)
This PR is to support grammar like the following: INSTALL PLUGIN FROM [source] [PROPERTIES("KEY"="VALUE", ...)]
user can set md5sum="xxxxxxx", so we don't need to provide a md5 uri.
2020-07-29 15:02:31 +08:00
59676a1117 [BUG] fix 4149, add sessionVariable to choose broadcastjoin first when cardinality cannot be estimated (#4150) 2020-07-29 12:28:52 +08:00
9e5ca697f3 [Doc] Fix typo for stream load content in basic-usage.md (#4185) 2020-07-27 16:50:15 +08:00
4608f9786e Support checking database used data quota when data load job begin a new txn (#3955)
Now, we only check database used data quota when create or alter table, or in some old type load job, but not for routine load job and stream load job. This PR provide a uniform solution to check db used data quota when data load job begin a new txn.
2020-07-24 10:03:43 +08:00
a01d1aec56 [Compaction] track RowsetReader's mem & add metric (#4068)
Ref https://github.com/apache/incubator-doris/issues/3624#issuecomment-655933244
Only RowsetReaders in compaction are under the track.
Other RowsetReaders won't be effected, because the parent_tracker is nullptr.
2020-07-24 07:58:09 +08:00
2334f5d997 Fix some problem related with publish version task (#4089)
This PR is mainly do following three things:
1. Add thread name in fe log to make trace problem more easy.
2. Add agent_task_resend_wait_time_ms config to escape sending duplicate agent task to be.
3. Skip to continue to update replica version when new version is lower than replica version in fe.
2020-07-23 20:06:02 +08:00
fbf7bd6a1d [Bug] Change get load state interface (#4081)
Now, the PathTrie will match wrong interface between
/api/{db}/{table} and /api/{db}/{label}
2020-07-20 15:51:27 +08:00
03cf9b2a24 [Compaction] Add delayed deletion of rowsets function, fix -230 error. (#4039)
Related issue #4017, main changes as follows:
1. Add expired_snapshot_rs_version_map,_expired_snapshot_rs_metas,
2. Add  VersionedRowsetTracker record compacted path version
3. Record path version when rowsets compact
4. In gc process, add expired snapshot rowsets to unused set to remove.
2020-07-19 22:03:59 +08:00
a0c19df18c [Website] Redesign the home page of document website (master) (#4069) 2020-07-16 11:36:24 +08:00