Commit Graph

71 Commits

Author SHA1 Message Date
03d9f6d8b4 [Feature] support hour time unit with dynamic parition (#4514)
Many tables are so large that need seperate partitions with "HOUR" time unit.
But now dynamic partition doesn't support "HOUR" time unit and it was marked as "TODO".
So I support the feature and it works.
2020-09-06 20:25:27 +08:00
a64c3a7acd [ODBC SCAN NODE] 3/4 Add ODBC_TABLE and ODBC_SCAN NODE in FE. (#4430)
we can create odbc_table use SQL like

```
CREATE EXTERNAL TABLE `baseall_oracle` (
  `k1` decimal(9, 3) NOT NULL COMMENT "",
  `k2` char(10) NOT NULL COMMENT "",
  `k3` datetime NOT NULL COMMENT "",
  `k5` varchar(20) NOT NULL COMMENT "",
  `k6` double NOT NULL COMMENT ""
) ENGINE=ODBC
PROPERTIES (
"host" = "192.168.0.1",
"port" = "8086",
"user" = "happenlee",
"password" = "doris",
"database" = "doris",
"table" = "baseall",
"driver" = "Oracle 19 ODBC driver",
"type" = "oracle"
);
```

Now we only support Oracle and MySQL Database and this feature default turned off by conf enable_odbc_table.
2020-09-04 09:30:01 +08:00
ac3bbdd3ab [BatchDelete] Add a configuration indicating whether to enable the batch delete function (#4493) 2020-09-03 16:56:37 +08:00
f207036cad [Spark load][Document] Add docs about spark and yarn client for spark load (#4489)
Add docs about spark and yarn client for spark load
2020-09-02 10:52:49 +08:00
1d93ba027a [Compaction] Compaction show policy type and disk format (#4466)
Add more information in compaction show api
1、add cumulative policy type
2、format rowset total disk size
2020-08-30 21:09:47 +08:00
174c9f89ea [DOCS] Add batch delete docs (#4435)
update documents for batch delete #4051
2020-08-28 09:24:07 +08:00
ad738fa198 Add OLAP_ERR_DATE_QUALITY_ERR error status to display schema change failure (#4388)
In the process of historical data transformation of materialized views, it may occur that the transformation fails due to data quality.
Add an error status code :OLAP_ERR_DATE_QUALITY_ERR to determine if a data problem is causing the failure

#3344
2020-08-27 17:52:53 +08:00
8b0b120aca [Profile] Add 2 Segment related metrics in query profile (#4348)
Total number of segments and filterd number of segment
2020-08-27 12:07:21 +08:00
b4d8b3d9ba Forbidden the illegal column types on BITMAP_UNION OR HLL_UNION mv (#4432)
1. The base column of bitmap_union could must be integer. The largeint is not supported too.
2. The base column of hll_union could not be decimal.

Check error msg of const expr in Union Node

If user wants to insert a negative number into bitmap mv, Doris will thrown exception 'invalid input'.
The const value in Union Node is checked in this commit.
2020-08-26 10:49:32 +08:00
1410d4e623 [Doc] Add in predicate support content in delete-manual.md (#4404)
Add in predicate support content in delete-manual.md
2020-08-24 21:52:28 +08:00
976820ba20 [SegmentV2] Change the default storage format to SegmentV2 (#4387)
Since the Segment V2 has been released for a long time, we should make it as default storage format for newly created table.

This CL mainly changes:
1. For all newly created tables, their default storage format is Segment V2.
2. For all already exist tablets, their storage format remain unchanged.
3. Fix  bugs described in Fix #4384 and Fix #4385
2020-08-24 21:51:17 +08:00
04a75b7c28 [Doc] Fix spelling errors in dynamic partition docs (#4395)
Change-Id: I84de1602b99c6b89b59ccc5869c96516c40a181d
2020-08-20 09:31:33 +08:00
4c571cb6f5 Revert "[Metrics] Support tablet level metrics (#4327)" (#4397)
This reverts commit 56260a65c87830ffe34109195ee4d6f1d543e630.

Co-authored-by: morningman <chenmingyu@baidu.com>
2020-08-19 22:37:52 +08:00
dc3ed1c525 [Compaction]Compaction rules optimization (#4212)
Compaction rules optimization, the detail problem description and design to see #4164.
This pr commits 2 functions:
(1) add the cumulative policy configable, and implement original policy.
(2) implement universal policy, the optimization version in #4164.
2020-08-19 09:34:13 +08:00
56260a65c8 [Metrics] Support tablet level metrics (#4327)
Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet,
but we have no insight about tablets in the cluster.
This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `.
However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request,
and not return tablet level metrics by default.
2020-08-18 16:56:12 +08:00
3359467b9a [Tablet][Recovery] Support using empty tablet to repair the damaged or missing tablet (#4255)
In some very special circumstances, such as code bugs, or human misoperation, etc.,
all replicas of some tablets may be lost. In this case, the data has been substantially lost.
However, in some scenarios, the business still hopes to ensure that the query will not
report errors even if there is data loss, and reduce the perception of the user layer.
At this point, we can use the blank Tablet to fill the missing replica to ensure that the query can be executed normally.

Add a new FE config `recover_with_empty_tablet`. default is false. true means to use empty tablet to fill the missing one.

Also fix a bug in Fix #4274
2020-08-18 06:13:53 +00:00
d6028863f3 [Compaction] Manually trigger compaction RESTapi interface (#4312)
Add restapi to be which do compaction task by manual trigger. The detail design in #4311 .
2020-08-13 23:41:46 +08:00
05fa55047e [Doc][Json Load] Improve json data format load documents (#4337)
And some detail explaination of JsonPath and Columns parameter
2020-08-13 23:39:57 +08:00
1d9b3aeee7 [Doc] Repair document format (#4336)
The error format '##keyword' in a lot of docs. This pr is to repair document format. #4335
2020-08-13 23:39:41 +08:00
d655b271b8 [Feature][Web] Add new feature to list all tablets on a particular BE (#4268)
A new feature has been added to acquire tablet id and schema hash of all the tablets on a particular BE node
via Web page,so that more detailed information of each tablet can be obtained according to these
tablet id and schema hash. In accordance with different web request, there are two ways 
(table and json)to show these acquired tablet id and schema hash on Web page.
2020-08-12 20:55:19 +08:00
eefad13107 [Feature] Support InPredicate in delete statement (#4006)
This PR is to add inPredicate support to delete statement,
and add max_allowed_in_element_num_of_delete variable to
limit element num of InPredicate in delete statement.
2020-08-06 23:19:40 +08:00
5ba4b024e7 [Docs] Add Materialized view manual (#4229)
Add usage manual of materialized view in Chinese and English
2020-08-06 23:18:06 +08:00
237c0807a4 [RoutineLoad] Support modify routine load job (#4158)
Support ALTER ROUTINE LOAD JOB stmt, for example:

```
alter routine load db1.label1
properties
(
"desired_concurrent_number"="3",
"max_batch_interval" = "5",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"strict_mode" = "false",
"timezone" = "+08:00"
)
```

Details can be found in `alter-routine-load.md`
2020-08-06 23:11:02 +08:00
3f31866169 [Bug][Load][Json] #4124 Load json format with stream load failed (#4217)
Stream load should read all the data completely before parsing the json.
And also add a new BE config streaming_load_max_batch_read_mb
to limit the data size when loading json data.

Fix the bug of loading empty json array []

Add doc to explain some certain case of loading json format data.

Fix: #4124
2020-08-04 12:55:53 +08:00
bdaef84a10 [FE] [HttpServer] Config netty param in HttpServer (#4225)
Now, if the length of URL is longer than 4096 bytes, netty will refuse.
The case can be reproduced by constructing a very long URL(longer than 4096bytes)

Add 2 http server params:
1. http_max_line_length
2. http_max_header_size
2020-08-01 17:59:01 +08:00
b4cb8fb9b2 [Feature][Cache]Add interface, metric, variable and config for query cache (#4159) 2020-07-30 11:24:20 +08:00
8a169981cf [Bug][TabletRepair] Fix bug that too many replicas generated when decommission BE (#4148)
Try to select the BE with an existing replicas as the destination BE for
REPLICA_RELOCATING clone task.
Fix #4147 

Also add 2 new FE configs `max_clone_task_timeout_sec` and `min_clone_task_timeout_sec`
2020-07-30 09:46:33 +08:00
59676a1117 [BUG] fix 4149, add sessionVariable to choose broadcastjoin first when cardinality cannot be estimated (#4150) 2020-07-29 12:28:52 +08:00
4608f9786e Support checking database used data quota when data load job begin a new txn (#3955)
Now, we only check database used data quota when create or alter table, or in some old type load job, but not for routine load job and stream load job. This PR provide a uniform solution to check db used data quota when data load job begin a new txn.
2020-07-24 10:03:43 +08:00
a01d1aec56 [Compaction] track RowsetReader's mem & add metric (#4068)
Ref https://github.com/apache/incubator-doris/issues/3624#issuecomment-655933244
Only RowsetReaders in compaction are under the track.
Other RowsetReaders won't be effected, because the parent_tracker is nullptr.
2020-07-24 07:58:09 +08:00
2334f5d997 Fix some problem related with publish version task (#4089)
This PR is mainly do following three things:
1. Add thread name in fe log to make trace problem more easy.
2. Add agent_task_resend_wait_time_ms config to escape sending duplicate agent task to be.
3. Skip to continue to update replica version when new version is lower than replica version in fe.
2020-07-23 20:06:02 +08:00
fbf7bd6a1d [Bug] Change get load state interface (#4081)
Now, the PathTrie will match wrong interface between
/api/{db}/{table} and /api/{db}/{label}
2020-07-20 15:51:27 +08:00
03cf9b2a24 [Compaction] Add delayed deletion of rowsets function, fix -230 error. (#4039)
Related issue #4017, main changes as follows:
1. Add expired_snapshot_rs_version_map,_expired_snapshot_rs_metas,
2. Add  VersionedRowsetTracker record compacted path version
3. Record path version when rowsets compact
4. In gc process, add expired snapshot rowsets to unused set to remove.
2020-07-19 22:03:59 +08:00
78a1dea19d Support using B/K/KB/M/MB/G/GB/T/TB/P/PB as unit in session variable exec_mem_limit (#4063)
Support using B/K/KB/M/MB/G/GB/T/TB/P/PB as unit in  session variable exec_mem_limit
2020-07-13 20:54:14 +08:00
5a27981e49 [Config] Add thrift_client_retry_interval_ms config in be for thrift client to avoid avalanche disaster in fe thrift server (#4022)
This PR is mainly to add  `thrift_client_retry_interval_ms` config in be for thrift client
to avoid avalanche disaster in fe thrift server and fix some typo and some rpc
setting problems at the same time.
2020-07-08 21:07:00 +08:00
c3d9feed75 [Load][Json] Refactor json load logic to make it more reasonable (#4020)
This CL mainly changes:

1. Reorganized the code logic to limit the supported json format to two, and the import behavior is more consistent.
2. Modified the statistical behavior of the number of error rows when loading in json format, so that the error rows can be counted correctly.
3. See `load-json-format.md` to get details of loading json format.
2020-07-07 23:07:28 +08:00
913b2caac4 [Dynamic Partition]Support set replication number (#3965)
This CL mainly support set replication_num property in dynamic partition
table if dynamic_partition.replication_num is not set, the value is the
same as table's default replication_num.
2020-07-05 16:28:38 +08:00
7351f7c237 [Config]Allower use to config different thrift server model (#3986)
Doris only support TThreadPoolServer model in thrift server, but the
server model is not effective in some high concurrency scenario, so this
PR introduced new config to allow user to choose different server model
by their scenario.
Add new FE config: `thrift_server_type`
2020-07-05 16:24:29 +08:00
48398232e7 [Bug] Fix bug that default_rowset_type have a session variable (#3953)
This  PR is mainly for fixing bug that  `default_rowset_type` have a session variable
2020-06-29 19:16:42 +08:00
b2b9e22b24 [CreateTable] Check backend disk has available capacity by storage medium before create table (#3519)
Currently we choose BE random without check disk is available, 
the create table will failed until create tablet task is sent to BE
and BE will check is there has available capacity to create tablet.
So check backend disk available by storage medium will reduce unnecessary RPC call.
2020-06-28 09:36:31 +08:00
6c768f5303 [Doc] Add doc of enable_materialized_view (#3940) 2020-06-25 16:26:18 +08:00
46c64f0861 [Bug] Enable to get TCP metrics for linux kernel 2.x (#3921)
Fix #3920 

CL:
1. Parse the TCP metrics header in `/proc/net/snmp` to get the right position of the metrics.
2. Add 2 new metrics: `tcp_in_segs` and `tcp_out_segs`
2020-06-24 21:29:07 +08:00
66a8383ac0 [Running_Profile] Fix all counter in DataStreamRecv and change the image path in docs (#3858) 2020-06-22 09:20:22 +08:00
5d40218ae6 [Config] Support max_stream_load_timeout_second config in fe (#3902)
This configuration is specifically used to limit timeout setting for stream load.
It is to prevent that failed stream load transactions cannot be canceled within
a short time because of the user's large timeout setting.
2020-06-19 17:09:27 +08:00
5a253bc2c6 [BE][Tool] Add segment v2 footer meta viewer (#3822)
Add segment v2 footer meta viewer tool
2020-06-19 09:32:11 +08:00
e9f7576b9d [Enhancement] make metrics api more clear (#3891) 2020-06-17 12:17:54 +08:00
6c4d7c60dd [Feature] Add QueryDetail to store query statistics. (#3744)
1. Store the query statistics in memory.
2. Supporting RESTFUL interface to get the statistics.
2020-06-15 18:16:54 +08:00
2211cb0ee0 [Metrics] Add metrics document and 2 new metrics of TCP (#3835) 2020-06-15 09:48:09 +08:00
3c09e1e1d8 [trace] Adapt trace util to compaction module (#3814)
Trace util is helpful for diagnosing compaction performance problems,
we can get trace log for base compaction like:
```
W0610 11:26:33.804431 56452 storage_engine.cpp:552] Trace:
0610 11:23:03.727535 (+     0us) storage_engine.cpp:554] start to perform base compaction
0610 11:23:03.728961 (+  1426us) storage_engine.cpp:560] found best tablet 546859
0610 11:23:03.728963 (+     2us) base_compaction.cpp:40] got base compaction lock
0610 11:23:03.729029 (+    66us) base_compaction.cpp:44] rowsets picked
0610 11:24:51.784439 (+108055410us) compaction.cpp:46] got concurrency lock and start to do compaction
0610 11:24:51.784818 (+   379us) compaction.cpp:74] prepare finished
0610 11:26:33.359265 (+101574447us) compaction.cpp:87] merge rowsets finished
0610 11:26:33.484481 (+125216us) compaction.cpp:102] output rowset built
0610 11:26:33.484482 (+     1us) compaction.cpp:106] check correctness finished
0610 11:26:33.513197 (+ 28715us) compaction.cpp:110] modify rowsets finished
0610 11:26:33.513300 (+   103us) base_compaction.cpp:49] compaction finished
0610 11:26:33.513441 (+   141us) base_compaction.cpp:56] unused rowsets have been moved to GC queue
Metrics: {"filtered_rows":0,"input_row_num":3346807,"input_rowsets_count":42,"input_rowsets_data_size":1256413170,"input_segments_num":44,"merge_rowsets_latency_us":101574444,"merged_rows":0,"output_row_num":3346807,"output_rowset_data_size":1228439659,"output_segments_num":6}
```
for cumulative compaction like:
```
W0610 11:14:18.714366 56468 storage_engine.cpp:518] Trace:
0610 11:14:08.068484 (+     0us) storage_engine.cpp:520] start to perform cumulative compaction
0610 11:14:08.069844 (+  1360us) storage_engine.cpp:526] found best tablet 547083
0610 11:14:08.069846 (+     2us) cumulative_compaction.cpp:42] got cumulative compaction lock
0610 11:14:08.069947 (+   101us) cumulative_compaction.cpp:46] calculated cumulative point
0610 11:14:08.070141 (+   194us) cumulative_compaction.cpp:50] rowsets picked
0610 11:14:08.070143 (+     2us) compaction.cpp:46] got concurrency lock and start to do compaction
0610 11:14:08.070518 (+   375us) compaction.cpp:74] prepare finished
0610 11:14:15.389893 (+7319375us) compaction.cpp:87] merge rowsets finished
0610 11:14:15.390916 (+  1023us) compaction.cpp:102] output rowset built
0610 11:14:15.390917 (+     1us) compaction.cpp:106] check correctness finished
0610 11:14:15.409460 (+ 18543us) compaction.cpp:110] modify rowsets finished
0610 11:14:15.409496 (+    36us) cumulative_compaction.cpp:55] compaction finished
0610 11:14:15.410138 (+   642us) cumulative_compaction.cpp:65] unused rowsets have been moved to GC queue
Metrics: {"filtered_rows":0,"input_row_num":136707,"input_rowsets_count":302,"input_rowsets_data_size":76617836,"input_segments_num":302,"merge_rowsets_latency_us":7319372,"merged_rows":0,"output_row_num":136707,"output_rowset_data_size":53893280,"output_segments_num":1}
```
2020-06-13 19:31:51 +08:00
414a0a35e5 [Dynamic Partition] Use ZonedDateTime to support set timezone (#3799)
This CL mainly support timezone in dynamic partition:
1. use new Java Time API to replace Calendar.
2. support set time zone in dynamic partition parameters.
2020-06-13 16:27:09 +08:00