doris

Author	SHA1	Message	Date
caoyang10	03d9f6d8b4	[Feature] support hour time unit with dynamic parition (#4514 ) Many tables are so large that need seperate partitions with "HOUR" time unit. But now dynamic partition doesn't support "HOUR" time unit and it was marked as "TODO". So I support the feature and it works.	2020-09-06 20:25:27 +08:00
HappenLee	a64c3a7acd	[ODBC SCAN NODE] 3/4 Add ODBC_TABLE and ODBC_SCAN NODE in FE. (#4430 ) we can create odbc_table use SQL like ``` CREATE EXTERNAL TABLE `baseall_oracle` ( `k1` decimal(9, 3) NOT NULL COMMENT "", `k2` char(10) NOT NULL COMMENT "", `k3` datetime NOT NULL COMMENT "", `k5` varchar(20) NOT NULL COMMENT "", `k6` double NOT NULL COMMENT "" ) ENGINE=ODBC PROPERTIES ( "host" = "192.168.0.1", "port" = "8086", "user" = "happenlee", "password" = "doris", "database" = "doris", "table" = "baseall", "driver" = "Oracle 19 ODBC driver", "type" = "oracle" ); ``` Now we only support Oracle and MySQL Database and this feature default turned off by conf enable_odbc_table.	2020-09-04 09:30:01 +08:00
Zhengguo Yang	ac3bbdd3ab	[BatchDelete] Add a configuration indicating whether to enable the batch delete function (#4493 )	2020-09-03 16:56:37 +08:00
xy720	f207036cad	[Spark load][Document] Add docs about spark and yarn client for spark load (#4489 ) Add docs about spark and yarn client for spark load	2020-09-02 10:52:49 +08:00
ZhangYu0123	1d93ba027a	[Compaction] Compaction show policy type and disk format (#4466 ) Add more information in compaction show api 1、add cumulative policy type 2、format rowset total disk size	2020-08-30 21:09:47 +08:00
Zhengguo Yang	174c9f89ea	[DOCS] Add batch delete docs (#4435 ) update documents for batch delete #4051	2020-08-28 09:24:07 +08:00
HangyuanLiu	ad738fa198	Add OLAP_ERR_DATE_QUALITY_ERR error status to display schema change failure (#4388 ) In the process of historical data transformation of materialized views, it may occur that the transformation fails due to data quality. Add an error status code ：OLAP_ERR_DATE_QUALITY_ERR to determine if a data problem is causing the failure #3344	2020-08-27 17:52:53 +08:00
Mingyu Chen	8b0b120aca	[Profile] Add 2 Segment related metrics in query profile (#4348 ) Total number of segments and filterd number of segment	2020-08-27 12:07:21 +08:00
EmmyMiao87	b4d8b3d9ba	Forbidden the illegal column types on BITMAP_UNION OR HLL_UNION mv (#4432 ) 1. The base column of bitmap_union could must be integer. The largeint is not supported too. 2. The base column of hll_union could not be decimal. Check error msg of const expr in Union Node If user wants to insert a negative number into bitmap mv, Doris will thrown exception 'invalid input'. The const value in Union Node is checked in this commit.	2020-08-26 10:49:32 +08:00
caiconghui	1410d4e623	[Doc] Add in predicate support content in delete-manual.md (#4404 ) Add in predicate support content in delete-manual.md	2020-08-24 21:52:28 +08:00
Mingyu Chen	976820ba20	[SegmentV2] Change the default storage format to SegmentV2 (#4387 ) Since the Segment V2 has been released for a long time, we should make it as default storage format for newly created table. This CL mainly changes: 1. For all newly created tables, their default storage format is Segment V2. 2. For all already exist tablets, their storage format remain unchanged. 3. Fix bugs described in Fix #4384 and Fix #4385	2020-08-24 21:51:17 +08:00
EmmyMiao87	04a75b7c28	[Doc] Fix spelling errors in dynamic partition docs (#4395 ) Change-Id: I84de1602b99c6b89b59ccc5869c96516c40a181d	2020-08-20 09:31:33 +08:00
Mingyu Chen	4c571cb6f5	Revert "[Metrics] Support tablet level metrics (#4327 )" (#4397 ) This reverts commit 56260a65c87830ffe34109195ee4d6f1d543e630. Co-authored-by: morningman <chenmingyu@baidu.com>	2020-08-19 22:37:52 +08:00
ZhangYu0123	dc3ed1c525	[Compaction]Compaction rules optimization (#4212 ) Compaction rules optimization, the detail problem description and design to see #4164. This pr commits 2 functions: (1) add the cumulative policy configable, and implement original policy. (2) implement universal policy, the optimization version in #4164.	2020-08-19 09:34:13 +08:00
Yingchun Lai	56260a65c8	[Metrics] Support tablet level metrics (#4327 ) Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet, but we have no insight about tablets in the cluster. This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `. However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request, and not return tablet level metrics by default.	2020-08-18 16:56:12 +08:00
Mingyu Chen	3359467b9a	[Tablet][Recovery] Support using empty tablet to repair the damaged or missing tablet (#4255 ) In some very special circumstances, such as code bugs, or human misoperation, etc., all replicas of some tablets may be lost. In this case, the data has been substantially lost. However, in some scenarios, the business still hopes to ensure that the query will not report errors even if there is data loss, and reduce the perception of the user layer. At this point, we can use the blank Tablet to fill the missing replica to ensure that the query can be executed normally. Add a new FE config `recover_with_empty_tablet`. default is false. true means to use empty tablet to fill the missing one. Also fix a bug in Fix #4274	2020-08-18 06:13:53 +00:00
ZhangYu0123	d6028863f3	[Compaction] Manually trigger compaction RESTapi interface (#4312 ) Add restapi to be which do compaction task by manual trigger. The detail design in #4311 .	2020-08-13 23:41:46 +08:00
Mingyu Chen	05fa55047e	[Doc][Json Load] Improve json data format load documents (#4337 ) And some detail explaination of JsonPath and Columns parameter	2020-08-13 23:39:57 +08:00
ZhangYu0123	1d9b3aeee7	[Doc] Repair document format (#4336 ) The error format '##keyword' in a lot of docs. This pr is to repair document format. #4335	2020-08-13 23:39:41 +08:00
weizuo93	d655b271b8	[Feature][Web] Add new feature to list all tablets on a particular BE (#4268 ) A new feature has been added to acquire tablet id and schema hash of all the tablets on a particular BE node via Web page，so that more detailed information of each tablet can be obtained according to these tablet id and schema hash. In accordance with different web request, there are two ways (table and json)to show these acquired tablet id and schema hash on Web page.	2020-08-12 20:55:19 +08:00
caiconghui	eefad13107	[Feature] Support InPredicate in delete statement (#4006 ) This PR is to add inPredicate support to delete statement, and add max_allowed_in_element_num_of_delete variable to limit element num of InPredicate in delete statement.	2020-08-06 23:19:40 +08:00
EmmyMiao87	5ba4b024e7	[Docs] Add Materialized view manual (#4229 ) Add usage manual of materialized view in Chinese and English	2020-08-06 23:18:06 +08:00
Mingyu Chen	237c0807a4	[RoutineLoad] Support modify routine load job (#4158 ) Support ALTER ROUTINE LOAD JOB stmt, for example: ``` alter routine load db1.label1 properties ( "desired_concurrent_number"="3", "max_batch_interval" = "5", "max_batch_rows" = "300000", "max_batch_size" = "209715200", "strict_mode" = "false", "timezone" = "+08:00" ) ``` Details can be found in `alter-routine-load.md`	2020-08-06 23:11:02 +08:00
Mingyu Chen	3f31866169	[Bug][Load][Json] #4124 Load json format with stream load failed (#4217 ) Stream load should read all the data completely before parsing the json. And also add a new BE config streaming_load_max_batch_read_mb to limit the data size when loading json data. Fix the bug of loading empty json array [] Add doc to explain some certain case of loading json format data. Fix: #4124	2020-08-04 12:55:53 +08:00
Lijia Liu	bdaef84a10	[FE] [HttpServer] Config netty param in HttpServer (#4225 ) Now, if the length of URL is longer than 4096 bytes, netty will refuse. The case can be reproduced by constructing a very long URL(longer than 4096bytes) Add 2 http server params: 1. http_max_line_length 2. http_max_header_size	2020-08-01 17:59:01 +08:00
HaiBo Li	b4cb8fb9b2	[Feature][Cache]Add interface, metric, variable and config for query cache (#4159 )	2020-07-30 11:24:20 +08:00
Mingyu Chen	8a169981cf	[Bug][TabletRepair] Fix bug that too many replicas generated when decommission BE (#4148 ) Try to select the BE with an existing replicas as the destination BE for REPLICA_RELOCATING clone task. Fix #4147 Also add 2 new FE configs `max_clone_task_timeout_sec` and `min_clone_task_timeout_sec`	2020-07-30 09:46:33 +08:00
wutiangan	59676a1117	[BUG] fix 4149, add sessionVariable to choose broadcastjoin first when cardinality cannot be estimated (#4150 )	2020-07-29 12:28:52 +08:00
caiconghui	4608f9786e	Support checking database used data quota when data load job begin a new txn (#3955 ) Now, we only check database used data quota when create or alter table, or in some old type load job, but not for routine load job and stream load job. This PR provide a uniform solution to check db used data quota when data load job begin a new txn.	2020-07-24 10:03:43 +08:00
HuangWei	a01d1aec56	[Compaction] track RowsetReader's mem & add metric (#4068 ) Ref https://github.com/apache/incubator-doris/issues/3624#issuecomment-655933244 Only RowsetReaders in compaction are under the track. Other RowsetReaders won't be effected, because the parent_tracker is nullptr.	2020-07-24 07:58:09 +08:00
caiconghui	2334f5d997	Fix some problem related with publish version task (#4089 ) This PR is mainly do following three things: 1. Add thread name in fe log to make trace problem more easy. 2. Add agent_task_resend_wait_time_ms config to escape sending duplicate agent task to be. 3. Skip to continue to update replica version when new version is lower than replica version in fe.	2020-07-23 20:06:02 +08:00
lichaoyong	fbf7bd6a1d	[Bug] Change get load state interface (#4081 ) Now, the PathTrie will match wrong interface between /api/{db}/{table} and /api/{db}/{label}	2020-07-20 15:51:27 +08:00
ZhangYu0123	03cf9b2a24	[Compaction] Add delayed deletion of rowsets function, fix -230 error. (#4039 ) Related issue #4017, main changes as follows: 1. Add expired_snapshot_rs_version_map，_expired_snapshot_rs_metas， 2. Add VersionedRowsetTracker record compacted path version 3. Record path version when rowsets compact 4. In gc process, add expired snapshot rowsets to unused set to remove.	2020-07-19 22:03:59 +08:00
Zhengguo Yang	78a1dea19d	Support using B/K/KB/M/MB/G/GB/T/TB/P/PB as unit in session variable exec_mem_limit (#4063 ) Support using B/K/KB/M/MB/G/GB/T/TB/P/PB as unit in session variable exec_mem_limit	2020-07-13 20:54:14 +08:00
caiconghui	5a27981e49	[Config] Add thrift_client_retry_interval_ms config in be for thrift client to avoid avalanche disaster in fe thrift server (#4022 ) This PR is mainly to add `thrift_client_retry_interval_ms` config in be for thrift client to avoid avalanche disaster in fe thrift server and fix some typo and some rpc setting problems at the same time.	2020-07-08 21:07:00 +08:00
Mingyu Chen	c3d9feed75	[Load][Json] Refactor json load logic to make it more reasonable (#4020 ) This CL mainly changes: 1. Reorganized the code logic to limit the supported json format to two, and the import behavior is more consistent. 2. Modified the statistical behavior of the number of error rows when loading in json format, so that the error rows can be counted correctly. 3. See `load-json-format.md` to get details of loading json format.	2020-07-07 23:07:28 +08:00
WingC	913b2caac4	[Dynamic Partition]Support set replication number (#3965 ) This CL mainly support set replication_num property in dynamic partition table if dynamic_partition.replication_num is not set, the value is the same as table's default replication_num.	2020-07-05 16:28:38 +08:00
WingC	7351f7c237	[Config]Allower use to config different thrift server model (#3986 ) Doris only support TThreadPoolServer model in thrift server, but the server model is not effective in some high concurrency scenario, so this PR introduced new config to allow user to choose different server model by their scenario. Add new FE config: `thrift_server_type`	2020-07-05 16:24:29 +08:00
caiconghui	48398232e7	[Bug] Fix bug that default_rowset_type have a session variable (#3953 ) This PR is mainly for fixing bug that `default_rowset_type` have a session variable	2020-06-29 19:16:42 +08:00
WingC	b2b9e22b24	[CreateTable] Check backend disk has available capacity by storage medium before create table (#3519 ) Currently we choose BE random without check disk is available, the create table will failed until create tablet task is sent to BE and BE will check is there has available capacity to create tablet. So check backend disk available by storage medium will reduce unnecessary RPC call.	2020-06-28 09:36:31 +08:00
EmmyMiao87	6c768f5303	[Doc] Add doc of `enable_materialized_view` (#3940 )	2020-06-25 16:26:18 +08:00
Mingyu Chen	46c64f0861	[Bug] Enable to get TCP metrics for linux kernel 2.x (#3921 ) Fix #3920 CL: 1. Parse the TCP metrics header in `/proc/net/snmp` to get the right position of the metrics. 2. Add 2 new metrics: `tcp_in_segs` and `tcp_out_segs`	2020-06-24 21:29:07 +08:00
HappenLee	66a8383ac0	[Running_Profile] Fix all counter in DataStreamRecv and change the image path in docs (#3858 )	2020-06-22 09:20:22 +08:00
caiconghui	5d40218ae6	[Config] Support max_stream_load_timeout_second config in fe (#3902 ) This configuration is specifically used to limit timeout setting for stream load. It is to prevent that failed stream load transactions cannot be canceled within a short time because of the user's large timeout setting.	2020-06-19 17:09:27 +08:00
yangzhg	5a253bc2c6	[BE][Tool] Add segment v2 footer meta viewer (#3822 ) Add segment v2 footer meta viewer tool	2020-06-19 09:32:11 +08:00
lichaoyong	e9f7576b9d	[Enhancement] make metrics api more clear (#3891 )	2020-06-17 12:17:54 +08:00
lichaoyong	6c4d7c60dd	[Feature] Add QueryDetail to store query statistics. (#3744 ) 1. Store the query statistics in memory. 2. Supporting RESTFUL interface to get the statistics.	2020-06-15 18:16:54 +08:00
Mingyu Chen	2211cb0ee0	[Metrics] Add metrics document and 2 new metrics of TCP (#3835 )	2020-06-15 09:48:09 +08:00
Yingchun Lai	3c09e1e1d8	[trace] Adapt trace util to compaction module (#3814 ) Trace util is helpful for diagnosing compaction performance problems, we can get trace log for base compaction like: ``` W0610 11:26:33.804431 56452 storage_engine.cpp:552] Trace: 0610 11:23:03.727535 (+ 0us) storage_engine.cpp:554] start to perform base compaction 0610 11:23:03.728961 (+ 1426us) storage_engine.cpp:560] found best tablet 546859 0610 11:23:03.728963 (+ 2us) base_compaction.cpp:40] got base compaction lock 0610 11:23:03.729029 (+ 66us) base_compaction.cpp:44] rowsets picked 0610 11:24:51.784439 (+108055410us) compaction.cpp:46] got concurrency lock and start to do compaction 0610 11:24:51.784818 (+ 379us) compaction.cpp:74] prepare finished 0610 11:26:33.359265 (+101574447us) compaction.cpp:87] merge rowsets finished 0610 11:26:33.484481 (+125216us) compaction.cpp:102] output rowset built 0610 11:26:33.484482 (+ 1us) compaction.cpp:106] check correctness finished 0610 11:26:33.513197 (+ 28715us) compaction.cpp:110] modify rowsets finished 0610 11:26:33.513300 (+ 103us) base_compaction.cpp:49] compaction finished 0610 11:26:33.513441 (+ 141us) base_compaction.cpp:56] unused rowsets have been moved to GC queue Metrics: {"filtered_rows":0,"input_row_num":3346807,"input_rowsets_count":42,"input_rowsets_data_size":1256413170,"input_segments_num":44,"merge_rowsets_latency_us":101574444,"merged_rows":0,"output_row_num":3346807,"output_rowset_data_size":1228439659,"output_segments_num":6} ``` for cumulative compaction like: ``` W0610 11:14:18.714366 56468 storage_engine.cpp:518] Trace: 0610 11:14:08.068484 (+ 0us) storage_engine.cpp:520] start to perform cumulative compaction 0610 11:14:08.069844 (+ 1360us) storage_engine.cpp:526] found best tablet 547083 0610 11:14:08.069846 (+ 2us) cumulative_compaction.cpp:42] got cumulative compaction lock 0610 11:14:08.069947 (+ 101us) cumulative_compaction.cpp:46] calculated cumulative point 0610 11:14:08.070141 (+ 194us) cumulative_compaction.cpp:50] rowsets picked 0610 11:14:08.070143 (+ 2us) compaction.cpp:46] got concurrency lock and start to do compaction 0610 11:14:08.070518 (+ 375us) compaction.cpp:74] prepare finished 0610 11:14:15.389893 (+7319375us) compaction.cpp:87] merge rowsets finished 0610 11:14:15.390916 (+ 1023us) compaction.cpp:102] output rowset built 0610 11:14:15.390917 (+ 1us) compaction.cpp:106] check correctness finished 0610 11:14:15.409460 (+ 18543us) compaction.cpp:110] modify rowsets finished 0610 11:14:15.409496 (+ 36us) cumulative_compaction.cpp:55] compaction finished 0610 11:14:15.410138 (+ 642us) cumulative_compaction.cpp:65] unused rowsets have been moved to GC queue Metrics: {"filtered_rows":0,"input_row_num":136707,"input_rowsets_count":302,"input_rowsets_data_size":76617836,"input_segments_num":302,"merge_rowsets_latency_us":7319372,"merged_rows":0,"output_row_num":136707,"output_rowset_data_size":53893280,"output_segments_num":1} ```	2020-06-13 19:31:51 +08:00
WingC	414a0a35e5	[Dynamic Partition] Use ZonedDateTime to support set timezone (#3799 ) This CL mainly support timezone in dynamic partition: 1. use new Java Time API to replace Calendar. 2. support set time zone in dynamic partition parameters.	2020-06-13 16:27:09 +08:00

1 2

71 Commits