doris

Author	SHA1	Message	Date
wangbo	c9b2f4cb92	[workload](pipeline) Add cgroup cpu controller (#24052 )	2023-09-21 21:49:33 +08:00
Yongqiang YANG	8eb14eec7c	[enhancement](baddisk) record bad disk in be_custom.conf to handle (#24639 )	2023-09-21 18:31:58 +08:00
lihangyu	36c9366a8b	[improve](tablet schema) add config to modify tablet schema recycle interval (#24602 )	2023-09-21 15:57:11 +08:00
Yongqiang YANG	4eb09ce1b2	[enhancement](config) do not abort when a disk is broken (#24692 )	2023-09-21 15:21:42 +08:00
Jerry Hu	00b994fea2	[chore](exception) Add config item 'exit_on_exception' (#24529 )	2023-09-21 14:51:05 +08:00
Xinyi Zou	f2f591e280	[fix](memory) Optimize memory exceed limit logs (#22655 ) After memory exceeds the limit, print the top 15 task trackers with the largest memory. After memory exceeds the limit, more detailed GC logs in stages. fix large memory check.	2023-09-21 10:38:17 +08:00
lihangyu	1405b7ca82	[improve](scan) support lower the thread priority of scan thread (#24526 ) The configuration item is used to lower the priority of the scanner thread, typically employed to ensure CPU scheduling for write operations.	2023-09-20 17:00:24 +08:00
Xinyi Zou	fc12362a6d	[feature-wip](arrow-flight)(step2) FE support Arrow Flight server (#24314 ) This is a POC, the design documentation will be updated soon	2023-09-20 14:42:54 +08:00
Yongqiang YANG	a2e29d171a	[enhancement](be-meta) sync rocksdb by default to protect data (#24571 ) If performance of user's disks is low, users can change the config to false, this way users know what would happen if a kernel panic.	2023-09-20 11:41:26 +08:00
Jerry Hu	7fd72351f9	[fix](agg) windown_funnel compatibility issue with multi backends (#24385 )	2023-09-15 17:22:47 +08:00
zclllyybb	927de33166	[config](log) disable StreamLoad log default and enable in regression pipeline (#24354 ) disable StreamLoad log default and enable in regression pipeline	2023-09-14 20:47:26 +08:00
zclllyybb	86aa3802cf	[log](config) set streamload record default to enable	2023-09-13 16:32:30 +08:00
Kaijie Chen	563c3f75ff	[feature](move-memtable) share delta writer v2 among sinks (#24066 )	2023-09-13 14:39:29 +08:00
plat1ko	d8ef9dda59	[feature](merge-cloud) Rewrite FS interface (#23953 )	2023-09-12 19:20:25 +08:00
Ashin Gau	71645a391c	[debug](FileCache) fail over to remote file reader if local cache failed (#24097 ) Fail over to remote file reader even if local file cache failed. This operation can increase the robustness of file cache.	2023-09-10 12:26:17 +08:00
meiyi	82dc970916	[feature](insert) Support group commit insert (#22829 )	2023-09-08 15:51:03 +08:00
TengJianPing	2f8b075b71	[improvement](bitmap) support version for ser/deser of bitmap (#23959 )	2023-09-07 09:55:29 +08:00
plat1ko	09bcedb116	[feature](merge-cloud) Remove deprecated old cache (#23881 ) * Remove deprecated old cache	2023-09-06 08:07:05 +08:00
Xinyi Zou	801ddc0313	[feature-wip](arrow-flight) BE not start Arrow Flight Service by default (#23901 )	2023-09-05 14:48:29 +08:00
yiguolei	1d1a9e2bfc	[improvement](graceful shutdown) waiting for all query finished when graceful shutdown (#23865 ) In some cloud native deployment scenario, BE(especially the Compute Node BE) will be add to cluster and remove from cluster very frequently. User's query will fail if there is a fragment is running on the shutting down BE. Users could use stop_be.sh --grace, then BE will wait all running queries to stop to avoiding running query failure, but if the waiting time exceed the limit, then be will exit directly. During this period, FE will not send any queries to BE and waiting for all running queries to stop	2023-09-05 09:52:28 +08:00
Xinyi Zou	039c76cbc0	[feature-wip] (arrow-flight) (step1) BE support Arrow Flight server, read data only (#23765 )	2023-09-04 19:19:55 +08:00
bobhan1	a6dff2faf0	[Feature](config) allow update multiple be configs in one request (#23702 )	2023-09-02 14:26:54 +08:00
Kang	18d470ecf7	[improvement](config) add a specific be config for segment_cache_capacity (#23701 ) * add segment_cache_capacity config istead of fd limit * 2/5 * default -1 for backward compatibility	2023-09-02 01:14:14 +08:00
AlexYue	c31cb5fd11	[enhance] use correct default value for show config action (#19284 )	2023-09-01 11:28:26 +08:00
Xinyi Zou	f1e43fcaa4	[opt](cache) Support segment cache dynamic opening and closing (#23659 ) Dynamically modify the config to clear the cache, each time the disable cache will only be cleared once. TODO, Support page cache and other caches. curl -X POST http://xxxx:8040/api/update_config?disable_segment_cache=true	2023-08-31 18:48:26 +08:00
hzq	c083336bbe	[Improvement](pipeline) Cancel outdated query if original fe restarts (#23582 ) If any FE restarts, queries that is emitted from this FE will be cancelled. Implementation of #23704	2023-08-31 17:58:52 +08:00
Ashin Gau	449c595f9d	[opt](FileReader) InMemoryReader is only used in s3 (#23486 ) If file size < 8MB, the file will be read into memory, and this idea is from https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md#s3inmemoryinputstream. However, in some cases, we only read one or two columns in a file, and the actually required bytes is only 1%, resulting in a multiple fold increase in the amount of data read. Therefore, `InMemoryReader` can only used in object storage, and reduce the threshold.	2023-08-30 20:43:39 +08:00
zzzzzzzs	05771e8a14	[Enhancement](Load) stream Load using SQL (#23362 ) Using stream load in SQL mode for example: example.csv 10000,北京 10001,天津 curl -v --location-trusted -u root: -H "sql: insert into test.t1(c1, c2) select c1,c2 from stream(\"format\" = \"CSV\", \"column_separator\" = \",\")" -T example.csv http://127.0.0.1:8030/api/_stream_load_with_sql curl -v --location-trusted -u root: -H "sql: insert into test.t2(c1, c2, c3) select c1,c2, 'aaa' from stream(\"format\" = \"CSV\", \"column_separator\" = \",\")" -T example.csv http://127.0.0.1:8030/api/_stream_load_with_sql curl -v --location-trusted -u root: -H "sql: insert into test.t3(c1, c2) select c1, count(1) from stream(\"format\" = \"CSV\", \"column_separator\" = \",\") group by c1" -T example.csv http://127.0.0.1:8030/api/_stream_load_with_sql	2023-08-30 19:02:48 +08:00
huanghaibin	82a4f114e4	[improvement](compaction) add an option on delete stale rowset by judging _stale_rs_metas size when doing compaction (#23448 )	2023-08-29 17:40:37 +08:00
Yongqiang YANG	bc020112fc	[enhancement](routineload) add debug conf and set broker.name.ttl = 0 (#23302 ) * set broker.name.ttl = 0 * add debug config for librdkafka	2023-08-26 10:56:35 +08:00
Kaijie Chen	0a70cbfe99	[feature](move-memtable)[5/7] add olap table sink v2 and writers (#23458 ) Co-authored-by: laihui <1353307710@qq.com>	2023-08-25 10:20:06 +08:00
Kaijie Chen	71071ba057	[feature](move-memtable)[4/7] add stream sink file writer (#23416 ) Co-authored-by: laihui <1353307710@qq.com>	2023-08-25 00:08:27 +08:00
zhangstar333	37b49f60b7	[refactor](conf) add be conf for partition topn partitions threshold (#23220 ) add be conf for partition topn partitions threshold	2023-08-21 10:52:41 +08:00
bobhan1	cd6453434b	[Enhancement](merge-on-write) add correctness check for the calculation of delete bitmap (#22282 ) Currently, for merge-on-write unique table, the delete bitmap of a rowset will be calculated during flush phase, commit phase and publish phase. In this PR, we add a special mark in every rowset considered when we calculate delete bitmap in these three phases. Before we finally merge the delete bitmap to the table meta's delete bitmap, we will check if all the rowsets contain the special mark to check if we have considered all the rowsets during the above three phases. Because the executor can not fail in publish phase if the coordinator have received successful commits info from all the executors, we just print logs if this correctness check failed rather than report a failure.	2023-08-11 21:12:35 +08:00
Chuanle Chen	71807ceb5f	[Enhancement](tvf) Table value function support reading local file (#17404 ) I tested the local tvf with tpch queries. First, generate `lineitem` datasets with 6001215 rows, and load it into `lineitem` table by: ``` insert into lineitem select c11, c1, c4, c2, c3, c5, c6, c7, c8, c9, c10, c12, c13, c14, c15, c16 from local( "file_path" = "tools/tpch-tools/bin/tpch-data/lineitem.tbl.1", "backend_id" = "10003", "format" = "csv", "column_separator" = "\|" ); ``` Then, run `q1` and `q16` tpch queries, the query result is correct. It can also analyze the BE's log directly like: ``` mysql> select * from local( "file_path" = "log/be.out", "backend_id" = "10006", "format" = "csv") where c1 like "%start_time%" limit 10; +--------------------------------------------------------+ \| c1 \| +--------------------------------------------------------+ \| start time: 2023年 08月 07日星期一 23:20:32 CST \| \| start time: 2023年 08月 07日星期一 23:32:10 CST \| \| start time: 2023年 08月 08日星期二 00:20:50 CST \| \| start time: 2023年 08月 08日星期二 00:29:15 CST \| +--------------------------------------------------------+ ```	2023-08-10 20:07:42 +08:00
yujun	94d563f04d	[improvement](garbage sweep) garbage sweep sleep for a while to reduce io (#22762 )	2023-08-10 12:11:50 +08:00
Xinyi Zou	f2731185c9	[fix](memory) fix cache clean thread (#22472 ) fix page cache update last visit time. fix cache clean thread	2023-08-08 15:38:29 +08:00
AlexYue	f036cdfde6	[feature](compaction) support delete in cumulative compaction (#19609 )	2023-08-07 15:22:21 +08:00
Xinyi Zou	1847e440b2	[fix](memory) enable Jemalloc arena dirty pages (#22639 ) If there is a core dump here, it may cover up the real stack, if stack trace indicates heap corruption (which led to invalid jemalloc metadata), like double free or use-after-free in the application. Try sanitizers such as ASAN, or build jemalloc with --enable-debug to investigate further.	2023-08-06 19:18:44 +08:00
Xinyi Zou	c2c01825c1	[opt](stacktrace) Optimize stacktrace output #22467	2023-08-06 15:53:53 +08:00
Mingyu Chen	d628baba0a	[improvement](hdfs) support hedged read (#22634 ) In some cases, the high load of HDFS may lead to a long time to read the data on HDFS, thereby slowing down the overall query efficiency. HDFS Client provides Hedged Read. This function can start another read thread to read the same data when a read request exceeds a certain threshold and is not returned, and whichever is returned first will use the result. eg: create catalog regression properties ( 'type'='hms', 'hive.metastore.uris' = 'thrift://172.21.16.47:7004', 'dfs.client.hedged.read.threadpool.size' = '128', 'dfs.client.hedged.read.threshold.millis' = "500" );	2023-08-06 14:51:48 +08:00
zxealous	38f9ac99df	[fix](bug) fix be custom conf persistence path and read path are inconsistent (#22520 ) be_custom.conf persistence path is ${doris_home}/conf/be_custom.conf, but if we set ${custom_config_dir} is a different path, will cause be can't read be_custom.conf from ${custom_config_dir}. set be_custom.conf persist path to ${custom_config_dir}.	2023-08-05 10:22:08 +08:00
Kaijie Chen	93593a013d	[feature](load) add segment bytes limit in segcompaction (#22526 )	2023-08-04 18:00:52 +08:00
zhannngchen	e90f95dfda	[config](merge-on-write) use separate config to control primary key index cache (#22538 )	2023-08-03 17:11:19 +08:00
Kaijie Chen	c2db01037a	[refactor](config) rename segcompaction_max_threads (#22468 )	2023-08-02 22:35:14 +08:00
Xinyi Zou	bc87002028	[opt](conf) remote scanner thread num is changed to core num * 10 (#22427 )	2023-08-01 23:09:49 +08:00
Chenyang Sun	19d1f49fbe	[improvement](compaction) compaction policy and options in the properties of a table (#22461 )	2023-08-01 22:02:23 +08:00
yiguolei	ff0fda460c	[be](parameter) change default fragment_pool_thread_num_max from 512 to 2048 (#22448 ) change some parameter's default value: brpc_num_threads from -1 to 256 compaction_task_num_per_disk from 2 to 4 compaction_task_num_per_fast_disk from 4 to 8 fragment_pool_thread_num_max from 512 to 2048 fragment_pool_queue_size from 2048 to 4096 --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-08-01 20:33:41 +08:00
bobhan1	a371e1d4c5	[fix](window_funnel_function) fix upgrade compatibility due to the added field in `WindowFunnelState` (#22416 )	2023-08-01 12:08:55 +08:00
Xinyi Zou	5f25b924b3	[opt](conf) Modify brpc eovercrowded conf (#22407 ) brpc ignore eovercrowded of data stream sender and exchange sink buffer Modify the default value of brpc_socket_max_unwritten_bytes	2023-08-01 08:47:55 +08:00

1 2 3

108 Commits