doris

Author	SHA1	Message	Date
Yongqiang YANG	bc020112fc	[enhancement](routineload) add debug conf and set broker.name.ttl = 0 (#23302 ) * set broker.name.ttl = 0 * add debug config for librdkafka	2023-08-26 10:56:35 +08:00
Kaijie Chen	0a70cbfe99	[feature](move-memtable)[5/7] add olap table sink v2 and writers (#23458 ) Co-authored-by: laihui <1353307710@qq.com>	2023-08-25 10:20:06 +08:00
Kaijie Chen	71071ba057	[feature](move-memtable)[4/7] add stream sink file writer (#23416 ) Co-authored-by: laihui <1353307710@qq.com>	2023-08-25 00:08:27 +08:00
zhangstar333	37b49f60b7	[refactor](conf) add be conf for partition topn partitions threshold (#23220 ) add be conf for partition topn partitions threshold	2023-08-21 10:52:41 +08:00
bobhan1	cd6453434b	[Enhancement](merge-on-write) add correctness check for the calculation of delete bitmap (#22282 ) Currently, for merge-on-write unique table, the delete bitmap of a rowset will be calculated during flush phase, commit phase and publish phase. In this PR, we add a special mark in every rowset considered when we calculate delete bitmap in these three phases. Before we finally merge the delete bitmap to the table meta's delete bitmap, we will check if all the rowsets contain the special mark to check if we have considered all the rowsets during the above three phases. Because the executor can not fail in publish phase if the coordinator have received successful commits info from all the executors, we just print logs if this correctness check failed rather than report a failure.	2023-08-11 21:12:35 +08:00
Chuanle Chen	71807ceb5f	[Enhancement](tvf) Table value function support reading local file (#17404 ) I tested the local tvf with tpch queries. First, generate `lineitem` datasets with 6001215 rows, and load it into `lineitem` table by: ``` insert into lineitem select c11, c1, c4, c2, c3, c5, c6, c7, c8, c9, c10, c12, c13, c14, c15, c16 from local( "file_path" = "tools/tpch-tools/bin/tpch-data/lineitem.tbl.1", "backend_id" = "10003", "format" = "csv", "column_separator" = "\|" ); ``` Then, run `q1` and `q16` tpch queries, the query result is correct. It can also analyze the BE's log directly like: ``` mysql> select * from local( "file_path" = "log/be.out", "backend_id" = "10006", "format" = "csv") where c1 like "%start_time%" limit 10; +--------------------------------------------------------+ \| c1 \| +--------------------------------------------------------+ \| start time: 2023年 08月 07日星期一 23:20:32 CST \| \| start time: 2023年 08月 07日星期一 23:32:10 CST \| \| start time: 2023年 08月 08日星期二 00:20:50 CST \| \| start time: 2023年 08月 08日星期二 00:29:15 CST \| +--------------------------------------------------------+ ```	2023-08-10 20:07:42 +08:00
yujun	94d563f04d	[improvement](garbage sweep) garbage sweep sleep for a while to reduce io (#22762 )	2023-08-10 12:11:50 +08:00
Xinyi Zou	f2731185c9	[fix](memory) fix cache clean thread (#22472 ) fix page cache update last visit time. fix cache clean thread	2023-08-08 15:38:29 +08:00
AlexYue	f036cdfde6	[feature](compaction) support delete in cumulative compaction (#19609 )	2023-08-07 15:22:21 +08:00
Xinyi Zou	1847e440b2	[fix](memory) enable Jemalloc arena dirty pages (#22639 ) If there is a core dump here, it may cover up the real stack, if stack trace indicates heap corruption (which led to invalid jemalloc metadata), like double free or use-after-free in the application. Try sanitizers such as ASAN, or build jemalloc with --enable-debug to investigate further.	2023-08-06 19:18:44 +08:00
Xinyi Zou	c2c01825c1	[opt](stacktrace) Optimize stacktrace output #22467	2023-08-06 15:53:53 +08:00
Mingyu Chen	d628baba0a	[improvement](hdfs) support hedged read (#22634 ) In some cases, the high load of HDFS may lead to a long time to read the data on HDFS, thereby slowing down the overall query efficiency. HDFS Client provides Hedged Read. This function can start another read thread to read the same data when a read request exceeds a certain threshold and is not returned, and whichever is returned first will use the result. eg: create catalog regression properties ( 'type'='hms', 'hive.metastore.uris' = 'thrift://172.21.16.47:7004', 'dfs.client.hedged.read.threadpool.size' = '128', 'dfs.client.hedged.read.threshold.millis' = "500" );	2023-08-06 14:51:48 +08:00
zxealous	38f9ac99df	[fix](bug) fix be custom conf persistence path and read path are inconsistent (#22520 ) be_custom.conf persistence path is ${doris_home}/conf/be_custom.conf, but if we set ${custom_config_dir} is a different path, will cause be can't read be_custom.conf from ${custom_config_dir}. set be_custom.conf persist path to ${custom_config_dir}.	2023-08-05 10:22:08 +08:00
Kaijie Chen	93593a013d	[feature](load) add segment bytes limit in segcompaction (#22526 )	2023-08-04 18:00:52 +08:00
zhannngchen	e90f95dfda	[config](merge-on-write) use separate config to control primary key index cache (#22538 )	2023-08-03 17:11:19 +08:00
Kaijie Chen	c2db01037a	[refactor](config) rename segcompaction_max_threads (#22468 )	2023-08-02 22:35:14 +08:00
Xinyi Zou	bc87002028	[opt](conf) remote scanner thread num is changed to core num * 10 (#22427 )	2023-08-01 23:09:49 +08:00
Chenyang Sun	19d1f49fbe	[improvement](compaction) compaction policy and options in the properties of a table (#22461 )	2023-08-01 22:02:23 +08:00
yiguolei	ff0fda460c	[be](parameter) change default fragment_pool_thread_num_max from 512 to 2048 (#22448 ) change some parameter's default value: brpc_num_threads from -1 to 256 compaction_task_num_per_disk from 2 to 4 compaction_task_num_per_fast_disk from 4 to 8 fragment_pool_thread_num_max from 512 to 2048 fragment_pool_queue_size from 2048 to 4096 --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-08-01 20:33:41 +08:00
bobhan1	a371e1d4c5	[fix](window_funnel_function) fix upgrade compatibility due to the added field in `WindowFunnelState` (#22416 )	2023-08-01 12:08:55 +08:00
Xinyi Zou	5f25b924b3	[opt](conf) Modify brpc eovercrowded conf (#22407 ) brpc ignore eovercrowded of data stream sender and exchange sink buffer Modify the default value of brpc_socket_max_unwritten_bytes	2023-08-01 08:47:55 +08:00
Xinyi Zou	c25b9071ad	[opt](conf) Modify brpc work pool conf default value #22406 Default, if less than or equal 32 core, the following are 128, 128, 10240, 10240 in turn. if greater than 32 core, the following are core num * 4, core num * 4, core num * 320, core num * 320 in turn brpc_heavy_work_pool_threads brpc_light_work_pool_threads brpc_heavy_work_pool_max_queue_size brpc_light_work_pool_max_queue_size	2023-07-31 20:38:34 +08:00
HHoflittlefish777	ee754307bb	[refactor](load) refactor memtable flush actively (#21634 )	2023-07-30 21:31:54 +08:00
lihangyu	0cc3232d6f	[Improve](topn opt) modify fetch rpc timeout from 20s to 30s, since fetch is quite heavy sometimes (#22163 )	2023-07-28 17:56:18 +08:00
lihangyu	5584d7a5ba	[Improve](point query) Improve lookup connection cache from DoubleBuffer to LRU cache for better item pruning (#22041 )	2023-07-27 22:22:50 +08:00
Yongqiang YANG	687d97e648	[improvement][default_config] enlarge default value compaction related (#22286 ) configs 1. Because vertical compaction is enabled by default, it consumes less memory, we can enlarge default value of compaction related configs. 2. Enlarge default value of shard size related to lock.	2023-07-27 20:17:43 +08:00
Pxl	05be45bd35	[Improvement](brpc) adjust brpc_light_work_pool_threads/brpc_heavy_work_pool_threads (#22241 ) adjust brpc_light_work_pool_threads/brpc_heavy_work_pool_threads	2023-07-27 14:03:46 +08:00
Yongqiang YANG	31c856351a	[enhancement](default_config) change default value of rpc related (#22149 ) configs Bdbje elect timeout is 30 seconds, so we enlarge thrift_rpc_timeout_ms and txn_commit_rpc_timeout_ms to 60s. BTW: enlarge bdbje_lock_timeout_second from 1 to 5.	2023-07-27 11:12:26 +08:00
HHoflittlefish777	9e16c69925	[improvement](compression) support LZ4_HC algorithm and parse LZ4_RAW (#22165 )	2023-07-26 18:23:39 +08:00
Xinyi Zou	1f3de0eae3	[fix](memory) fix invalid large memory check && fix memory info thread safety (#22027 ) fix invalid large memory check fix memory info thread safety	2023-07-26 12:18:31 +08:00
Ashin Gau	30c21789c8	[opt](filecache) use weak_ptr to cache the file handle of file segment (#21975 ) Use weak_ptr to cache the file handle of file segment. The max cached number of file handles can be configured by `file_cache_max_file_reader_cache_size`, default `1000000`. Users can inspect the number of cached file handles by request BE metrics: `http://be_host:be_webserver_port/metrics`: ``` # TYPE doris_be_file_cache_segment_reader_cache_size gauge doris_be_file_cache_segment_reader_cache_size{path="/mnt/datadisk1/gaoxin/file_cache"} 2500 ```	2023-07-24 19:09:27 +08:00
HHoflittlefish777	e146969376	[Fix](config) delete unuse lazy open config #22136	2023-07-24 15:02:34 +08:00
bobhan1	367ad9164a	[feature-wip](auto-inc)(step-2) support auto-increment column for duplicate table (#19917 )	2023-07-20 18:03:39 +08:00
YueW	c31e826756	[opt](config) rename alter_inverted_index_worker_count to alter_index_worker_count, and add docs (#21985 )	2023-07-20 17:50:04 +08:00
Xinyi Zou	d180ed418d	[fix](stacktrace) Speed up stack trace (#21755 ) Introduce libunwind get stack trace, cost is negligible and has line numbers. use StackTraceCache, PHDRCache speed up, is customizable and has some optimizations. Other stack trace tools remain: glog, boost, glibc, in case for need. TODO: currently support linux __x86_64__, __arm__, __powerpc__, not supported __FreeBSD__, APPLE Note: __arm__, __powerpc__ not been verified Support signal handle libunwid support unw_backtrace for jemalloc Use of undefined compile option USE_MUSL for later	2023-07-19 15:43:14 +08:00
Xinyi Zou	4b30485d62	[improvement](memory) Refactor doris cache GC (#21522 ) Abstract CachePolicy, which controls the gc of all caches. Add stale sweep to all lru caches, including page caches, etc. I0710 18:32:35.729460 2945318 mem_info.cpp:172] End Full GC Free, Memory 3866389992 Bytes. cost(us): 112165339, details: FullGC: FreeTopMemoryQuery: - CancelCostTime: 1m51s - CancelTasksNum: 1 - FindCostTime: 0.000ns - FreedMemory: 2.93 GB WorkloadGroup: Cache name=DataPageCache: - CostTime: 15.283ms - FreedEntrys: 9.56K - FreedMemory: 691.97 MB - PruneAllNumber: 1 - PruneStaleNumber: 1	2023-07-11 20:21:31 +08:00
Xinyi Zou	38c8657e5e	[improve](memory) more grace logging for memory exceed limit (#21311 ) more grace logging for Allocator and MemTracker when memory exceed limit fix bthread grace exit.	2023-07-05 14:59:06 +08:00
Mingyu Chen	13fb69550a	[improvement](kerberos) disable hdfs fs handle cache to renew kerberos ticket at fix interval (#21265 ) Add a new BE config `kerberos_ticket_lifetime_seconds`, default is 86400. Better set it same as the value of `ticket_lifetime` in `krb5.conf` If a HDFS fs handle in cache is live longer than HALF of this time, it will be set as invalid and recreated. And the kerberos ticket will be renewed.	2023-07-04 17:13:34 +08:00
Gabriel	a3d34e1e08	[decimalv2](compatibility) add config to allow invalid decimalv2 literal (#21327 )	2023-07-03 10:55:27 +08:00
Xinyi Zou	0396f78590	[fix](memory) Remove ChunkAllocator & fix Allocator no use mmap (#21259 )	2023-06-28 16:10:24 +08:00
Xin Liao	5d1fb33f2d	[enhancement](merge-on-write) increasing the max_write_buffer_number parameter to improve save meta performance (#21243 )	2023-06-28 11:32:11 +08:00
lihangyu	50c1d55769	[Improve](dynamic schema) support filtering invalid data (#21160 ) * [Improve](dynamic schema) support filtering invalid data 1. Support dynamic schema to filter illegal data. 2. Expand the regular expression for ColumnName to support more column names. 3. Be compatible with PropertyAnalyzer and support legacy tables. 4. Default disable parse multi dimenssion array, since some bug unresolved	2023-06-26 19:32:43 +08:00
Lijia Liu	76bdcf1d26	[improvement](pipeline) task group scan entity (#19924 )	2023-06-25 14:43:35 +08:00
Xinyi Zou	2c9bdd64fa	[fix](memory) arena support memory reuse after clear() (#21033 )	2023-06-21 23:27:21 +08:00
Chenyang Sun	18a0824eb3	[fix](compaction)Modify time series compaction policy default config (#21079 )	2023-06-21 20:29:58 +08:00
DongLiang-0	442a734ef5	[improvement](config) update be config max_runnings_transactions_per_txn_map default value (#21060 )	2023-06-21 20:29:13 +08:00
zhannngchen	564b3533cf	[enhancement](merge-on-write) update publish/streamload/compaction co… (#21040 )	2023-06-21 14:49:51 +08:00
Xin Liao	9eade148dd	[enhancement](merge-on-write) add primary key data page size config (#20961 )	2023-06-20 19:51:02 +08:00
zzzxl	cc3f9ed9b7	[Fix](fd) fix fd limit over 100% (#20778 )	2023-06-17 19:54:10 +08:00
yongjinhou	2e295a1ee9	[Enhancement](http) unify http auth config (#20864 )	2023-06-16 16:55:46 +08:00

1 2

79 Commits