doris

Author	SHA1	Message	Date
Lijia Liu	76bdcf1d26	[improvement](pipeline) task group scan entity (#19924 )	2023-06-25 14:43:35 +08:00
Xinyi Zou	2c9bdd64fa	[fix](memory) arena support memory reuse after clear() (#21033 )	2023-06-21 23:27:21 +08:00
Chenyang Sun	18a0824eb3	[fix](compaction)Modify time series compaction policy default config (#21079 )	2023-06-21 20:29:58 +08:00
DongLiang-0	442a734ef5	[improvement](config) update be config max_runnings_transactions_per_txn_map default value (#21060 )	2023-06-21 20:29:13 +08:00
zhannngchen	564b3533cf	[enhancement](merge-on-write) update publish/streamload/compaction co… (#21040 )	2023-06-21 14:49:51 +08:00
Xin Liao	9eade148dd	[enhancement](merge-on-write) add primary key data page size config (#20961 )	2023-06-20 19:51:02 +08:00
zzzxl	cc3f9ed9b7	[Fix](fd) fix fd limit over 100% (#20778 )	2023-06-17 19:54:10 +08:00
yongjinhou	2e295a1ee9	[Enhancement](http) unify http auth config (#20864 )	2023-06-16 16:55:46 +08:00
Xin Liao	f1af09ef87	[Enhancement](merge-on-write) parallel calculate delete bitmap when tablet has multi segments (#20706 )	2023-06-15 21:11:39 +08:00
Chenyang Sun	2a2e485456	[Enhancement](compaction) time-series scenario cumulative compaction policy (#20715 ) new compaction policy for log and time-series scenario	2023-06-14 23:48:44 +08:00
Mingyu Chen	4b15185e25	[improvement](hdfs) add parquet footer cache and hdfs file handle cache (#20544 ) 1. Add hdfs file handle cache for hdfs file reader Copied from Impala, `https://github.com/apache/impala/blob/master/be/src/util/lru-multi-cache.h`. (Thanks for the Impala team) This is a lru cache that can store multi entries with same key. The key is build with {file name + modification time} The value is the hdfsFile pointer that point to a certain hdfs file. This cache is to avoid reopen same hdfs file mutli time, which can save query time. Add a BE config `max_hdfs_file_handle_cache_num` to limit the max number of file handle cache, default is 20000. 2. Add file meta cache The file meta cache is a lru cache. the key is {file name + modification time}, the value is the parsed file meta info of the certain file, which can save the time of re-parsing file meta everytime. Currently, it is only used for caching parquet file footer. The test show that is cache is hit, the `FileOpenTime` and `ParseFooterTime` is reduce to almost 0 in query profile, which can save time when there are lots of files to read.	2023-06-13 15:13:57 +08:00
Pxl	e010fa8d4f	[Chore](runtime filter) remove runtime filter ready_for_publish/publish_finally (#20593 )	2023-06-13 11:20:49 +08:00
yujun	bd5a26f240	[improvement](recover) Default disable check tablet path (#20565 ) change check tablet path interval's default value to -1	2023-06-09 08:47:39 +08:00
yujun	92577f45d3	[fix] (recover) fix can not recover a BE's tablet after deleting its data directory manual (#20273 ) (#20274 )	2023-06-07 22:27:50 +08:00
zhengyu	09344eaab5	[feature](load) introduce single-stream-multi-table load (#20006 ) For routine load (kafka load), user can produce all data for different table into single topic and doris will dispatch them into corresponding table. Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2023-06-07 17:55:25 +08:00
Ashin Gau	3e186a8821	[opt](MergedIO) optimize merge small IO, prevent amplified read (#20305 ) Optimize the strategy of merging small IO to prevent severe read amplification, and turn off merged IO when file cache enabled. Adjustable parameters: ``` // the max amplified read ratio when merging small IO max_amplified_read_ratio=0.8 // the min segment size file_cache_min_file_segment_size = 1048576 ```	2023-06-03 10:51:24 +08:00
Jerry Hu	c03a19ea23	[improvement](bitmap) Using set to store a small number of elements to improve performance (#19973 ) Test on SSB 100g: select lo_suppkey, count(distinct lo_linenumber) from lineorder group by lo_suppkey; exec time: 4.388s create materialized view: create materialized view customer_uv as select lo_suppkey, bitmap_union(to_bitmap(lo_linenumber)) from lineorder group by lo_suppkey; select lo_suppkey, count(distinct lo_linenumber) from lineorder group by lo_suppkey; exec time: 12.908s test with the patch, exec time: 5.790s	2023-05-31 16:13:42 +08:00
Chenyang Sun	accaff1026	[Feature](compaction) wip: single replica compaction (#19237 ) Currently, compaction is executed separately for each backend, and the reconstruction of the index during compaction leads to high CPU usage. To address this, we are introducing single replica compaction, where a specific primary replica is selected to perform compaction, and the remaining replicas fetch the compaction results from the primary replica. The Backend (BE) requests replica information for all peers corresponding to a tablet from the Frontend (FE). This information includes the host where the replica is located and the replica_id. By calculating hash(replica_id), the replica with the smallest hash value is responsible for executing compaction, while the remaining replicas are responsible for fetching the compaction results from this replica. The compaction task producer thread, before submitting a compaction task, checks whether the local replica should fetch from its peer. If it should, the task is then submitted to the single replica compaction thread pool. When performing single replica compaction, the process begins by requesting rowset versions from the target replica. These rowset_versions are then compared with the local rowset versions. The first version that can be fetched is selected.	2023-05-30 21:12:48 +08:00
lihangyu	ab8125d56f	[Improve](performance) introduce SchemaCache to cache TabletSchame & Schema (#20037 ) * [Improve](performance) introduce SchemaCache to cache TabletSchame & Schema 1. When the system is under high-concurrency load with wide table point queries, the frequent memory allocation and deallocation of Schema become evident system bottlenecks. Additionally, the initialization of TabletSchema and Schema also becomes a CPU hotspot.Therefore, the introduction of a SchemaCache is implemented to cache these resources for reuse. 2. Make some variables wrapped with std::unique<unique_ptr> Performance: \| 状态 \| QPS \| 平均响应时间 (avg) \| P99 响应时间 \| \|------------------\|-----\|------------------\|-------------\| \| 开启 SchemaCache \| 501 \| 20ms \| 34ms \| \| 关闭 SchemaCache \| 321 \| 31ms \| 61ms \| * handle schema change with schema version * remove useless header * rebase	2023-05-29 17:34:53 +08:00
Jack Drogon	93933308e6	[Feature-WIP](CCR): Add ccr doris interface (WIP) (#17881 )	2023-05-26 23:40:49 +08:00
qiye	9e70a9ef84	[opt](compaction) add pick rowset to compact interval config (#19868 )	2023-05-26 17:39:02 +08:00
ZhangYu0123	1c950d6930	[fix](config) fix memory config enable_query_memroy_overcommit spell problem #19898	2023-05-22 00:32:20 +08:00
Xinyi Zou	76c358b3e3	[revert](memory) revert page no use Allocator && default disable ChunkAllocator (#19905 ) default chunk allocator reserve is 0. At this time, it is meaningless to enable chunk allocator, it will only waste memory.	2023-05-21 22:16:41 +08:00
ZhangYu0123	07bbf741fb	[enhence](memory) gc inverted index cache when there is not enough memory (#19622 ) Support to gc inverted index cache when there is not enough memory. previous problem： The inverted index cache (InvertedIndexSearcherCache and InvertedIndexQueryCache) may use 20% memory which can't be released.	2023-05-18 16:41:51 +08:00
Xinyi Zou	7c8b7878cd	[fix](memory) Print all query/load memory before memory GC when `memory_debug=true` (#19720 )	2023-05-18 14:55:47 +08:00
Gabriel	851886cc18	[minor](datev2) remove datev2 because datev2 is used by default (#19777 )	2023-05-18 13:36:11 +08:00
yixiutt	943e5fb7e5	[improvement](MOW) use seperated cache for mow pk cache (#19686 ) In mow, primary key cache have a big impact on load performance, so we add a new cache type to seperate it from page cache to make it more flexible in some cases	2023-05-18 13:27:09 +08:00
chenlinzhong	f412aec187	[improvement](load)disable shrink memory by default (#19714 ) disable shrink memory by default, it becomes very slow when importing large amounts of data you can turn on If you think it's necessary	2023-05-18 11:25:39 +08:00
Xinyi Zou	d5d47703fe	[fix](memory) remove auto option in memory config and optimize memtracker logs #19706 fix mem_limit default value memory_gc_sleep_time_s to memory_gc_sleep_time_ms LoadChannelMgr::_handle_mem_exceed_limit process_mem_limit to process soft mem limit fix query mem tracker print	2023-05-18 08:54:03 +08:00
zhengyu	4566281cc3	[fix](sink) disable lazy-open partition by default (#19769 ) Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2023-05-18 07:28:04 +08:00
Yongqiang YANG	d76e2e2254	[chore](config) ignore_eovercrowded to be true by default (#19282 )	2023-05-17 16:21:32 +08:00
zxealous	2d9cc8fe8f	[improvement](file cache)Support set min file segment size while use block file cache (#19536 )	2023-05-17 10:23:33 +08:00
Gabriel	8fd1eb0d1e	[minor](hash table) parameterize hash table (#19653 )	2023-05-17 09:58:26 +08:00
Yongqiang YANG	610f1c8ef5	[improvement](load) skip compression when memtable is small (#19300 ) * [improvement](load) skip compression when memtable is small * format	2023-05-16 12:08:41 +08:00
AlexYue	0617c7e56b	[enhance](Cold&Heat separation) use file block cache for cold heat separation rowset (#19410 ) For performance issue, we would specify rowset included by cold heat separation table to use file block cache no matter what config user has set. I've tested the config using cold_heat_seperation_case_p2 and it works well.	2023-05-14 22:06:26 +08:00
HHoflittlefish777	f8ef25bb10	[enhancement](load) lazy-open necessary partitions when load (#18874 )	2023-05-14 16:09:55 +08:00
DeadlineFen	a05dbd3f81	[chore](compile) Improves PCH cache hit ratio (#19469 ) Supplement the documentation of be-clion-dev, avoid the problem of undefined DORIS_JAVA_HOME and inability to find jni.h when using clion development without directly compiling through build.sh Complete the classification of header files in pch.h and introduce some header files that are not frequently modified in doris. Separate the declaration and definition in common/config.h. If you need to modify the default configuration now, please modify it in common/config.cpp. gen_cpp/version.h is regenerated every time it is recompiled, which may cause PCH to fail, so now you need to get the version information indirectly rather than directly.	2023-05-10 12:49:01 +08:00

37 Commits