doris

Author	SHA1	Message	Date
Kaijie Chen	c2db01037a	[refactor](config) rename segcompaction_max_threads (#22468 )	2023-08-02 22:35:14 +08:00
Xinyi Zou	bc87002028	[opt](conf) remote scanner thread num is changed to core num * 10 (#22427 )	2023-08-01 23:09:49 +08:00
Chenyang Sun	19d1f49fbe	[improvement](compaction) compaction policy and options in the properties of a table (#22461 )	2023-08-01 22:02:23 +08:00
yiguolei	ff0fda460c	[be](parameter) change default fragment_pool_thread_num_max from 512 to 2048 (#22448 ) change some parameter's default value: brpc_num_threads from -1 to 256 compaction_task_num_per_disk from 2 to 4 compaction_task_num_per_fast_disk from 4 to 8 fragment_pool_thread_num_max from 512 to 2048 fragment_pool_queue_size from 2048 to 4096 --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-08-01 20:33:41 +08:00
bobhan1	a371e1d4c5	[fix](window_funnel_function) fix upgrade compatibility due to the added field in `WindowFunnelState` (#22416 )	2023-08-01 12:08:55 +08:00
Xinyi Zou	5f25b924b3	[opt](conf) Modify brpc eovercrowded conf (#22407 ) brpc ignore eovercrowded of data stream sender and exchange sink buffer Modify the default value of brpc_socket_max_unwritten_bytes	2023-08-01 08:47:55 +08:00
Xinyi Zou	c25b9071ad	[opt](conf) Modify brpc work pool conf default value #22406 Default, if less than or equal 32 core, the following are 128, 128, 10240, 10240 in turn. if greater than 32 core, the following are core num * 4, core num * 4, core num * 320, core num * 320 in turn brpc_heavy_work_pool_threads brpc_light_work_pool_threads brpc_heavy_work_pool_max_queue_size brpc_light_work_pool_max_queue_size	2023-07-31 20:38:34 +08:00
HHoflittlefish777	ee754307bb	[refactor](load) refactor memtable flush actively (#21634 )	2023-07-30 21:31:54 +08:00
lihangyu	0cc3232d6f	[Improve](topn opt) modify fetch rpc timeout from 20s to 30s, since fetch is quite heavy sometimes (#22163 )	2023-07-28 17:56:18 +08:00
lihangyu	5584d7a5ba	[Improve](point query) Improve lookup connection cache from DoubleBuffer to LRU cache for better item pruning (#22041 )	2023-07-27 22:22:50 +08:00
Yongqiang YANG	687d97e648	[improvement][default_config] enlarge default value compaction related (#22286 ) configs 1. Because vertical compaction is enabled by default, it consumes less memory, we can enlarge default value of compaction related configs. 2. Enlarge default value of shard size related to lock.	2023-07-27 20:17:43 +08:00
Pxl	05be45bd35	[Improvement](brpc) adjust brpc_light_work_pool_threads/brpc_heavy_work_pool_threads (#22241 ) adjust brpc_light_work_pool_threads/brpc_heavy_work_pool_threads	2023-07-27 14:03:46 +08:00
Yongqiang YANG	31c856351a	[enhancement](default_config) change default value of rpc related (#22149 ) configs Bdbje elect timeout is 30 seconds, so we enlarge thrift_rpc_timeout_ms and txn_commit_rpc_timeout_ms to 60s. BTW: enlarge bdbje_lock_timeout_second from 1 to 5.	2023-07-27 11:12:26 +08:00
HHoflittlefish777	9e16c69925	[improvement](compression) support LZ4_HC algorithm and parse LZ4_RAW (#22165 )	2023-07-26 18:23:39 +08:00
Xinyi Zou	1f3de0eae3	[fix](memory) fix invalid large memory check && fix memory info thread safety (#22027 ) fix invalid large memory check fix memory info thread safety	2023-07-26 12:18:31 +08:00
Ashin Gau	30c21789c8	[opt](filecache) use weak_ptr to cache the file handle of file segment (#21975 ) Use weak_ptr to cache the file handle of file segment. The max cached number of file handles can be configured by `file_cache_max_file_reader_cache_size`, default `1000000`. Users can inspect the number of cached file handles by request BE metrics: `http://be_host:be_webserver_port/metrics`: ``` # TYPE doris_be_file_cache_segment_reader_cache_size gauge doris_be_file_cache_segment_reader_cache_size{path="/mnt/datadisk1/gaoxin/file_cache"} 2500 ```	2023-07-24 19:09:27 +08:00
HHoflittlefish777	e146969376	[Fix](config) delete unuse lazy open config #22136	2023-07-24 15:02:34 +08:00
bobhan1	367ad9164a	[feature-wip](auto-inc)(step-2) support auto-increment column for duplicate table (#19917 )	2023-07-20 18:03:39 +08:00
YueW	c31e826756	[opt](config) rename alter_inverted_index_worker_count to alter_index_worker_count, and add docs (#21985 )	2023-07-20 17:50:04 +08:00
Xinyi Zou	d180ed418d	[fix](stacktrace) Speed up stack trace (#21755 ) Introduce libunwind get stack trace, cost is negligible and has line numbers. use StackTraceCache, PHDRCache speed up, is customizable and has some optimizations. Other stack trace tools remain: glog, boost, glibc, in case for need. TODO: currently support linux __x86_64__, __arm__, __powerpc__, not supported __FreeBSD__, APPLE Note: __arm__, __powerpc__ not been verified Support signal handle libunwid support unw_backtrace for jemalloc Use of undefined compile option USE_MUSL for later	2023-07-19 15:43:14 +08:00
Xinyi Zou	4b30485d62	[improvement](memory) Refactor doris cache GC (#21522 ) Abstract CachePolicy, which controls the gc of all caches. Add stale sweep to all lru caches, including page caches, etc. I0710 18:32:35.729460 2945318 mem_info.cpp:172] End Full GC Free, Memory 3866389992 Bytes. cost(us): 112165339, details: FullGC: FreeTopMemoryQuery: - CancelCostTime: 1m51s - CancelTasksNum: 1 - FindCostTime: 0.000ns - FreedMemory: 2.93 GB WorkloadGroup: Cache name=DataPageCache: - CostTime: 15.283ms - FreedEntrys: 9.56K - FreedMemory: 691.97 MB - PruneAllNumber: 1 - PruneStaleNumber: 1	2023-07-11 20:21:31 +08:00
Xinyi Zou	38c8657e5e	[improve](memory) more grace logging for memory exceed limit (#21311 ) more grace logging for Allocator and MemTracker when memory exceed limit fix bthread grace exit.	2023-07-05 14:59:06 +08:00
Mingyu Chen	13fb69550a	[improvement](kerberos) disable hdfs fs handle cache to renew kerberos ticket at fix interval (#21265 ) Add a new BE config `kerberos_ticket_lifetime_seconds`, default is 86400. Better set it same as the value of `ticket_lifetime` in `krb5.conf` If a HDFS fs handle in cache is live longer than HALF of this time, it will be set as invalid and recreated. And the kerberos ticket will be renewed.	2023-07-04 17:13:34 +08:00
Gabriel	a3d34e1e08	[decimalv2](compatibility) add config to allow invalid decimalv2 literal (#21327 )	2023-07-03 10:55:27 +08:00
Xinyi Zou	0396f78590	[fix](memory) Remove ChunkAllocator & fix Allocator no use mmap (#21259 )	2023-06-28 16:10:24 +08:00
Xin Liao	5d1fb33f2d	[enhancement](merge-on-write) increasing the max_write_buffer_number parameter to improve save meta performance (#21243 )	2023-06-28 11:32:11 +08:00
lihangyu	50c1d55769	[Improve](dynamic schema) support filtering invalid data (#21160 ) * [Improve](dynamic schema) support filtering invalid data 1. Support dynamic schema to filter illegal data. 2. Expand the regular expression for ColumnName to support more column names. 3. Be compatible with PropertyAnalyzer and support legacy tables. 4. Default disable parse multi dimenssion array, since some bug unresolved	2023-06-26 19:32:43 +08:00
Lijia Liu	76bdcf1d26	[improvement](pipeline) task group scan entity (#19924 )	2023-06-25 14:43:35 +08:00
Xinyi Zou	2c9bdd64fa	[fix](memory) arena support memory reuse after clear() (#21033 )	2023-06-21 23:27:21 +08:00
Chenyang Sun	18a0824eb3	[fix](compaction)Modify time series compaction policy default config (#21079 )	2023-06-21 20:29:58 +08:00
DongLiang-0	442a734ef5	[improvement](config) update be config max_runnings_transactions_per_txn_map default value (#21060 )	2023-06-21 20:29:13 +08:00
zhannngchen	564b3533cf	[enhancement](merge-on-write) update publish/streamload/compaction co… (#21040 )	2023-06-21 14:49:51 +08:00
Xin Liao	9eade148dd	[enhancement](merge-on-write) add primary key data page size config (#20961 )	2023-06-20 19:51:02 +08:00
zzzxl	cc3f9ed9b7	[Fix](fd) fix fd limit over 100% (#20778 )	2023-06-17 19:54:10 +08:00
yongjinhou	2e295a1ee9	[Enhancement](http) unify http auth config (#20864 )	2023-06-16 16:55:46 +08:00
Xin Liao	f1af09ef87	[Enhancement](merge-on-write) parallel calculate delete bitmap when tablet has multi segments (#20706 )	2023-06-15 21:11:39 +08:00
Chenyang Sun	2a2e485456	[Enhancement](compaction) time-series scenario cumulative compaction policy (#20715 ) new compaction policy for log and time-series scenario	2023-06-14 23:48:44 +08:00
Mingyu Chen	4b15185e25	[improvement](hdfs) add parquet footer cache and hdfs file handle cache (#20544 ) 1. Add hdfs file handle cache for hdfs file reader Copied from Impala, `https://github.com/apache/impala/blob/master/be/src/util/lru-multi-cache.h`. (Thanks for the Impala team) This is a lru cache that can store multi entries with same key. The key is build with {file name + modification time} The value is the hdfsFile pointer that point to a certain hdfs file. This cache is to avoid reopen same hdfs file mutli time, which can save query time. Add a BE config `max_hdfs_file_handle_cache_num` to limit the max number of file handle cache, default is 20000. 2. Add file meta cache The file meta cache is a lru cache. the key is {file name + modification time}, the value is the parsed file meta info of the certain file, which can save the time of re-parsing file meta everytime. Currently, it is only used for caching parquet file footer. The test show that is cache is hit, the `FileOpenTime` and `ParseFooterTime` is reduce to almost 0 in query profile, which can save time when there are lots of files to read.	2023-06-13 15:13:57 +08:00
Pxl	e010fa8d4f	[Chore](runtime filter) remove runtime filter ready_for_publish/publish_finally (#20593 )	2023-06-13 11:20:49 +08:00
yujun	bd5a26f240	[improvement](recover) Default disable check tablet path (#20565 ) change check tablet path interval's default value to -1	2023-06-09 08:47:39 +08:00
yujun	92577f45d3	[fix] (recover) fix can not recover a BE's tablet after deleting its data directory manual (#20273 ) (#20274 )	2023-06-07 22:27:50 +08:00
zhengyu	09344eaab5	[feature](load) introduce single-stream-multi-table load (#20006 ) For routine load (kafka load), user can produce all data for different table into single topic and doris will dispatch them into corresponding table. Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2023-06-07 17:55:25 +08:00
Ashin Gau	3e186a8821	[opt](MergedIO) optimize merge small IO, prevent amplified read (#20305 ) Optimize the strategy of merging small IO to prevent severe read amplification, and turn off merged IO when file cache enabled. Adjustable parameters: ``` // the max amplified read ratio when merging small IO max_amplified_read_ratio=0.8 // the min segment size file_cache_min_file_segment_size = 1048576 ```	2023-06-03 10:51:24 +08:00
Jerry Hu	c03a19ea23	[improvement](bitmap) Using set to store a small number of elements to improve performance (#19973 ) Test on SSB 100g: select lo_suppkey, count(distinct lo_linenumber) from lineorder group by lo_suppkey; exec time: 4.388s create materialized view: create materialized view customer_uv as select lo_suppkey, bitmap_union(to_bitmap(lo_linenumber)) from lineorder group by lo_suppkey; select lo_suppkey, count(distinct lo_linenumber) from lineorder group by lo_suppkey; exec time: 12.908s test with the patch, exec time: 5.790s	2023-05-31 16:13:42 +08:00
Chenyang Sun	accaff1026	[Feature](compaction) wip: single replica compaction (#19237 ) Currently, compaction is executed separately for each backend, and the reconstruction of the index during compaction leads to high CPU usage. To address this, we are introducing single replica compaction, where a specific primary replica is selected to perform compaction, and the remaining replicas fetch the compaction results from the primary replica. The Backend (BE) requests replica information for all peers corresponding to a tablet from the Frontend (FE). This information includes the host where the replica is located and the replica_id. By calculating hash(replica_id), the replica with the smallest hash value is responsible for executing compaction, while the remaining replicas are responsible for fetching the compaction results from this replica. The compaction task producer thread, before submitting a compaction task, checks whether the local replica should fetch from its peer. If it should, the task is then submitted to the single replica compaction thread pool. When performing single replica compaction, the process begins by requesting rowset versions from the target replica. These rowset_versions are then compared with the local rowset versions. The first version that can be fetched is selected.	2023-05-30 21:12:48 +08:00
lihangyu	ab8125d56f	[Improve](performance) introduce SchemaCache to cache TabletSchame & Schema (#20037 ) * [Improve](performance) introduce SchemaCache to cache TabletSchame & Schema 1. When the system is under high-concurrency load with wide table point queries, the frequent memory allocation and deallocation of Schema become evident system bottlenecks. Additionally, the initialization of TabletSchema and Schema also becomes a CPU hotspot.Therefore, the introduction of a SchemaCache is implemented to cache these resources for reuse. 2. Make some variables wrapped with std::unique<unique_ptr> Performance: \| 状态 \| QPS \| 平均响应时间 (avg) \| P99 响应时间 \| \|------------------\|-----\|------------------\|-------------\| \| 开启 SchemaCache \| 501 \| 20ms \| 34ms \| \| 关闭 SchemaCache \| 321 \| 31ms \| 61ms \| * handle schema change with schema version * remove useless header * rebase	2023-05-29 17:34:53 +08:00
Jack Drogon	93933308e6	[Feature-WIP](CCR): Add ccr doris interface (WIP) (#17881 )	2023-05-26 23:40:49 +08:00
qiye	9e70a9ef84	[opt](compaction) add pick rowset to compact interval config (#19868 )	2023-05-26 17:39:02 +08:00
ZhangYu0123	1c950d6930	[fix](config) fix memory config enable_query_memroy_overcommit spell problem #19898	2023-05-22 00:32:20 +08:00
Xinyi Zou	76c358b3e3	[revert](memory) revert page no use Allocator && default disable ChunkAllocator (#19905 ) default chunk allocator reserve is 0. At this time, it is meaningless to enable chunk allocator, it will only waste memory.	2023-05-21 22:16:41 +08:00

1 2

64 Commits