doris

Author	SHA1	Message	Date
plat1ko	25b6e4deb2	[fix](daemon) Fix incorrect initialization order of daemon services (#23578 ) Current initialization dependency: Daemon ───┬──► StorageEngine ──► ExecEnv ──► Disk/Mem/CpuInfo │ │ BackendService ─┘ However, original code incorrectly initialize Daemon before StorageEngine. This PR also stop and join threads of daemon services in their dtor, to ensure Daemon services release resources in reverse order of initialization via RAII.	2023-08-31 19:46:38 +08:00
Siyang Tang	1ac0ff0ea9	[feature](delete-predicate) support delete sub predicate v2 (#22442 ) New structure for delete sub predicate. Delete sub predicate uses a string type condition_str to stored temporarily now and fields will be extracted from it using std::regex, which may introduces stack overflow when matching a extremely large string(bug of libc). Now we attempt to use a new PB structure to hold the delete sub predicate, to avoid that problem. message DeleteSubPredicatePB { optional int32 column_unique_id = 1; optional string column_name = 2; optional string op = 3; optional string cond_value = 4; } Currently, 2 versions of sub predicate will both be filled. For query, we use the v2, and during compaction we still use v1. The old rowset meta with delete predicates which had sub predicate v1 will be attempted to convert to v2 when read from PB. Moreover, efforts will be made to rewrite these meta with the new delete sub predicate. Make preparation to use column unique id to specify a column globally. Using the column unique id rather than the column name to identify a column is vital for flexible schema change. The rewritten delete predicate will attach column unique id.	2023-08-29 19:37:23 +08:00
abmdocrt	da9eb79ac4	[Enhancement](Schema hash) Remove schema hash in tablet info (#23516 )	2023-08-29 10:05:12 +08:00
Yongqiang YANG	9c65b7ab96	[improvement](column_reader) move load once to index reader to reduce (#23537 ) memory footprint of column reader	2023-08-29 09:34:27 +08:00
Kaijie Chen	0d7a61ae8c	[fix](load) fix duplicate register of memtable writer in memory limiter (#23205 )	2023-08-22 10:05:17 +08:00
Kaijie Chen	6cf1efc997	[refactor](load) use smart pointers to manage writers in memtable memory limiter (#23019 )	2023-08-16 16:34:57 +08:00
bobhan1	4510e16845	[improvement](delete) support delete predicate on value column for merge-on-write unique table (#21933 ) Previously, delete statement with conditions on value columns are only supported on duplicate tables. After we introduce delete sign mechanism to do batch delete, a delete statement with conditions on value columns on unique tables will be transformed into the corresponding insert into ..., __DELETE_SIGN__ select ... statement. However, for unique table with merge-on-write enabled, the overhead of inserting these data can be eliminated. So this PR add the ability to allow delete predicate on value columns for merge-on-write unique tables.	2023-08-16 12:18:05 +08:00
yujun	b9b9071c9b	[improvement](create partition) create partition require quorum replicas succ (#22554 )	2023-08-11 11:59:05 +08:00
Kaijie Chen	58e7952eea	[refactor](load) use memtable writer in memtable memory limiter (#22780 )	2023-08-10 17:08:47 +08:00
Siyang Tang	4359089b9c	[fix](delete-pred) fix special char in delete sub condition #22667 For some users, their delete condition may contain special chars like '$', which will cause failure in parsing delete condition.	2023-08-09 00:04:26 +08:00
Kaijie Chen	9581d2b4eb	[refactor](load) split memtable writer out of delta writer (#21892 )	2023-08-08 22:02:42 +08:00
Siyang Tang	77e772e103	[enhancement](config) add some pre-process and pre-check for BE storage config attentions in docs (#22486 )	2023-08-07 18:16:57 +08:00
AlexYue	f036cdfde6	[feature](compaction) support delete in cumulative compaction (#19609 )	2023-08-07 15:22:21 +08:00
Pxl	7839a0e708	[Bug](brpc) fix brpc failed on big query came concurrently (#22600 ) fix PriorityThreadPool get_info get wrong number change brpc pool from priority to fifo do not use brpc pool when send eos	2023-08-05 21:24:32 +08:00
Kaijie Chen	93593a013d	[feature](load) add segment bytes limit in segcompaction (#22526 )	2023-08-04 18:00:52 +08:00
Chenyang Sun	19d1f49fbe	[improvement](compaction) compaction policy and options in the properties of a table (#22461 )	2023-08-01 22:02:23 +08:00
HHoflittlefish777	ee754307bb	[refactor](load) refactor memtable flush actively (#21634 )	2023-07-30 21:31:54 +08:00
Gabriel	103c473b96	[Bug](pipeline) fix pipeline shared scan + topn optimization (#21940 )	2023-07-25 12:48:27 +08:00
zhannngchen	86e80ae175	[enhancement](merge-on-write) support concurrent delete bitmap calc while close_wait (#21488 )	2023-07-24 10:09:28 +08:00
Chenyang Sun	f7ac827c90	[fix](compaction) fix time series compaction point policy (#21670 )	2023-07-21 23:09:02 +08:00
Xin Liao	f0d08da97c	[enhancement](merge-on-write) split delete bitmap from tablet meta (#21456 )	2023-07-12 19:13:36 +08:00
Pxl	ca71048f7f	[Chore](status) avoid empty error msg on status (#21454 ) avoid empty error msg on status	2023-07-11 13:48:16 +08:00
abmdocrt	7d4c47e250	[Enhancement](Compaction) Caculate all committed rowsets delete bitmaps when do comapction (#20907 ) Here we will calculate all the rowsets delete bitmaps which are committed but not published to reduce the calculation pressure of publish phase. Step1: collect this tablet's all committed rowsets' delete bitmaps. Step2: calculate all rowsets' delete bitmaps which are published during compaction. Step3: write back updated delete bitmap and tablet info.	2023-07-10 14:06:11 +08:00
Mingyu Chen	2678afd2db	[fix][improvement](fs) add HdfsIO profile and modification time (#21638 ) Refactor the interface of create_file_reader the file_size and mtime are merged into FileDescription, not in FileReaderOptions anymore. Now the file handle cache can get correct file's modification time from FileDescription. Add HdfsIO for hdfs file reader pick from [Enhancement](multi-catalog) Add hdfs read statistics profile. #21442	2023-07-08 14:49:44 +08:00
zhannngchen	67afea73b1	[enhancement](merge-on-write) add more version and txn information for mow publish (#21257 )	2023-07-07 16:18:47 +08:00
Yongqiang YANG	fb14950887	[refactor](load) split flush_segment_writer into two parts (#21372 )	2023-07-06 11:13:34 +08:00
zhannngchen	ec0e398c50	[enhancement](merge-on-write) record precise primary key index size (#21196 )	2023-06-27 16:50:09 +08:00
Xin Liao	48065fce19	[bugfix](merge-on-write) optimize rowset tree and tablet header lock (#20911 )	2023-06-18 19:26:02 +08:00
zhannngchen	ce9a20a375	[enhancement](merge-on-write) format logs about MoW and add more stats for publish (#20853 )	2023-06-17 23:14:28 +08:00
Xin Liao	f1af09ef87	[Enhancement](merge-on-write) parallel calculate delete bitmap when tablet has multi segments (#20706 )	2023-06-15 21:11:39 +08:00
Chenyang Sun	2a2e485456	[Enhancement](compaction) time-series scenario cumulative compaction policy (#20715 ) new compaction policy for log and time-series scenario	2023-06-14 23:48:44 +08:00
Mingyu Chen	4b15185e25	[improvement](hdfs) add parquet footer cache and hdfs file handle cache (#20544 ) 1. Add hdfs file handle cache for hdfs file reader Copied from Impala, `https://github.com/apache/impala/blob/master/be/src/util/lru-multi-cache.h`. (Thanks for the Impala team) This is a lru cache that can store multi entries with same key. The key is build with {file name + modification time} The value is the hdfsFile pointer that point to a certain hdfs file. This cache is to avoid reopen same hdfs file mutli time, which can save query time. Add a BE config `max_hdfs_file_handle_cache_num` to limit the max number of file handle cache, default is 20000. 2. Add file meta cache The file meta cache is a lru cache. the key is {file name + modification time}, the value is the parsed file meta info of the certain file, which can save the time of re-parsing file meta everytime. Currently, it is only used for caching parquet file footer. The test show that is cache is hit, the `FileOpenTime` and `ParseFooterTime` is reduce to almost 0 in query profile, which can save time when there are lots of files to read.	2023-06-13 15:13:57 +08:00
Pxl	a15a0b9193	[Chore](build) use file(GLOB_RECURSE xxx CONFIGURE_DEPENDS) to replace set cpp (#20461 ) use file(GLOB_RECURSE xxx CONFIGURE_DEPENDS) to replace set cpp	2023-06-08 19:36:21 +08:00
plat1ko	a68fc551f0	[bug](cooldown) Fix async_write_cooldown_meta and snapshot cooldowned version not continuous bug (#20437 )	2023-06-08 15:35:35 +08:00
Kaijie Chen	b0bbff0fd1	[performance](load) improve memtable sort performance (#20392 )	2023-06-04 20:33:15 +08:00
Jerry Hu	8ff8705b3f	[fix](olap) deletion statement with space conditions did not take effect (#20349 ) Deletion statement like this: delete from tb where k1 = ' '; The rows whose k1's value is ' ' will not be deleted.	2023-06-02 13:52:57 +08:00
xiongjx751	5b6b1b38a6	[Enhancement](merge-on-write) Performance optimization of calculations of delete bitmap between segments (#20153 ) 1. Use heap sort to find duplicated keys between segments and update the delete-bitmap. The old implementation traversed all keys in all segments, used each key to search for duplicates in earlier segments, and then marked them for deletion. 2. Trick: Each time the heap top is popped as a key1, the new heap top is key2, allowing for jumping directly from key1 to key2 instead of advancing iteratively. 3. Effect: This technique works well when there are many segments within the same rowset and the imported data is relatively ordered.	2023-06-01 10:12:59 +08:00
Chenyang Sun	accaff1026	[Feature](compaction) wip: single replica compaction (#19237 ) Currently, compaction is executed separately for each backend, and the reconstruction of the index during compaction leads to high CPU usage. To address this, we are introducing single replica compaction, where a specific primary replica is selected to perform compaction, and the remaining replicas fetch the compaction results from the primary replica. The Backend (BE) requests replica information for all peers corresponding to a tablet from the Frontend (FE). This information includes the host where the replica is located and the replica_id. By calculating hash(replica_id), the replica with the smallest hash value is responsible for executing compaction, while the remaining replicas are responsible for fetching the compaction results from this replica. The compaction task producer thread, before submitting a compaction task, checks whether the local replica should fetch from its peer. If it should, the task is then submitted to the single replica compaction thread pool. When performing single replica compaction, the process begins by requesting rowset versions from the target replica. These rowset_versions are then compared with the local rowset versions. The first version that can be fetched is selected.	2023-05-30 21:12:48 +08:00
lihangyu	ab8125d56f	[Improve](performance) introduce SchemaCache to cache TabletSchame & Schema (#20037 ) * [Improve](performance) introduce SchemaCache to cache TabletSchame & Schema 1. When the system is under high-concurrency load with wide table point queries, the frequent memory allocation and deallocation of Schema become evident system bottlenecks. Additionally, the initialization of TabletSchema and Schema also becomes a CPU hotspot.Therefore, the introduction of a SchemaCache is implemented to cache these resources for reuse. 2. Make some variables wrapped with std::unique<unique_ptr> Performance: \| 状态 \| QPS \| 平均响应时间 (avg) \| P99 响应时间 \| \|------------------\|-----\|------------------\|-------------\| \| 开启 SchemaCache \| 501 \| 20ms \| 34ms \| \| 关闭 SchemaCache \| 321 \| 31ms \| 61ms \| * handle schema change with schema version * remove useless header * rebase	2023-05-29 17:34:53 +08:00
Yongqiang YANG	e0d9f7f955	[enhancement](load) add some profile items for load (#20141 )	2023-05-29 09:54:03 +08:00
Jack Drogon	93933308e6	[Feature-WIP](CCR): Add ccr doris interface (WIP) (#17881 )	2023-05-26 23:40:49 +08:00
Pxl	15a7420661	[Chore](ub) fix some undefined behaviors (#19986 ) /home/zcp/repo_center/doris_master/doris/be/src/olap/rowset/segment_v2/column_reader.cpp:895:21: runtime error: load of value 423208544, which is not a valid value for type 'doris::ReaderType' /home/zcp/repo_center/doris_master/doris/be/src/vec/columns/column_decimal.cpp:260:33: runtime error: load of misaligned address 0x7fa3348b301c for type 'int64_t' (aka 'long'), which requires 8 byte alignment /home/zcp/repo_center/doris_master/doris/be/src/olap/block_column_predicate.cpp:82:24: runtime error: variable length array bound evaluates to non-positive value 0 /home/zcp/repo_center/doris_master/doris/be/src/vec/columns/column_string.h:225:26: runtime error: null pointer passed as argument 2, which is declared to never be null	2023-05-26 14:08:40 +08:00
yixiutt	943e5fb7e5	[improvement](MOW) use seperated cache for mow pk cache (#19686 ) In mow, primary key cache have a big impact on load performance, so we add a new cache type to seperate it from page cache to make it more flexible in some cases	2023-05-18 13:27:09 +08:00
Xinyi Zou	16f5d3d5b3	[Improvement](memory) new page use Allocator (#19472 )	2023-05-16 19:09:17 +08:00
zhannngchen	fad9237d30	[fix](storage) consider file size on page cache key (#19619 ) The core is due to a DCHECK: F0513 22:48:56.059758 3996895 tablet.cpp:2690] Check failed: num_to_read == num_read Finally, we found that the DCHECK failure is due to page cache: 1. At first we have 20 segments, which id is 0-19. 2. For MoW table, memtable flush process will calculate the delete bitmap. In this procedure, the index pages and data pages of PrimaryKeyIndex is loaded to cache 3. Segment compaction compact all these 10 segments to 2 segment, and rename it to id 0,1 4. Finally, before the load commit, we'll calculate delete bitmap between segments in current rowset. This procedure need to iterator primary key index of each segments, but when we access data of new compacted segments, we read data of old segments in page cache To fix this issue, the best policy is: 1. Add a crc32 or last modified time to CacheKey. 2. Or invalid related cache keys after segment compaction. For policy 1, we don't have crc32 in segment footer, and getting the last-modified-time needs to perform 1 additional disk IO. For policy 2, we need to add additional page cache invalidation methods, which may cause the page cache not stable So I think we can simply add a file size to identify that the file is changed. In LSM-Tree, all modification will generate new files, such file-name reuse is not normal case(as far as I know, only segment compaction), file size is enough to identify the file change.	2023-05-15 17:16:31 +08:00
Pxl	dfad7b6b38	[Feature](generic-aggregation) some prowork of generic aggregation (#19343 ) some prowork of generic aggregation	2023-05-09 21:42:21 +08:00
xiaojunjie	9813406757	[Enhancement](HttpServer) Add http interface authentication for BE (#17753 )	2023-05-04 23:46:49 +08:00
yixiutt	aef9355cd3	[feature-wip](partial update) PART1: support basic partial write (#17542 )	2023-04-28 17:17:57 +08:00
Yongqiang YANG	6eb12640a1	[fix](segment_iter) do not init segment_iterator twice (#18337 ) * [fix](segment_iter) do not init segment_iterator twice SegmentIterator::init is called by Segment::new_iterator and BetaRowsetReader::get_segment_iterators twice.	2023-04-27 09:51:57 +08:00
caiconghui	a32fa219ec	Revert "[Enhancement](compaction) stop tablet compaction when table dropped (#18702 )" (#19086 ) This reverts commit 296b0c92f702675b92eee3c8af219f3862802fb2. we can use drop table force stmt to fast drop tablets, no need to check tablet dropped state in every report Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2023-04-26 18:27:46 +08:00

1 2 3 4 5 ...

496 Commits