doris

Author	SHA1	Message	Date
Kang	bd9a9a32f5	[bugfix](s3 fs) fix s3 uri parsing for http/https uri (#20656 )	2023-06-11 14:00:04 +08:00
Pxl	a15a0b9193	[Chore](build) use file(GLOB_RECURSE xxx CONFIGURE_DEPENDS) to replace set cpp (#20461 ) use file(GLOB_RECURSE xxx CONFIGURE_DEPENDS) to replace set cpp	2023-06-08 19:36:21 +08:00
plat1ko	a68fc551f0	[bug](cooldown) Fix async_write_cooldown_meta and snapshot cooldowned version not continuous bug (#20437 )	2023-06-08 15:35:35 +08:00
zhengyu	09344eaab5	[feature](load) introduce single-stream-multi-table load (#20006 ) For routine load (kafka load), user can produce all data for different table into single topic and doris will dispatch them into corresponding table. Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2023-06-07 17:55:25 +08:00
Pxl	7dc7ed97eb	[Chore](build) remove some unused code and remove some wno (#20326 ) remove some unused code about spinlock remove some wno and fix warning remove varadic macro usage	2023-06-05 10:48:07 +08:00
Kaijie Chen	b0bbff0fd1	[performance](load) improve memtable sort performance (#20392 )	2023-06-04 20:33:15 +08:00
Jerry Hu	8ff8705b3f	[fix](olap) deletion statement with space conditions did not take effect (#20349 ) Deletion statement like this: delete from tb where k1 = ' '; The rows whose k1's value is ' ' will not be deleted.	2023-06-02 13:52:57 +08:00
xiongjx751	5b6b1b38a6	[Enhancement](merge-on-write) Performance optimization of calculations of delete bitmap between segments (#20153 ) 1. Use heap sort to find duplicated keys between segments and update the delete-bitmap. The old implementation traversed all keys in all segments, used each key to search for duplicates in earlier segments, and then marked them for deletion. 2. Trick: Each time the heap top is popped as a key1, the new heap top is key2, allowing for jumping directly from key1 to key2 instead of advancing iteratively. 3. Effect: This technique works well when there are many segments within the same rowset and the imported data is relatively ordered.	2023-06-01 10:12:59 +08:00
Jerry Hu	c03a19ea23	[improvement](bitmap) Using set to store a small number of elements to improve performance (#19973 ) Test on SSB 100g: select lo_suppkey, count(distinct lo_linenumber) from lineorder group by lo_suppkey; exec time: 4.388s create materialized view: create materialized view customer_uv as select lo_suppkey, bitmap_union(to_bitmap(lo_linenumber)) from lineorder group by lo_suppkey; select lo_suppkey, count(distinct lo_linenumber) from lineorder group by lo_suppkey; exec time: 12.908s test with the patch, exec time: 5.790s	2023-05-31 16:13:42 +08:00
Chenyang Sun	accaff1026	[Feature](compaction) wip: single replica compaction (#19237 ) Currently, compaction is executed separately for each backend, and the reconstruction of the index during compaction leads to high CPU usage. To address this, we are introducing single replica compaction, where a specific primary replica is selected to perform compaction, and the remaining replicas fetch the compaction results from the primary replica. The Backend (BE) requests replica information for all peers corresponding to a tablet from the Frontend (FE). This information includes the host where the replica is located and the replica_id. By calculating hash(replica_id), the replica with the smallest hash value is responsible for executing compaction, while the remaining replicas are responsible for fetching the compaction results from this replica. The compaction task producer thread, before submitting a compaction task, checks whether the local replica should fetch from its peer. If it should, the task is then submitted to the single replica compaction thread pool. When performing single replica compaction, the process begins by requesting rowset versions from the target replica. These rowset_versions are then compared with the local rowset versions. The first version that can be fetched is selected.	2023-05-30 21:12:48 +08:00
lihangyu	ab8125d56f	[Improve](performance) introduce SchemaCache to cache TabletSchame & Schema (#20037 ) * [Improve](performance) introduce SchemaCache to cache TabletSchame & Schema 1. When the system is under high-concurrency load with wide table point queries, the frequent memory allocation and deallocation of Schema become evident system bottlenecks. Additionally, the initialization of TabletSchema and Schema also becomes a CPU hotspot.Therefore, the introduction of a SchemaCache is implemented to cache these resources for reuse. 2. Make some variables wrapped with std::unique<unique_ptr> Performance: \| 状态 \| QPS \| 平均响应时间 (avg) \| P99 响应时间 \| \|------------------\|-----\|------------------\|-------------\| \| 开启 SchemaCache \| 501 \| 20ms \| 34ms \| \| 关闭 SchemaCache \| 321 \| 31ms \| 61ms \| * handle schema change with schema version * remove useless header * rebase	2023-05-29 17:34:53 +08:00
Jerry Hu	9f8de89659	[refactor](exec) replace the single pointer with an array of 'conjuncts' in ExecNode (#19758 ) Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity. By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed. This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.	2023-05-29 11:47:31 +08:00
Yongqiang YANG	e0d9f7f955	[enhancement](load) add some profile items for load (#20141 )	2023-05-29 09:54:03 +08:00
amory	9d44918036	[Improve](data-type) Clean datatype uselesscode (#20145 ) * fix struct_export out data * delete useless code with data type	2023-05-28 20:48:29 +08:00
YueW	ae352997b4	[Enhancement](alter inverted index) Improve alter inverted index performance with light weight add or drop inverted index (#19063 )	2023-05-28 11:23:07 +08:00
Jack Drogon	93933308e6	[Feature-WIP](CCR): Add ccr doris interface (WIP) (#17881 )	2023-05-26 23:40:49 +08:00
amory	ee34b6de2d	[Refact] (serde) refact mysql serde with data type (#19543 ) refact mysql output (de)serialize with data type serde , avoid accoriding switch case Primitive type writed in mysqlWriter	2023-05-26 14:11:17 +08:00
Pxl	15a7420661	[Chore](ub) fix some undefined behaviors (#19986 ) /home/zcp/repo_center/doris_master/doris/be/src/olap/rowset/segment_v2/column_reader.cpp:895:21: runtime error: load of value 423208544, which is not a valid value for type 'doris::ReaderType' /home/zcp/repo_center/doris_master/doris/be/src/vec/columns/column_decimal.cpp:260:33: runtime error: load of misaligned address 0x7fa3348b301c for type 'int64_t' (aka 'long'), which requires 8 byte alignment /home/zcp/repo_center/doris_master/doris/be/src/olap/block_column_predicate.cpp:82:24: runtime error: variable length array bound evaluates to non-positive value 0 /home/zcp/repo_center/doris_master/doris/be/src/vec/columns/column_string.h:225:26: runtime error: null pointer passed as argument 2, which is declared to never be null	2023-05-26 14:08:40 +08:00
Jerry Hu	e5eed53b89	[improvement](bitmap) Use shared_ptr in BitmapValue to avoid deep copying (#19101 ) Currently bitmapvalue type is copied between columns, it cost a lot of memory. Use a shared ptr in bitmap value to avoid copy data.	2023-05-24 16:13:01 +08:00
amory	a6bd014b8a	[FIX](serde)pb ut is not stable #19907	2023-05-22 09:03:02 +08:00
ZhangYu0123	1c950d6930	[fix](config) fix memory config enable_query_memroy_overcommit spell problem #19898	2023-05-22 00:32:20 +08:00
lihangyu	fd4fa5c64e	[Optimize](row store) optimize serialization and deserialization (#19691 ) 1. Get DataTypeSerde in advance to avoid get temporary DataTypeSerde iterate each column 2. Iterate the original row once is enoungh for deserializing by introducing a map for record the index of each column's unique id	2023-05-18 16:22:38 +08:00
Kang	294599ee45	[feature](jsonb) rename JSONB type name and function name to JSON (#19774 ) To be more compatible with MySQL, rename JSONB type name and function name to JSON. The old JSONB type name and jsonb_xx function can still be used for backward compatibility. There is a function jsonb_extract remained since json_extract is used by json string function and more work need to change it. It will be changed further.	2023-05-18 16:16:52 +08:00
Xinyi Zou	068a32bc49	[Improvement](memory) faststring use Allocator #19762 After the outer catch exception, faststring resize reserve build may throw a memory alloc failure exception from the Allocator. Currently page body compress will catch memory alloc failure exception	2023-05-18 15:00:49 +08:00
yixiutt	943e5fb7e5	[improvement](MOW) use seperated cache for mow pk cache (#19686 ) In mow, primary key cache have a big impact on load performance, so we add a new cache type to seperate it from page cache to make it more flexible in some cases	2023-05-18 13:27:09 +08:00
herry2038	79d30cfe46	[feature](compact) Duplicate with no keys tables compaction coredump (#19490 ) Co-authored-by: yuxianbing <yuxianbing@yy.com>	2023-05-17 22:22:14 +08:00
Xinyi Zou	16f5d3d5b3	[Improvement](memory) new page use Allocator (#19472 )	2023-05-16 19:09:17 +08:00
zhannngchen	fad9237d30	[fix](storage) consider file size on page cache key (#19619 ) The core is due to a DCHECK: F0513 22:48:56.059758 3996895 tablet.cpp:2690] Check failed: num_to_read == num_read Finally, we found that the DCHECK failure is due to page cache: 1. At first we have 20 segments, which id is 0-19. 2. For MoW table, memtable flush process will calculate the delete bitmap. In this procedure, the index pages and data pages of PrimaryKeyIndex is loaded to cache 3. Segment compaction compact all these 10 segments to 2 segment, and rename it to id 0,1 4. Finally, before the load commit, we'll calculate delete bitmap between segments in current rowset. This procedure need to iterator primary key index of each segments, but when we access data of new compacted segments, we read data of old segments in page cache To fix this issue, the best policy is: 1. Add a crc32 or last modified time to CacheKey. 2. Or invalid related cache keys after segment compaction. For policy 1, we don't have crc32 in segment footer, and getting the last-modified-time needs to perform 1 additional disk IO. For policy 2, we need to add additional page cache invalidation methods, which may cause the page cache not stable So I think we can simply add a file size to identify that the file is changed. In LSM-Tree, all modification will generate new files, such file-name reuse is not normal case(as far as I know, only segment compaction), file size is enough to identify the file change.	2023-05-15 17:16:31 +08:00
HHoflittlefish777	f8ef25bb10	[enhancement](load) lazy-open necessary partitions when load (#18874 )	2023-05-14 16:09:55 +08:00
Pxl	9b7a419aed	[Chore](build) update some doc about build enviroment (#19325 ) update some doc about build enviroment	2023-05-10 16:18:44 +08:00
DeadlineFen	a05dbd3f81	[chore](compile) Improves PCH cache hit ratio (#19469 ) Supplement the documentation of be-clion-dev, avoid the problem of undefined DORIS_JAVA_HOME and inability to find jni.h when using clion development without directly compiling through build.sh Complete the classification of header files in pch.h and introduce some header files that are not frequently modified in doris. Separate the declaration and definition in common/config.h. If you need to modify the default configuration now, please modify it in common/config.cpp. gen_cpp/version.h is regenerated every time it is recompiled, which may cause PCH to fail, so now you need to get the version information indirectly rather than directly.	2023-05-10 12:49:01 +08:00
amory	b2371c1246	[Refact](Literal)refact literal get field and value (#19351 )	2023-05-10 09:01:17 +08:00
Pxl	dfad7b6b38	[Feature](generic-aggregation) some prowork of generic aggregation (#19343 ) some prowork of generic aggregation	2023-05-09 21:42:21 +08:00
Gabriel	4c6ca88088	Revert "[refactor](function) ignore DST for function `from_unixtime` (#19151 )" (#19333 ) This reverts commit 9dd6c8f87b73db238bfd38fb1d76f3796910f398.	2023-05-06 16:33:58 +08:00
Gabriel	9dd6c8f87b	[refactor](function) ignore DST for function `from_unixtime` (#19151 )	2023-05-05 11:51:49 +08:00
xiaojunjie	9813406757	[Enhancement](HttpServer) Add http interface authentication for BE (#17753 )	2023-05-04 23:46:49 +08:00
amory	e9a4cbcdf9	[Refact](type system) refact column with arrow serde (#19091 ) * refact arrow serde * add date serde * update arrow and fix nullable and date type	2023-05-04 15:28:46 +08:00
AlexYue	b0c215e694	[enhance](be)add more profile in prefetched buffered reader (#19119 )	2023-05-02 09:53:39 +08:00
yixiutt	aef9355cd3	[feature-wip](partial update) PART1: support basic partial write (#17542 )	2023-04-28 17:17:57 +08:00
Ashin Gau	65a82a0b57	[opt](FileReader) turn off prefetch data in parquet page reader when using MergeRangeFileReader (#19102 ) Using both `MergeRangeFileReader` and `BufferedStreamReader` simultaneously would waste a lot of memory, so turn off prefetch data in `BufferedStreamReader` when using MergeRangeFileReader.	2023-04-28 09:27:56 +08:00
Yongqiang YANG	6eb12640a1	[fix](segment_iter) do not init segment_iterator twice (#18337 ) * [fix](segment_iter) do not init segment_iterator twice SegmentIterator::init is called by Segment::new_iterator and BetaRowsetReader::get_segment_iterators twice.	2023-04-27 09:51:57 +08:00
caiconghui	a32fa219ec	Revert "[Enhancement](compaction) stop tablet compaction when table dropped (#18702 )" (#19086 ) This reverts commit 296b0c92f702675b92eee3c8af219f3862802fb2. we can use drop table force stmt to fast drop tablets, no need to check tablet dropped state in every report Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2023-04-26 18:27:46 +08:00
Lightman	1be5dac036	[improve] Refactor file cache and Improve the file cache strategy (#18652 ) 1. Refactor file cache. Before refactor, the file cache config format is "[{"path":"/path/to/file_cache","normal":21474836480,"persistent":10737418240,"query_limit":10737418240}]" and now change to "[{"path":"/mnt/disk3/selectdb_cloud/file_cache","total_size":21474836480,"query_limit":10737418240}]". It will be simpler than before. 2. Support more strategy. Support file cache priority. The file cache will have three queue, name as 'index'/'normal'/'disposable'. We can avoid that the higher priority data is eliminate by the lower priority data.	2023-04-25 23:14:28 +08:00
WenYao	339d804ec4	[Refactor](exceptionsafe) add factory creator to some class (#19000 )	2023-04-25 14:33:47 +08:00
Adonis Ling	16a394da0e	[chore](build) Use include-what-you-use to optimize includes (PART III) (#18958 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-24 14:51:51 +08:00
gitccl	296b0c92f7	[Enhancement](compaction) stop tablet compaction when table dropped (#18702 ) * [Enhancement](compaction) stop tablet compaction when table dropped * fix be ut	2023-04-24 11:04:27 +08:00
yiguolei	8d7a9fd21b	[refactor](exceptionsafe) add factory creator to some class (#18978 ) make vexprecontext,vexpr,function,query context,runtimestate thread safe. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-04-24 10:32:11 +08:00
AlexYue	c3baa65de3	[feature](io) enable s3 file writer with multi part uploading concurrently (#17585 ) Formerly S3FileWriter has to write each buffer with 5MB or more then upload one part, after all these works are done it could then process the incoming data, it's blocking and inefficient. This pr brings one bufferpool where the data could write into memory buffer immediately if has free buffer and then it would be uploaded into the S3. This pr doesn't provide the ability to elegantly support cases where there is no free buffer, i'll leave it as one future work.	2023-04-23 23:19:44 +08:00
yiguolei	3736530585	[refactor](query context) rename query fragments context to query context and make query context safe (#18950 ) * [refactor](query context) rename query fragments context to query context and make query context safe --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-04-23 22:53:56 +08:00
Ashin Gau	29f502380c	[opt](FileReader) merge small IO to optimize read performace (#18796 ) Add `MergeRangeFileReader` to merge small IO to optimize parquet&orc read performance. `MergeRangeFileReader` is a FileReader that efficiently supports random access in format like parquet and orc. In order to merge small IO in parquet and orc, the random access ranges should be generated when creating the reader. The random access ranges is a list of ranges that order by offset. The range in random access ranges should be reading sequentially, can be skipped, but can't be read repeatedly. When calling read_at, if the start offset located in random access ranges, the slice size should not span two ranges. For example, in parquet, the random access ranges is the column offsets in a row group. When reading at offset, if [offset, offset + 8MB) contains many random access ranges, the reader will read data in [offset, offset + 8MB) as a whole, and copy the data in random access ranges into small buffers(name as box, default 1MB, 64MB in total). A box can be occupied by many ranges, and use a reference counter to record how many ranges are cached in the box. If reference counter equals zero, the box can be release or reused by other ranges. When there is no empty box for a new read operation, the read operation will do directly. ## Effects The runtime of ClickBench reduces from 102s to 77s, and the runtime of Query 24 reduces from 24.74s to 9.45s. The profile of Query 24: ``` VFILE_SCAN_NODE (id=0):(Active: 8s344ms, % non-child: 83.06%) - FileReadBytes: 534.46 MB - FileReadCalls: 1.031K (1031) - FileReadTime: 28s801ms - GetNextTime: 8s304ms - MaxScannerThreadNum: 12 - MergedSmallIO: 0ns - CopyTime: 157.774ms - MergedBytes: 549.91 MB - MergedIO: 94 - ReadTime: 28s642ms - RequestBytes: 507.96 MB - RequestIO: 1.001K (1001) - NumScanners: 18 ``` 1001 request IOs has been merged into 94 IOs. ## Remaining problems 1. Add p2 regression test in nest PR 2. Profiles are scattered in various codes and will be refactored in the next PR 3. Support ORC reader	2023-04-23 10:51:38 +08:00

1 2 3 4 5 ...

1092 Commits