doris

Author	SHA1	Message	Date
Jibing-Li	9c6c2f736e	[Improvement](statistics)Improve stats sample strategy (#26435 ) Improve the accuracy of sample stats collection. For non distribution columns, use `nd / (n - f1 + f1n/N)` where `f1` is the number of distinct values that occurred exactly once in our sample of n rows (from a total of N), and `d` is the total number of distinct values in the sample. For distribution columns, use `ndv(n) * fraction of tablets sampled` for NDV. For very large tablet to sample, use limit to control the total lines to scan (for non key column only, because key column is sorted and will be inaccurate using limit).	2023-11-13 15:52:21 +08:00
airborne12	c6b97c4daa	[Improvement](segment iterator) remove range in first read to save time (#26689 ) Currently, rowids may be fragmented significantly after `_get_row_ranges_by_column_conditions`, potentially leading to high CPU costs when processing these scattered ranges of rowid. This PR enhances the `SegmentIterator` by eliminating the initial range read in the `BitmapRangeIterator` constructor and introducing a `read_batch_rowids` method to both `BitmapRangeIterator` and `BackwardBitmapRangeIterator` classes. The aim is to boost performance by omitting redundant read operations, thereby reducing execution time. Moreover, to avoid unnecessary reads when the range is relatively complete, we employ a simple `is_continuous` check to determine if the block of rows is continuous. If so, we call `next_batch` instead of `read_by_rowids`, streamlining the processing of consecutive rowids. We selected three SQL statement scenarios to test the effects of the optimization, which are: 1. ```select COUNT() from wc_httplogs_inverted_index where request match "images" and (size >= 10 and status = 200);``` 2. ```select COUNT() from wc_httplogs_inverted_index where request match "HTTP" and (size >= 10 and status = 200);``` 3. ```select COUNT() from wc_httplogs_inverted_index where request match "GET" and (size >= 10 and status = 200);``` - The first SQL statement represents the scenario primarily optimized in this PR, where the first read matches a large number of rows but is highly fragmented. - The second SQL statement represents a scenario where the first read fully hits, mainly to verify if there is any performance degradation in the PR when hitting a complete rowid range. - The third SQL statement represents a near-total hit with only occasional misses, used to check if the PR degrades when the rowid range contains many continuous ranges. The results are as follows: 1. For the first SQL statement: 1. Before optimization: Execution time: 0.32 sec, FirstReadTime: 6s628ms 2. After optimization: Execution time: 0.16 sec, FirstReadTime: 1s604ms 2. For the second SQL statement: 1. Before optimization: Execution time: 0.16 sec, FirstReadTime: 682.816ms 2. After optimization: Execution time: 0.15 sec, FirstReadTime: 635.156ms 3. For the third SQL statement: 1. Before optimization: Execution time: 0.16 sec, FirstReadTime: 787.904ms 2. After optimization: Execution time: 0.16 sec, FirstReadTime: 798.861ms	2023-11-13 15:51:48 +08:00
zhangstar333	b0c92d408b	[bug](function) add signature for precentile function (#26867 )	2023-11-13 15:43:10 +08:00
Luzhijing	761fa68ab2	[docs](readme)Update README.md (#26844 )	2023-11-13 14:29:39 +08:00
zy-kkk	2f32a721ee	[refactor](jni) unified jni framework for jdbc catalog (#26317 ) This commit overhauls the JDBC connector logic within our project, transitioning from the previous mechanism of fetching data through JNI calls for individual ResultSet items to a more efficient and unified approach using the VectorTable data structure.	2023-11-13 14:28:15 +08:00
qiye	5a7c0ec9dc	[fix](broker load) pass loadToSingleTablet to olapTableSink (#26680 )	2023-11-13 14:14:25 +08:00
谢健	7e62c3c2de	[fix](Nereids) store user variable in connect context (#26655 ) 1.user variable should be case insensitive 2.user variable should be cleared after the connection reset	2023-11-13 12:25:08 +08:00
Jerry Hu	fa3c7d98c8	[fix](map) the implementation of ColumnMap::replicate was incorrect" (#26647 )	2023-11-13 12:17:14 +08:00
starocean999	17b1108635	[fix](nereids)support uncorrelated subquery in join condition (#26672 ) sql select * from t1 a join t1 b on b.id in (select 1) and a.id = b.id; will report an error. This pr support uncorrelated subquery in join condition to fix it	2023-11-13 11:49:11 +08:00
starocean999	a78e0f8309	[enhancement](nereids)make error message more readable when bind logicalRepeat node (#26744 )	2023-11-13 10:52:27 +08:00
Liqf	db29850e1c	[bug](user login)fix PASSWORD_LOCK_TIME setting UNBOUNDED does not take effect (#26585 )	2023-11-13 10:41:49 +08:00
bobhan1	7e36ab838f	[regression](partial update) Add cases when the deleted rows have non nullable columns without default value (#26776 )	2023-11-13 10:36:59 +08:00
meiyi	c0fda8c5c2	[improve](group commit) Add a swicth to wait internal group commit lo… (#26734 ) * [improve](group commit) Add a swicth to make internal group commit load finish * modify group commit tvf plan	2023-11-13 10:35:35 +08:00
TengJianPing	7332b1b371	[fix](decimal) fix undefined behaviour of divide by zero when cast string to decimal (#26822 ) * [fix](decimal) fix undefined behaviour of divide by zero when cast string to decimal * fix format	2023-11-13 10:09:06 +08:00
yujun	d34dc1c133	[enhancement](regression test) stream load support direct load to be (#26829 )	2023-11-13 10:07:10 +08:00
TengJianPing	183c74f6ae	[decimal](test case) porting postgres regression test cases (#26836 )	2023-11-13 10:06:43 +08:00
Yongqiang YANG	d9e0a9fa2e	[enhancement](230) print max version and spec version when -230 happens (#26643 ) More information is provided.	2023-11-13 09:57:22 +08:00
Mingyu Chen	fa8c3aec07	[opt](load) catch Throwable to make load error msg more clear (#26821 ) When doing LoadPendingTask or LoadLoadingTask, there may be some Error thrown, such as `NoClassDefFoundError`, but previously, we only catch java's `Exception`, so other kind of error can not be shown clearly.	2023-11-13 09:39:29 +08:00
Mingyu Chen	4230b8c36c	[doc](hive) fix hive.version doc (#26806 )	2023-11-12 19:38:12 +08:00
AlexYue	07f1114ffa	[chore](fs) Don't print the stack for file system and it's derived class (#26814 )	2023-11-12 19:22:01 +08:00
yujun	b2dd58a666	[fix](disk migrate) migrate ignore not exists tablet (#26779 )	2023-11-12 18:04:33 +08:00
Mingyu Chen	66054a5c78	[opt](scanner) increase the connection num of s3 client (#26795 )	2023-11-12 00:29:11 -06:00
yiguolei	8cf360fff7	[refactor](closure) remove ref count closure using auto release closure (#26718 ) 1. closure should be managed by a unique ptr and released by brpc , should not hold by our code. If hold by our code, we need to wait brpc finished during cancel or close. 2. closure should be exception safe, if any exception happens, should not memory leak. 3. using a specific callback interface to be implemented by Doris's code, we could write any code and doris should manage callback's lifecycle. 4. using a weak ptr between callback and closure. If callback is deconstruted before closure'Run, should not core.	2023-11-12 11:57:46 +08:00
Guangdong Liu	ef880166bb	[regression-test](stream load)Invalid EXEC_MEM_LIMIT check (#26717 )	2023-11-12 11:55:44 +08:00
Mingyu Chen	8392e49983	[fix](hudi) fix wrong schema when query hudi table on obs (#26789 )	2023-11-11 21:10:30 -06:00
Mingyu Chen	2937b5166e	[fix](refresh) fix priv issue of refresh database and table operation (#26793 )	2023-11-11 21:09:53 -06:00
AlexYue	b23dd27c5e	[chore](regression-test) Fix error add partition operation due to duplicate partition range (#26742 )	2023-11-12 11:00:52 +08:00
zhiqiang	ad754cb58f	[fix](fe ut) Fix set traceid failed #26808 related to #26605	2023-11-12 10:55:10 +08:00
AlexYue	12b2b0f366	[fix](s3) Prevent data race when finishing s3 file writer's _put_object operation (#26811 )	2023-11-12 07:29:14 +08:00
walter	c26f5a2bd2	[improvement](BE) Remove unnecessary error handling codes (#26760 )	2023-11-12 00:02:51 +08:00
Lei Zhang	3044b8397e	[feature](fe) Add coverage tool for FE UT (#26203 )	2023-11-11 19:54:04 +08:00
Siyang Tang	196fadc044	[enhancement](metrics) enhance visibility of flush thread pool (#26544 )	2023-11-11 19:53:24 +08:00
lihangyu	8b33b0c4a4	[Fix](row store) cache invalidate key should not include sequence column (#26771 )	2023-11-11 01:30:32 -06:00
meiyi	ca47d75e83	[fix](regression) Add regression for group commit executed on observe… (#26692 )	2023-11-10 18:53:45 +08:00
Dongyang Li	70fdd1f1af	[fix](ci) fix bug, tpch pipeline upload log (#26627 ) * [fix](ci) fix bug, tpch pipeline upload log Co-authored-by: stephen <hello-stephen@qq.com>	2023-11-10 18:01:40 +08:00
Xiangyu Wang	fd43e64a72	[Enhancement](sql-cache) Use update time of hive to avoid cache miss through multi fe nodes. (#26424 ) Now the update time of hms table is generated by every FE node (Use `System.currentTimestamp()` separately), so the update time of a hms table may be different between FE nodes, always the same query can not hit the sql-cache if we submit it more than one times through different FE nodes. This pr mainly do following changes to avoid this problem. - Use the `transient_lastDdlTime` instead of `System.currentTimestamp` as the `schemaUpdateTime` of hms tables - Use the `eventTime` in hms event instead of `System.currentTimestamp` as the update time when processing hms events	2023-11-10 17:36:00 +08:00
zhangguoqiang	8ee237c55a	[Enhance](regression)enhance case test_hdfs_json_load #26358 enhance case test_hdfs_json_load	2023-11-10 17:29:11 +08:00
Dongyang Li	0e0cd3b256	[fix](action) Update pr-approve-status.yml (#26577 ) According to https://docs.github.com/en/rest/pulls/reviews?apiVersion=2022-11-28#list-reviews-for-a-pull-request, the number of results per page default is 30 (max 100). review of APPROVED after 30 will not be listed, change to 100 to fix it.	2023-11-10 17:01:37 +08:00
Pxl	2712bb9f60	[Bug](decimalv2) getCmpType return decimalv2 when lhs/rhs type both is decimalv2 (#26705 ) getCmpType return decimalv2 when lhs/rhs type both is decimalv2	2023-11-10 16:21:28 +08:00
minghong	59efebce3b	[opt](nereids) estimate join cost when col stats are not available (#26086 ) no stats left zigzag	2023-11-10 16:13:53 +08:00
yujun	0749d632c4	[feature](diagnose) diagnose for cluster balance (#26085 )	2023-11-10 15:31:58 +08:00
zhiqiang	4ebb517af0	[fix](be-ut) Fix compilation errors caused by missing opentelemetry headers (#26739 )	2023-11-10 14:58:46 +08:00
谢健	ce64f0c917	[enhancement](Nereids): add phase in shape string (#26682 )	2023-11-10 14:56:28 +08:00
yujun	5c3fed216d	[fix](transaction) Fix publish txn wait too long when not meet quorum (#26659 )	2023-11-10 14:55:26 +08:00
Guangdong Liu	9f6c6ffc92	[regression-test](stream load)Invalid file format check (#26713 )	2023-11-10 14:53:01 +08:00
zhannngchen	899630d0eb	[chore](key_util) remove useless null_first parameter (#26635 ) Doris always put null in the first when sorting key, the parameter null_first of encode_keys is useless.	2023-11-10 14:27:47 +08:00
赵硕	cdba4936b4	[feature](nereids) Support group commit insert (#26075 )	2023-11-10 14:20:14 +08:00
Xinyi Zou	019fb956d3	[docs](cache) Refactor query-cache docs (#26418 )	2023-11-10 13:57:20 +08:00
bobhan1	7878c08e15	[Revert](merge-on-write) Don't use delete bitmap to mark delete for rows with delete sign when sequence column doesn't exist (#26721 )	2023-11-10 13:55:40 +08:00
deardeng	27a21aa150	[fix](balance) Delete useless debug log (#26732 )	2023-11-10 12:57:13 +08:00

1 2 3 4 5 ...

14741 Commits