doris

Author	SHA1	Message	Date
minghong	f1d90ffc4e	[regression](nereids) add test case for partition prune (#26849 ) * list selected partition name in explain * add prune partition test case （multi-range key）	2023-11-14 11:51:32 +08:00
Kaijie Chen	f6a9914bc7	[feature](move-memtable) support auto partition in sink v2 (#26914 )	2023-11-14 11:39:44 +08:00
LiBinfeng	0a9d71ebd2	[Fix](Planner) fix varchar does not show real length (#25171 ) Problem: when we create table with datatype varchar(), we regard it to be max length by default. But when we desc, it does not show real length but show varchar() Reason: when we upgrade version from 2.0.1 to 2.0.2, we support new feature of creating varchar(), and it shows the same way with ddl schema. So user would confuse of the length of varchar Solved: change the showing of varchar() to varchar(65533), which in compatible with hive	2023-11-14 10:49:21 +08:00
Guangdong Liu	e0934166f5	[bugfix](es-catalog)fix exception when querying ES table (#26788 )	2023-11-14 10:47:37 +08:00
HHoflittlefish777	cd7ad99de0	[improvement](regression-test) add chunked transfer json test (#26902 )	2023-11-14 10:31:30 +08:00
Xinyi Zou	de6ecd2035	[fix](tls) Manually track memory in Allocator instead of mem hook and ThreadContext life cycle to manual control (#26904 ) Manually track query/load/compaction/etc. memory in Allocator instead of mem hook. Can still use Mem Hook when cannot manually track memory code segments and find memory locations during debugging. This will cause memory tracking loss for Query, loss less than 10% compared to the past, but this is expected to be more controllable. Similarly, Mem Hook will no longer track unowned memory to the orphan mem tracker by default, so the total memory of all MemTrackers will be less than before. Not need to get memory size from jemalloc in Mem Hook each memory alloc and free, which would lose performance in the past. Not require caching bthread local in pthread local for memory hook, in the past this has caused core dumps inside bthread, seems to be a bug in bthread. ThreadContext life cycle to manual control In the past, ThreadContext was automatically created when it was used for the first time (this was usually in the Jemalloc Hook when the first malloc memory), and was automatically destroyed when the thread exited. Now instead of manually controlling the create and destroy of ThreadContext, it is mainly created manually when the task thread start and destroyed before the task thread end. Run 43 clickbench query tests. Use MemHook in the past:	2023-11-14 10:30:42 +08:00
abmdocrt	fef627c0ba	[Fix](Txn) Fix transaction write to sequence column error (#26748 )	2023-11-14 10:30:10 +08:00
Petrichor	44f49c687d	[typo](doc) Add documentation for synchronizing tables without primary keys. (#26774 ) * add no primary-key sync doc	2023-11-14 10:10:54 +08:00
Ashin Gau	34edc578f1	[opt](MergeIO) use equivalent merge size to measure merge effectiveness (#26741 ) `MergeRangeFileReader` is used to merge small IOs, and `max_amplified_read_ratio` controls the proportion of read amplification. However, in some extreme cases(eg. `orc strip size`/`parquet row group size` is less than 3MB), the control effect of `max_amplified_read_ratio` is not good, resulting in a large amount of small IOs. After testing, the return time of a single IO for IO size smaller than 4kb in hdfs(512kb in oss) remains basically unchanged. Therefore, equivalent IO size is used to measure merge effectiveness: ``` EquivalentIOSize = MergeSize / Request IOs ``` When `EquivalentIOSize` is greater than 4kb in hdfs, or 512kb in oss, we believe that this kind of merge is effective.	2023-11-14 10:07:14 +08:00
Siyang Tang	6f82c798eb	[fix](delta-writer) fix total received rows in delta writer incorrect (#26905 )	2023-11-14 08:31:16 +08:00
Ashin Gau	ec40603b93	[fix](parquet) compressed_page_size has the same meaning in page v1 and v2 (#26783 ) 1. Parquet with page v2 is parsed error when using other codec except snappy. Because `compressed_page_size` has the same meaning in page v1 and v2, it always contains the bytes of definition level, repetition level and compressed data. 2. Add regression test for `fix_length_byte_array` stored decimal type, and dictionary encoded date/datetime type.	2023-11-14 08:30:42 +08:00
HHoflittlefish777	df6e444e75	[improvement](log) log desensitization without displaying user info (#26912 )	2023-11-14 08:30:00 +08:00
Jack Drogon	e7a8022106	[enhancement](binlog) Add dbName && tableName in CreateTableRecord (#26901 ) Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>	2023-11-14 08:29:38 +08:00
Kaijie Chen	b19abac5e2	[fix](move-memtable) pass num local sink to backends (#26897 )	2023-11-14 08:28:49 +08:00
Kaijie Chen	de62c00f4e	[fix](move-memtable) init auto partition context in VRowDistribution::open (#26911 )	2023-11-14 08:16:14 +08:00
Lei Zhang	37ca129fa7	[test](fe) Add more FE UT for `org.apache.doris.journal.bdbje` (#26629 )	2023-11-13 23:06:17 +08:00
Yongqiang YANG	5ad49dceaa	[fix](scanner_schedule) scanner hangs due to negative num_running_scanners (#26816 ) * [fix] scanner hangs due to negative num_running_scanners Before the patch, num_running_scanners is increased after submitting, then it may be decreased before increasing then negative values can be seen by get_block_from_queue and a expected submit does not happend. Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>	2023-11-13 23:03:49 +08:00
zzzxl	f698bb7be2	[Feature](inverted index) index tool add match_all and match_phrase (#26896 )	2023-11-13 22:53:11 +08:00
bobhan1	2853efd4ee	[fix](partial update) Fix NPE when the query statement of an update statement is a point query in `OriginPlanner` (#26881 ) close #26882 We should not use the singleNodePlan to generate the rootPlanFragment if the query is inside a insert statement or distributedPlanner will be null. introduced in #15491	2023-11-13 22:08:06 +08:00
HHoflittlefish777	9bb46f4c0f	[fix](doc) fix stream load sql doc (#26898 )	2023-11-13 22:05:23 +08:00
Lei Zhang	8e32b5fb6f	[regression](conf) Make checkpoint/clean thread trigger more frequent (#26883 ) * When run p0, we want some checkpoint/clean thread in FE work more frequently	2023-11-13 21:56:10 +08:00
yujun	f4d5e6dd55	[improvement](backend balance) improve capacity cofficient of be load score (#26874 )	2023-11-13 21:51:31 +08:00
Guangdong Liu	4ecaa921f9	[regression-test](num_as_string) test num_as_string (#26842 )	2023-11-13 21:48:13 +08:00
yujun	7b8709a944	[feature](doris compose) Support generate code coverage data (#26804 )	2023-11-13 21:47:53 +08:00
yujun	4d3983335a	[log](tablet invert) add preconditition check failed log (#26770 )	2023-11-13 21:47:33 +08:00
yujun	ebc15fc6cc	[fix](transaction) Fix concurrent schema change and txn cause dead lock (#26428 ) Concurrent schema change and txn may cause dead lock. An example: Txn T commit but not publish; Run schema change or rollup on T's related partition, add alter replica R; sc/rollup add a sched txn watermark M; Restart fe; After fe restart, T's loadedTblIndexes will clear because it's not save to disk; T will publish version to all tablet, including sc/rollup's new alter replica R; Since R not contains txn data, so the T will fail. It will then always waitting for R's data; sc/rollup wait for txn before M to finish, only after that it will let R copy history data; Since T's not finished, so sc/rollup will always wait, so R will nerver copy history data; Txn T and sc/rollup will wait each other forever, cause dead lock; Fix: because sc/rollup will ensure double write after the sched watermark M, so for finish transaction, when checking a alter replica: if txn id is bigger than M, check it just like a normal replica; otherwise skip check this replica, the BE will modify history data later.	2023-11-13 21:39:28 +08:00
Guangdong Liu	7b50a62f0c	[regression-test](stream-load-comment)comment case (#26841 )	2023-11-13 19:27:49 +08:00
Kang	44d9cd30bd	Revert "disable branch-2.0 protection temporarily (#26186 )" (#26875 ) This reverts commit d3c475b06ac1e447d1936fb2b6d35bf7e400dd42.	2023-11-13 18:24:29 +08:00
TengJianPing	504ec324bb	Revert "[refactor](scan) delete bloom_filter_predicate (#26499 )" (#26851 ) This reverts commit 2bb3ef198144954583aea106591959ee09932cba.	2023-11-13 16:27:23 +08:00
daidai	8160a04b35	[fix](information_schema) fix test_query_sys_tables schema_privileges regression case. (#26753 )	2023-11-13 16:04:30 +08:00
Mingyu Chen	d71104fc51	[doc](jdbc) add recommended jdbc driver version (#26866 )	2023-11-13 16:01:50 +08:00
Jibing-Li	9c6c2f736e	[Improvement](statistics)Improve stats sample strategy (#26435 ) Improve the accuracy of sample stats collection. For non distribution columns, use `nd / (n - f1 + f1n/N)` where `f1` is the number of distinct values that occurred exactly once in our sample of n rows (from a total of N), and `d` is the total number of distinct values in the sample. For distribution columns, use `ndv(n) * fraction of tablets sampled` for NDV. For very large tablet to sample, use limit to control the total lines to scan (for non key column only, because key column is sorted and will be inaccurate using limit).	2023-11-13 15:52:21 +08:00
airborne12	c6b97c4daa	[Improvement](segment iterator) remove range in first read to save time (#26689 ) Currently, rowids may be fragmented significantly after `_get_row_ranges_by_column_conditions`, potentially leading to high CPU costs when processing these scattered ranges of rowid. This PR enhances the `SegmentIterator` by eliminating the initial range read in the `BitmapRangeIterator` constructor and introducing a `read_batch_rowids` method to both `BitmapRangeIterator` and `BackwardBitmapRangeIterator` classes. The aim is to boost performance by omitting redundant read operations, thereby reducing execution time. Moreover, to avoid unnecessary reads when the range is relatively complete, we employ a simple `is_continuous` check to determine if the block of rows is continuous. If so, we call `next_batch` instead of `read_by_rowids`, streamlining the processing of consecutive rowids. We selected three SQL statement scenarios to test the effects of the optimization, which are: 1. ```select COUNT() from wc_httplogs_inverted_index where request match "images" and (size >= 10 and status = 200);``` 2. ```select COUNT() from wc_httplogs_inverted_index where request match "HTTP" and (size >= 10 and status = 200);``` 3. ```select COUNT() from wc_httplogs_inverted_index where request match "GET" and (size >= 10 and status = 200);``` - The first SQL statement represents the scenario primarily optimized in this PR, where the first read matches a large number of rows but is highly fragmented. - The second SQL statement represents a scenario where the first read fully hits, mainly to verify if there is any performance degradation in the PR when hitting a complete rowid range. - The third SQL statement represents a near-total hit with only occasional misses, used to check if the PR degrades when the rowid range contains many continuous ranges. The results are as follows: 1. For the first SQL statement: 1. Before optimization: Execution time: 0.32 sec, FirstReadTime: 6s628ms 2. After optimization: Execution time: 0.16 sec, FirstReadTime: 1s604ms 2. For the second SQL statement: 1. Before optimization: Execution time: 0.16 sec, FirstReadTime: 682.816ms 2. After optimization: Execution time: 0.15 sec, FirstReadTime: 635.156ms 3. For the third SQL statement: 1. Before optimization: Execution time: 0.16 sec, FirstReadTime: 787.904ms 2. After optimization: Execution time: 0.16 sec, FirstReadTime: 798.861ms	2023-11-13 15:51:48 +08:00
zhangstar333	b0c92d408b	[bug](function) add signature for precentile function (#26867 )	2023-11-13 15:43:10 +08:00
Luzhijing	761fa68ab2	[docs](readme)Update README.md (#26844 )	2023-11-13 14:29:39 +08:00
zy-kkk	2f32a721ee	[refactor](jni) unified jni framework for jdbc catalog (#26317 ) This commit overhauls the JDBC connector logic within our project, transitioning from the previous mechanism of fetching data through JNI calls for individual ResultSet items to a more efficient and unified approach using the VectorTable data structure.	2023-11-13 14:28:15 +08:00
qiye	5a7c0ec9dc	[fix](broker load) pass loadToSingleTablet to olapTableSink (#26680 )	2023-11-13 14:14:25 +08:00
谢健	7e62c3c2de	[fix](Nereids) store user variable in connect context (#26655 ) 1.user variable should be case insensitive 2.user variable should be cleared after the connection reset	2023-11-13 12:25:08 +08:00
Jerry Hu	fa3c7d98c8	[fix](map) the implementation of ColumnMap::replicate was incorrect" (#26647 )	2023-11-13 12:17:14 +08:00
starocean999	17b1108635	[fix](nereids)support uncorrelated subquery in join condition (#26672 ) sql select * from t1 a join t1 b on b.id in (select 1) and a.id = b.id; will report an error. This pr support uncorrelated subquery in join condition to fix it	2023-11-13 11:49:11 +08:00
starocean999	a78e0f8309	[enhancement](nereids)make error message more readable when bind logicalRepeat node (#26744 )	2023-11-13 10:52:27 +08:00
Liqf	db29850e1c	[bug](user login)fix PASSWORD_LOCK_TIME setting UNBOUNDED does not take effect (#26585 )	2023-11-13 10:41:49 +08:00
bobhan1	7e36ab838f	[regression](partial update) Add cases when the deleted rows have non nullable columns without default value (#26776 )	2023-11-13 10:36:59 +08:00
meiyi	c0fda8c5c2	[improve](group commit) Add a swicth to wait internal group commit lo… (#26734 ) * [improve](group commit) Add a swicth to make internal group commit load finish * modify group commit tvf plan	2023-11-13 10:35:35 +08:00
TengJianPing	7332b1b371	[fix](decimal) fix undefined behaviour of divide by zero when cast string to decimal (#26822 ) * [fix](decimal) fix undefined behaviour of divide by zero when cast string to decimal * fix format	2023-11-13 10:09:06 +08:00
yujun	d34dc1c133	[enhancement](regression test) stream load support direct load to be (#26829 )	2023-11-13 10:07:10 +08:00
TengJianPing	183c74f6ae	[decimal](test case) porting postgres regression test cases (#26836 )	2023-11-13 10:06:43 +08:00
Yongqiang YANG	d9e0a9fa2e	[enhancement](230) print max version and spec version when -230 happens (#26643 ) More information is provided.	2023-11-13 09:57:22 +08:00
Mingyu Chen	fa8c3aec07	[opt](load) catch Throwable to make load error msg more clear (#26821 ) When doing LoadPendingTask or LoadLoadingTask, there may be some Error thrown, such as `NoClassDefFoundError`, but previously, we only catch java's `Exception`, so other kind of error can not be shown clearly.	2023-11-13 09:39:29 +08:00
Mingyu Chen	4230b8c36c	[doc](hive) fix hive.version doc (#26806 )	2023-11-12 19:38:12 +08:00

1 2 3 4 5 ...

14772 Commits