doris

Author	SHA1	Message	Date
yiguolei	98c74f9ab8	[improvement](signal) add tid during core dump,the tid is equal to tid in be.INFO (#15893 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-14 18:40:02 +08:00
Gabriel	84d6938a73	[Bug](pipeline) Fix BE crash caused by pipeline (#15890 ) * [Bug](pipeline) Fix BE crash caused by pipeline * update	2023-01-14 18:37:19 +08:00
Tiewei Fang	c4475a8dbc	[Enhencement](jdbc scanner) add profile for jdbc scanner (#15914 )	2023-01-14 10:28:59 +08:00
Lightman	313e14d220	[Bugfix] (ROLLUP) fix the coredump when add rollup by link schema change (#15654 ) Because of the rollup has the same keys and the keys's order is same, BE will do linked schema change. The base tablet's segments will link to the new rollup tablet. But the unique id from the base tablet is starting from 0 and as the rollup tablet also. In this case, the unique id 4 in the base table is column 'city', but in the rollup tablet is 'cost'. It will decode the varcode page to bigint page so that be coredump. It needs to be rejected. I think that if a rollup add by link schema change, it means this rollup is redundant. It brings no additional revenue and wastes storage space. So It needs to be rejected.	2023-01-14 10:20:07 +08:00
yixiutt	d8990522fb	[conf](compaction) enable vertical_compaction ordered_data_compaction (#14945 )	2023-01-13 23:12:42 +08:00
airborne12	ecb5aea182	[Feature-WIP](inverted index) inverted index writer's implementation (#15821 )	2023-01-13 21:30:44 +08:00
AlexYue	514de605b6	[Bug](predicate) add double predicate creator (#15762 ) Add one double predicator the same as integer predicate creator.	2023-01-13 18:34:09 +08:00
AlexYue	049f8ad2f9	[Bug](sort)fix merge sorter might div zero when block bytes less than block rows (#15859 ) If block bytes are bigger than the corresponding block's rows, then the avg_size_per_row would be zero. Which would end up diving zero in the following logic.	2023-01-13 18:33:40 +08:00
Tiewei Fang	1489e3cfbf	[Fix](file system) Make the constructor of `XxxFileSystem` a private method (#15889 ) Since Filesystem inherited std::enable_shared_from_this , it is dangerous to create native point of FileSystem. To avoid this behavior, making the constructor of XxxFileSystem a private method and using the static method create(...) to get a new FileSystem object.	2023-01-13 15:32:16 +08:00
Ashin Gau	34bb9cd5d3	[fix](parquet-reader) fix coredump when load datatime data to doris from parquet (#15794 ) `date_time_v2` will check scale when constructed datatimev2: ``` LOG(FATAL) << fmt::format("Scale {} is out of bounds", scale); ``` This [PR](https://github.com/apache/doris/pull/15510) has fixed this issue, but parquet does not use constructor to create `TypeDescriptor`, leading the `scale = -1` when reading datetimev2 data.	2023-01-13 11:51:11 +08:00
luozenglin	b1fb1277dd	[fix](bitmap) fix bitmap iterator comparison error (#15779 ) Fix the bug that bitmap.begin() == bitmap.end() is always true when the bitmap contains a single value.	2023-01-13 11:37:07 +08:00
HappenLee	9468711f9f	[Bug](join) fix bug null aware left anti join not correct result (#15841 )	2023-01-13 10:18:05 +08:00
yongkang.zhong	688a0bb96a	[feature](multi-catalog) support clickhouse jdbc catalog (#15780 )	2023-01-13 10:07:22 +08:00
yiguolei	16862d9b43	[refactor](remove unused code) remove buffer pool and disk io mgr (#15853 ) * [refactor](remove buffer pool and disk io mgr) remove unused code Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-13 09:42:58 +08:00
Jerry Hu	bae29157aa	[fix](olap) dictionary cannot be sorted after inserting some null values (#15829 )	2023-01-13 09:28:55 +08:00
TengJianPing	730571e386	[fix](sort spill) fix bug of failed to create spilled file (#15864 ) Also increase buffered block size when it has started to spill.	2023-01-13 09:23:26 +08:00
Gabriel	174e5e601f	[refactor](rpc fn) decouple vectorized remote function from row-based one (#15871 )	2023-01-13 09:21:33 +08:00
Gabriel	0fbdf8e3e1	[Refactor](table function) Decouple vectorized table functions from non-vectorized ones (#15772 )	2023-01-12 15:08:21 +08:00
zhannngchen	ef0e0cf68d	[enhancement](load) refine the reduce memory policy when process memory is nearly full (#15685 ) If process memory is almost full but data load don't consume more than 5% (50% * 10%) of total memory, we don't need to reduce memory of load jobs	2023-01-12 14:43:33 +08:00
abmdocrt	7441b4dc96	[Feature](function) Support width_bucket function (#14396 )	2023-01-12 13:59:21 +08:00
zhannngchen	92dd7c442a	[enhancement](unique key) disable concurrent flush memtable for unique key (#15802 )	2023-01-12 12:10:50 +08:00
TengJianPing	791604ba1f	[log](vlog) improve vlog print for query TExecPlanFragmentParams (#15806 ) * [log] improve vlog print for query TExecPlanFragmentParams * improvement	2023-01-12 09:27:59 +08:00
zhengyu	f3ef3f7e15	[fix](sink) fix memory leak in VNodeChannel (#15834 ) (#15835 ) Signed-off-by: freemandealer <freeman.zhang1992@gmail.com> Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2023-01-12 09:24:51 +08:00
AlexYue	98d69d1568	[fix](compile) fix vscan node compile error (#15805 ) conflict merge of #15604 and #15618	2023-01-11 15:08:46 +08:00
zbtzbtzbt	fe5e5d2bf4	[refactor] separate agg and flush in memtable (#15713 )	2023-01-11 10:07:34 +08:00
HappenLee	f5948eb4b0	[Build](cmake) Uniform capitalization keyword of cmake (#15728 )	2023-01-11 09:58:07 +08:00
Mingyu Chen	3fec5ff0f5	[refactor](scan-pool) move scan pool from env to scanner scheduler (#15604 ) The origin scan pools are in exec_env. But after enable new_load_scan_node by default, the scan pool in exec_env is no longer used. All scan task will be submitted to the scan pool in scanner_scheduler. BTW, reorganize the scan pool into 3 kinds: local scan pool For olap scan node remote scan pool For file scan node limited scan pool For query which set cpu resource limit or with small limit clause TODO: Use bthread to unify all IO task. Some trivial issues: fix bug that the memtable flush size printed in log is not right Add RuntimeProfile param in VScanner	2023-01-11 09:38:42 +08:00
yiguolei	d857b4af1b	[refactor](remove row batch) remove impala rowbatch structure (#15767 ) * [refactor](remove row batch) remove impala rowbatch structure Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-11 09:37:35 +08:00
TengJianPing	8f31a36429	[feature] support spill to disk for sort node (#15624 )	2023-01-11 08:40:58 +08:00
Jerry Hu	4bbc93b7ce	[refactor](hashtable) simplify template args of partitioned hash table (#15736 )	2023-01-11 08:39:13 +08:00
Gabriel	124c8662e8	[Bug](schema scanner) Fix wrong type in schema scanner (#15768 )	2023-01-11 08:37:39 +08:00
slothever	90a92f0643	[feature-wip](multi-catalog) add iceberg tvf to read snapshots (#15618 ) Support new table value function `iceberg_meta("table" = "ctl.db.tbl", "query_type" = "snapshots")` we can use the sql `select * from iceberg_meta("table" = "ctl.db.tbl", "query_type" = "snapshots")` to get snapshots info of a table. The other iceberg metadata will be supported later when needed. One of the usage: Before we use following sql to time travel: `select * from ice_table FOR TIME AS OF "2022-10-10 11:11:11"`; `select * from ice_table FOR VERSION AS OF "snapshot_id"`; we can use the snapshots metadata to get the `committed time` or `snapshot_id`, and then, we can use it as the time or version in time travel clause	2023-01-10 22:37:35 +08:00
zclllyybb	c3da5a687a	[fix]fixed dangerous usage of namespace std (#15741 ) Co-authored-by: zhaochangle <zhaochangle@selectdb.com>	2023-01-10 16:10:49 +08:00
Tiewei Fang	f17d69e450	[feature](file cache)Import `file cache` for remote file reader (#15622 ) The main purpose of this pr is to import `fileCache` for lakehouse reading remote files. Use the local disk as the cache for reading remote file, so the next time this file is read, the data can be obtained directly from the local disk. In addition, this pr includes a few other minor changes Import File Cache: 1. The imported `fileCache` is called `block_file_cache`, which uses lru replacement policy. 2. Implement a new FileRereader `CachedRemoteFilereader`, so that the logic of `file cache` is hidden under `CachedRemoteFilereader`. Other changes: 1. Add a new interface `fs()` for `FileReader`. 2. `IOContext` adds some statistical information to count the situation of `FileCache` Co-authored-by: Lightman <31928846+Lchangliang@users.noreply.github.com>	2023-01-10 12:23:56 +08:00
Gabriel	d0e8f84279	[feature](vectorized) Support MemoryScratchSink on vectorized engine (#15612 )	2023-01-10 10:38:35 +08:00
Xinyi Zou	9c0f96883a	[fix](hashjoin) Fix right join pull output block memory overflow (#15440 ) For outer join / right outer join / right semi join, when HashJoinNode::pull->process_data_in_hashtable outputs a block, it will output all rows of a key in the hash table into a block, and the output of a key is completed After that, it will check whether the block size exceeds the batch size, and if it exceeds, the output will be terminated. If a key has 2000w+ rows, memory overflow will occur when the subsequent block operations on the 2000w+ rows are performed.	2023-01-10 10:10:43 +08:00
Mingyu Chen	9e3a61989b	[refactor](es) remove BE generated dsl for es query #15751 remove fe config enable_new_es_dsl and all related code. Now the DSL for es is always generated on FE side.	2023-01-10 08:40:32 +08:00
plat1ko	ab186a60ce	[enhancement](compaction) Optimize judging delete rowset and picking candidate rowsets for compaction #15631 Tablet::version_for_delete_predicate should travel all rowset metas in tablet meta which complex is O(N), however we can directly judge whether this rowset is a delete rowset by RowsetMeta::has_delete_predicate which complex is O(1). As we won't call Tablet::version_for_delete_predicate when pick input rowsets for compaction, we can reduce the critical area of Tablet::_meta_lock.	2023-01-10 08:32:15 +08:00
Gabriel	2c9c7c48ac	[improvement](decimalv3) Java UDF and array type support DECIMALV3 (#15674 )	2023-01-09 15:13:16 +08:00
Gabriel	699bf972e2	[Bug](bitmap) Fix bitmap_from_string for null constant (#15698 )	2023-01-09 10:21:08 +08:00
Pxl	1514b5ab5c	[Feature](Materialized-View) support advanced Materialized-View (#15212 )	2023-01-09 09:53:11 +08:00
Lijia Liu	c57fa7c930	[Pipeline] Fix PipScannerContext::can_finish return wrong status (#15259 ) Now in ScannerContext::push_back_scanner_and_reschedule, _num_running_scanners-- is before _num_scheduling_ctx++. InPipScannerContext::can_finish, we check _num_running_scanners == 0 && _num_scheduling_ctx == 0 without obtaining _transfer_lock. In follow case, PipScannerContext::can_finish will return wrong result. _num_running_scanners-- Check _num_running_scanners == 0 && _num_scheduling_ctx == 0` return true. _num_scheduling_ctx++ So, we can set _num_running_scanners-- in the last of this func. Describe your changes. PipScannerContext::get_block_from_queue not block. Set _num_running_scanners-- in the last of ScannerContext::push_back_scanner_and_reschedule.	2023-01-09 08:46:58 +08:00
zbtzbtzbt	ba54634d55	[refactor] delete non vec load from memtable (#15667 ) * [refactor] delete non vec load from memtable delete non vec load from memtable totally. remove function keys_type() in memtable. Co-authored-by: zhoubintao <1229701101@qq.com>	2023-01-09 08:41:58 +08:00
ElvinWei	36590da24b	[fix](regression p0) add the alias function hist to histogram and fix p0 (#15708 ) add the alias function hist to histogram and fix p0	2023-01-08 11:31:23 +08:00
yixiutt	90be1a22a9	[bugfix](vertical compaction) fix dcheck failed in MOW tablet (#15638 ) fix a dcheck error for vertical compaction on Merge-On-Write table。 When merge rowsets with empty segment, VerticalHeapMergeIterator::init return ok directly and _record_rowids not set, dcheck failed when _unique_key_next_block call current_block_row_locations。	2023-01-08 10:39:52 +08:00
Ashin Gau	707eab9a63	[opt](multi-catalog) cache and reuse position delete rows in iceberg v2 (#15670 ) A deleted file may belong to multiple data files. Each data file will read a full amount of deleted files, so a deleted file may be read repeatedly. The deleted files can be cached, and multiple data files can reuse the first read content. The performance is improved by 60% in the case of single thread, and by 30% in the case of multithreading.	2023-01-07 22:29:11 +08:00
ElvinWei	76ad599fd7	[enhancement](histogram) optimise aggregate function histogram (#15317 ) This pr mainly to optimize the histogram(👉🏻 https://github.com/apache/doris/pull/14910) aggregation function. Including the following: 1. Support input parameters `sample_rate` and `max_bucket_num` 2. Add UT and regression test 3. Add documentation 4. Optimize function implementation logic Parameter description： - `sample_rate`：Optional. The proportion of sample data used to generate the histogram. The default is 0.2. - `max_bucket_num`：Optional. Limit the number of histogram buckets. The default value is 128. --- Example： ``` MySQL [test]> SELECT histogram(c_float) FROM histogram_test; +-------------------------------------------------------------------------------------------------------------------------------------+ \| histogram(`c_float`) \| +-------------------------------------------------------------------------------------------------------------------------------------+ \| {"sample_rate":0.2,"max_bucket_num":128,"bucket_num":3,"buckets":[{"lower":"0.1","upper":"0.1","count":1,"pre_sum":0,"ndv":1},...]} \| +-------------------------------------------------------------------------------------------------------------------------------------+ MySQL [test]> SELECT histogram(c_string, 0.5, 2) FROM histogram_test; +-------------------------------------------------------------------------------------------------------------------------------------+ \| histogram(`c_string`) \| +-------------------------------------------------------------------------------------------------------------------------------------+ \| {"sample_rate":0.5,"max_bucket_num":2,"bucket_num":2,"buckets":[{"lower":"str1","upper":"str7","count":4,"pre_sum":0,"ndv":3},...]} \| +-------------------------------------------------------------------------------------------------------------------------------------+ ``` Query result description： ``` { "sample_rate": 0.2, "max_bucket_num": 128, "bucket_num": 3, "buckets": [ { "lower": "0.1", "upper": "0.2", "count": 2, "pre_sum": 0, "ndv": 2 }, { "lower": "0.8", "upper": "0.9", "count": 2, "pre_sum": 2, "ndv": 2 }, { "lower": "1.0", "upper": "1.0", "count": 2, "pre_sum": 4, "ndv": 1 } ] } ``` Field description： - sample_rate：Rate of sampling - max_bucket_num：Limit the maximum number of buckets - bucket_num：The actual number of buckets - buckets：All buckets - lower：Upper bound of the bucket - upper：Lower bound of the bucket - count：The number of elements contained in the bucket - pre_sum：The total number of elements in the front bucket - ndv：The number of different values in the bucket > Total number of histogram elements = number of elements in the last bucket(count) + total number of elements in the previous bucket(pre_sum).	2023-01-07 00:50:32 +08:00
Jerry Hu	9c36278c4a	[improvement](pipeline) Support sharing hash table for broadcast join (#15628 )	2023-01-06 15:11:28 +08:00
HappenLee	1038093c29	[Pipeline](Exec) disable work steal of hash join build (#15652 )	2023-01-06 15:08:10 +08:00
HappenLee	f24659c003	[Refactor](pipeline) refactor the code of channel buffer limit and change the default value (#15650 )	2023-01-06 14:52:43 +08:00

1 2 3 4 5 ...

3561 Commits