Commit Graph

6933 Commits

Author SHA1 Message Date
8633a0c0cc [Opt](exec) enable top opt in string type (#31489)
enable top opt in string type
2024-02-29 08:42:35 +08:00
586217bf73 [Improve](Variant) support prune segment for quering variant (#31310) 2024-02-28 17:52:11 +08:00
f18c853495 [enhance](S3) Init default retry strategy for aws s3 sdk (#31329) 2024-02-28 13:08:36 +08:00
e86cc7e8e8 [chore](log) reduce a lot inject debug point log #31474 2024-02-28 13:07:47 +08:00
7f566f9365 Reset report_workload_runtime_status to optional (#31479) 2024-02-28 13:07:47 +08:00
747faeed17 [Enhancement](group commit) optimize some group commit code (#31392)
This PR optimizes some of the logic related to group commit:
1. Improved the error handling when there is insufficient WAL space during import.
2. Accounted for cases where the content length is negative during import.
3. Added missing error log printing in `group_commit_mgr.cpp`.
2024-02-28 13:05:57 +08:00
2f6251ccde [pipelineX](refactor) remove source state from operator functions (#31435) 2024-02-28 13:05:57 +08:00
41e31ee333 creat hdfs fs with it's resource id (#31505) 2024-02-28 11:33:34 +08:00
82fd3af54b [chore](log) change merge-on-write correctness check log to VLOG_NOTICE (#31414) (#31467) 2024-02-27 23:36:24 +08:00
f039ec8cfb [debug](Variant) sanitize variant type and column in find_and_set_leave_value (#31436) 2024-02-27 13:58:13 +08:00
4b0d6716dc [Fix](be)Fix gcc compile failed #31431 2024-02-27 10:12:53 +08:00
3b093cabd1 Fix building issue in be on ubuntu with test enabled. (#31407)
Co-authored-by: tangye <tangye@bestpay.com.cn>
2024-02-27 10:12:44 +08:00
1fbf32ead2 [enhancement](pipelinex) limit add_child to and dependency to avoid error (#31394)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-02-27 10:12:36 +08:00
6e62017ed5 [fix](scanner) allocated_bytes should be called after success (#31428)
allocated_bytes should be called after success
2024-02-27 10:12:36 +08:00
c34639245e [Improvement](executor)add remote scan thread pool (#31376)
* add remote scan thread pool

* +1
2024-02-27 10:12:33 +08:00
1127b0065a [Improment](executor)Add scanbytes/scanrows condition (#31364)
* Add scanbytes/scanrows condition

* fix reg
2024-02-27 10:12:33 +08:00
b8fe620ba3 Fix cgroup cpu controller file leak (#31398) 2024-02-27 10:12:33 +08:00
f163d56a98 [feature](function) support sequence function(alias of array_range), enhance both to handle datetimev2 (#30823) 2024-02-27 10:12:19 +08:00
3cee6c6722 [fix](function) fix unexpected be core in string search function (#31312)
Fix be core in multi_match_any/multi_search_all_positions functions.
2024-02-27 10:12:18 +08:00
35333d7a77 [opt](scanner) scan enough blocks in each scan task (#31277) 2024-02-27 10:12:18 +08:00
0f38769102 [fix](scan) Fix missing shared tablet header lock (#31433) 2024-02-26 23:39:10 +08:00
Pxl
3acffaa205 [Feature](agg-state) support write_column_to_pb from DataTypeFixedLengthObjectSerDe (#31171) 2024-02-26 19:07:10 +08:00
48804a978a [Fix](group commit) Fix group commit flink error message (#31350)
* When using stream processing frameworks like Flink with group commit mode enabled, the uncertain size of imported data makes such behavior prohibitive. Previously, to simplify the process, the error message for excessive data volume during streamload was combined with the one for group commit mode, leading to confusion for users when encountering errors indicating the data volume is too large during Flink imports. To address this issue, we are adjusting the logic: if a user employs stream processing imports like Flink with group commit mode enabled, we will automatically disable group commit mode, switching to the standard import mode instead. This is the essence of this PR.
2024-02-26 19:07:10 +08:00
3451cd6c23 [fix](datetime) fix hour 24 on be (#31304) 2024-02-26 19:07:10 +08:00
859e56ac16 [bugfix](wg) should set task group down after thread pool stopped 2024-02-25 18:09:39 +08:00
4e5147c6a4 [fix](parquet) Fix possible memory leak if ParquetReader::parse_thrift_footer failed (#31375) 2024-02-25 18:08:19 +08:00
70304bffd2 [refactor](wg) move memory gc logic to workload group (#31334)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-02-23 23:12:09 +08:00
db58104bc3 [fix](inverted index) Fix inverted index for MOR unique table #31051 (#31354) 2024-02-23 23:10:36 +08:00
1c443661c1 [Fix](rf) fix multi thread init error in RuntimeFilterMergeControllerEntity (#31337) 2024-02-23 20:44:43 +08:00
52c45e38af [Refactor](RF) refactor the profile of rf and pipeline-x support local ignore (#31287)
* [Refactor](RF) refactor the profile of rf and pipeline-x support local ignore

* fix local merge filter
2024-02-23 19:05:06 +08:00
8f77e6363a [Feature](function) Support xxhash function like murmur hash function (#31193) 2024-02-23 19:03:28 +08:00
1456785aa1 [fix](join) incorrect result of mark join in nested loop join (#31280) 2024-02-23 19:03:28 +08:00
f65876d803 [Feature](explode) support explode map type (#30151) 2024-02-22 20:08:44 +08:00
4c3a96e7df [fix](memory) Fix LRU cache frequent prune (#31220) 2024-02-22 19:51:20 +08:00
ad07dec0ed [Improve](InPredict) enhance in predict with struct type (#30840) 2024-02-22 13:01:49 +08:00
90a7f04349 [refine](pipelinex) get sink local state does not require an id. #31195 2024-02-22 13:01:49 +08:00
52b9af06fb [pipelineX](refactor) Delete subclasses inherited from Dependency (#31216) 2024-02-22 13:01:48 +08:00
b66583551c [fix](group_commit)Fix bound checking problem when reading wal block (#31112) 2024-02-22 13:01:48 +08:00
f2a38e6345 [chore](columns) remove update_hashes_with_value for SipHash (#31224) 2024-02-22 13:01:48 +08:00
c56cb0ac3e [Exec](RF) Support merge remote rf local first (#31067) 2024-02-22 13:01:48 +08:00
9c096710fa [Enhancement](group commit) Add bvar and log for group commit (#31017) 2024-02-22 13:01:32 +08:00
e183e9ac46 [fix](stream-load) print stream load profile for pipeline and pipelinex (#31198) 2024-02-22 13:01:32 +08:00
49dd411f87 [fix](datetime) fix datetime round on BE (#31205)
with tmp as (
            select CONCAT(
                YEAR('2024-02-06 03:37:07.157'), '-', 
                LPAD(MONTH('2024-02-06 03:37:07.157'), 2, '0'), '-',
                LPAD(DAY('2024-02-06 03:37:07.157'), 2, '0'), ' ',
                LPAD(HOUR('2024-02-06 03:37:07.157'), 2, '0'), ':',
                LPAD(MINUTE('2024-02-06 03:37:07.157'), 2, '0'), ':',
                LPAD(SECOND('2024-02-06 03:37:07.157'), 2, '0'), '.', "123456789" )
            AS generated_string)
            select generated_string, cast(generated_string as DateTime(6)) from tmp
before (incorrect round)

+-------------------------------+-----------------------------------------+
| generated_string              | cast(generated_string as DATETIMEV2(6)) |
+-------------------------------+-----------------------------------------+
| 2024-02-06 03:37:07.123456789 | 2024-02-06 03:37:07.123456              |
+-------------------------------+-----------------------------------------+
after (round up, keep consistent with mysql):

+-------------------------------+-----------------------------------------+
| generated_string              | cast(generated_string as DATETIMEV2(6)) |
+-------------------------------+-----------------------------------------+
| 2024-02-06 03:37:07.123456789 | 2024-02-06 03:37:07.123457              |
+-------------------------------+-----------------------------------------+
1 row in set (0.03 sec)
same work with #30744 but implemented on BE
2024-02-21 19:18:45 +08:00
8d889e434b [fix](topn) Fix key topn block reverse is missed in some cases (#31199)
* move reverse block row order operation after _next_batch_internal

* add testcase
2024-02-21 19:18:45 +08:00
8fc9d80479 [compatibility](MySQL) update charset to utf8mb4, collation to utf8mb4_0900_bin (#31046)
Doris's behaviour is more like utf8mb4 and utf8mb4_0900_bin than utf8 and utf8_general_ci
2024-02-21 17:01:39 +08:00
1abe9d4384 [fix](memory) Fix LRU cache stale sweep (#31122)
Remove LRUCacheValueBase, put last_visit_time into LRUHandle, and automatically update timestamp to last_visit_time during cache insert and lookup.

Do not rely on external modification of last_visit_time, which is often forgotten.
2024-02-21 17:01:29 +08:00
a8d8c6a271 [fix](file-writer) opt s3 file writer and fix empty file related issue #28983 #30703 #31169 (#31213)
* (feature)(cloud) Use dynamic allocator instead of static buffer pool for better elasticity. (#28983)

* [fix](outfile) Fix unable to export empty data (#30703)

Issue Number: close #30600
Fix unable to export empty data to hdfs / S3, this behavior is inconsistent with version 1.2.7,
version 1.2.7 can export empty data to hdfs/ S3, and there will be exported files on S3/HDFS.

* [fix](file-writer) avoid empty file for segment writer (#31169)

---------

Co-authored-by: AlexYue <yj976240184@gmail.com>
Co-authored-by: zxealous <zhouchangyue@baidu.com>
2024-02-21 16:48:54 +08:00
f121ee907f [fix](invert index) fix the inaccurate rows_inverted_index_filtered in the profile (#31158) 2024-02-21 13:53:39 +08:00
278b232e76 [Bug](json reader) object should stop processing when encounter error (#31159)
If DATA_QUALITY_ERROR encountered we should stop processing this document any more.Otherwise there will be UB in simdjson.
2024-02-21 13:53:32 +08:00
8b6d6d0165 [pipelineX](refactor) refactor streaming agg structure (#31151) 2024-02-21 13:53:32 +08:00