Commit Graph

1401 Commits

Author SHA1 Message Date
8ca399ab92 [exec](pipeline) runtime filter wait time (#35108) 2024-05-21 12:50:05 +08:00
6b1c441258 [fix](group_commit) Wal reader should check block length to avoid reading empty block (#34792) 2024-05-18 18:17:56 +08:00
6c515e0c76 [fix](group commit) Make compatibility issues on serializing and deserializing wal file more clear (#34793) 2024-05-18 18:12:43 +08:00
80dd027ce2 [opt](join) For left semi/anti join without mark join conjunct and without other conjucnts, stop probing after matching one row (#34703) 2024-05-18 18:08:50 +08:00
1f0c45204b [fix](iceberg) read the primary key columns if hasing equality delete (#34884)
backport: #34835
2024-05-15 11:37:25 +08:00
02084fd91f [fix](iceberg_orc)Fixed the bug that the iceberg reader did not perform position delete when reading the orc file without a predicate. (#34814) (#34882)
bp #34814
2024-05-15 11:31:29 +08:00
9491b7d422 [fix](iceberg) prevent coredump if read position delete file failed (#34802) 2024-05-14 14:03:33 +08:00
8c237e82a3 [Bug](exec) fix intersections/differences bug (#34675) 2024-05-11 11:45:31 +08:00
cc00666be6 [opt](inverted index) add inlist condition handling to compound (#34134)
1. Previously, the compound did not support the inlist condition, which could impact performance if an inverted index was created.
2024-05-10 14:35:47 +08:00
e085f75a43 [opt](file-scanner) print current path when encountering error (#34365) (#34523)
bp #34365
2024-05-08 14:49:03 +08:00
4be589951b Revert "Revert "[fix](csv-reader) fix column split error when there is escape character (#34364)""
This reverts commit d127d67ebe989484bbdf340a4de5b79ded56eecc.
2024-05-07 18:03:56 +08:00
d127d67ebe Revert "[fix](csv-reader) fix column split error when there is escape character (#34364)"
This reverts commit 971e10a9db782c9986b20e1209468e4d7aeedf71.
2024-05-07 13:36:11 +08:00
9d0d7293f0 [fix](json) fix be crash while load json data (#34283) 2024-05-07 07:42:53 +08:00
971e10a9db [fix](csv-reader) fix column split error when there is escape character (#34364) 2024-05-07 07:38:35 +08:00
35f8563a75 [feature](iceberg) support iceberg equality delete (#34223) (#34327)
bp #34223

Co-authored-by: Ashin Gau <AshinGau@users.noreply.github.com>
2024-04-30 11:51:29 +08:00
1bfe0f0393 [feature](iceberg)support read iceberg complex type,iceberg.orc format and position delete. (#33935) (#34256)
master #33935
2024-04-29 14:40:12 +08:00
99af54f779 [Fix](orc-reader) Fix the issue when string col has mixed plain and dict encoding in different stripes. (#34146) (#34248)
backport #34146
2024-04-28 19:43:57 +08:00
0f0c0a266b [opt](parquet)Skip page with offset index (#33082)
Make skip_page() in ColumnChunkReader more efficient. No more reading page headers if there are pagelocations in chunk.
2024-04-26 15:06:16 +08:00
9f0a5690a6 [profile](scan) add projection time in scaner #34120 2024-04-26 07:43:40 +08:00
47b54d4bd5 Fix remote scan pool (#33976) 2024-04-25 15:04:43 +08:00
799c43686c [fix](jni-connector) avoid core dump if init connector failed (#34007)
_jni_scanner_cls may be null if connector init failed.
So need to check it before delete it.
2024-04-24 17:13:50 +08:00
Pxl
5a5063be20 [bug](fix) heap use after free when json parse failed (#33955) 2024-04-22 22:33:24 +08:00
4d7ac82305 [profile](scanner) Fix wrong metrics (#33965) 2024-04-22 22:33:24 +08:00
c631f4f8a8 [fix](schema change) resolve the use count check of source logical column (#33932)
Fix error like:
```
8# google::LogMessageFatal::~LogMessageFatal() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
 9# doris::vectorized::Block::clear_column_data(int) in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
10# doris::vectorized::ParquetReader::get_next_block(doris::vectorized::Block*, unsigned long*, bool*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/parquet/vparquet_reader.cpp:514
11# doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/vfile_scanner.cpp:333
12# doris::vectorized::VScanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/vscanner.cpp:132
13# doris::vectorized::VScanner::get_block_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/vscanner.cpp:99
```

Because source logical column is the destination logical column if logical converter is consistent. Previously, the reference of column was reset after the conversion was completed, but if an EOF occurred, it was returned in advance, but EOF is not a true error.
```
if (_logical_converter->is_consistent()) {
            // If logical converter is consistent, _src_logical_column is the final destination column,
            // other components will check the use count
            _src_logical_column.reset();
}
```
2024-04-22 12:31:46 +08:00
36a70ba1e7 [Fix](Csv-Reader)Fix the issue of BE core dump caused by improper configuration of column_seperator and line_delimiter. (#33693) 2024-04-20 20:06:48 +08:00
0e3ad5cd9d [fix](parquet) fix time zone error(isAdjustedToUTC=true) in parquet reader (#33675) (#33924)
bp (#33675)

Co-authored-by: Ashin Gau <AshinGau@users.noreply.github.com>
2024-04-20 19:06:54 +08:00
25358564ca [Fix](compile) Fix gcc compile on master (#33864)
This is imported by #33511. wrongly used

ColumnStr<T> ();

which violate C++20 standard(see https://wg21.cmeerw.net/cwg/issue2237) but still supported by clang up until now(see llvm/llvm-project#58112)
2024-04-19 23:41:37 +08:00
1300317723 [Exec](join) Support column string64 to avoid join failed in string size overflow the uint32 (#33511) (#33850) 2024-04-18 19:43:08 +08:00
b07e0a2f06 [FIX](cast)fix full/right out join for cast array (#33475)
in some case, we has code
```
        if (_join_op == TJoinOp::RIGHT_OUTER_JOIN || _join_op == TJoinOp::FULL_OUTER_JOIN) {
            _probe_column_convert_to_null = _convert_block_to_null(*input_block);
        }
```
then do next function like cast , but in function cast we assume block column is same with from_type.which will make status error
2024-04-17 23:42:13 +08:00
59de97be5e [improvement](mow) Add profile for delete_bitmap get_agg function (#33576) 2024-04-17 23:42:13 +08:00
2cd4012541 [opt](scan) read scan ranges in the order of partitions (#33515) (#33657)
backport: #33515
2024-04-17 23:42:12 +08:00
8ee8de7857 [Fix](executor)reset remote scan thread num #33579 2024-04-17 23:42:11 +08:00
ae68cca07d [fix](schema change) CastStringConverter is compiled failed in g++ (#33546)
follow #32873, CastStringConverter is compiled failed in g++ for uninitialized value, which is ok in clang:
2024-04-17 23:42:00 +08:00
249a9c9875 [Feature](Variant) support aggregation model for Variant type (#33493)
refactor use `insert_from` to replace `replace_column_data` for variable lengths columns
2024-04-17 23:42:00 +08:00
6bcf24b1f6 [bug](not in) if not in (null) could eos early (#33482)
* [bug](not in) if not in (null) could eos early
2024-04-17 23:41:59 +08:00
9b7af4c0cf [feature](schema change) unified schema change for parquet and orc reader (#32873)
Following #25138, unified schema change interface for parquet and orc reader, and can be applied to other format readers as well.
Unified schema change interface for all format readers:
- First, read the data according to the column type of the file into source column;
- Second, convert source column to the destination column with type planned by FE.
2024-04-12 15:09:25 +08:00
Pxl
5f30463bb3 [Chore](descriptors) remove unused codes for descriptors (#33408)
remove unused codes for descriptors
2024-04-12 15:09:25 +08:00
f7d52b5b1c [feature](expr) add type check when expr prepare (#33330) 2024-04-11 09:31:50 +08:00
Pxl
3081fc584d [Improvement](runtime-filter) support sync join node build side's size to init bloom runtime filter (#32180)
support sync join node build side's size to init bloom runtime filter
2024-04-11 09:31:50 +08:00
28acfaed2b [fix](pipeline)group by and output is empty (#33192) 2024-04-10 16:23:20 +08:00
Pxl
8fd6d4c41b [Chore](build) add -Wconversion and remove some unused code (#33127)
add -Wconversion and remove some unused code
2024-04-10 15:26:08 +08:00
cc363f26c2 [fix](Nereids) fix group concat (#33091)
Fix failed in regression_test/suites/query_p0/group_concat/test_group_concat.groovy

select
group_concat( distinct b1, '?'), group_concat( distinct b3, '?')
from
table_group_concat
group by
b2

exception:

lowestCostPlans with physicalProperties(GATHER) doesn't exist in root group

The root cause is '?' is push down to slot by NormalizeAggregate, AggregateStrategies treat the slot as a distinct parameter and generate a invalid PhysicalHashAggregate, and then reject by ChildOutputPropertyDeriver.

I fix this bug by avoid push down literal to slot in NormalizeAggregate, and forbidden generate stream aggregate node when group by slots is empty
2024-04-10 14:59:46 +08:00
b0b5f84e40 [feature](load) support compressed JSON format data for broker load (#30809) 2024-04-10 14:20:53 +08:00
Pxl
e4993a19e5 [Chore](column) remove ColumnVectorHelper (#33036)
remove ColumnVectorHelper
2024-04-10 11:56:41 +08:00
8e19cdd745 [featrue](expr) support common subexpression elimination be part (#32673) 2024-04-10 11:56:21 +08:00
cf7595d423 [opt](memory) Optimize mem tracker accuracy (#32039) (#33140) 2024-04-10 11:42:19 +08:00
c5a3af5c27 [partitionsort](fix) Fix DCHECK failure (#33035) 2024-04-10 11:34:30 +08:00
3c4ccb3981 Revert "[opt](scan) read scan ranges in the order of partitions (#31630)"
This reverts commit 5d99dffe6f1a3fcb107ce56181aeff96ef222def.
2024-04-09 12:37:31 +08:00
0c8d3d007d [fix](jni) don't delete global ref if scanner is not openned (#33398) 2024-04-09 09:06:16 +08:00
0234976ab7 [refactor](meta scan) Remove RPC from execute threads (#33378) 2024-04-08 20:28:02 +08:00