Commit Graph

1428 Commits

Author SHA1 Message Date
4a277affdc [fix](scan) In-predicate should not be pushed down for non-key column(#35913) (#35968)
pick #35913
2024-06-11 11:13:34 +08:00
9e972cb0b9 [bugfix](iceberg)Fix the datafile path error issue for 2.1 (#36066)
bp: #35957
2024-06-08 21:51:46 +08:00
fe1a4c4136 [Feature](IP) support ipv4/ipv6 with inverted index and conjuncts for query (#35734)
support data type ipv4/ipv6 with inverted index 
and then we can query like "> or < or >= or <= or in/not in " this
conjuncts expr for ip with inverted index speeding up
2024-06-03 23:24:03 +08:00
bc062a2595 [fix](orc)fix orc reader missing column. (#35735)
## Proposed changes
bp #35583 
Issue Number: close #xxx

<!--Describe your changes.-->
2024-05-31 22:51:44 +08:00
fb9363f042 [fix](set) incorrect result of set operator (#35607)
If there are duplicated expressions in the select list, the result will
be incorrect.

## Proposed changes

Issue Number: close #28438

<!--Describe your changes.-->
2024-05-30 19:59:37 +08:00
680be6d19f [fix](ub) fix uninitialized accesses in BE (#35370)
ubsan hints:
```c++
/root/doris/be/src/olap/hll.h:93:29: runtime error: load of value 3078029312, which is not a valid value for type 'HllDataType'
/root/doris/be/src/olap/hll.h:94:23: runtime error: load of value 3078029312, which is not a valid value for type 'HllDataType'
/root/doris/be/src/runtime/descriptors.h:439:38: runtime error: load of value 118, which is not a valid value for type 'bool'
/root/doris/be/src/vec/exec/vjdbc_connector.cpp:61:50: runtime error: load of value 35, which is not a valid value for type 'bool' 
```
2024-05-29 20:31:07 +08:00
b91d2caab8 [Feature](iceberg-writer) Implements iceberg sink basic functionality for inserting into table. (#35587)
backport #34929
2024-05-29 16:40:54 +08:00
86c7092f21 [opt](external) ignore not find files (#35319)
The file list is got from external meta cache, and the file may already
be removed from storage.
We should ignore not found files and that query continue.
2024-05-28 18:51:56 +08:00
d97788dec8 [Refactor](Status) Refactor the scanner scheduler code make return error msg means (#35286)
## Proposed changes

Before error msg:
```
Failed to submit scanner to scanner pool
```

After error msg:
```
Failed to submit scanner to scanner pool reason:Scan thread pool had shutdown|type 1

```
2024-05-28 18:49:55 +08:00
596fb6f327 [improve](ub) fix some runtime error of ubsan when downcast (#35343)
those code could work well, but it will be report some runtime error under UBSAN,
so refactor it to let's ubsan could running happy.
2024-05-27 15:27:43 +08:00
c44affb43f Add downgrade scan thread num by column num (#35351) 2024-05-27 15:27:12 +08:00
68eda58a8c [Fix](multi-catalog) Fix string dict filtering when use null related function in parquet and orc reader. (#35335)
The following sql and when the dictionary column contains functions related to null, the results will be incorrect.
```
select * from ( select IF(o_orderpriority IS NULL, 'null', o_orderpriority) AS o_orderpriority from test_string_dict_filter_orc ) as A where o_orderpriority = 'null';
```
```
select * from ( select IFNULL(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null'
```
```
select * from ( select COALESCE(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null';
```
2024-05-27 15:25:29 +08:00
7284b6959f [Configurations](multi-catalog)Fix enable_orc_filter_by_min_max functionality, the mistake for #35012. (#35320)
fix bug introduced from  #35012
2024-05-27 15:25:07 +08:00
2e20e38523 [improvement](jdbc catalog) remove useless jdbc catalog code (#34986) (#35418) 2024-05-27 14:25:26 +08:00
34e5030702 [bugifx](core) fix logical error of status check in nestedloop join (#35365) 2024-05-25 17:46:44 +08:00
4b91ad003f [opt](memory) avoid allocate memory in agg operator constructor (#35301)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-05-24 16:23:58 +08:00
eb49cd839b [refactor](datalake) return the error status instead of static_cast<void> (#34873)
Followup #34797
`static_cast<void>` has ignored the wrong status, some of them should make the query finished with error status, so replace `static_cast<void>`  with `RETURN_IF_ERROR`.

The following three scenarios need to be handled separately and cannot be simply replaced:
1. The outer function returns void;
2. Call status function inner constructors or destructors;
3. Call status function with best effort, and should ignore the wrong status.
2024-05-23 19:06:21 +08:00
Pxl
e962a7309b [Chore](runtime-filter) adjust some check and error msg on runtime filter (#35018) (#35251)
adjust some check and error msg on runtime filter
2024-05-23 11:20:02 +08:00
adc364a6fd [feature](Paimon) support deletion vector for Paimon naive reader (#34743) (#35241)
bp #34743
Co-authored-by: 苏小刚 <suxiaogang223@icloud.com>
2024-05-23 00:01:30 +08:00
3a5fb6265a [refactor](jdbc catalog) split trino jdbc executor (#34932) (#35176)
pick #34932
2024-05-22 19:09:57 +08:00
05a390e050 [refactor](jdbc catalog) split oceanbase jdbc executor (#34869) (#35175)
pick #34869
2024-05-22 19:09:35 +08:00
291cf57c54 [Configurations](multi-catalog) Add enable_parquet_filter_by_min_max and enable_orc_filter_by_min_max Session variables. (#35012) (#35164)
backport #35012
2024-05-22 19:06:12 +08:00
f38ecd349c [enhancement](memory) return error if allocate memory failed during add rows method (#35085)
* return error when add rows failed

* f

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-05-22 00:53:34 +08:00
903ff32021 [opt](fe) exit FE when transfer to (non)master failed (#34809) (#35158)
bp #34809
2024-05-21 22:31:47 +08:00
98f8eb5c43 [opt](split) get file splits in batch mode (#34032) (#35107)
bp  #34032
2024-05-21 22:27:07 +08:00
74d66e9650 [Fix](parquet-reader) Fix Timestamp Int96 min-max statistics is incorrect when was written by some old parquet writers by disable it. (#35041)
Parquet INT96 timestamp values were compared incorrectly for the purposes of producing statistics
by older parquet writers, so PARQUET-1065 deprecated them. The result is that any writer that produced
stats was producing unusable incorrect values, except the special case where min == max and an incorrect
ordering would not be material to the result. PARQUET-1026 made binary stats available and valid in that special case.
2024-05-21 13:00:22 +08:00
c0fd98abe5 [Fix](tvf) Fix that tvf reading empty files in compressed formats. (#34926)
1. Fix the issue with tvf reading empty compressed files.
2. move two test cases (`test_local_tvf_compression` and `test_s3_tvf_compression`) from p2 to p0
2024-05-21 12:59:31 +08:00
8ca399ab92 [exec](pipeline) runtime filter wait time (#35108) 2024-05-21 12:50:05 +08:00
6b1c441258 [fix](group_commit) Wal reader should check block length to avoid reading empty block (#34792) 2024-05-18 18:17:56 +08:00
6c515e0c76 [fix](group commit) Make compatibility issues on serializing and deserializing wal file more clear (#34793) 2024-05-18 18:12:43 +08:00
80dd027ce2 [opt](join) For left semi/anti join without mark join conjunct and without other conjucnts, stop probing after matching one row (#34703) 2024-05-18 18:08:50 +08:00
1f0c45204b [fix](iceberg) read the primary key columns if hasing equality delete (#34884)
backport: #34835
2024-05-15 11:37:25 +08:00
02084fd91f [fix](iceberg_orc)Fixed the bug that the iceberg reader did not perform position delete when reading the orc file without a predicate. (#34814) (#34882)
bp #34814
2024-05-15 11:31:29 +08:00
9491b7d422 [fix](iceberg) prevent coredump if read position delete file failed (#34802) 2024-05-14 14:03:33 +08:00
8c237e82a3 [Bug](exec) fix intersections/differences bug (#34675) 2024-05-11 11:45:31 +08:00
cc00666be6 [opt](inverted index) add inlist condition handling to compound (#34134)
1. Previously, the compound did not support the inlist condition, which could impact performance if an inverted index was created.
2024-05-10 14:35:47 +08:00
e085f75a43 [opt](file-scanner) print current path when encountering error (#34365) (#34523)
bp #34365
2024-05-08 14:49:03 +08:00
4be589951b Revert "Revert "[fix](csv-reader) fix column split error when there is escape character (#34364)""
This reverts commit d127d67ebe989484bbdf340a4de5b79ded56eecc.
2024-05-07 18:03:56 +08:00
d127d67ebe Revert "[fix](csv-reader) fix column split error when there is escape character (#34364)"
This reverts commit 971e10a9db782c9986b20e1209468e4d7aeedf71.
2024-05-07 13:36:11 +08:00
9d0d7293f0 [fix](json) fix be crash while load json data (#34283) 2024-05-07 07:42:53 +08:00
971e10a9db [fix](csv-reader) fix column split error when there is escape character (#34364) 2024-05-07 07:38:35 +08:00
35f8563a75 [feature](iceberg) support iceberg equality delete (#34223) (#34327)
bp #34223

Co-authored-by: Ashin Gau <AshinGau@users.noreply.github.com>
2024-04-30 11:51:29 +08:00
1bfe0f0393 [feature](iceberg)support read iceberg complex type,iceberg.orc format and position delete. (#33935) (#34256)
master #33935
2024-04-29 14:40:12 +08:00
99af54f779 [Fix](orc-reader) Fix the issue when string col has mixed plain and dict encoding in different stripes. (#34146) (#34248)
backport #34146
2024-04-28 19:43:57 +08:00
0f0c0a266b [opt](parquet)Skip page with offset index (#33082)
Make skip_page() in ColumnChunkReader more efficient. No more reading page headers if there are pagelocations in chunk.
2024-04-26 15:06:16 +08:00
9f0a5690a6 [profile](scan) add projection time in scaner #34120 2024-04-26 07:43:40 +08:00
47b54d4bd5 Fix remote scan pool (#33976) 2024-04-25 15:04:43 +08:00
799c43686c [fix](jni-connector) avoid core dump if init connector failed (#34007)
_jni_scanner_cls may be null if connector init failed.
So need to check it before delete it.
2024-04-24 17:13:50 +08:00
Pxl
5a5063be20 [bug](fix) heap use after free when json parse failed (#33955) 2024-04-22 22:33:24 +08:00
4d7ac82305 [profile](scanner) Fix wrong metrics (#33965) 2024-04-22 22:33:24 +08:00