Commit Graph

132 Commits

Author SHA1 Message Date
e2bb86e7f8 [fix](inverted index) fixed in_list condition not indexed on pipelinex (#38178)
## Proposed changes

https://github.com/apache/doris/pull/36565
https://github.com/apache/doris/pull/37842
https://github.com/apache/doris/pull/37921
https://github.com/apache/doris/pull/37386

<!--Describe your changes.-->
2024-07-25 14:42:34 +08:00
4a277affdc [fix](scan) In-predicate should not be pushed down for non-key column(#35913) (#35968)
pick #35913
2024-06-11 11:13:34 +08:00
fe1a4c4136 [Feature](IP) support ipv4/ipv6 with inverted index and conjuncts for query (#35734)
support data type ipv4/ipv6 with inverted index 
and then we can query like "> or < or >= or <= or in/not in " this
conjuncts expr for ip with inverted index speeding up
2024-06-03 23:24:03 +08:00
cc00666be6 [opt](inverted index) add inlist condition handling to compound (#34134)
1. Previously, the compound did not support the inlist condition, which could impact performance if an inverted index was created.
2024-05-10 14:35:47 +08:00
6bcf24b1f6 [bug](not in) if not in (null) could eos early (#33482)
* [bug](not in) if not in (null) could eos early
2024-04-17 23:41:59 +08:00
Pxl
5f30463bb3 [Chore](descriptors) remove unused codes for descriptors (#33408)
remove unused codes for descriptors
2024-04-12 15:09:25 +08:00
a8232c67f9 [pipelineX](runtime filter) Fix task timeout caused by runtime filter (#33332) (#33369) 2024-04-08 16:30:32 +08:00
c8f3643890 [exec](runtimefilter) support null aware in runtime filter (#32152)
null aware in runtime filter
2024-03-15 18:05:13 +08:00
9bf22a872a [Bug](fix) fix or and "<=>" cause coredump in query (#31884) 2024-03-07 16:53:19 +08:00
52c45e38af [Refactor](RF) refactor the profile of rf and pipeline-x support local ignore (#31287)
* [Refactor](RF) refactor the profile of rf and pipeline-x support local ignore

* fix local merge filter
2024-02-23 19:05:06 +08:00
366a6792bf [refactor](scanner) refactoring and optimizing scanner scheduling (#30746) 2024-02-16 10:12:24 +08:00
378d9e7336 [Colo][Scan] delete the colo scan code (#30584) 2024-01-31 23:53:39 +08:00
d3bf23d70d [chore](removelogs) remove debug query timeout logs (#30006)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-01-16 18:48:18 +08:00
e4e57e9b05 [chore](removelogs) remove debug query timeout logs 2024-01-12 14:37:20 +08:00
0d691c638b [Feature](profile)Support report runtime workload statistics #29591 2024-01-12 11:59:27 +08:00
abb7640d37 [debug](timeout) add more log in scanner ctx to find timeout problem #29704
Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-01-12 11:44:21 +08:00
bd8113f424 [bugfix](scannerscheduler) should minus num_of_scanners before check should schedule #28926 (#29331)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-01-03 20:47:35 +08:00
c75e63a2a5 [Improvement](scan) Use scanner to do projection of scan node (#29124) 2023-12-27 16:00:52 +08:00
1545c36d16 Revert "[bugfix](scannercore) scanner will core in deconstructor during collect profile (#28727)" (#28931)
This reverts commit 4066de375efe6ff8e156a61df4f9316b3d9eaa4e.
2023-12-24 20:37:33 +08:00
4066de375e [bugfix](scannercore) scanner will core in deconstructor during collect profile (#28727) 2023-12-23 11:09:46 +08:00
73f7b61019 [refactor](scanner) use weak ptr to lock task execution context to avoid core in scanner dctor (#28493)
using weak ptr as a lock between fragment execute thread and scanner thread, to solve the core problem in scanner's dctor to access scannode's profile.
2023-12-18 14:09:32 +08:00
9fe2fce306 [minor](refactor) remove unused code (#28383) 2023-12-14 17:16:41 +08:00
7398c3daf1 [Feature-Variant](Variant Type) support variant type query and index (#27676) 2023-11-29 10:37:28 +08:00
1ebb54afdc [fix](null equal) fix coredump of pushing eq_for_null (#27341) 2023-11-21 18:36:33 +08:00
504ec324bb Revert "[refactor](scan) delete bloom_filter_predicate (#26499)" (#26851)
This reverts commit 2bb3ef198144954583aea106591959ee09932cba.
2023-11-13 16:27:23 +08:00
a5565f68b2 [Refactor](opentelemetry) Remove opentelemetry (#26605) 2023-11-09 18:05:34 +08:00
2bb3ef1981 [refactor](scan) delete bloom_filter_predicate (#26499) 2023-11-07 19:37:31 +08:00
c3527672a5 [refactor & pipelineX][pick fix] Pick fix of predicate pushdown to pipelineX (#25953)
Co-authored-by: JackDrogon <jack.xsuperman@gmail.com>
2023-10-26 18:04:43 +08:00
6e1a4dbda2 [Fix](predicate pushdown) Common expression not acting on any slot should not be pushed down (#25901) 2023-10-26 11:20:12 +08:00
693982fd1a [feature](decimal) support decimal256 (#25386) 2023-10-25 15:47:51 +08:00
6b2eed779c [feature](AuditLog) add scanRows scanBytes in auditlog (#25435) 2023-10-25 10:00:35 +08:00
b5ee4a9dbb [enhancement](profilev2) add some fields for profile v2 (#25611)
Add 3 counters for ExecNode:

ExecTime - Total execution time(excluding the execution time of children).
OutputBytes - The total number of bytes output to parent.
BlockCount - The total count of blocks output to parent.
2023-10-23 15:55:40 +08:00
3d1206d325 [date](fix) modify push-down predicate for datev1 type (#25571)
For comparison predicate, two arguments must be cast to datetime and push down to storage if either one is date type. This PR disables predicate push-down for this case.
2023-10-19 14:18:27 +08:00
53b46b7e6c [FIX](filter) update for filter_by_select logic (#25007)
this pr is aim to update for filter_by_select logic and change delete limit

only support scala type in delete statement where condition
only support column nullable and predict column support filter_by_select logic, because we can not push down non-scala type to storage layer to pack in predict column but do filter logic
2023-10-09 21:27:40 +08:00
5c020be4d2 [Bug](join) corner case cause the mark join + null aware left join core dump in regression test in pipeline query engine (#25087) 2023-10-08 22:50:12 +08:00
642e5cdb69 [Fix](Status) Make Status [[nodiscard]] and handle returned Status correctly (#23395) 2023-09-29 22:38:52 +08:00
9579634eac [Debug](pipeline) add log of pipeline scan bug (#24804) 2023-09-25 08:38:31 +08:00
Pxl
35c5d71549 [Improvement](join) some improvement of hash join (#23972)
some improvement of hash join
2023-09-14 17:55:35 +08:00
dbf509edc0 [Debug](scan) Add debug log for find p0 scan coredump in pipeline (#24202) 2023-09-12 12:17:44 +08:00
962221cb18 [test](log) add log for debug case failure (#23506) 2023-08-28 10:45:25 +08:00
Pxl
d9db3f5431 [Improvement](scan) Remove redundant predicates on scan node (#23374)
* Remove redundant predicates on scan node

* update

* fix
2023-08-25 10:41:37 +08:00
Pxl
8ed4045df9 [Chore](primitive-type) remove VecPrimitiveTypeTraits (#22842) 2023-08-23 08:37:40 +08:00
a5ca6cadd6 [Improvement] Optimize count operation for iceberg (#22923)
Iceberg has its own metadata information, which includes count statistics for table data. If the table does not contain equli'ty delete, we can get the count data of the current table directly from the count statistics.
2023-08-18 09:57:51 +08:00
Pxl
56392e21ae [Bug](decimalv3) fix decimalv3 keyrange set wrong number #22818 2023-08-10 18:15:40 +08:00
c9dc715c5d [fix](broker-load) fix error when using multi data description for same table in load stmt (#22666)
For load request, there are 2 tuples on scan node, input tuple and output tuple.
The input tuple is for reading file, and it will be converted to output tuple based on user specified column mappings.

And the broker load support different column mapping in different data description to same table(or partition).
So for each scanner, the output tuples are same but the input tuple can be different.

The previous implements save the input tuple in scan node level, causing different scanner using same input tuple,
which is incorrect.
This PR remove the input tuple from scan node and save them in each scanners.
2023-08-07 20:03:03 +08:00
ae8a26335c [opt](hive)opt select count(*) stmt push down agg on parquet in hive . (#22115)
Optimization "select count(*) from table" stmtement , push down "count" type to BE.
support file type : parquet ,orc in hive .

1. 4kfiles , 60kwline num 
    before:  1 min 37.70 sec 
    after:   50.18 sec

2. 50files , 60kwline num
    before: 1.12 sec
    after: 0.82 sec
2023-07-29 00:31:01 +08:00
23e7423748 [pipeline](refactor) refactor pipeline task schedule logics (#22028) 2023-07-25 17:18:26 +08:00
Pxl
19492b06c1 [Bug](decimalv3) fix failed on test_dup_tab_decimalv3 due to wrong precision (#21890)
fix failed on test_dup_tab_decimalv3 due to wrong precision
2023-07-18 12:53:09 +08:00
f87a3ccba2 [fix](runtime_filter) runtime_profile was not initialized in multi_cast_data_stream_source (#21690) 2023-07-11 00:16:29 +08:00
90dd8716ed [refactor](multicast) change the way multicast do filter, project and shuffle (#21412)
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>

1. Filtering is done at the sending end rather than the receiving end
2. Projection is done at the sending end rather than the receiving end
3. Each sender can use different shuffle policies to send data
2023-07-04 16:51:07 +08:00