Commit Graph

60 Commits

Author SHA1 Message Date
8c0e13ab51 [improvement](profile) add detail memory counter for exec nodes (#14806)
* [improvement](profile) improve accuraccy of memory usage and add detail memory counter

* fix
2022-12-05 11:51:52 +08:00
12304bc0ee [Pipeline](exec) Support pipeline exec engine (#14736)
Co-authored-by: Lijia Liu <liutang123@yeah.net>
Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>
Co-authored-by: Pxl <952130278@qq.com>
Co-authored-by: shee <13843187+qzsee@users.noreply.github.com>
Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>

## Problem Summary:

### 1. Design

DSIP: https://cwiki.apache.org/confluence/display/DORIS/DSIP-027%3A+Support+Pipeline+Exec+Engine

### 2. How to use:

Set the environment variable `set enable_pipeline_engine = true; `
2022-12-02 17:11:34 +08:00
176f519fa1 [enhancement](memtracker) Optimize exec node memory tracking (#14711) 2022-12-01 14:52:21 +08:00
3e8b3658c7 [feature-wip](decimalv3) Support basic agg and arithmetic operations for decimal v3 (#14513) 2022-11-29 15:12:41 +08:00
1520e5c88a [enhancement](agg)use new method to serialize keys in batch if the key is too large (#14484)
* [enhancement](agg)use new method to serialize keys in batch if the key is too large

* fix compile error
2022-11-23 17:35:39 +08:00
2c42f0a905 [refactor](decimalv3) Refine code for DecimalV3 (#14394) 2022-11-19 16:57:17 +08:00
1f326fc0d6 [enhancement](be)limit mem cost to 16m when pre serialize keys in agg node (#14321)
* [enhancement](be)limit mem cost to 16m when pre serialize keys in agg node

* use only one chunk memory when serializing keys in agg node
2022-11-18 12:31:52 +08:00
1a035e2073 [fix](profile)(AggNode) fix the GetResultsTime is always zero (#14366)
add scoped_timer in _serialize_with_serialized_key_result
2022-11-17 22:30:21 +08:00
6d2e6d85d3 [enhancement](be)release memory in Node's close() method (#14258)
* [enhancement](be)release memory in Node's close() method

* format code
2022-11-15 15:59:23 +08:00
139c4a77f1 [enhancement](be)close ExecNode ASAP to release resource earlier (#14203) 2022-11-14 09:41:35 +08:00
dd11d5c0a5 [enhancement](memory) Support try catch bad alloc (#14135) 2022-11-13 11:22:56 +08:00
d6b72d9b89 [Bug](update) support to check optional value of agg_sort_infos (#13732) 2022-10-28 10:37:13 +08:00
4bc33a54a1 [Fix](agg) fix bitmap agg core dump when phmap pointer assert alignment (#13381) 2022-10-15 10:39:23 +08:00
8f4bb0f804 [improvement](agg) iterate aggregation data in memory written order (#12704)
Following the iteration order of the hash table will result in out-of-order access to aggregate states, which is very inefficient.
Traversing aggregate states in memory write order can significantly improve memory read efficiency.

Test
hash table items count: 3.35M

Before this optimization: insert keys into column takes 500ms
With this optimization only takes 80ms
2022-09-21 14:58:50 +08:00
8e4374b7ec [enhancement](agg)remove unnessasery mem alloc and dealloc in agg node (#12535) 2022-09-15 11:07:06 +08:00
14221adbbd [fix](agg) crash caused by failure of prepare (#12437) 2022-09-08 15:03:45 +08:00
3485dfa927 [chore](profile) add some counters in aggregatation & sender (#12385) 2022-09-07 10:09:05 +08:00
8c8078ad28 [fix](projections) get error row_descriptor when have projections on ExecNode (#12232)
When ExecNode's projections is not empty, it use output row descriptor to initialize the block before doing projection. But we should use original row descriptor. This PR fix it.
2022-09-01 10:48:10 +08:00
9a74ad1702 [feature](Nereids)add the ability of projection on each ExecNode and add column prune on OlapScan (#11842)
We have added logical project before, but to actually finish the prune to reduce the data IO, we need to add related supports in translator and BE.
This PR:
- add projections on each ExecNode in BE
- translate PhysicalProject into projections on PlanNode in FE
- do column prune on ScanNode in FE

Co-authored-by: HappenLee <happenlee@hotmail.com>
2022-08-30 16:17:10 +08:00
73a3471fbd [minor](conjuncts) remove row-based conjuncts from vectorized engine (#12053) 2022-08-25 10:13:20 +08:00
dc8f64b3e3 [improvement](agg) Serialize the fixed-length aggregation results with corresponding columns instead of ColumnString (#11801) 2022-08-22 10:12:06 +08:00
Pxl
cac317430f [Bug](aggregation) fix core dump on 2nd phase aggregate (#11843) 2022-08-18 14:42:34 +08:00
288b440b14 [improvement](vectorized) Improve count distinct performance by using fastunion (#11516)
Improve count distinct performance by using fastunion.
Testing our user real data has a 10-40% performance improvement.
2022-08-16 12:18:46 +08:00
01e4522612 [fix]collect_list/collect_set without GROUP BY for NOT NULL column (#11529)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-08-09 20:49:37 +08:00
092a394782 [improvement](agg)limit the output of agg node (#11461)
* [improvement](agg)limit the output of agg node
2022-08-05 07:53:55 +08:00
ecbf87d77b [bugfix](memtracker)fix exceed memory limit log (#11485) 2022-08-04 10:22:20 +08:00
842a5b8e24 [refactor](agg) Abstract the hash operation into a method" (#11399) 2022-08-02 17:27:19 +08:00
0325fa436e [fix](agg)Add field of 'is_first_phase' in TAggregationNode (#11321) 2022-08-01 11:49:50 +08:00
d360974dce [improvement](agg)Use phmap::flat_hash_set in AggregateFunctionUniq (#11363)
This reverts commit 688b55053dd1fc5113343a6f565ad732ddd9612a.
2022-08-01 10:36:11 +08:00
688b55053d Revert "[improvement]Use phmap::flat_hash_set in AggregateFunctionUniq (#11257)" (#11356)
This reverts commit a7199fb98e18b925664b38460b667d04cbee8e01.
2022-07-30 23:15:36 +08:00
a7199fb98e [improvement]Use phmap::flat_hash_set in AggregateFunctionUniq (#11257) 2022-07-29 16:55:22 +08:00
0b1d06bfd6 [Vectorized] Support order by aggregate function (#11187)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-07-28 09:12:58 +08:00
b74f36e009 [improvement]Use phmap for aggregation with integer keys (#11175) 2022-07-27 13:58:20 +08:00
37dff975a7 [bugfix] fix ASAN error alloc-dealloc-mismatch (#11168) 2022-07-25 18:14:20 +08:00
babab5d535 [feature-wip] support datetimev2 (#11085) 2022-07-23 16:07:59 +08:00
b7c9007776 [improvement][agg]Process aggregated results in the vectorized way (#11084) 2022-07-22 22:04:43 +08:00
4960043f5e [enhancement] Refactor to improve the usability of MemTracker (step2) (#10823) 2022-07-21 17:11:28 +08:00
899acb6564 [improvement][agg]import sub hashmap (#10937) 2022-07-18 18:36:45 +08:00
d1573e1a4a [improvement]Use phmap for aggregation with serialized key (#10821) 2022-07-14 11:26:09 +08:00
3b46242483 [feature-wip] Optimize Decimal type (#10794)
* [feature-wip](decimalv3) support decimalv3

* [feature-wip] Optimize Decimal type

Co-authored-by: liaoxin <liaoxinbit@126.com>
2022-07-14 10:50:50 +08:00
89e2678f4e [improvement]Increase min_ht_mem of StreamingHtMinReductionEntry (#10787) 2022-07-12 22:20:02 +08:00
d5ea677282 [feature](tracing) Support query tracing to improve doris observability by introducing OpenTelemetry. (#10533)
The collection of query traces is implemented in fe and be, and the spans are exported to zipkin.
DSIP: https://cwiki.apache.org/confluence/display/DORIS/DSIP-012%3A+Introduce+opentelemetry
2022-07-09 15:50:40 +08:00
e293fbd277 [improvement]pre-serialize aggregation keys (#10700) 2022-07-09 06:21:56 +08:00
aecf6e09a9 [fix] fix agg_memleak (#10571)
The previous code did not call 'destroy' to release the resource after the' create 'operation,
resulting in a memory leak. So I added Destroy
2022-07-03 20:22:26 +08:00
c9f86bc7e2 [refactor] Refactoring Status static methods to format message using fmt(#9533) 2022-07-02 18:58:23 +08:00
ca94867b4e [Feature-wip] add date v2 type (#9916) 2022-06-26 16:07:56 +08:00
476be35961 [TYPO] fix typo 'destory' -> 'destroy' (#10373) 2022-06-24 19:11:28 +08:00
200557052a [BUGFIX] wrong answer with with as + two phase agg (#10303) 2022-06-22 14:39:39 +08:00
4c24586865 [Vectorized][UDF] support java-udaf (#9930) 2022-06-15 10:53:44 +08:00
f377c26bf7 [refactor][be] Optimize headers (#9708) 2022-05-30 16:12:10 +08:00