Commit Graph

774 Commits

Author SHA1 Message Date
Pxl
b727033906 [Chore](build) enable -Wextra and remove some -Wno (#15760)
enable -Wextra and remove some -Wno
2023-01-15 10:40:35 +08:00
16862d9b43 [refactor](remove unused code) remove buffer pool and disk io mgr (#15853)
* [refactor](remove buffer pool and disk io mgr) remove unused code


Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-13 09:42:58 +08:00
d857b4af1b [refactor](remove row batch) remove impala rowbatch structure (#15767)
* [refactor](remove row batch) remove impala rowbatch structure

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-11 09:37:35 +08:00
124c8662e8 [Bug](schema scanner) Fix wrong type in schema scanner (#15768) 2023-01-11 08:37:39 +08:00
90a92f0643 [feature-wip](multi-catalog) add iceberg tvf to read snapshots (#15618)
Support new table value function `iceberg_meta("table" = "ctl.db.tbl", "query_type" = "snapshots")`
we can use the sql `select * from iceberg_meta("table" = "ctl.db.tbl", "query_type" = "snapshots")` to get snapshots info  of a table. The other iceberg metadata will be supported later when needed.

One of the usage:

Before we use following sql to time travel:
`select * from ice_table FOR TIME AS OF "2022-10-10 11:11:11"`;
`select * from ice_table FOR VERSION AS OF "snapshot_id"`;
we can use the snapshots metadata to get the `committed time` or `snapshot_id`, 
and then, we can use it as the time or version in time travel clause
2023-01-10 22:37:35 +08:00
c3da5a687a [fix]fixed dangerous usage of namespace std (#15741)
Co-authored-by: zhaochangle <zhaochangle@selectdb.com>
2023-01-10 16:10:49 +08:00
d0e8f84279 [feature](vectorized) Support MemoryScratchSink on vectorized engine (#15612) 2023-01-10 10:38:35 +08:00
9e3a61989b [refactor](es) remove BE generated dsl for es query #15751
remove fe config enable_new_es_dsl and all related code.
Now the DSL for es is always generated on FE side.
2023-01-10 08:40:32 +08:00
1018657d9d [Enhancement](SparkLoad): avoid BE OOM in push task, fix #15572 (#15620)
Release memory pool held by the parquet reader when the data has been flushed by rowset writter.
Co-authored-by: spaces-x <weixiang06@meituan.com>
2023-01-05 10:20:32 +08:00
17286861ef [Fix](multi catalog)Skip non-vectorized init code for NewFileScanNode. #15550 2023-01-03 09:22:17 +08:00
87110ad3e3 [chore](Sink)remove useless OlapTablePartitionParam-related code (#15549) 2023-01-02 22:47:16 +08:00
100834df8b [fix](nereids) fix some arrgregate bugs in Nereids (#15326)
1. the agg function without distinct keyword should be a "merge" funcion in threePhaseAggregateWithDistinct
2. use aggregateParam.aggMode.consumeAggregateBuffer instead of aggregateParam.aggPhase.isGlobal() to indicate if a agg function is a "merge" function
3. add an AvgDistinctToSumDivCount rule to support avg(distinct xxx) in some case
4. AggregateExpression's nullable method should call inner function's nullable method.
5. add a bind slot rule to bind pattern "logicalSort(logicalHaving(logicalProject()))"
6. don't remove project node in PhysicalPlanTranslator
7. add a cast to bigint expr when count( distinct datelike type )
8. fallback to old optimizer if bitmap runtime filter is enabled.
9. fix exchange node mem leak
2022-12-30 23:07:37 +08:00
edecc2e706 [feature-wip](inverted index) API for inverted index reader and syntax for fulltext match (#14211)
* [feature-wip](inverted index)inverted index api: reader

* [feature-wip](inverted index) Fulltext query syntax with MATCH/MATCH_ALL/MATCH_ALL

* [feature-wip](inverted index) Adapt to index meta

* [enhance] add more metrics

* [enhance] add fulltext match query check for column type and index parser

* [feature-wip](inverted index) Support apply inverted index in compound predicate which except leaf node of and node
2022-12-30 21:48:14 +08:00
85c7c531f1 [vectorized](jdbc) support array type in jdbc external table (#15303) 2022-12-30 00:29:08 +08:00
305dd15fea [improvement](index) Support bitmap index can be applied with compound predicate when enable vectorized engine query (#13035)
Current bitmap index only can apply pushed down predicates which in AND conditions. When predicates in OR conditions and other complex compound conditions, it will not be pushed down to the storage layer, this leads to read more data.

Based on that situation, this pr will do:

1. this pr in order to support bitmap index apply compound predicates, query sql like:
select * from tb where a > 'hello' or b < 100;
select * from tb where a > 'hello' or b < 100 or c > 'ok';
select * from tb where (a > 'hello' or b <100) and (a < 'world' or b > 200);
select * from tb where (not a> 'hello') or b < 100;
...
above sql,column a and b and c has created bitmap_index.

2. this optimization can reduce reading data by index
3. set config enable_index_apply_compound_predicates to use this optimization
2022-12-28 20:08:57 +08:00
a807978882 [refactor](non-vec) Remove rowbatch code from delta writer and some rowbatch related code (#15349)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-12-26 08:54:51 +08:00
8515a03ef9 [fix](compile) fix compile error caused by mysql_scan_node.cpp not being found when enabling WITH_MYSQL (#15277) 2022-12-23 16:25:28 +08:00
b085ff49f0 [refactor](non-vec) delete non-vec data sink (#15283)
* [refactor](non-vec) delete non-vec data sink

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-12-23 14:10:47 +08:00
e9a201e0ec [refactor](non-vec) delete some non-vec exec node (#15239)
* [refactor](non-vec) delete some non-vec exec node
2022-12-22 14:05:51 +08:00
af54299b26 [Pipeline](projection) Support projection on pipeline engine (#15220) 2022-12-21 15:47:29 +08:00
efdc73777a [enhancement](load) verify the number of rows between different replicas when load data to avoid data inconsistency (#15101)
It is very difficult to investigate the data inconsistency of multiple replicas.
When loading data, the number of rows between replicas is checked to avoid some data inconsistency problems.
2022-12-21 09:50:13 +08:00
494eb895d3 [vectorized](pipeline) support union node operator (#15031) 2022-12-19 22:01:56 +08:00
1597afcd67 [fix](mutil-catalog) fix get many same name db/table when show where (#15076)
when show databases/tables/table status where xxx, it will change a selectStmt to select result from 
information_schema, it need catalog info to scan schema table, otherwise may get many
database or table info from multi catalog.

for example
mysql> show databases where schema_name='test';
+----------+
| Database |
+----------+
| test |
| test |
+----------+

MySQL [internal.test]> show tables from test where table_name='test_dc';
+----------------+
| Tables_in_test |
+----------------+
| test_dc |
| test_dc |
+----------------+
2022-12-19 14:27:48 +08:00
401d5776b0 [fix](compile) compile error while with DORIS_WITH_MYSQL #15105
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-12-15 20:40:33 +08:00
Pxl
c25a7235f9 [Pipeline](load) support pipeline broker load (#14940)
support pipeline broker load
2022-12-13 00:28:36 +08:00
f3aea7f0f0 [Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744) 2022-12-11 23:33:18 +08:00
0b945e4ee3 [fix](csv-reader) fix be crash when reading invalid value (#14951) 2022-12-10 18:45:47 +08:00
68092fe514 [pipeline](NLJ) support nested loop join for pipeline (#14966) 2022-12-10 00:20:16 +08:00
873b128fde [feature](pipeline) add inersect/except operators (#14868) 2022-12-09 14:13:48 +08:00
5292880310 [refactor](odbc) move param to config (#14596)
move param to config
2022-12-06 17:38:52 +08:00
b30cd86e9e [Refactor](pipeline) Refactor operator and builder code of pipeline (#14787) 2022-12-05 18:35:00 +08:00
12304bc0ee [Pipeline](exec) Support pipeline exec engine (#14736)
Co-authored-by: Lijia Liu <liutang123@yeah.net>
Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>
Co-authored-by: Pxl <952130278@qq.com>
Co-authored-by: shee <13843187+qzsee@users.noreply.github.com>
Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>

## Problem Summary:

### 1. Design

DSIP: https://cwiki.apache.org/confluence/display/DORIS/DSIP-027%3A+Support+Pipeline+Exec+Engine

### 2. How to use:

Set the environment variable `set enable_pipeline_engine = true; `
2022-12-02 17:11:34 +08:00
176f519fa1 [enhancement](memtracker) Optimize exec node memory tracking (#14711) 2022-12-01 14:52:21 +08:00
898d0d42f1 [improvement](load)add more log for better bug tracing experience for be write (#14424)
Recently when tracing one bug happened in version 1.1.4
I found out there were some places we can add more log for a better tracing.
2022-11-29 22:28:39 +08:00
39c47d930b [improvement](load) add more log on rpc error (#14559)
* [improvement](load) add more log on rpc error

* update
2022-11-28 08:32:20 +08:00
9103ded1dd [improvement](join)optimize sharing hash table for broadcast join (#14371)
This PR is to make sharing hash table for broadcast more robust:

Add a session variable to enable/disable this function.
Do not block the hash join node's close function.
Use shared pointer to share hash table and runtime filter in broadcast join nodes.
The Hash join node that doesn't need to build the hash table will close the right child without reading any data(the child will close the corresponding sender).
2022-11-24 21:06:44 +08:00
7f4cc61286 [fix](cast)prevent be from crashing when cast function is not available (#14540)
* [fix](cast)prevent be from crashing when cast function is not available

* format code
2022-11-24 14:17:49 +08:00
Pxl
bcd641877f [Enhancement](scan) disable build key range and filters when push down agg work (#14248)
disable build key range and filters when push down agg work
2022-11-21 12:47:57 +08:00
41dae8b6bb [improvement](load) add a log when close OlapTableSink with error (#14257) 2022-11-21 10:33:37 +08:00
2c42f0a905 [refactor](decimalv3) Refine code for DecimalV3 (#14394) 2022-11-19 16:57:17 +08:00
512b787559 [fix](parquet-reader) fix stack-use-after-return error (#14411) 2022-11-19 10:52:50 +08:00
a82896f420 [fix](broker-load) fix that broker load don not set be exec version and limit node channel memory (#14399) 2022-11-18 23:38:37 +08:00
d5af4f6558 [Neried](Profile) Add projection timer for neried (#14286) 2022-11-17 22:17:55 +08:00
dba19e591c [cherry-pick](scanner) using avg rowset to calculate batch size instead of using total_bytes since it costs a lot of cpu (#14345)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-11-17 18:57:21 +08:00
20634ab7e3 [feature-wip](multi-catalog) support partition&missing columns in parquet lazy read (#14264)
PR https://github.com/apache/doris/pull/13917 has supported lazy read for non-predicate columns in ParquetReader, 
but can't trigger lazy read when predicate columns are partition or missing columns.
This PR support such case, and fill partition and missing columns in `FileReader`.
2022-11-16 08:43:11 +08:00
3ea9d3f2e1 [enhancement](array) support read list(Array) type from orc file (#14132)
Before this pr, if we try to load ORC file with native list(or array) type data, the be will crash.
Because complex types in ORC file include multi real columns, so we need to filter columns by column names.
Otherwise we could not read all columns we need.
Now arrow release-7.0.0 only support create stripe reader by column index, so we patch it to support create stripe reader by column names.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-11-15 17:48:17 +08:00
5badd70db2 [fix](csv-reader) Fix core dump when load text into doris with special delimiter (#14196) 2022-11-15 16:06:59 +08:00
333c6390ee [fix](be-ut) AddressSanitizer detects container-overflow issues (#14255)
* [chore] Fix the container-overflow errors detected by address sanitizer

* Fix compilation errors
2022-11-15 15:49:55 +08:00
7eed5a292c [feature-wip](multi-catalog) Support hive partition cache (#14134) 2022-11-14 14:12:40 +08:00
dd11d5c0a5 [enhancement](memory) Support try catch bad alloc (#14135) 2022-11-13 11:22:56 +08:00