Commit Graph

423 Commits

Author SHA1 Message Date
73ee352705 [fix](multi catalog)Fix convert_to_doris_type missing break for some cases (#14992) 2022-12-13 13:34:55 +08:00
e7a84e4a16 [fix](multi-catalog)fix page index thrift deserialize (#15001)
fix the err when parse page index: Couldn't deserialize thrift msg.
use two buffer to store column index and offset index msg, avoid parse them in a buffer
2022-12-13 13:33:19 +08:00
8fe0729835 [fix](multi catalog)Check orc file reader is not null before using it. (#14988)
The external table file path cache may out of date, which will cause orc reader to visit non-exist files.
In this case, orc file reader is nullptr.
This pr is to check the reader before using it to avoid core dump of visiting nullptr.
2022-12-13 11:27:51 +08:00
Pxl
c25a7235f9 [Pipeline](load) support pipeline broker load (#14940)
support pipeline broker load
2022-12-13 00:28:36 +08:00
f3aea7f0f0 [Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744) 2022-12-11 23:33:18 +08:00
7fb695b51d [Pipeline](select node) Support select node on pipeline engine (#14928) 2022-12-11 21:31:32 +08:00
ef46b580d0 [Vectorized](operator) support analytic eval operator (#14774) 2022-12-10 19:32:11 +08:00
68092fe514 [pipeline](NLJ) support nested loop join for pipeline (#14966) 2022-12-10 00:20:16 +08:00
wxy
af50461211 [fix](statistics) fix CpuTimeMS in audit log when enable_vectorized_engine=true. (#14853)
Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>
2022-12-09 21:13:05 +08:00
0c8fdc90fb [pipeline](union) support union operator (#14963) 2022-12-09 19:55:40 +08:00
4611947920 [Pipeline](Operator) Add schema scan node operator (#14955) 2022-12-09 19:55:03 +08:00
873b128fde [feature](pipeline) add inersect/except operators (#14868) 2022-12-09 14:13:48 +08:00
5a1c7f6314 [improvement](analytic) improve memory counter (#14890) 2022-12-09 14:13:17 +08:00
9d36931038 [Refactor](NLJ) refactor the nested loop join node (#14911)
* [Refactor](NLJ) refactor the nested loop join node

* change the logic of alloc/release resource
2022-12-09 14:10:26 +08:00
00f44257e2 [feature-wip](file-reader) Merge hdfs reader to the new file reader (#14875) 2022-12-09 13:21:59 +08:00
20f2abb3d4 [vectorized](pipeline) support assert num rows operator (#14923) 2022-12-09 09:39:29 +08:00
0c817e6b3a [Pipeline](hashjoin) Support hash join on pipeline engine (#14898) 2022-12-08 15:43:02 +08:00
962810b973 [Vectorized](jdbc) add check type for jdbc table (#14501) 2022-12-08 10:27:47 +08:00
Pxl
48a9166aa4 [Pipeline](sink) support olap table sink operator (#14872)
* support olap table sink operator

* update config
2022-12-07 15:29:56 +08:00
fcea89bcf4 [fix](const_expr) fix coredump caused by unsupported cast const expr (#14825) 2022-12-06 10:31:15 +08:00
b30cd86e9e [Refactor](pipeline) Refactor operator and builder code of pipeline (#14787) 2022-12-05 18:35:00 +08:00
1190fd4cd6 [Pipeline](regression) Add ssb flat for pipeline (#14763) 2022-12-05 15:05:23 +08:00
8c0e13ab51 [improvement](profile) add detail memory counter for exec nodes (#14806)
* [improvement](profile) improve accuraccy of memory usage and add detail memory counter

* fix
2022-12-05 11:51:52 +08:00
wxy
e141664339 [fix](statistics) fix missing scanBytes and scanRows in query statist… (#14750)
* [fix](statistics) fix missing scanBytes and scanRows in query statistics when enable_vectorized_engine=true.

Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>
2022-12-05 09:17:51 +08:00
12304bc0ee [Pipeline](exec) Support pipeline exec engine (#14736)
Co-authored-by: Lijia Liu <liutang123@yeah.net>
Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>
Co-authored-by: Pxl <952130278@qq.com>
Co-authored-by: shee <13843187+qzsee@users.noreply.github.com>
Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>

## Problem Summary:

### 1. Design

DSIP: https://cwiki.apache.org/confluence/display/DORIS/DSIP-027%3A+Support+Pipeline+Exec+Engine

### 2. How to use:

Set the environment variable `set enable_pipeline_engine = true; `
2022-12-02 17:11:34 +08:00
9dd1d989e8 [test](decimalv3) add regression test cases for decimalv3 (#14672) 2022-12-01 15:18:40 +08:00
176f519fa1 [enhancement](memtracker) Optimize exec node memory tracking (#14711) 2022-12-01 14:52:21 +08:00
b4d32a0c44 [fix](join) runtime filter shared from other instance wasn't be published (#14717) 2022-12-01 14:17:23 +08:00
Pxl
bba77fa9dd [Enhancement](profile) enhance column predicates display on profile (#14664) 2022-12-01 13:07:12 +08:00
7873bc95a6 [Enhancement](bitmapfilter) Support bitmap filter to apply zone_map index to filter pages (#14635) 2022-12-01 10:41:09 +08:00
6c70d794f6 [fix](bitmapfilter) fix core dump caused by bitmap filter (#14702) 2022-12-01 09:56:22 +08:00
9272680d00 [feature](multi-catalog) support Jdbc catalog (#14527)
Issue Number: close #xxx

I add jdbc catalog for doris multi-catalog feature.
Currently, the jdbc catalog only supports MYSQL DBMS.

TODO:

support for postgre DB
Support for other databases.
Problem summary
For jdbc catalog, we can create catalog like:

CREATE CATALOG jdbc4 PROPERTIES (
    "type"="jdbc",
    "jdbc.user"="root",
    "jdbc.password"="123456",
    "jdbc.jdbc_url" = "jdbc:mysql://127.0.0.1:13396/demo?yearIsDateType=false",
    "jdbc.driver_url" = "file:/mnt/disk2/ftw/tools/jar/mysql-connector-java-5.1.47/mysql-connector-java-5.1.47.jar",
    "jdbc.driver_class" = "com.mysql.jdbc.Driver"
);
Note:
yearIsDateType is a param of jdbc:
If yearIsDateType configuration property is set to false, then the returned object type is java.sql.Short. If set to true (the default), then the returned object is of type java.sql.Date with the date set to January 1st, at midnight.
To compat with mysql, we force the use of yearIsDateType=false in FE. if user sets yearIsDateType=true, doris FE will force to change yearIsDateType=false.
2022-11-30 11:28:08 +08:00
3e8b3658c7 [feature-wip](decimalv3) Support basic agg and arithmetic operations for decimal v3 (#14513) 2022-11-29 15:12:41 +08:00
f7a827c06b [fix](new-scan) fix some bugs about new scan node and readers (#14504)
json reader DCHECK fail because of missing TYPE_STRING

fix bug that if no file is found, the tvf will throw NPE.

The predicate conjuncts can not be pushed down to parquet reader if this is a load task.
Because the predicate should be applied on column of dest table, not on column of source file.

Add a temp property "use_new_load_scan_node" of broker load to make regression test happy.
So that we can use new load scan node for a certain job and avoid setting global FE config.
2022-11-29 10:21:41 +08:00
7513c82431 [NLJoin](conjuncts) separate join conjuncts and general conjuncts (#14608) 2022-11-29 08:55:54 +08:00
78adecac1b [enhancemennt](be)optimize mem usage in join and set node (#14602) 2022-11-27 13:38:49 +08:00
36419fae48 [fix](JdbcExecutor) fix that JdbcExecutor did not load the class jar (#14598)
JdbcExecutor did not load jdbc driver jar, so add classloader to load jdbc jar.
2022-11-26 23:53:05 +08:00
064b8d2aa6 [fix](multi-catalog) fix coredump when querying partitioned hive table with text format (#14604)
BE will crash when querying partitioned hive table with text format
and put partition column at first of select items.

1. FE should use file slots to set the column mapping index of csv file.
2. BE should use `get_by_name` of block to get right column in a block in csv reader.
2022-11-26 11:42:40 +08:00
4728e75079 [feature](bitmap) Support in bitmap syntax and bitmap runtime filter (#14340)
1.Support in bitmap syntax, like 'where k1 in (select bitmap_column from tbl)';
2.Support bitmap runtime filter. Generate a bitmap filter using the right table bitmap and push it down to the left table storage layer for filtering.
2022-11-25 15:22:44 +08:00
25de068a05 [fix](parquet-reader) the value of null map will overflow when LazyRead merges too many empty batches (#14558)
The run length of null map is saved as `uint16_t`. Previously, the run length of null map was
limited by `batch_size` in the `ParquetReader`, by setting `batch_size = std::min(batch_size, (size_t)USHRT_MAX)`.
It works well when the batch size is less than `USHRT_MAX`.
However, [Lazy read](https://github.com/apache/doris/pull/13917) will merge empty batches until reading
a non-empty batch or reaching the EOF of a row group, so the `batch_size` may be greater than `USHRT_MAX`
in non-predicate columns.
In addition, even if the `batch_size` does not exceed `USHRT_MAX`, the adjacent batches may also make
the run  length exceed the `USHRT_MAX` in `ColumnSelectVector::get_next_run`.
2022-11-25 12:22:18 +08:00
9103ded1dd [improvement](join)optimize sharing hash table for broadcast join (#14371)
This PR is to make sharing hash table for broadcast more robust:

Add a session variable to enable/disable this function.
Do not block the hash join node's close function.
Use shared pointer to share hash table and runtime filter in broadcast join nodes.
The Hash join node that doesn't need to build the hash table will close the right child without reading any data(the child will close the corresponding sender).
2022-11-24 21:06:44 +08:00
6c7f758ef7 [improvement](hashjoin) support partitioned hash table in hash join (#14480) 2022-11-24 14:16:47 +08:00
d14e1d25ff [Bug](vectorized) Fix wrong column type (#14387) 2022-11-23 18:07:33 +08:00
1520e5c88a [enhancement](agg)use new method to serialize keys in batch if the key is too large (#14484)
* [enhancement](agg)use new method to serialize keys in batch if the key is too large

* fix compile error
2022-11-23 17:35:39 +08:00
30e1818724 [fix](tracing) fix tracing in the new scan node does not meet expectations (#14155)
Issue Number: close #14149

- Remove unexpected tracing, like 'vscanner::scan'
- Merge span vscannode::get_next
2022-11-22 16:44:02 +08:00
1ec7f45fb6 [Bug](avg) Fix avg for bigint (#14433) 2022-11-22 10:29:59 +08:00
fea9966728 [fix](parquet-orc) fix that be core dump when some columns specified are not in the parquet or orc file (#14440)
When some columns specified are not in the parquet or orc file in broker load, _batch->num_columns() will less than _num_of_columns_from_file. It will lead to be core dump.
To prevent be core dump, just return an error in this case.
2022-11-22 09:10:38 +08:00
Pxl
bcd641877f [Enhancement](scan) disable build key range and filters when push down agg work (#14248)
disable build key range and filters when push down agg work
2022-11-21 12:47:57 +08:00
2c42f0a905 [refactor](decimalv3) Refine code for DecimalV3 (#14394) 2022-11-19 16:57:17 +08:00
512b787559 [fix](parquet-reader) fix stack-use-after-return error (#14411) 2022-11-19 10:52:50 +08:00