Commit Graph

463 Commits

Author SHA1 Message Date
f8bb8c7829 [fix](broker) fix be core dump caused by broker load (#15390)
* [fix](broker) fix be core dump caused by broker load
2022-12-28 10:57:41 +08:00
fc8f6a0715 [fix](multi-catalog) throw NPE when reading data after EOF (#15358)
1. Fix 1 bug:  
Throw null pointer exception when reading data after the reader reaches the end of file, so should return directly when `_do_lazy_read` read no data.

2. Optimize code:  
Remove unused parameters.

3. Fix regression test
2022-12-26 22:49:35 +08:00
bf71943605 [feature](load) stream load trim double quotes for csv (#15241) 2022-12-26 11:45:54 +08:00
ca4674ca68 [pipeline](opt) opt the exec performance of pipe exec engine (#15330)
opt the exec performance of pipe exec engine
2022-12-26 09:58:52 +08:00
6bec1ffc47 [feature](planner) remove restrict of offset without order by (#15218)
Support SELECT * FROM tbl LIMIT 5, 3;
2022-12-26 09:37:41 +08:00
ec055e1acb [feature](new file reader) Integrate new file reader (#15175) 2022-12-26 08:55:52 +08:00
0e651365ca [profile](scanner) add per scanner running time profile (#15321)
* [profile](scanner) add per scanner running time profile


Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-12-26 08:55:07 +08:00
a807978882 [refactor](non-vec) Remove rowbatch code from delta writer and some rowbatch related code (#15349)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-12-26 08:54:51 +08:00
5cefd05869 [fix](multi-catalog) fix and optimize iceberg v2 reader (#15274)
Fix three bugs when read iceberg v2 tables:
1. The `delete position` in `delete file` represents the position of delete row in the entire file, but the `read range` in 
`RowGroupReader` represents the position in current row group. Therefore, we need to subtract the position of first 
row of current row group from `delete position`.
2. When only reading the partition columns, `RowGroupReader` skips processing the `delete position`.
3. If the `delete position` has delete all rows in a row group, the `read range` is empty, but we read the whole row 
group in such case.

Optimize four performance issues:
1. We change `delete position` to `delete range`, and then merge `delete range` and `read range` into the final read 
ranges. This process is too tedious and time-consuming. . we can merge `delete position` and `read range` directly.
2. `delete position` is ordered in a `delete file`, so we can use merge-sort, instead of ordered-set.
3. Initialize `RowGroupReader` when reading, instead of initialize all row groups when opening a `ParquetReader`, to 
save memory usage, and the same as `IcebergReader`.
4. Change the recursive call of `_do_lazy_read` to loop logic.
2022-12-24 16:02:07 +08:00
e72404c537 [fix](scan) fix that be may core dump when the predicates are all false (#15332) 2022-12-24 15:27:43 +08:00
06f71f2bca [pipeline](fix) Fix bugs to pass all regression cases (#15306)
* [pipeline](fix) Fix bugs to pass all regression cases

* update

* update
2022-12-23 22:17:50 +08:00
e336178ef8 [Fix](multi catalog)Fix VFileScanner file not found status bug. #15226
The if condition to check NOT FOUND status for VFileScanner is incorrect, fix it.
2022-12-23 16:45:54 +08:00
8a810cd554 [fix](bitmapfilter) fix core dump caused by bitmap filter (#15296)
Do not push down the bitmap filter to a non-integer column
2022-12-23 16:42:45 +08:00
fe562bc3e7 [Bug](Agg) fix crash when encountering not supported agg function like last_value(bitmap) (#15257)
The former logic inside aggregate_function_window.cpp would shutdown BE once encountering agg function with complex type like BITMAP. This pr makes it don't crash and would return one more concrete error message which tells the unsupported function signature to user.
2022-12-23 14:23:21 +08:00
b085ff49f0 [refactor](non-vec) delete non-vec data sink (#15283)
* [refactor](non-vec) delete non-vec data sink

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-12-23 14:10:47 +08:00
388df291af [pipeline](schedule) Add profile for except node and fix steal task problem (#15282) 2022-12-22 22:42:37 +08:00
e331e0420b [improvement](topn)add per scanner limit check for new scanner (#15231)
Optimize for key topn query like `SELECT * FROM store_sales ORDER BY ss_sold_date_sk, ss_sold_time_sk LIMIT 100` 
(ss_sold_date_sk, ss_sold_time_sk is prefix of table sort key). 

Check per scanner limit and set eof true to reduce the data need to be read.
2022-12-22 22:39:31 +08:00
e9a201e0ec [refactor](non-vec) delete some non-vec exec node (#15239)
* [refactor](non-vec) delete some non-vec exec node
2022-12-22 14:05:51 +08:00
8ecf69b09b [pipeline](regression) nested loop join test get error result in pipeline engine and refactor the code for need more input data (#15208) 2022-12-21 19:03:51 +08:00
af54299b26 [Pipeline](projection) Support projection on pipeline engine (#15220) 2022-12-21 15:47:29 +08:00
a447121fc3 [fix](scanner scheduler) fix coredump of ScannerScheduler::_scanner_scan (#15199)
* [fix](scanner scheduler) fix coredump of ScannerScheduler::_scanner_scan

* fix
2022-12-21 15:44:47 +08:00
2445ac9520 [Bug](runtimefilter) Fix BE crash due to init failure (#15228) 2022-12-21 15:36:22 +08:00
732417258c [Bug](pipeline) Fix bugs to pass TPCDS cases (#15194) 2022-12-20 22:29:55 +08:00
5cf21fa7d1 [feature](planner) mark join to support subquery in disjunction (#14579)
Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
2022-12-20 15:22:43 +08:00
494eb895d3 [vectorized](pipeline) support union node operator (#15031) 2022-12-19 22:01:56 +08:00
7c67fa8651 [Bug](pipeline) fix bug of right anti join error result in pipeline (#15165) 2022-12-19 19:28:44 +08:00
0732f31e5d [Bug](pipeline) Fix bugs for scan node and join node (#15164)
* [Bug](pipeline) Fix bugs for scan node and join node

* update
2022-12-19 15:59:29 +08:00
1597afcd67 [fix](mutil-catalog) fix get many same name db/table when show where (#15076)
when show databases/tables/table status where xxx, it will change a selectStmt to select result from 
information_schema, it need catalog info to scan schema table, otherwise may get many
database or table info from multi catalog.

for example
mysql> show databases where schema_name='test';
+----------+
| Database |
+----------+
| test |
| test |
+----------+

MySQL [internal.test]> show tables from test where table_name='test_dc';
+----------------+
| Tables_in_test |
+----------------+
| test_dc |
| test_dc |
+----------------+
2022-12-19 14:27:48 +08:00
7730a88d11 [fix](multi-catalog) add support for orc binary type (#15141)
Fix three bugs:
1. DataTypeFactory::create_data_type is missing the conversion of binary type, and OrcReader will failed
2. ScalarType#createType is missing the conversion of binary type, and ExternalFileTableValuedFunction will failed
3. fmt::format can't generate right format string, and will be failed
2022-12-19 14:24:12 +08:00
13bc8c2ef8 [Pipeline](runtime filter) Support runtime filters on pipeline engine (#15040) 2022-12-18 21:48:00 +08:00
874acdf68f [vectorized](join) add try catch in create thread (#15065) 2022-12-16 19:55:09 +08:00
ef21eea2e8 [fix](pipeline) _valid_element_in_hash_tbl was not set correctly (#15072) 2022-12-16 18:06:49 +08:00
728a238564 [vectorized](jdbc) fix external table of oracle with condition about … (#15092)
* [vectorized](jdbc) fix external table of oracle with condition about datetime report error

* formatter
2022-12-16 10:48:17 +08:00
0e1e5a802b [config](load) enable new load scan node by default (#14808)
Set FE `enable_new_load_scan_node` to true by default.
So that all load tasks(broker load, stream load, routine load, insert into) will use FileScanNode instead of BrokerScanNode
to read data

1. Support loading parquet file in stream load with new load scan node.
2. Fix bug that new parquet reader can not read column without logical or converted type.
3. Change jsonb parser function to "jsonb_parse_error_to_null"
    So that if the input string is not a valid json string, it will return null for jsonb column in load task.
2022-12-16 09:41:43 +08:00
e0d528980f [fix](multi catalog)Return emtpy block while external table scanner couldn't find the file (#14997)
FE file path cache for external table may out of date. In this case, BE may fail to find the not exist file from FE cache. 
This pr is to handle this case: instead of throw an error message to the user, we return empty result set to the user.
2022-12-16 09:36:35 +08:00
c6d93f739c [feature-wip](file reader) Merge stream_load_pipe to the new file reader (#15035)
Currently, there are two sets of file readers in Doris, this pr rewrites the old stream_load_pipe with the new file reader.
2022-12-15 16:31:22 +08:00
67e4292533 [fix](iceberg-v2) icebergv2 filter data path (#14470)
1. a icebergv2 delete file may cross many data paths, so the path of a file split is required as a predicate to filter rows of delete file
- create delete file structure to save predicate parameters
- create predicate for file path
2. add some log to print row range
3.  fix bug when create file metadata
2022-12-15 10:18:12 +08:00
b8f93681eb [feature-wip](file reader) Merge broker reader to the new file reader (#14980)
Currently, there are two sets of file readers in Doris, this pr rewrites the old broker reader with the new file reader.

TODO:
1. rewrite stream load pipe and kafka consumer pipe
2022-12-14 12:48:02 +08:00
wxy
bbf3a5420d [fix](statistics) fix missing scanBytes and scanRows in query statist… (#14828)
A patch for PR-14750. There's one modification missing in ISSUE-14750.
2022-12-14 09:37:05 +08:00
284a3351f4 [Refactor](exec) refactor the code of datasink eos logic (#15009) 2022-12-13 15:33:08 +08:00
73ee352705 [fix](multi catalog)Fix convert_to_doris_type missing break for some cases (#14992) 2022-12-13 13:34:55 +08:00
e7a84e4a16 [fix](multi-catalog)fix page index thrift deserialize (#15001)
fix the err when parse page index: Couldn't deserialize thrift msg.
use two buffer to store column index and offset index msg, avoid parse them in a buffer
2022-12-13 13:33:19 +08:00
8fe0729835 [fix](multi catalog)Check orc file reader is not null before using it. (#14988)
The external table file path cache may out of date, which will cause orc reader to visit non-exist files.
In this case, orc file reader is nullptr.
This pr is to check the reader before using it to avoid core dump of visiting nullptr.
2022-12-13 11:27:51 +08:00
Pxl
c25a7235f9 [Pipeline](load) support pipeline broker load (#14940)
support pipeline broker load
2022-12-13 00:28:36 +08:00
f3aea7f0f0 [Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744) 2022-12-11 23:33:18 +08:00
7fb695b51d [Pipeline](select node) Support select node on pipeline engine (#14928) 2022-12-11 21:31:32 +08:00
ef46b580d0 [Vectorized](operator) support analytic eval operator (#14774) 2022-12-10 19:32:11 +08:00
68092fe514 [pipeline](NLJ) support nested loop join for pipeline (#14966) 2022-12-10 00:20:16 +08:00
wxy
af50461211 [fix](statistics) fix CpuTimeMS in audit log when enable_vectorized_engine=true. (#14853)
Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>
2022-12-09 21:13:05 +08:00
0c8fdc90fb [pipeline](union) support union operator (#14963) 2022-12-09 19:55:40 +08:00