doris

Author	SHA1	Message	Date
Mingyu Chen	d286aa7bf7	[fix](spark-load) no need to filter row group when doing spark load (#13116 ) 1. Fix issue #13115 2. Modify the method of `get_next_block` or `GenericReader`, to return "read_rows" explicitly. Some columns in block may not be filled in reader, if the first column is not filled, use `block->rows()` can not return real row numbers. 3. Add more checks for broker load test cases.	2022-10-05 23:00:56 +08:00
Lightman	7b75c2df54	[fix](BE) fix the stream load error when upgrade BE from 1.1.2 to master (#13058 )	2022-10-05 12:13:26 +08:00
Ashin Gau	026ffaf10d	[feature-wip](parquet-reader) add detail profile for parquet reader (#13095 ) Add more detail profile for ParquetReader: ParquetColumnReadTime: the total time of reading parquet columns ParquetDecodeDictTime: time to parse dictionary page ParquetDecodeHeaderTime: time to parse page header ParquetDecodeLevelTime: time to parse page's definition/repetition level ParquetDecodeValueTime: time to decode page data into doris column ParquetDecompressCount: counter of decompressing page data ParquetDecompressTime: time to decompress page data ParquetParseMetaTime: time to parse parquet meta data	2022-10-02 15:11:48 +08:00
Gabriel	287ff50a6f	[Bug](datev2) Fix compatible error between datev2 and date (#13024 )	2022-09-29 18:01:55 +08:00
slothever	820ec435ce	[feature-wip](parquet-reader) refactor parquet_predicate (#12896 ) This change serves the following purposes: 1. use ScanPredicate instead of TCondition for external table, it can reuse old code branch. 2. simplify and delete some useless old code 3. use ColumnValueRange to save predicate	2022-09-28 21:27:13 +08:00
Mingyu Chen	d80b7b9689	[feature-wip](new-scan) support more load situation (#12953 )	2022-09-27 21:48:32 +08:00
Pxl	8731eea26e	[Chore](clang) fix some build fail on clang15 (#12882 ) remove unused variables	2022-09-26 23:13:28 +08:00
Tiewei Fang	acd5d67355	[feature-wip](new-scan)Add new odbc scanner and new odbc scan node (#12899 )	2022-09-26 09:24:25 +08:00
yiguolei	7b230e41a8	[bugfix](scanner) olap scanner compute is wrong (#12857 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-09-24 09:59:59 +08:00
Ashin Gau	5bfdfac387	[feature-wip](parquet-reader) add parquet reader profile (#12797 ) Add profile for parquet reader. New counters: - ParquetFilteredGroups: Filtered row groups by `RowGroup` min-max statistics - ParquetReadGroups: The number of row groups to read - ParquetFilteredRowsByGroup: The number of filtered rows by `RowGroup` min-max statistics - ParquetFilteredRowsByPage: The number of filtered rows by page min-max statistics - ParquetFilteredBytes: The filtered bytes by `RowGroup` min-max statistics - ParquetReadBytes: The total bytes in `ParquetReadGroups`, may be further filtered If a page is skipped as a whole ## Result ``` ┌──────────────────────────────────────────────────────┐ │[0: VFILE_SCAN_NODE] │ │(Active: 1s29ms, non-child: 96.42) │ │ - Counters: │ │ - BytesRead: 0.00 │ │ - FileReadCalls: 1.826K (1826) │ │ - FileReadTime: 510.627ms │ │ - FileRemoteReadBytes: 65.23 MB │ │ - FileRemoteReadCalls: 1.146K (1146) │ │ - FileRemoteReadRate: 128.29331970214844 MB/sec │ │ - FileRemoteReadTime: 508.469ms │ │ - NumDiskAccess: 0 │ │ - NumScanners: 1 │ │ - ParquetFilteredBytes: 0.00 │ │ - ParquetFilteredGroups: 0 │ │ - ParquetFilteredRowsByGroup: 0 │ │ - ParquetFilteredRowsByPage: 6.600003M (6600003)│ │ - ParquetReadBytes: 2.13 GB │ │ - ParquetReadGroups: 20 │ │ - PeakMemoryUsage: 0.00 │ │ - PredicateFilteredRows: 3.399797M (3399797) │ │ - PredicateFilteredTime: 133.302ms │ │ - RowsRead: 3.399997M (3399997) │ │ - RowsReturned: 200 │ │ - RowsReturnedRate: 194 │ │ - TotalRawReadTime(*): 726.566ms │ │ - TotalReadThroughput: 0.0 /sec │ │ - WaitScannerTime: 1s27ms │ └──────────────────────────────────────────────────────┘ ```	2022-09-23 18:42:14 +08:00
HappenLee	f7e3ca29b5	[Opt](Vectorized) Support push down no grouping agg (#12803 ) Support push down no grouping agg	2022-09-23 18:29:54 +08:00
slothever	1ca6d559e4	[feature-wip](parquet-reader) refactor some arguments for parquet reader (#12771 ) refactor some arguments for parquet reader 1. Add new parquet context to wrap reader arguments 2. Reduced some arguments for function call Co-authored-by: jinzhe <jinzhe@selectdb.com>	2022-09-22 09:34:01 +08:00
Jibing-Li	fbdebe2424	[feature-wip](new-scan)Add load counter for VFileScanner (#12812 ) The new scanner (VFileScanner) need a counter to record two values in load job. 1. The number of rows unselected by pre-filter, and 2. The number of rows filtered by unmatched schema or other error. This pr is to implement the counter.	2022-09-21 20:59:13 +08:00
Jibing-Li	ec2b3bf220	[feature-wip](new-scan)Refactor VFileScanner, support broker load, remove unused functions in VScanner base class. (#12793 ) Refactor of scanners. Support broker load. This pr is part of the refactor scanner tasks. It provide support for borker load using new VFileScanner. Work still in progress.	2022-09-21 12:49:56 +08:00
Jibing-Li	5978fd9647	[refactor](file scanner)Refactor file scanner. (#12602 ) Refactor the scanners for hms external catalog, work in progress. Use VFileScanner, will remove NewFileParquetScanner, NewFileOrcScanner and NewFileTextScanner after fully tested. Query for parquet file has been tested, still need to add readers for orc file, text file and load logic as well.	2022-09-19 15:23:51 +08:00
Mingyu Chen	bc38b2fdfb	[improvement](new-scan) graceful quit scanner scheduler (#12715 )	2022-09-19 08:39:08 +08:00
TengJianPing	8364165e30	[regression_test](testcase) add regression test case from session variable skip_storage_engine_merge, skip_delete_predicate and show_hidden_columns (#12617 ) also add this function to new olap scan node.	2022-09-16 10:33:12 +08:00
Mingyu Chen	c5ad989065	[refactor](reader) refactor the interface of file reader (#12574 ) Currently, Doris has a variety of readers for different file formats, such as parquet reader, orc reader, csv reader, json reader and so on. The interfaces of these readers are not unified, which makes it impossible to call them through a unified method. In this PR, I added a `GenericReader` interface class, and other Readers will implement this interface class to use the `get_next_block()` method. This PR currently only modifies `arrow_reader` and `parquet reader`. Other readers will be modified one by one in subsequent PRs.	2022-09-14 22:31:11 +08:00
Pxl	9e49f68663	[fix](new-scan) try to fix invalid call to nullptr slot (#12552 )	2022-09-13 18:54:29 +08:00
Jibing-Li	dc80a993bc	[feature-wip](new-scan) New load scanner. (#12275 ) Related pr: https://github.com/apache/doris/pull/11582 https://github.com/apache/doris/pull/12048 Using new file scan node and new scheduling framework to do the load job, replace the old broker scan node. The load part (Be part) is work in progress. Query part (Fe) has been tested using tpch benchmark. Please review only the FE code in this pr, BE code has been disabled by enable_new_load_scan_node configuration. Will send another pr soon to fix be side code.	2022-09-13 13:36:34 +08:00
Mingyu Chen	8a274d7851	[feature-wip](new-scan) refactor some interface about predicate push down in scan node (#12527 ) This PR introduce a new enum type `PushDownType`: ``` enum class PushDownType { // The predicate can not be pushed down to data source UNACCEPTABLE, // The predicate can be pushed down to data source // and the data source can fully evaludate it ACCEPTABLE, // The predicate can be pushed down to data source // but the data source can not fully evaluate it. PARTIAL_ACCEPTABLE }; ``` And derived class of VScanNode can override following method to determine whether to accept a bianry/in/bloom filter/is null predicate: ``` PushDownType _should_push_down_binary_predicate(); PushDownType _should_push_down_in_predicate(); PushDownType _should_push_down_function_filter(); PushDownType _should_push_down_bloom_filter(); PushDownType _should_push_down_is_null_predicate(); ```	2022-09-13 10:25:13 +08:00
Mingyu Chen	efd2bdb203	[improvement](new-scan) avoid too many scanner context scheduling (#12491 ) When select large number of data from a table, the profile will show that: - ScannerCtxSchedCount: 2.82664M(2826640) But there is only 8 times of ScannerSchedCount, most of them are busy running. After improvement, the ScannerCtxSchedCount will be reduced to only 10.	2022-09-12 10:22:54 +08:00
Mingyu Chen	f98ec06783	[feature-wip](new-scan) Add memtracker and span for new olap scan node (#12281 ) Add memtracker and span for new olap scan node	2022-09-09 09:39:08 +08:00
Mingyu Chen	3ce305134a	[fix](scan) fix potential wrong cancel when sql has limit (#12224 )	2022-09-01 19:11:40 +08:00
HappenLee	8c8078ad28	[fix](projections) get error row_descriptor when have projections on ExecNode (#12232 ) When ExecNode's projections is not empty, it use output row descriptor to initialize the block before doing projection. But we should use original row descriptor. This PR fix it.	2022-09-01 10:48:10 +08:00
Jibing-Li	ec4863b63a	[feature-wip](new-scan)Add new file scan node (#12048 ) Related pr: #11582 This is the new file scan node and scanner for external hms catalog.	2022-09-01 10:01:20 +08:00
HappenLee	573e5476dd	[Opt](load) Speed up the vectorized load (#12146 ) * [Opt](load) Speed up the vectorized load	2022-08-31 16:23:36 +08:00
Kikyou1997	9a74ad1702	[feature](Nereids)add the ability of projection on each ExecNode and add column prune on OlapScan (#11842 ) We have added logical project before, but to actually finish the prune to reduce the data IO, we need to add related supports in translator and BE. This PR: - add projections on each ExecNode in BE - translate PhysicalProject into projections on PlanNode in FE - do column prune on ScanNode in FE Co-authored-by: HappenLee <happenlee@hotmail.com>	2022-08-30 16:17:10 +08:00
Mingyu Chen	a16cf0e2c8	[feature-wip](scan) add profile for new olap scan node (#12042 ) Copy most of profiles from VOlapScanNode and VOlapScanner to NewOlapScanNode and NewOlapScanner. Fix some blocking bug of new scan framework. TODO: Memtracker Opentelemetry spen The new framework is still disabled by default, so it will not effect other feature.	2022-08-30 10:55:48 +08:00
yiguolei	2f192019d3	[bugfix](delete hanlder) delete predicate is merged and could not find schema cause core dump (#12161 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-08-30 09:18:21 +08:00
Mingyu Chen	05da3d947f	[feature-wip](new-scan) add scanner scheduling framework (#11582 ) There are currently many types of ScanNodes in Doris. And most of the logic of these ScanNodes is the same, including: Runtime filter Predicate pushdown Scanner generation and scheduling So I intend to unify the common logic of all ScanNodes. Different data sources only need to implement different Scanners for data access. So that the future optimization for scan can be applied to the scan of all data sources, while also reducing the code duplication. This PR mainly adds 4 new class: VScanner All Scanners' parent class. The subclasses can inherit this class to implement specific data access methods. VScanNode The unified ScanNode, and is responsible for common logic including RuntimeFilter, predicate pushdown, Scanner generation and scheduling. ScannerContext ScannerContext is responsible for recording the execution status of a group of Scanners corresponding to a ScanNode. Including how many scanners are being scheduled, and maintaining a producer-consumer blocks queue between scanners and scan nodes. ScannerContext is also the scheduling unit of ScannerScheduler. ScannerScheduler schedules a ScannerContext at a time, and submits the Scanners to the scanner thread pool for data scanning. ScannerScheduler Unified responsible for all Scanner scheduling tasks Test: This work is still in progress and default is disabled. I tested it with jmeter with 50 concurrency, but currently the scanner is just return without data. The QPS can reach about 9000. I can't compare it to origin implement because no data is read for now. I will test it when new olap scanner is ready. Co-authored-by: morningman <morningman@apache.org>	2022-08-23 08:45:18 +08:00

31 Commits