Commit Graph

273 Commits

Author SHA1 Message Date
Pxl
8731eea26e [Chore](clang) fix some build fail on clang15 (#12882)
remove unused variables
2022-09-26 23:13:28 +08:00
acd5d67355 [feature-wip](new-scan)Add new odbc scanner and new odbc scan node (#12899) 2022-09-26 09:24:25 +08:00
692176ec07 [feature-wip](parquet-reader) pre read page data in advance to avoid frequent seek (#12898)
1. Fix the bug of file position in `HdfsFileReader`
2. Reserve enough buffer for `ColumnColumnReader` to read large continuous memory
2022-09-25 21:21:06 +08:00
f1a64ea09f [fix](new-scan)Fix new scanner load job bugs (#12903)
Fix bugs:
1. Fe need to send file format (e.g. parquet, orc ...) to be while processing load jobs using new scanner.
2. Try to get parquet file column type from SchemaElement.type before getting from Logical type and Converted type.
2022-09-24 17:21:19 +08:00
7b230e41a8 [bugfix](scanner) olap scanner compute is wrong (#12857)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-09-24 09:59:59 +08:00
5bfdfac387 [feature-wip](parquet-reader) add parquet reader profile (#12797)
Add profile for parquet reader. New counters:
- ParquetFilteredGroups: Filtered row groups by `RowGroup` min-max statistics
- ParquetReadGroups: The number of row groups to read
- ParquetFilteredRowsByGroup: The number of filtered rows by `RowGroup` min-max statistics
- ParquetFilteredRowsByPage: The number of filtered rows by page min-max statistics
- ParquetFilteredBytes: The filtered bytes by `RowGroup` min-max statistics
- ParquetReadBytes: The total bytes in `ParquetReadGroups`, may be further filtered If a page is skipped as a whole
## Result
```
┌──────────────────────────────────────────────────────┐
│[0: VFILE_SCAN_NODE]                                  │
│(Active: 1s29ms, non-child: 96.42)                    │
│  - Counters:                                         │
│      - BytesRead: 0.00                               │
│      - FileReadCalls: 1.826K (1826)                  │
│      - FileReadTime: 510.627ms                       │
│      - FileRemoteReadBytes: 65.23 MB                 │
│      - FileRemoteReadCalls: 1.146K (1146)            │
│      - FileRemoteReadRate: 128.29331970214844 MB/sec │
│      - FileRemoteReadTime: 508.469ms                 │
│      - NumDiskAccess: 0                              │
│      - NumScanners: 1                                │
│      - ParquetFilteredBytes: 0.00                    │
│      - ParquetFilteredGroups: 0                      │
│      - ParquetFilteredRowsByGroup: 0                 │
│      - ParquetFilteredRowsByPage: 6.600003M (6600003)│
│      - ParquetReadBytes: 2.13 GB                     │
│      - ParquetReadGroups: 20                         │
│      - PeakMemoryUsage: 0.00                         │
│      - PredicateFilteredRows: 3.399797M (3399797)    │
│      - PredicateFilteredTime: 133.302ms              │
│      - RowsRead: 3.399997M (3399997)                 │
│      - RowsReturned: 200                             │
│      - RowsReturnedRate: 194                         │
│      - TotalRawReadTime(*): 726.566ms                │
│      - TotalReadThroughput: 0.0 /sec                 │
│      - WaitScannerTime: 1s27ms                       │
└──────────────────────────────────────────────────────┘
```
2022-09-23 18:42:14 +08:00
f7e3ca29b5 [Opt](Vectorized) Support push down no grouping agg (#12803)
Support push down no grouping agg
2022-09-23 18:29:54 +08:00
4b95b4e41d [feature-wip](file-scanner)Get column type from parquet schema (#12833)
Get schema from parquet reader.
The new VFileScanner need to get file schema (column name to type map) from parquet file while processing load job, 
this pr is to set the type information for parquet columns.
2022-09-22 09:35:37 +08:00
1ca6d559e4 [feature-wip](parquet-reader) refactor some arguments for parquet reader (#12771)
refactor some arguments for parquet reader 
1. Add new parquet context to wrap reader arguments
2. Reduced some arguments for function call
Co-authored-by: jinzhe <jinzhe@selectdb.com>
2022-09-22 09:34:01 +08:00
fbdebe2424 [feature-wip](new-scan)Add load counter for VFileScanner (#12812)
The new scanner (VFileScanner) need a counter to record two values in load job.
1. The number of rows unselected by pre-filter, and
2. The number of rows filtered by unmatched schema or other error. This pr is to implement the counter.
2022-09-21 20:59:13 +08:00
8f4bb0f804 [improvement](agg) iterate aggregation data in memory written order (#12704)
Following the iteration order of the hash table will result in out-of-order access to aggregate states, which is very inefficient.
Traversing aggregate states in memory write order can significantly improve memory read efficiency.

Test
hash table items count: 3.35M

Before this optimization: insert keys into column takes 500ms
With this optimization only takes 80ms
2022-09-21 14:58:50 +08:00
ec2b3bf220 [feature-wip](new-scan)Refactor VFileScanner, support broker load, remove unused functions in VScanner base class. (#12793)
Refactor of scanners. Support broker load.
This pr is part of the refactor scanner tasks. It provide support for borker load using new VFileScanner.
Work still in progress.
2022-09-21 12:49:56 +08:00
3cfaae0031 [Improvement](sort) Use heap sort to optimize sort node (#12700) 2022-09-21 10:01:52 +08:00
b837b2eb95 [feature-wip](parquet-reader) filter rows by page index (#12664)
# Proposed changes

[Parquet v1.11+ supports page skipping](https://github.com/apache/parquet-format/blob/master/PageIndex.md), 
which helps the scanner reduce the amount of data scanned, decompressed, decoded, and insertion.
According to the performance FlameGraph, decompression takes up 20% cpu time.
If a page can be filtered as a whole, the page can not be decompressed.

However, the row numbers between pages are not aligned. Columns containing predicates can be filtered by page granularity,
but other columns need to be skipped within pages, so non predicate columns can only save the decoding and insertion time.

Array column needs the repetition level to align with other columns, so the array column can only save the decoding and insertion time.

## Explore
`OffsetIndex` in the column metadata can locate the page position.
Theoretically, a page can be completely skipped, including the time of reading from HDFS.
However, the average size of a page is around 500KB. Skipping a page requires calling the `skip`.
The performance of `skip` is low when it is called frequently,
and may not be better than continuous reading of large blocks of data (such as 4MB).

If multiple consecutive pages are filtered, `skip` reading can be performed according to`OffsetIndex`.
However, for the convenience of programming and readability, the data of all pages are loaded and filtered in turn.
2022-09-20 15:55:19 +08:00
d435f0de41 [feature-wip](parquet-reader) add page index row range (#12652)
Add some utils and provide the candidate row range  (generated with skipped row range of each column) 
to read for page index filter
this version support binary operator filter

todo: 
- use context instead of structures in close() 
- process complex type filter
- use this instead of row group minmax filter
- refactor _eval_binary() for row group filter and page index filter
2022-09-20 10:36:19 +08:00
ca3e52a0bb [fix](agg)the output of window function's nullability should be consistent with output slot (#12607)
FE may force window function to output a nullable value in some case, be should follow this and change the nullability accordingly.
2022-09-20 09:29:44 +08:00
5978fd9647 [refactor](file scanner)Refactor file scanner. (#12602)
Refactor the scanners for hms external catalog, work in progress.
Use VFileScanner, will remove NewFileParquetScanner, NewFileOrcScanner and NewFileTextScanner after fully tested.
Query for parquet file has been tested, still need to add readers for orc file, text file and load logic as well.
2022-09-19 15:23:51 +08:00
fb9e48a34a [fix](vstream load) Fix bug when load json with jsonpath (#12660) 2022-09-19 10:13:18 +08:00
fa8ed2bccc [fix](array-type) fix the invalid format load for stream load (#12424)
this pr is used to fix the invalid format load for stream load.
before the change , we will get the error when we load the invalid array format.
the origin file to load :
1 [1, 2, 3]
2 [4, 5, 6]
3 \N
4 [7, \N, 8]
5 10, 11, 12
[hugo@xafj-palo]$ sh curl_cmd.sh
{
"TxnId": 11035,
"Label": "11c9f111-188e-4616-9a50-aec8b7814513",
"TwoPhaseCommit": "false",
"Status": "Fail",
"Message": "Array does not start with '[' character, found '1'",
"NumberTotalRows": 0,
"NumberLoadedRows": 0,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 55,
"LoadTimeMs": 7,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 2,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 3,
"CommitAndPublishTimeMs": 0
}
3. after this change, we will get success and the error url which report the error line.
[hugo@xafj-palo]$ sh curl_cmd.sh
{
"TxnId": 11046,
"Label": "249808ee-55f4-4c08-b671-b3d82689d614",
"TwoPhaseCommit": "false",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 5,
"NumberLoadedRows": 4,
"NumberFilteredRows": 1,
"NumberUnselectedRows": 0,
"LoadBytes": 55,
"LoadTimeMs": 39,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 2,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 19,
"CommitAndPublishTimeMs": 16,
"ErrorURL": "http://10.81.85.89:8502/api/_load_error_log?file=__shard_3/error_log_insert_stmt_8d4130f0c18aeb0a-ad7ffd4233c41893_8d4130f0c18aeb0a_ad7ffd4233c41893"
}

the sql select result:
MySQL [example_db]> select * from array_test06;
+------+--------------+
| k1 | k2 |
+------+--------------+
| 1 | [1, 2, 3] |
| 2 | [4, 5, 6] |
| 3 | NULL |
| 4 | [7, NULL, 8] |
+------+--------------+
4 rows in set (0.019 sec)

the url page show us:
"Reason: Invalid format for array column(k2). src line [10, 11, 12]; "

Issue Number: #7570
2022-09-19 08:52:59 +08:00
bc38b2fdfb [improvement](new-scan) graceful quit scanner scheduler (#12715) 2022-09-19 08:39:08 +08:00
8364165e30 [regression_test](testcase) add regression test case from session variable skip_storage_engine_merge, skip_delete_predicate and show_hidden_columns (#12617)
also add this function to new olap scan node.
2022-09-16 10:33:12 +08:00
c05d736331 [Improvement](sort) fallback to partial sort small block if topN is small (#12604)
* [Improvement](sort) fallback to partial sort small block if topN is small
2022-09-16 10:20:17 +08:00
2a063355ad [fix](vstream load) Fix the default value insertion problem when importing json (#12601)
* [fix](vstream load) Fix the default value insertion problem when importing json

* update
2022-09-16 09:54:45 +08:00
22a8d35999 [Feature](vectorized) support jdbc sink for insert into data to table (#12534) 2022-09-15 11:08:41 +08:00
8e4374b7ec [enhancement](agg)remove unnessasery mem alloc and dealloc in agg node (#12535) 2022-09-15 11:07:06 +08:00
b136d80e1a [enhancement](compress) reuse compression ctx and buffer (#12573)
Reuse compression ctx and buffer.
Use a global instance for every compression algorithm, and use a
thread saft buffer pool to reuse compression buffer, pool size is equal
to max parallel thread num in compression, and this will not be too large.

Test shows this feature increase 5% of data import and compaction.

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-09-15 10:59:46 +08:00
c5ad989065 [refactor](reader) refactor the interface of file reader (#12574)
Currently, Doris has a variety of readers for different file formats,
such as parquet reader, orc reader, csv reader, json reader and so on.

The interfaces of these readers are not unified, which makes it impossible to call them through a unified method.

In this PR, I added a `GenericReader` interface class, and other Readers will implement this interface class
to use the `get_next_block()` method.

This PR currently only modifies `arrow_reader` and `parquet reader`.
Other readers will be modified one by one in subsequent PRs.
2022-09-14 22:31:11 +08:00
Pxl
0ead048b93 [Enhancement](column) remove ColumnString terminating zero and add a data_version for pblock (#12456)
1. remove ColumnString terminating zero
    2. add a data_version for pblock
    3. change EncryptionMode to enum class
2022-09-14 21:25:22 +08:00
6bf5fc6db5 [improvement](storage) For debugging problems: add session variable skip_storage_engine_merge to treat agg and unique data model as dup model (#11952)
For debug purpose:
Add session variable skip_storage_engine_merge, when set to true, tables of aggregate key model and unique key model will be read as duplicate key model.
Add session variable skip_delete_predicate, when set to true, rows deleted with delete statement will be selected.
2022-09-13 19:18:56 +08:00
Pxl
9e49f68663 [fix](new-scan) try to fix invalid call to nullptr slot (#12552) 2022-09-13 18:54:29 +08:00
dc80a993bc [feature-wip](new-scan) New load scanner. (#12275)
Related pr:
https://github.com/apache/doris/pull/11582
https://github.com/apache/doris/pull/12048

Using new file scan node and new scheduling framework to do the load job, replace the old broker scan node.
The load part (Be part) is work in progress. Query part (Fe) has been tested using tpch benchmark.

Please review only the FE code in this pr, BE code has been disabled by enable_new_load_scan_node configuration. Will send another pr soon to fix be side code.
2022-09-13 13:36:34 +08:00
9f25544f2f [feature-wip](parquet-reader) page index bug fix (#12428)
Co-authored-by: jinzhe <jinzhe@selectdb.com>
2022-09-13 10:28:53 +08:00
8a274d7851 [feature-wip](new-scan) refactor some interface about predicate push down in scan node (#12527)
This PR introduce a new enum type `PushDownType`:
```
enum class PushDownType {
        // The predicate can not be pushed down to data source
        UNACCEPTABLE,
        // The predicate can be pushed down to data source
        // and the data source can fully evaludate it
        ACCEPTABLE,
        // The predicate can be pushed down to data source
        // but the data source can not fully evaluate it.
        PARTIAL_ACCEPTABLE
    };
```

And derived class of VScanNode can override following method to determine whether to accept
a bianry/in/bloom filter/is null predicate:

```
PushDownType _should_push_down_binary_predicate();
PushDownType _should_push_down_in_predicate();
PushDownType _should_push_down_function_filter();
PushDownType _should_push_down_bloom_filter();
PushDownType _should_push_down_is_null_predicate();
```
2022-09-13 10:25:13 +08:00
e33f4f90ae [fix](exec) Avoid query thread block on wait_for_start (#12411)
When FE send cancel rpc to BE, it does not notify the wait_for_start() thread, so that the fragment will be blocked and occupy the execution thread.
Add a max wait time for wait_for_start() thread. So that it will not block forever.
2022-09-13 08:57:37 +08:00
efd2bdb203 [improvement](new-scan) avoid too many scanner context scheduling (#12491)
When select large number of data from a table, the profile will show that:

- ScannerCtxSchedCount: 2.82664M(2826640)
But there is only 8 times of ScannerSchedCount, most of them are busy running.
After improvement, the ScannerCtxSchedCount will be reduced to only 10.
2022-09-12 10:22:54 +08:00
66491ec137 [Improvement](sort) improve partial sort algorithm (#12349)
* [Improvement](sort) improve partial sort algorithm
2022-09-09 15:44:18 +08:00
f98ec06783 [feature-wip](new-scan) Add memtracker and span for new olap scan node (#12281)
Add memtracker and span for new olap scan node
2022-09-09 09:39:08 +08:00
b4663062da [feature-wip](parquet-reader) bug fix, parquet footer buffer is small when containing many columns (#12477)
Failed when reading parquet file with many columns(>1600).

mysql> select int_col from types_sf100_r100w limit 5;
ERROR 1105 (HY000): errCode = 2, detailMessage = Couldn't deserialize thrift msg:
TProtocolException: Invalid data
parse_thrift_footer uses fixed length buffer(=64k) to read parquet footer, but the meta data of a parquet file with 1600 columns can exceed 5MB.

Therefore, the buffer size needs to be applied according to the actual length.
2022-09-09 09:12:34 +08:00
2ccbbb5392 [fix](stream load) Fix wrong conversion of null value when vstream load json format (#12460) 2022-09-08 16:48:35 +08:00
14221adbbd [fix](agg) crash caused by failure of prepare (#12437) 2022-09-08 15:03:45 +08:00
dd2f834c79 [feature-wip](parquet-reader) bug fix, create compress codec before parsing dictionary (#12422)
## Fix five bugs:
1. Parquet dictionary data may be compressed, but `ColumnChunkReader` try to parse dictionary data before creating compression codec, causing unexpected data errors.
2. `FE` doesn't resolve array type
3. `ParquetFileHdfsScanner`  doesn't fill partition values when the table is partitioned
4. `ParquetFileHdfsScanner` set `_scanner_eof = true` when a scan range is empty, causing the end of the scanner, and resulting in data loss
5. typographical error in `PageReader`
2022-09-08 09:54:25 +08:00
86e347f3bb [Bug](doe) fix closing scanner twice (#12408) 2022-09-07 22:45:30 +08:00
449d0c219f [Improvement](sort) Accumulate blocks to do partial sort (#12336) 2022-09-07 10:34:28 +08:00
42bdde8750 [Feature](Vectorized) support jdbc scan node (#12010) 2022-09-07 10:29:41 +08:00
3485dfa927 [chore](profile) add some counters in aggregatation & sender (#12385) 2022-09-07 10:09:05 +08:00
893567628e [fix](exec-node) fix nullptr of runtime state (#12395)
Remove default nullptr runtime state, which is very error-prone
2022-09-07 08:46:42 +08:00
4a55b504c0 [feature-wip](parquet-reader) bug fix, get the correct group reader (#12294)
Fix the problem that cannot read the lineitem table of TPCH , and the error of allocate memory
Co-authored-by: jinzhe <jinzhe@selectdb.com>
2022-09-06 13:59:35 +08:00
b8e38b9167 [Bug](load) block call clear_column_data may have ref not equal 1 (#12350) 2022-09-05 20:40:40 +08:00
e175a7ed63 [fix](memtracker) Fix the exceeded limit of the first query execution (#12332)
In some cases, when the user executes the query for the first time, an error of the exceeded mem limit will be reported, and the query will be successful only after the second execution.

This is because when the query is executed for the first time, the memory consumed by adding the page cache and other caches is recorded in the query mem tracker, hoping to unify the behavior of multiple queries.

A temporary solution, remove the hook of scanner thread, test clickbench q13

Before removing the scanner thread hook
Enable page cache: 3G for the first query, 3G for the tracker; 900M for the second query, 900M for the tracker.
Turn off page cache: 1.9G for the first query, 1.9G for the tracker; 900M for the second query, 900M for the tracker
After removing the scanner thread hook and fix MemTrackerLimiter::cache_consume_local bug
Enable page cache: 2916M for the first query, 1147M for the tracker; 979M for the second query, 1144M for the tracker
Turn off page cache: 1809M for the first query, 1147M for the tracker; 975M for the second query, 1145M for the tracker
TODO, a better solution is to track storage-related memory separately, in the scanner thread. Otherwise, it is impossible to know where the process memory grows when querying.
2022-09-05 19:22:46 +08:00
202ad5c659 [feature-wip](parquet-reader) bug fix, the number of rows are different among columns in a block (#12228)
1. `ExprContext` is delete in `ParquetReader::close()`, but it has not been closed,
so the `DCHECH` in `~ExprContext()` is failed. the lifetime of `ExprContext` is managed by scan node,
so we should not delete its pointer in `ParquetReader::close()`.
2. `RowGroupReader::next_batch` will update `_read_rows` in every column loop,
and does not ensure the number of rows in every column are equal.
3.  The skipped row ranges are variables in stack, which are released when calling `ArrayColumnReader::read_column_data`, so we should copy them out.
2022-09-02 09:50:25 +08:00