Commit Graph

737 Commits

Author SHA1 Message Date
e22f5891d2 [WIP](row store) two phase opt read row store (#18654) 2023-05-16 13:21:58 +08:00
Pxl
b927f8cd37 [Chore](asan) change asan_suppr from interceptor_via_lib to interceptor_via_fun (#19636)
change asan_suppr from interceptor_via_lib to interceptor_via_fun
2023-05-16 10:51:43 +08:00
Pxl
2a02561863 [Bug](ubsan) fix some wrong downcast founded by ubsan (#19591)
fix some wrong downcast founded by ubsan.
```cpp
doris/be/src/olap/bloom_filter_predicate.h:43:32: runtime error: downcast of address 0x7f8ec2b691a0 which does not point to an object of type 'doris::BloomFilterColumnPredicate<doris::TYPE_DATE>::SpecificFilter' (aka 'BloomFilterFunc<(doris::PrimitiveType)11U>')
0x7f8ec2b691a0: note: object is of type 'doris::BloomFilterFunc<(doris::PrimitiveType)12>'
 e5 55 00 00  10 74 58 42 e5 55 00 00  00 00 10 00 8e 7f 00 00  20 07 6f cc 8e 7f 00 00  80 fe 68 cc
              ^~~~~~~~~~~~~~~~~~~~~~~
              vptr for 'doris::BloomFilterFunc<(doris::PrimitiveType)12>'  
```
1. TYPE_DATE/TYPE_DATETIME have same data format, so I change the cast about bloom filter to reinterpret cast.
```cpp
doris/be/src/vec/exec/format/orc/vorc_reader.h:281:17: runtime error: downcast of address 0x7f562f4c3180 which does not point to an object of type 'ColumnVector<int>'
0x7f562f4c3180: note: object is of type 'doris::vectorized::ColumnDecimal<doris::vectorized::Decimal<int> >'
 74 65 00 00  20 91 70 f5 ca 55 00 00  02 00 00 00 00 00 00 00  f0 d4 4c 2f 56 7f 00 00  f0 d4 4c 2f
              ^~~~~~~~~~~~~~~~~~~~~~~
              vptr for 'doris::vectorized::ColumnDecimal<doris::vectorized::Decimal<int> >'
```
2. doris use ColumnDecimal to store decimal elements.
2023-05-15 14:27:48 +08:00
Pxl
4eb2604789 [Bug](function) fix function define of Retention inconsist and change some static_cast to assert cast (#19455)
1. fix function define of `Retention` inconsist, this function return tinyint on `FE` and return uint8 on `BE`
2. make assert_cast support cast to derived
3. change some static cast to assert cast
4. support sum(bool)/avg(bool)
2023-05-15 11:50:02 +08:00
92bf485abd [Bug] Fix doris pipeline shared scan and top n opt (#19599) 2023-05-15 10:00:44 +08:00
8ef9212ddc [enhancement](exceptionsafe) force check exec node method's return value (#19538) 2023-05-12 10:21:00 +08:00
e9392780a9 [fix](nereids)fix some nereids planner bugs (#19509)
1.some encrypt and decrypt functions have wrong blockEncryptionMode
2.topN node should compare tuples from intermediate_row_desc with first_sort_slot.tuple_id
3.must keep the limit if it's an uncorrelated in-subquery with limit on sort, like select a from t1 where a in ( select b from t2 order by xx limit yy )
2023-05-12 09:06:16 +08:00
0b25376cf8 [feature](torc) support insert only transactional hive table on be side (#19518) 2023-05-11 14:15:09 +08:00
1d421a26d9 [bugfix](memory) merge block may allocate failed (#19507) 2023-05-11 10:42:47 +08:00
d7ad299154 [fix](NestedType) throw error when reading complex nested type in orc&parquet (#19489)
Doris block does not support complex nested type now, but orc and parquet reader has generated complex nested column,
which makes the output of mysql client wrong and users confused.
2023-05-11 07:51:02 +08:00
3ba3b6c66f [opt](FileCache) use modification time to determine whether the file is changed (#18906)
Get the last modification time from file status, and use the combination of path and modification time to generate cache identifier.
When a file is changed, the modification time will be changed, so the former cache path will be invalid.
2023-05-11 07:50:39 +08:00
95833426e8 [BugFix](table-value-function) Fix backends() tvf (#19452)
Change the `Alive/SystemDecommissioned/ClusterDecommissioned` field type of the `backends()`tvf to bool
2023-05-11 07:49:27 +08:00
9ffdbae442 [bugfix](jdbcconnector) jdbc connector cast string to array core (#19494)
introduced by https://github.com/apache/doris/pull/18328/files
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-05-10 21:46:20 +08:00
4483e3a6e1 [Improvement](scan) add a config for scan queue memory limit (#19439) 2023-05-10 13:14:23 +08:00
Pxl
5473795a51 [Bug](scan) forbiden push down in predicate when in_state->use_set is false (#19471)
forbiden push down in predicate when in_state->use_set is false
2023-05-10 11:12:20 +08:00
cf8ceb8586 [fix](scan) fix scanner mem tracker (#19354) 2023-05-10 09:56:41 +08:00
096aa25ca6 [improvement](orc-reader) Implements ORC lazy materialization (#18615)
- Implements ORC lazy materialization, integrate with the implementation of https://github.com/apache/doris-thirdparty/pull/56 and https://github.com/apache/doris-thirdparty/pull/62.
- Refactor code: Move `execute_conjuncts()` and `execute_conjuncts_and_filter_block()` in `parquet_group_reader `to `VExprContext`, used by parquet reader and orc reader.
- Add session variables `enable_parquet_lazy_materialization` and `enable_orc_lazy_materialization` to control whether enable lazy materialization.
- Modify `build.sh` to update apache-orc submodule or download package every time.
2023-05-09 23:33:33 +08:00
1bc405c06f [fix](catalog) fix doris jdbc catalog largeint select error (#19407)
when I use mysql-jdbc 5.1.47 create a doris jdbc catalog, the largeint cannot select
When mysql-jdbc reads largeint, it will convert the format to string because it is too long

mysql> select `largeint` from type3;
ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]Fail to convert jdbc type of java.lang.String to doris type LARGEINT on column: largeint. You need to check this column type between external table and doris table.
2023-05-09 17:34:48 +08:00
aeb3450151 [feature](graph)Support querying data from the Nebula graph database (#19209)
Support querying data from the Nebula graph database
This feature comes from the needs of commercial customers who have used Doris and Nebula, hoping to connect these two databases

changes mainly include:

* add New Graph Database JDBC Type
* Adapt the type and map the graph to the Doris type
2023-05-09 15:30:11 +08:00
673cbe3317 [chore](build) Porting to GCC-13 (#19293)
Support using GCC-13 to build the codebase.
2023-05-08 10:42:06 +08:00
b50e2a8c08 [Fix](parquet-reader) Fix dict cols not be converted back to string type in some cases. (#19348)
Fix dict cols not be converted back to string type in some cases, which includes introduced by #19039.
For dict cols, we will convert dict cols to int32 type firstly, then convert back to string type after read block. 
The block will be reuse it, so it is necessary to convert it back.
2023-05-07 10:05:23 +08:00
9edbfa37cd [Enhancement](Broker Load) New progress manager for showing loading progress status (#19170)
This work is in the early stage, current progress is not accurate because the scan range will be too large
for gathering information, what's more, only file scan node and import job support new progress manager

## How it works

for example, when we use the following load query:
```
LOAD LABEL test_broker_load
(
	DATA INFILE("XXX")
	INTO TABLE `XXX`
        ......
)
```

Initial Progress: the query will call `BrokerLoadJob` to create job, then `coordinator` is called to calculate scan range and its location. 
Update Progress: BE will report runtime_state to FE and FE update progress status according to jobID and fragmentID

we can use `show load` to see the progress

PENDING:
```
         State: PENDING
      Progress: 0.00%
```

LOADING:
```
         State: LOADING
      Progress: 14.29% (1/7)
```

FINISH:
```
         State: FINISHED
      Progress: 100.00% (7/7)
```

At current time, full output of `show load\G` looks like:

```
*************************** 1. row ***************************
         JobId: 25052
         Label: test_broker
         State: LOADING
      Progress: 0.00% (0/7)
          Type: BROKER
       EtlInfo: NULL
      TaskInfo: cluster:N/A; timeout(s):250000; max_filter_ratio:0.0
      ErrorMsg: NULL
    CreateTime: 2023-05-03 20:53:13
  EtlStartTime: 2023-05-03 20:53:15
 EtlFinishTime: 2023-05-03 20:53:15
 LoadStartTime: 2023-05-03 20:53:15
LoadFinishTime: NULL
           URL: NULL
    JobDetails: {"Unfinished backends":{"5a9a3ecd203049bc-85e39a765c043228":[10080]},"ScannedRows":39611808,"TaskNumber":1,"LoadBytes":7398908902,"All backends":{"5a9a3ecd203049bc-85e39a765c043228":[10080]},"FileNumber":1,"FileSize":7895697364}
 TransactionId: 14015
  ErrorTablets: {}
          User: root
       Comment: 
```

## TODO:

1. The current partition granularity of scan range is too large, resulting in an uneven loading process for progress."
2. Only broker load supports the new Progress Manager, support progress for other query
2023-05-06 22:44:40 +08:00
b6c7f3aeb8 [opt](FileCache) Add file cache metrics and management (#19177)
Add file cache metrics and management.
1. Get file cache metrics
> If the performance of file cache is not efficient, there are currently no metrics to investigate the cause. In practice, hit ratio, disk usage, and segments removed status are very important information. 

API: `http://be_host:be_webserver_port/metrics`
File cache metrics for each base path start with `doris_be_file_cache_` prefix. `hits_ratio` is the hit ratio of the cache since BE startup; `removed_elements` is the num of removed segment files since BE startup; Every cache path has three queues: index, normal and disposable. The capacity ratio of the three queues is 1:17:2.
```
doris_be_file_cache_hits_ratio{path="/mnt/datadisk1/gaoxin/file_cache"} 0.500000
doris_be_file_cache_hits_ratio{path="/mnt/datadisk1/gaoxin/small_file_cache"} 0.500000
doris_be_file_cache_removed_elements{path="/mnt/datadisk1/gaoxin/file_cache"} 0
doris_be_file_cache_removed_elements{path="/mnt/datadisk1/gaoxin/small_file_cache"} 0

doris_be_file_cache_normal_queue_max_size{path="/mnt/datadisk1/gaoxin/file_cache"} 912680550400
doris_be_file_cache_normal_queue_max_size{path="/mnt/datadisk1/gaoxin/small_file_cache"} 8500000000
doris_be_file_cache_normal_queue_max_elements{path="/mnt/datadisk1/gaoxin/file_cache"} 217600
doris_be_file_cache_normal_queue_max_elements{path="/mnt/datadisk1/gaoxin/small_file_cache"} 102400

doris_be_file_cache_normal_queue_curr_size{path="/mnt/datadisk1/gaoxin/file_cache"} 14129846
doris_be_file_cache_normal_queue_curr_size{path="/mnt/datadisk1/gaoxin/small_file_cache"} 14874904
doris_be_file_cache_normal_queue_curr_elements{path="/mnt/datadisk1/gaoxin/file_cache"} 18
doris_be_file_cache_normal_queue_curr_elements{path="/mnt/datadisk1/gaoxin/small_file_cache"} 22

...
```
2. Release file cache
> Frequent segment files swapping can seriously affect the performance of file cache. Adding a deletion interface helps users clean up the file cache.

API: `http://be_host:be_webserver_port/api/file_cache?op=release&base_path=${file_cache_base_path}`
Return the number of released segment files. If `base_path` is not provide in url, all cache paths will be released.
It's thread-safe to call this api, so only the segment files not been read currently can be released.
```
{"released_elements":22}
```
3. Specify the base path to store cache data
> Currently, regression testing lacks test cases of file cache, which cannot guarantee the stability of file cache. This interface is generally used in regression testing scenarios. Different queries use different paths to verify different usage cases and performance.

User can set session variable `file_cache_base_path` to specify the base path to store cache data. `file_cache_base_path="random"` as default, means chosing a random path from cached paths to store cache data.  If `file_cache_base_path` is not one of the base paths in BE configuration, a random path is used.
2023-05-05 14:28:01 +08:00
4e4fb33995 [refactor](conjuncts) simplify conjuncts in exec node (#19254)
Co-authored-by: yiguolei <yiguolei@gmail.com>
Currently, exec node save exprcontext**, but the object is in object pool, the code is very unclear. we could just use exprcontext*.
2023-05-04 18:04:32 +08:00
c74c2a4f8e [fix](Metadata tvf) Metadata TVF supports read the specified columns from Fe (#19110) 2023-04-29 00:06:08 +08:00
a324ee794c [fix](memory) Fix Aggregation null key memory leak due to incorrect aggfunc destroy #19201 2023-04-28 18:41:41 +08:00
65a82a0b57 [opt](FileReader) turn off prefetch data in parquet page reader when using MergeRangeFileReader (#19102)
Using both `MergeRangeFileReader` and `BufferedStreamReader` simultaneously would waste a lot of memory,
so turn off prefetch data in `BufferedStreamReader` when using MergeRangeFileReader.
2023-04-28 09:27:56 +08:00
28016c53f0 [profile](rf) refactor profile of runtime filters (#19134)
* [profile](rf) refactor profile of runtime filters


---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-04-28 08:46:42 +08:00
3ed5cf8350 [Optimize] add has_filter template param in get_next_run() to decrease _has_filter condition checking count in the loop. (#19043) 2023-04-27 21:23:36 +08:00
e4f7d77c5c [Optimize](parquet-reader) Opt by filtering null count statistics in rowgroup and page level. (#19106)
Issue Number: About #19038, we found in this case, l_orderkey has many nulls,
so we can filter it by null count statistics in the row group and page level,
then it can improve a lot of performance in this case.
2023-04-27 21:21:30 +08:00
9e2b118288 [RegressTest](Exec) Add DCHECK null_aware_left_anti_join in mark join (#19149) 2023-04-27 17:52:03 +08:00
f23c93b3c6 [fix](memory) Fix AggFunc memory leak due to incorrect destroy (#19126) 2023-04-27 14:58:32 +08:00
a262f42a28 [refactor](exceptionsafe) make scanner and scancontext exception safe (#19057) 2023-04-27 09:23:01 +08:00
aabcab9dbe [Improvement](runtime filter) Improve merge phase (#18828) 2023-04-26 21:01:20 +08:00
94b11af17c [fixbug](json-reader) fix memory leak of new_json_reader #19067 2023-04-26 12:54:47 +08:00
5bd4a3897e [optimize](multi-catalog) Skip whole row group in lazy_read if data has been filtered. (#19039)
We found qt_q11 in regression test test_external_catalog_hive is very slow.
The result is only one record, so other data should be filtered out in the parquet lazy read situation.
Then we found currently the parquet reader read many records because we can only skip parquet page. But in order to skip parquet page, currently we need to read page header, then it will caused prefetch data. Therefore, prefetch data in this case may be not good.

So there are two issues:

Skip whole row group in this case.
Prefetching data in this case may be not good, need to improve it.
This PR resolve issues 1.
2023-04-26 12:10:14 +08:00
339d804ec4 [Refactor](exceptionsafe) add factory creator to some class (#19000) 2023-04-25 14:33:47 +08:00
39d66ca2c6 [fix](parquet) hasn't initialize select vector when number of nested values equals zero (#18953)
Fix bug when reading array type in parquet file:
```
ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]Read parquet file xxx failed,
reason = [IO_ERROR]Decode too many values in current page
```
When reading normal columns, `ScalarColumnReader::_read_values` still calls `ColumnSelectVector::set_run_length_null_map` to initialize select vector, but `ScalarColumnReader::_read_nested_column` hasn't do this, making the number of values wrong.
The situation where this error occurs is particularly extreme: The column pages have remaining values to be read,
but all of them are null values at ancestor level, so there's no actual read operation, just skipping null values at ancestor level.
2023-04-25 14:21:33 +08:00
b2c26e17e1 [Compile](vec) Fix compile by BHREAD_SCANNER (#18979) 2023-04-24 17:07:06 +08:00
8d7a9fd21b [refactor](exceptionsafe) add factory creator to some class (#18978)
make vexprecontext,vexpr,function,query context,runtimestate thread safe.


---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-24 10:32:11 +08:00
8e4710079d [improvement](profile) Insert into add LoadChannel runtime profile (#18908)
TabletSink and LoadChannel in BE are M: N relationship,
Every once in a while LoadChannel will randomly return its own runtime profile to a TabletSink, so usually all LoadChannel runtime profiles are saved on each TabletSink, and the timeliness of the same LoadChannel profile saved on different TabletSinks is different, and each TabletSink will periodically send fe reports all the LoadChannel profiles saved by itself, and ensures to update the latest LoadChannel profile according to the timestamp.
2023-04-24 09:41:57 +08:00
3736530585 [refactor](query context) rename query fragments context to query context and make query context safe (#18950)
* [refactor](query context) rename query fragments context to query context and make query context safe

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-23 22:53:56 +08:00
29f502380c [opt](FileReader) merge small IO to optimize read performace (#18796)
Add `MergeRangeFileReader` to merge small IO to optimize parquet&orc read performance.

`MergeRangeFileReader` is a FileReader that efficiently supports random access in format like parquet and orc.
In order to merge small IO in parquet and orc, the random access ranges should be generated when creating the 
reader. The random access ranges is a list of ranges that order by offset.
The range in random access ranges should be reading sequentially, can be skipped, but can't be read repeatedly.
When calling read_at, if the start offset located in random access ranges, the slice size should not span two ranges.

For example, in parquet, the random access ranges is the column offsets in a row group.

When reading at offset, if [offset, offset + 8MB) contains many random access ranges,
the reader will read data in [offset, offset + 8MB) as a whole, and copy the data in random access ranges into small 
buffers(name as box, default 1MB, 64MB in total). A box can be occupied by many ranges,
and use a reference counter to record how many ranges are cached in the box. If reference counter equals zero,
the box can be release or reused by other ranges. When there is no empty box for a new read operation,
the read operation will do directly.

## Effects
The runtime of ClickBench reduces from 102s to 77s, and the runtime of Query 24 reduces from 24.74s to 9.45s.
The profile of Query 24:
```
 VFILE_SCAN_NODE  (id=0):(Active:  8s344ms,  %  non-child:  83.06%)
    -  FileReadBytes:  534.46  MB
    -  FileReadCalls:  1.031K  (1031)
    -  FileReadTime:  28s801ms
    -  GetNextTime:  8s304ms
    -  MaxScannerThreadNum:  12
    -  MergedSmallIO:  0ns
        -  CopyTime:  157.774ms
        -  MergedBytes:  549.91  MB
        -  MergedIO:  94
        -  ReadTime:  28s642ms
        -  RequestBytes:  507.96  MB
        -  RequestIO:  1.001K  (1001)
    -  NumScanners:  18
```
1001 request IOs has been merged into 94 IOs.

## Remaining problems
1. Add p2 regression test in nest PR
2. Profiles are scattered in various codes and will be refactored in the next PR
3. Support ORC reader
2023-04-23 10:51:38 +08:00
af20b2c95e [Bug](topn opt) Fix be crash when enable topn opt with larger thresho… (#18858)
topn opt should be inited when update it
2023-04-21 17:45:00 +08:00
63a76ed115 [refactor](exceptionsafe) disallow call new method explicitly (#18830)
disallow call new method explicitly
force to use create_shared or create_unique to use shared ptr
placement new is allowed
reference https://abseil.io/tips/42 to add factory method to all class.
I think we should follow this guide because if throw exception in new method, the program will terminate.

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-21 09:13:24 +08:00
b26e2d5d50 [bugfix](memoryleak) close expr after it is pushdown to storage layer (#18849) (#18852)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-21 05:21:16 +08:00
c4e469c82c [feature](agg) Support spill to disk in aggregation (#18051) 2023-04-20 18:59:08 +08:00
3328a65b75 [Fix](mutli-catalog) Use decimal v3 type to fix decimal loss issue in multi-catalog module. (#18835)
Fix decimal v3 precision loss issues in the multi-catalog module.
Now it will use decimal v3 to represent decimal type in the multi-catalog module.
Regression Test: `test_load_with_decimal.groovy`
2023-04-20 11:02:53 +08:00
293e115536 [Improvement](bloom filter) initialize bloom filter with adaptive size (#18785) 2023-04-20 10:06:40 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00