Commit Graph

3766 Commits

Author SHA1 Message Date
8317c4a752 [Bug](cooldown) set new replica id when early exit in doing clone when no missed versions (#16644)
* set new replica id

* reduce lock

* reset when replica id is different
2023-02-13 14:39:03 +08:00
be9385d40a [improvement](lock raii) use raii to lock and unlock (#16652)
* [improvement](lock raii) use raii to lock and unlock

This is part of exception safe: #16366.

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-02-13 14:06:36 +08:00
91c4d1cade [Feature-WIP](inverted index) step 1 for supporting range predicate pushing down to inverted index (#16615) 2023-02-13 10:30:51 +08:00
f41a2055d3 [feature](Load)Remove user/password in properties for mysql load to avoid double auth. (#16073)
Use FE cluster token to auth stream load.
This auth is only open for be, and fe auth still only support http basic auth.

I will use this auth for mysql load to build a no-auth stream load from fe to be.
And this will avoid double auth in mysql load.
More information to see the design doc.
2023-02-13 10:00:08 +08:00
1de4e312cc [fix](metric) Fix be core when set enable_system_metrics to false in be (#16646)
when enable_system_metrics is false, we should not use system_metrics any more

Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2023-02-12 23:01:41 +08:00
cf739e7496 [Enhancement](Stmt) Set insert_into timeout session variable separately (#16343) 2023-02-12 16:56:10 +08:00
6a8fc35b78 [Bug](Cooldown) fix load balance causing no cooldown replica (#16641) 2023-02-12 16:47:38 +08:00
09b7c22f6b [Opt](exec) remove unless null key when no split in convert key range (#16624) 2023-02-11 15:44:35 +08:00
aba843bb2b [Improvement](inverted index) inverted index query match bitmap cache (#16578)
Add cache for inverted index query match bitmap to accelerate common query keyword, especially for keyword matching many rows. 

Tests result:
- large result: matching 99% out of 247 million rows shows 8x speed up.
- small result: matching 0.1% out of 247 million rows shows 2x speed up.
2023-02-11 13:38:58 +08:00
37d1519316 [WIP](dynamic-table) support dynamic schema table (#16335)
Issue Number: close #16351

Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.
2023-02-11 13:37:50 +08:00
b155fc07f6 [fix](fragment thread) fix thread in fragment thread pool hang (#16608)
process the return status for exec_state->execute() in function FragmentMgr::_exec_actual
2023-02-11 09:05:10 +08:00
171ae2892f [improvement](batch size) pass batch size of exec engine to storage engine (#16614)
Currently batch_size is not passed on to SegmentIterator, the SegmentIterator uses the hard coded value 4096 - 32 as the max row count of a block.


* fix bug
2023-02-11 09:01:44 +08:00
8749aedbae [Bug](point query) make get_rowset thread safe (#16609)
`get_rowset` calling from `lookup_row_data` without lock will lead to core dump if _rs_version_map, _stale_rs_version_map changed
2023-02-10 23:54:56 +08:00
c3110f8153 [fix](merge-on-write) fix that the query result has duplicate keys when load with sequence column (#16587) 2023-02-10 22:31:05 +08:00
75847f7f6a [bugfix](exchange node) should not depend on eos to judge the ending of stream receiver (#16600)
[bugfix](exchange node) should not depend on eos to judge the ending of stream receiver #16600

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-02-10 20:35:49 +08:00
ad141747b4 [fix](inverted index) fix array type inverted index query error (#16582) 2023-02-10 17:57:15 +08:00
43eca4f209 [Feature-WIP](inverted index) Implementation for alter inverted index. (#16371)
implementation for add/drop inverted index.
2023-02-10 17:56:17 +08:00
6a5277b391 [fix](sequence-column) MergeIterator does not use the correct seq column for comparison (#16494) 2023-02-10 17:51:15 +08:00
861f31205a [fix](window function) invalid order_by_start in VAnalyticEvalNode (#16589) 2023-02-10 17:40:40 +08:00
32188855ef [improve](topn) seperate multiget rpc to ThreadPool (#16598)
multiget_data working in bthread and may block the whole worker pthread of BRPC framework and effect other bthreads, so I seperate work task into a seperate task pool.
2023-02-10 17:39:31 +08:00
1f631c388d [enhance](cooldown)accelerate cooldown task produce efficiency (#16089) 2023-02-10 16:58:27 +08:00
b99e2dc727 [bug](jdbc) fix jdbc can't get object of PGobject (#16496)
when pg table have some  unsupported column type like: point, polygon, jsonb......
jdbc catalog will convert it to string type in doris. but get result set in java is org.postgresql.util.PGobject
 
Some test need this pr: #16442
2023-02-10 16:19:02 +08:00
06788bc2d0 [Bug](pipeline) Fix projection on streaming operator (#16592) 2023-02-10 15:57:26 +08:00
379bef598d [fix-core](block) clear block row_same_bit when block reuse (#16172) 2023-02-10 12:21:27 +08:00
1b3902baa2 [Feature](Complex-type) Add struct and map type to Doris (#16444)
This commit support:
1、Insert + select for struct/map type
2、Json stream load for struct type
3、m[key] function for map type

How to use:
Set the fe config to create table for struct and map type
1、admin set frontend config("enable_struct_type" = "true");
2、admin set frontend config("enable_map_type" = "true");

#16547

Co-authored-by: xy720 <xuyang25@baidu.com>
Co-authored-by: amory <wangqiannan@selectdb.com>
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2023-02-10 11:00:33 +08:00
Pxl
266bb971a6 [Enchancement](function) display elements number on check_chars_length #16570 2023-02-10 08:52:41 +08:00
c1a1275870 [fix](memory) Fix parquet load stack overflow (#16537) 2023-02-10 08:48:12 +08:00
4fcd6cd236 [refactor](remove unused code) remove load stream mgr (#16580)
remove old stream load pipe
remove old stream load manager

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-02-10 07:46:18 +08:00
130b3599bc [Improvement](writer) make DeltaWriter close idempotent to be more robust (#16558)
return `Status::OK()` instead of `Status::Error<ALREADY_CLOSED>()` for close() in `DeltaWriter` if it's already closed.
2023-02-09 19:48:23 +08:00
a038fdaec6 [Bug](pipeline) Fix bug in non-local exchange on pipeline engine (#16463)
Currently, for broadcast shuffle, we serialize a block once and then send it by RPC through multiple channel. After this, we will serialize next block in the same memory for consideration of memory reuse. However, since the RPC is asynchronized, maybe the next block serialization will happen before sending the previous block.

So, in this PR, I use a ref count to identify if the serialized block can be reuse in broadcast shuffle.
2023-02-09 19:22:40 +08:00
539fd684e9 [improvement](filecache) use dynamic segment size to cache remote file block (#16485)
`CachedRemoteFileReader` has used fixed segment size(file_cache_max_file_segment_size=4M) to cache remote file blocks. However, the column size in a rowgroup/strip maybe smaller than 10K if a parquet/orc file has many columns, resulting in particularly serious read amplification. For example:
Q1 in clickbench: select count(*) from hits
```
-  FileCache:  0ns
  -  IOHitCacheNum:  552
  -  IOTotalNum:  835
  -  ReadFromFileCacheBytes:  19.98  MB
  -  ReadFromWriteCacheBytes:  0.00  
  -  ReadTotalBytes:  29.52  MB
  -  SkipCacheBytes:  0.00  
  -  WriteInFileCacheBytes:  915.77  MB
  -  WriteInFileCacheNum:  283 
```
Only 30MB of data is needed, but 900MB+ of data is read from hdfs. The query time of Q1(single scan thread) increased from **5.17s** to **24.45s** when enable file cache.

Therefore, this PR introduce dynamic segment size which is based on the `read_size` of the data. In order to prevent too small or too large IO, the segment size is limited in [4096, file_cache_max_file_segment_size].

Q1 in clickbench is **5.66s** when enable file cache. The performance is almost the same as if the cache is disabled, and the data size read from hdfs is reduced to 45MB.
```
-  FileCache:  0ns
    -  IOHitCacheNum:  297
    -  IOTotalNum:  835
    -  ReadFromFileCacheBytes:  8.73  MB
    -  ReadFromWriteCacheBytes:  0.00  
    -  ReadTotalBytes:  29.52  MB
    -  SkipCacheBytes:  0.00  
    -  WriteInFileCacheBytes:  45.66  MB
    -  WriteInFileCacheNum:  544
```
## Remaining Problems
Small queries may result in a large number of small files(4KB at least), and the `BE` saves too much meta information of cached segments.

## Fix bug
`FileCachePolicy` in `FileReaderOptions` is a constant reference, but the parameter passed in `FileFactory::create_file_reader` is a temporary variable, resulting in segmentation fault.
2023-02-09 16:39:10 +08:00
e48a033338 [Bug](pipeline) Support projection in UnionSourceOperator (#16525) 2023-02-09 14:43:44 +08:00
7d035486ad [Opt](vec) opt the fast execute logic to remove useless function call (#16532) 2023-02-09 14:12:40 +08:00
646ba2cc88 [bugfix](scannode) 1. make rows_read correct 2. use single scanner if has limit clause (#16473)
make rows_read correct so that the scheduler could using this correctly.
use single scanner if has limit clause. Move it from fragment context to scannode.
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-02-09 14:12:18 +08:00
0142ef8b95 [improvement](scanner) Supports bthread scanner (#16031) 2023-02-09 10:24:56 +08:00
9f8753ffd2 [bugfix](vertical_compaction) fix base_compaction delete_sign handler (#16469)
In vertical base compaction, same rows will be filtered in vertical_merge_iterator,
we should skip these filtered rows when set agg flag of delete sign.
For example, schema is a,b,delete_sign, and data is
1,1,1
1,1,0
1,1,0
2,2,1
2,2
and Block we get in VerticalBlockReader is
1,1,1
2,2,1
and we should set agg flag idex 0,4 to true when handle delete sign, so
we add a function continuous_agg_count to skip same rows filtered in
VerticalMergeIterator.
2023-02-09 10:13:41 +08:00
e1f1386395 [fix](cooldown) Rewrite update cooldown conf (#16488)
Remove error-prone CooldownJob, and use CooldownConfHandler to update Tablet's cooldown conf.
Some bug fix about cooldown.
2023-02-09 09:12:55 +08:00
d1c6b81140 [Bug](log) add some log to find out bug (#16518) 2023-02-08 21:23:02 +08:00
f71fc3291f [Bug](fix) right anti join error result when batch size is low (#16510) 2023-02-08 17:26:19 +08:00
d956cb13af [Bug](point query) Reusable in PointQueryExecutor should call init before add to LookupCache (#16489)
Otherwise in high concurrent query, _block_pool maybe used before Reusable::init done in other threads
2023-02-08 16:05:59 +08:00
f6a20f844b [fix](hashjoin) join produce blocks with rows larger than batch size: handle join with other conjuncts (#16402) 2023-02-08 14:26:35 +08:00
41947c73eb [Feature](array-function) Support array functions for nested type datev2 and datetimev2 (#16382) 2023-02-08 12:51:07 +08:00
cf18de14b5 [fix](writer) add _is_closed state to DeltaWriter and avoid write/close core after close (#16453) 2023-02-07 22:40:26 +08:00
91325e5ca3 [fix](pipeline) incorrect result when disabling sharing hash table (#16476) 2023-02-07 21:25:32 +08:00
f90d844a53 [improvement](compaction) enable compaction in TABLET_NOTREADY (#16470)
If alter task in queue, compaction is not enabled and may cause too much version.
Keep last 10 version in new tablet so that base tablet's max version will
not be merged and than we can copy data from base tablet to new tablet.
2023-02-07 19:58:23 +08:00
9114896178 [DecimalV3](opt) opt the function of decimalv3 to_string logic (#16427) 2023-02-07 13:28:07 +08:00
d390e63a03 [enhancement](stream receiver) make stream receiver exception safe (#16412)
make stream receiver exception safe
change get_block(block**) to get_block(block* , bool* eos) unify stream semantic
2023-02-07 12:44:20 +08:00
6fdd35a6f2 [enhancement](mpp process) remove unused method and make report process more clear (#16441)
both update status and open_vectorized_internal will call send_report and stop report thread. move update_status code to open method and remove unnecessary send_report and stop_report_thread.


---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-02-07 12:28:55 +08:00
27216dc7e0 [improvement](multi-catalog) push down all predicates into rowgroup/page filtering for ParquetReader (#16388)
Tow improvements:
1. Refactor rowgroup&page filtering in `ParquetReader`, and use the operator overloading of Doris native c++ type to process comparison.
2. Support decimal/decimal v3/date/datev2/datetime/datetimev2
2023-02-07 11:32:57 +08:00
91229bb87d [Bug](makr join) Fix mark join with other conjuncts (#16435) 2023-02-07 09:31:41 +08:00