Commit Graph

8852 Commits

Author SHA1 Message Date
56ebbf8bc9 [chore](tools) fix load-clickbench-data script cannot be interrupted #17000 2023-02-22 19:34:40 +08:00
8dd1a12ea6 [typo](docs)Add upgrade precautions #17027 2023-02-22 19:27:20 +08:00
e48d9c9d62 [doc](typo)update datax.md #17009 2023-02-22 19:27:03 +08:00
b194a7cf83 [improvement](memory) Support GC segment cache, when memory insufficient (#16987)
fix segment cache memory tracker statistics
support GC
2023-02-22 18:31:20 +08:00
e65a061256 [Enhancement](datetimev2-enhance) support 'microseconds_add' function for datetimev2 (#16970)
support 'microseconds_add' function for datetimev2
2023-02-22 17:49:41 +08:00
7956800df7 [refactor](Nereids) let type coercion same with legacy planner (#16844)
- change for Nereids
1. add a variable length parameter to the ctor of Count for a good error reporting of Count(a, b)
2. refactor StringRegexPredicate, let it inherit from ScalarFunction
3. remove useless class TypeCollection
4. use catalog.Type.Collection to check expression arguments type
5. change type coercion for TimestampArithmetic, divide, integral divide, comparison predicate, case when and in predicate. Let them same as legacy planner.

- change for legacy planner
1. change the common type of floating and Decimal from Decimal to Double
2023-02-22 17:29:37 +08:00
0b624d282d [enhancement](ut) add merge-on-write ut code back (#16939) 2023-02-22 16:29:15 +08:00
66ceab540a [fix](replica) Fix inconsistent replica id between BE and FE in corner case of tablet rebalance (#16889) 2023-02-22 16:21:11 +08:00
51eb147711 fix inverted index doc typo and reorganize index related docs (#16915) 2023-02-22 15:15:10 +08:00
0b3e18d060 [chore](macOS) Support LLVM Clang 15 (#16991)
Remove the deprecated classes std::codecvt_utf8_utf16<char16_t> and std::wstring_convert.
Use libiconv to convert UTF-8 strings to UTF-16LE ones.
2023-02-22 15:04:48 +08:00
3636d0a561 [feature](merge-on-write) add DCHECK in compaction to detect data inconsistency (#16564)
MoW will mark all duplicate primary key as deleted, so we can add a DCHECK while compaction, if MoW's delete bitmap works incorrectly, we're able to detect this kind of issue ASAP.
In Debug version, DCHECK will make BE crush, in release version, compaction will fail and finally load will fail due to -235
2023-02-22 14:59:18 +08:00
0e3be4eff5 [Improvement](brpc) Using a thread pool for RPC service avoiding std::mutex block brpc::bthread (#16639)
mainly include:
- brpc service adds two types of thread pools. The number of "light" and "heavy" thread pools is different
Classify the interfaces of be. Those related to data transmission are classified as heavy interfaces and others as light interfaces
- Add some monitoring to the thread pool, including the queue size and the number of active threads. Use these 
- indicators to guide the configuration of the number of threads
2023-02-22 14:15:47 +08:00
ad86b931d4 [Thirdparty](clucene) update clucene to v2.4.6 to fix bthread/pthread context bug (#16982)
1. change clucene version from 2.4.4->2.4.6
2. update build-thirdparty.sh clucene's build block, adding USE_BTHREAD CMAKE flag, this flag is inherited from doris's USE_BTHREAD_SCANNER.
2023-02-22 11:24:45 +08:00
29c46d6926 [fix](struct-type) fix be core when load array orc file (#16978)
* fix be core when load array orc file
2023-02-22 10:15:39 +08:00
4cb97b6fb7 [chore](macOS) Fix linkage errors for the release build (#17002)
Issue Number: close #17003

## Problem summary
The linker couldn't find some symbols because the implementation of a template member function doris::vectorized::Decoder::init_decimal_converter is missing in the header file in which the corresponding declaration is placed.
2023-02-22 10:01:51 +08:00
16c4e42f42 [typo](doc) 字段描述与建表sql中的不一致 (#16270)
* 字段描述与建表sql中的不一致

* 1. 英文文档将`key_desc`改为`keys_type`。

* 1. 英文文档将`partition_desc`改为`partition_info`。

---------

Co-authored-by: unicornlee@dingtalk.com <lxb@201092104>
2023-02-21 23:00:26 +08:00
085f0826f6 update (#16975)
Co-authored-by: wudi <>
2023-02-21 22:53:49 +08:00
76ef4af29d [fix](alter inverted index) fix write edit log in replaymodifyTableAddOrDropInvertedIndices function (#16977)
Actually, when modifyTableAddOrDropInvertedIndices, no need write logAlterJob edit log, because write logModifyTableAddOrDropInvertedIndices is enough
2023-02-21 22:36:56 +08:00
52f9e03eea [fix](cooldown) Use pending_remote_rowsets to avoid deleting rowset files being uploaded (#16803) 2023-02-21 21:58:20 +08:00
0de8f90a83 [enhancement](nereids) add a session variable to control join reorder algorithm (#16783)
1. disable join reorder in nereids if session variable disable_join_reorder is true.
2. add a session variable max_table_count_use_cascades_join_reorder to control join reorder algorithm in nereids. if dp hyper is used only when enable_dphyp_optimizer is true and the joined table count more than max_table_count_use_cascades_join_reorder, which default value is 10.
2023-02-21 21:08:39 +08:00
09d41c3479 [fix](log) clarify error msg for tablet writer write failure (#14078) (#16954) (#16950)
fmt::format dosen't support non-template object as args, even if it implements
`to_string()` or `operator<<`. so orignal code may cause `false` to be printed
instead of real cause of the failure. So to_string() need to be manually invoked.

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-02-21 19:42:49 +08:00
54bf40b6e7 [feature](Nereids): Eliminate duplicate join condition. (#16910) 2023-02-21 19:40:44 +08:00
a95f47ac0a [ehancement](planner) Support filter the output of set operation node (#16666) 2023-02-21 19:22:09 +08:00
ed05f3b480 [regression-test](fuzzy) fuzzy session variable batch_size (#16384) 2023-02-21 17:53:19 +08:00
f37da6e789 [Function](vec) use const column to opt function current_time() (#16953) 2023-02-21 16:26:35 +08:00
cc839aead7 [fix](Nereids) fix signatures of some window functions (#16871)
change signatures of lead(), lag(), first_value(), last_value() to be equal with legacy optimizer;
these four functions only support Type.trivialTypes as returnType and input column type
2023-02-21 15:55:29 +08:00
004872c99a [fix](doc) fix invalid urls in tpch.md (#16949) 2023-02-21 15:45:31 +08:00
246dd65435 [fix](doc) fix export-manual.md (#16969) 2023-02-21 15:44:41 +08:00
6cb452c22d [improvement](test)Set compile required and add clickbench,arm to buildall (#16944) 2023-02-21 14:47:17 +08:00
879a729afb [improve](inverted index) not apply inverted index on 'in' or 'not_in' predicate which is produced by runtime_filter (#16952)
When there are multi-table join query, there will be many in or not_in predicate of runtime filter pushed down to the storage layer. According to our test, if apply those predicates by inverted index, the performance will be degraded because there are many conditions in in_predicate. Therefore, the inverted index not apply on in or not_in predicate which is produced by runtime_filter.

Based on that situation, this pr will do:
not apply inverted index on in or not_in predicate which is produced by runtime_filter.
2023-02-21 14:24:50 +08:00
13ae8cd6c6 [doc](point query) add row cache doc for hight-concurrent-point-query (#16972)
This code in VCollectIterator::build_heap is possible to cause double free if cumu_iter->init() fails and returns early, becuase some LevelIterator* exists both in VCollectIterator::_children and cumu_iter::_children.
2023-02-21 14:18:37 +08:00
6f94e84da7 [improvement](memory) fix possible double free in vcollect iterator (#16875)
This code in VCollectIterator::build_heap is possible to cause double free if cumu_iter->init() fails and returns early, becuase some LevelIterator* exists both in VCollectIterator::_children and cumu_iter::_children.
2023-02-21 14:18:04 +08:00
5ec8c51366 [fix](union iterator) fix bug that result data order of VUnionIterator is different (#16938)
Fix bug of #16680, data order of VUnionIterator outout block is changed, which will impact compaction.
2023-02-21 14:17:21 +08:00
491d269412 [fix](tvf) fix bug that failed to get schema of tvf when file is empty (#16928)
In previous implementation, when querying tvf, FE will get schema from BE.
And BE will try to open the first file to get its schema info, but for orc or parquet format,
if the file is empty, it will return error.
But even for an empty file, we can still get schema info from file's footer.
So we should handle the empty file to get schema info correctly.

Also modify the catalog doc to add some FAQ.
2023-02-21 14:14:32 +08:00
c0bb2e33a8 [improvement](scan) separate scanner into local and remote scanner pool (#16891)
There are 2 kinds for scanner thread pool, local and remote.
Local is for local file read, specially for olap scanner.
Remote is for other external data source, such as file scanner, jdbc scanner.

This PR mainly changes:

For olap scanner, use cold or hot rowset to decide whether to use local or remote pool.
For other scanner, user remote pool by default.
Add a new BE config doris_max_remote_scanner_thread_pool_thread_num, default is 512,
indicate the max thread number of the remote scanner thread pool

This will alleviate the problem of interaction between olap queries with load job and external queries.
2023-02-21 14:13:09 +08:00
113023fb86 (Enhancement)[load-json] support simdjson in new json reader (#16903)
be config:
enable_simdjson_reader=true

related PR #11665
2023-02-21 11:31:00 +08:00
0950a08efd [chore](tools) Support starting multiple FEs on single node (#16787)
Introduce a tool to start multiple FEs on single node.

Use case:

```
$ ./multi-fe
./multi-fe start|stop|clean [OPTIONS ...]

    start -n <NUM> -l <LIBRARY_PATH> -p <BASE_PORT>

             Start the FE cluster.
      -n     The number of FEs.
      -l     The FE library path (default: doris/output/fe/lib)
      -p     The base port to generate all needed ports (default: 9030).

    stop     Stop the FE cluster.

    clean    Stop the data (rm -rf "$(pwd)"/fe*).
```
2023-02-21 10:55:36 +08:00
44fed0e99b [Fix](multi catalog)(nereids)Enable runtime filter for external table (#16855)
Enable runtime filter for external table.
2023-02-21 10:35:58 +08:00
4522aeb74a [improvement](MOW) use shared_lock when get load info in publish txn (#16874) 2023-02-21 10:14:40 +08:00
08adf914f9 [improvement](vec) avoid creating a new column while filtering mutable columns (#16850)
Currently, when filtering a column, a new column will be created to store the filtering result, which will cause some performance loss。 ssb-flat without pushdown expr from 19s to 15s.
2023-02-21 09:47:21 +08:00
57519fcf50 [fix](information_schema) catch and skip exception when getting schema from FE catalog (#16647)
When querying information_schema database, BE will call FE RPC
to get schema info such as db name list, table name list, etc.
But some external catalog when failed to get these info because of wrong connection info.
We should catch these kind of exception and skip it, so that it can continue to
get schema info of other catalogs.
Otherwise, the whole query on information_schema will fail, even if user just want to get
info of internal catalog.

And set jdbc connection timeout to 5s, to avoid thrift rpc timeout from BE to FE(default is 30s)
2023-02-21 08:43:09 +08:00
c618e69f59 [typo](docs)supplement the document content for grouping_id.md. (#16926)
* [typo](docs)supplement the document content for grouping_id.md.

* Update grouping_id.md

* Update grouping_id.md
2023-02-21 08:27:25 +08:00
e04c13b7a6 [enhancement](exception safe) make function state exception safe (#16771) 2023-02-20 23:01:45 +08:00
a46941c684 [Fix](multi-catalog) Fix switch-case fall-through issue in multi-catalog module. (#16931)
Fix switch-case fall-through issue in multi-catalog module.
2023-02-20 21:35:41 +08:00
a1799e5506 [improve](point query) reuse rowset from lookup_row_key to eliminate tablet lock (#16770)
Reuse rowset for 2 reasons:
1. eliminate tablet lock for performance issue, if other thread hold the lock too long could affect point query latency
2. rowset should be acquired during lookup procedure
2023-02-20 18:38:11 +08:00
83ab29fd56 [Fix](inverted index) fix compound directory unlock problem (#16861)
In DorisCompoundDirectory::FSIndexInput::close, use lock_guard to automatic unlock, or it may cause lock leak.
2023-02-20 18:29:39 +08:00
f32cd2c123 [fix](statistics) fix a problem with histogram statistics collection parameters (#16918)
1. Fixed a problem with histogram statistics collection parameters.
2. Solved the problem that it takes a long time to collect histogram statistics.

TODO: Optimize histogram statistics sampling method and make the sampling parameters effective.

The problem is that the histogram function works as expected in the single-node test, but doesn't work in the multi-node test. In addition, the performance of the current support sampling to collect histogram is low, resulting in a large time consumption when collecting histogram information.

Fixed the parameter issue and temporarily removed support for sampling to speed up the collection of histogram statistics.

Will next support sampling to collect histogram information.
2023-02-20 16:33:18 +08:00
c98a0bf803 [Enchancement](merge-on-write) check the correctness of rowid conversion after compaction (#16689)
MoW updates the delete bitmap of the imported data during the compaction by rowid conversion. The correctness of rowid conversion is very important to the result of delete bitmap. So I add a rowid conversion result check.
2023-02-20 16:27:18 +08:00
3a5e8f83e8 [fix](merge-on-write) fix that be may coredump when sequence column is null (#16832)
To facilitate the use of the primary key index, encode the seq column to the minimum value of the corresponding length when the seq column is null.
2023-02-20 16:25:52 +08:00
a3aceab72b [Fix](inverted index) fix inverted index bkd reader memory leak problem (#16885)
Original implementation of get_bkd_reader's raw pointer usage may cause memory leak problem, use shared_ptr to avoid that.
2023-02-20 15:39:04 +08:00