Commit Graph

995 Commits

Author SHA1 Message Date
e94170d81f [feature](cooldown)add ut for cooldown on be (#17246)
* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be
2023-03-07 10:42:53 +08:00
Pxl
d8f0ca7108 [Chore](schema change) remove some unused code in schema change (#17459)
remove some unused code in schema change.
remove some row-based config and code.
2023-03-07 09:18:34 +08:00
1d858db617 [feature](filecache) add a const parameter to control the cache version (#17441)
* [feature](filecache) add a const parameter to control the cache version

* fix
2023-03-07 08:03:18 +08:00
9477c48ef8 [refactor](functioncontext) remove duplicate type definition in function context (#17421)
remove duplicate type definition in function context
remove unused method in function context
not need stale state in vexpr context because vexpr is stateless and function context saves state and they are cloned.
remove useless slot_size in all tuple or slot descriptor.
remove doris_udf namespace, it is useless.
remove some unused macro definitions.
init v_conjuncts in vscanner, not need write the same code in every scanner.
using unique ptr to manage function context since it could only belong to a single expr context.
Issue Number: close #xxx
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-06 16:07:09 +08:00
400b4bf7a7 [enhance](report) add local and remote size in tablet meta header action (#17406) 2023-03-06 10:43:57 +08:00
b9b028099d [enhancement](stream load pipe) using queryid or load id to identify stream load pipe instead of fragment instance id (#17362)
* [enhancement](stream load pipe) using queryid or load id to identify stream load pipe instead of fragment instance id

NewLoadStreamMgr already has pipe and other info. Do not need save the pipe into fragment state. and FragmentState should be more clear.

But this pr will change the behaviour of BE.
I will pick the pr to doris 1.2.3 and add the load id to FE support. The user could upgrade from 1.2.3 to 2.x
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-04 16:19:36 +08:00
3b94ca5ceb [chore](macOS) Use LLVM Clang by default (#17292)
Use LLVM Clang by default
2023-03-03 14:18:02 +08:00
e82b827bc8 [optimize](vectorization)Optimize to_string's performance. (#17076) 2023-03-03 10:35:59 +08:00
17f4990bd3 [enhancement](functioncontext) function context should use shared ptr and simply function context (#17311)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-02 16:23:54 +08:00
30df268c1f [fix](hdfs)(catalog) fix BE crash when hdfs-site.xml not exist in be/conf and fix compute node logic (#17244)
We set LIBHDFS3_CONF env in start_be.sh, so libhdfs3 will try to read this hdfs-site.xml,
if file does not exist, it will throw error. But Doris does not handle this error, cause BE crash.
This CL mainly changes:

Modify start_be.sh to only set LIBHDFS3_CONF if hdfs-site.xml exist.
Refactor the HDFSCommonBuilder so that it can return error correctly.
Add BE IP info in status, so that we can get ip from error msg like:
ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]failed to init reader for file  000.snappy.orc, err: 
[INTERNAL_ERROR][172.21.0.101]failed to init HDFSCommonBuilder, please check check be/conf/hdfs-site.xml
The logic of prefer compute node is wrong, which causing the external table query can only assign up to 3 backends.
This CL refactor this logic and also change some FE config:

prefer_compute_node_for_external_table

If set to true, query on external table will prefer to assign to compute node.
And the max number of compute node is controlled by min_backend_num_for_external_table.
If set to false, query on external table will assign to any node.

min_backend_num_for_external_table

Only take effect when prefer_compute_node_for_external_table is true.
If the compute node number is less than this value, query on external table will try to get some mix node
to assign, to let the total number of node reach this value.
If the compute node number is larger than this value, query on external table will assign to compute node only.
2023-03-02 11:09:55 +08:00
62ec74f4e7 segcompaction featuring verticalcompaction (#16731)
This patchset applies the following changes:

using vertical compaction machanism to do segcompaction
basic (WIP) refraction to separate segcompaction logic from BetaRowsetWriter
add segcompaction specific ut and regression tests
2023-03-01 10:55:40 +08:00
a1db5c6f52 [fix](vec) crash caused by not-implemented function in ColumnFixedLengthObject (#17215) 2023-02-28 15:27:06 +08:00
33acaa067b [refactor](mempool) remove mempool parameter from key decoder methods (#17137)
decode method is only used for big int and other decode method is only used in unit test.

I remove the useless method and we can remove mempool parameter from decode method.
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-02-27 11:16:14 +08:00
7229751bd9 [Improve](map-type) Add contains_null for map (#16948)
Add contains_null for map type.
2023-02-23 20:47:26 +08:00
e5f884a6fc [enhancement](cache) make segment cache prune more effectively (#17011)
BloomFilter in MoW table may consume lots of memory, and it's life cycle is same as segment. This patch try to improve the efficiency of recycling segment cache, to release the memory in time.
2023-02-23 18:24:18 +08:00
edead494cb [Enhancement](storage) add a new hidden column __DORIS_VERSION_COL__ for unique key table (#16509) 2023-02-23 15:47:17 +08:00
8eeb435963 [improvement](meta) Enhance Doris's fault tolerance to disk error (#16472)
Sense io error.
Retry query when io error.
Greylist: When finds one disk is completely broken, or the diff of tablet number in BE and FE meta is too large,reduce the query priority of the BE.
2023-02-23 08:40:45 +08:00
b194a7cf83 [improvement](memory) Support GC segment cache, when memory insufficient (#16987)
fix segment cache memory tracker statistics
support GC
2023-02-22 18:31:20 +08:00
0b624d282d [enhancement](ut) add merge-on-write ut code back (#16939) 2023-02-22 16:29:15 +08:00
5ec8c51366 [fix](union iterator) fix bug that result data order of VUnionIterator is different (#16938)
Fix bug of #16680, data order of VUnionIterator outout block is changed, which will impact compaction.
2023-02-21 14:17:21 +08:00
f32cd2c123 [fix](statistics) fix a problem with histogram statistics collection parameters (#16918)
1. Fixed a problem with histogram statistics collection parameters.
2. Solved the problem that it takes a long time to collect histogram statistics.

TODO: Optimize histogram statistics sampling method and make the sampling parameters effective.

The problem is that the histogram function works as expected in the single-node test, but doesn't work in the multi-node test. In addition, the performance of the current support sampling to collect histogram is low, resulting in a large time consumption when collecting histogram information.

Fixed the parameter issue and temporarily removed support for sampling to speed up the collection of histogram statistics.

Will next support sampling to collect histogram information.
2023-02-20 16:33:18 +08:00
Pxl
2bc014d83a [Enchancement](function) remove unused params on aggregate function (#16886)
remove unused params on aggregate function
2023-02-20 11:08:45 +08:00
30dafd6a44 [improve](inverted index) Add element count limit for inverted index searcher cache (#16758)
The element in InvertedIndexSearcherCache is inverted index searcher, which is a file descriptor of inverted index file, so InvertedIndexSearcherCache is actually cache file descriptor of inverted index file.

If open file descriptor limit of the Linux system is set too small and config inverted_index_searcher_cache_limit is too big, during high pressure load maybe cause "Too many open files".

So, when insert inverted index searcher into InvertedIndexSearcherCache, need also check whether reach file_descriptor_number limit for inverted index file.
2023-02-17 11:53:07 +08:00
e2245cbdd3 [improvement](filecache) split file cache into sharding directories (#16767)
Save cached file segment into path like `cache_path / hash(filepath).substr(0, 3) / hash(filepath) / offset`
to prevent too many directories in `cache_path`.
2023-02-16 16:04:29 +08:00
Pxl
f50edff59d [Chore](build) enable fallthrough check annd fix some fallthrough bug (#16748)
* enable fallthrough check annd fix some fallthrough bug

* fix

* fix
2023-02-15 15:58:43 +08:00
9b8c91e18c [improvement](rowset reader) fix possible memleak (#16680)
* [improvement](rowset reader) fix possible memleak

* fix be UT
2023-02-15 11:13:31 +08:00
d013d529c8 [Feature](ipv6)Support IPV6 (#14063)
Support IPV6 in Apache Doris, the main changes are:
1. enable binding to IPV6 address if network priority in config file contains an IPV6 CIDR string
2. BRPC and HTTP support binding to IPV6 address
3. BRPC and HTTP support visiting IPV6 Services
2023-02-14 21:43:10 +08:00
be9385d40a [improvement](lock raii) use raii to lock and unlock (#16652)
* [improvement](lock raii) use raii to lock and unlock

This is part of exception safe: #16366.

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-02-13 14:06:36 +08:00
aba843bb2b [Improvement](inverted index) inverted index query match bitmap cache (#16578)
Add cache for inverted index query match bitmap to accelerate common query keyword, especially for keyword matching many rows. 

Tests result:
- large result: matching 99% out of 247 million rows shows 8x speed up.
- small result: matching 0.1% out of 247 million rows shows 2x speed up.
2023-02-11 13:38:58 +08:00
37d1519316 [WIP](dynamic-table) support dynamic schema table (#16335)
Issue Number: close #16351

Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.
2023-02-11 13:37:50 +08:00
1f631c388d [enhance](cooldown)accelerate cooldown task produce efficiency (#16089) 2023-02-10 16:58:27 +08:00
1b3902baa2 [Feature](Complex-type) Add struct and map type to Doris (#16444)
This commit support:
1、Insert + select for struct/map type
2、Json stream load for struct type
3、m[key] function for map type

How to use:
Set the fe config to create table for struct and map type
1、admin set frontend config("enable_struct_type" = "true");
2、admin set frontend config("enable_map_type" = "true");

#16547

Co-authored-by: xy720 <xuyang25@baidu.com>
Co-authored-by: amory <wangqiannan@selectdb.com>
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2023-02-10 11:00:33 +08:00
4fcd6cd236 [refactor](remove unused code) remove load stream mgr (#16580)
remove old stream load pipe
remove old stream load manager

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-02-10 07:46:18 +08:00
d390e63a03 [enhancement](stream receiver) make stream receiver exception safe (#16412)
make stream receiver exception safe
change get_block(block**) to get_block(block* , bool* eos) unify stream semantic
2023-02-07 12:44:20 +08:00
6fdd35a6f2 [enhancement](mpp process) remove unused method and make report process more clear (#16441)
both update status and open_vectorized_internal will call send_report and stop report thread. move update_status code to open method and remove unnecessary send_report and stop_report_thread.


---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-02-07 12:28:55 +08:00
2bee26b05a [fix](merge-on-write) fix that the query result has duplicate keys (#16336)
* [fix](merge-on-write) fix that the query result has duplicate keys

* add ut
2023-02-06 17:09:53 +08:00
bd8ef4edeb [fix](cooldown) Fix core in remove_all_remote_rowsets (#16374) 2023-02-04 22:31:38 +08:00
Pxl
5e4bb98900 [Chore](build) enable -Wpedantic and update lowest gcc version to 11.1 (#16290)
enable -Wpedantic and update lowest gcc version to 11.1
2023-02-03 11:28:48 +08:00
6ee0dbfb23 [fix](cooldown) Fix bugs in cooldown single replica files (#16299) 2023-02-02 19:31:26 +08:00
69f34cd1c3 [fix](load) sequence column do not compare correctly in memtable (#16211) 2023-02-02 11:00:23 +08:00
7c145faa80 [Enhance] use fast_float::from_chars to do str cast to float/double to avoid lose precision (#16190) 2023-02-01 23:53:34 +08:00
00a598a839 [feature](cooldown) Decouple storage policy and resource (#15873) 2023-01-31 14:13:47 +08:00
90b12143a3 [refactor](remove unused code) remove runtime tuple structure and useless utils class (#16237) 2023-01-30 16:45:14 +08:00
4b6a4b3cf7 [refactor](remove unused code) Remove unused mempool declare or function params (#16222)
* Remove unused mempool declare or function params

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-30 13:03:18 +08:00
69e748b076 [fix](schema scanner)change schema_scanner::get_next_row to get_next_block (#15718) 2023-01-30 10:01:50 +08:00
5eaa995704 [refactor](some mempool) not memset 0 in default value iterator (#16194)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-29 22:50:39 +08:00
Pxl
46347a51d2 [Bug](exec) enable warning on ignoring function return value for vctx (#16157)
* enable warning on ignoring function return value for vctx
2023-01-29 17:23:21 +08:00
f97c51d786 [enhancement](compaction) Optimize calculating level size of input rowset in SizeBasedCumulativeCompactionPolicy (#15633)
* Optimize calculating level size of input rowset in SizeBasedCumulativeCompactionPolicy
2023-01-29 17:18:50 +08:00
3235b636cc [refactor](remove unused code) remove thread pool manager (#16179)
* remove thread resource manager

* remove  string buffer
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-29 13:03:08 +08:00
241a956b20 [refactor](remove unused code) remove partition info from datastream sender (#16162)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-28 19:56:41 +08:00