* [Feature](vectorized)(quantile_state): support vectorized quantile state functions
1. now quantile column only support not nullable
2. add up some regression test cases
3. set default enable_quantile_state_type = true
---------
Co-authored-by: spaces-x <weixiang06@meituan.com>
ECB algorithm, block_encryption_mode does not take effect, it only takes effect when init vector is provided.
Solved: 192/256 supports calculation without init vector
For other algorithms, an error should be reported when there is no init vector
Initialization Vector. The default value for the block_encryption_mode system variable is aes-128-ecb, or ECB mode, which does not require an initialization vector. The alternative permitted block encryption modes CBC, CFB1, CFB8, CFB128, and OFB all require an initialization vector.
Reference: https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_aes-decrypt
Note: This fix does not support smooth upgrades. during upgrade process, query may report error: funciton not found
Result of select bitmap_to_string(bitmap_or(to_bitmap(1), null)) should be 1 instead of null.
This PR fix logic of bitmap_or and bitmap_or_count.
Other count related funcitons should also be checked and fix, they will be fixed in another PR.
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
remove duplicate type definition in function context
remove unused method in function context
not need stale state in vexpr context because vexpr is stateless and function context saves state and they are cloned.
remove useless slot_size in all tuple or slot descriptor.
remove doris_udf namespace, it is useless.
remove some unused macro definitions.
init v_conjuncts in vscanner, not need write the same code in every scanner.
using unique ptr to manage function context since it could only belong to a single expr context.
Issue Number: close #xxx
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
* [enhancement](stream load pipe) using queryid or load id to identify stream load pipe instead of fragment instance id
NewLoadStreamMgr already has pipe and other info. Do not need save the pipe into fragment state. and FragmentState should be more clear.
But this pr will change the behaviour of BE.
I will pick the pr to doris 1.2.3 and add the load id to FE support. The user could upgrade from 1.2.3 to 2.x
Co-authored-by: yiguolei <yiguolei@gmail.com>
We set LIBHDFS3_CONF env in start_be.sh, so libhdfs3 will try to read this hdfs-site.xml,
if file does not exist, it will throw error. But Doris does not handle this error, cause BE crash.
This CL mainly changes:
Modify start_be.sh to only set LIBHDFS3_CONF if hdfs-site.xml exist.
Refactor the HDFSCommonBuilder so that it can return error correctly.
Add BE IP info in status, so that we can get ip from error msg like:
ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]failed to init reader for file 000.snappy.orc, err:
[INTERNAL_ERROR][172.21.0.101]failed to init HDFSCommonBuilder, please check check be/conf/hdfs-site.xml
The logic of prefer compute node is wrong, which causing the external table query can only assign up to 3 backends.
This CL refactor this logic and also change some FE config:
prefer_compute_node_for_external_table
If set to true, query on external table will prefer to assign to compute node.
And the max number of compute node is controlled by min_backend_num_for_external_table.
If set to false, query on external table will assign to any node.
min_backend_num_for_external_table
Only take effect when prefer_compute_node_for_external_table is true.
If the compute node number is less than this value, query on external table will try to get some mix node
to assign, to let the total number of node reach this value.
If the compute node number is larger than this value, query on external table will assign to compute node only.
This patchset applies the following changes:
using vertical compaction machanism to do segcompaction
basic (WIP) refraction to separate segcompaction logic from BetaRowsetWriter
add segcompaction specific ut and regression tests
decode method is only used for big int and other decode method is only used in unit test.
I remove the useless method and we can remove mempool parameter from decode method.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
BloomFilter in MoW table may consume lots of memory, and it's life cycle is same as segment. This patch try to improve the efficiency of recycling segment cache, to release the memory in time.
Sense io error.
Retry query when io error.
Greylist: When finds one disk is completely broken, or the diff of tablet number in BE and FE meta is too large,reduce the query priority of the BE.
1. Fixed a problem with histogram statistics collection parameters.
2. Solved the problem that it takes a long time to collect histogram statistics.
TODO: Optimize histogram statistics sampling method and make the sampling parameters effective.
The problem is that the histogram function works as expected in the single-node test, but doesn't work in the multi-node test. In addition, the performance of the current support sampling to collect histogram is low, resulting in a large time consumption when collecting histogram information.
Fixed the parameter issue and temporarily removed support for sampling to speed up the collection of histogram statistics.
Will next support sampling to collect histogram information.
The element in InvertedIndexSearcherCache is inverted index searcher, which is a file descriptor of inverted index file, so InvertedIndexSearcherCache is actually cache file descriptor of inverted index file.
If open file descriptor limit of the Linux system is set too small and config inverted_index_searcher_cache_limit is too big, during high pressure load maybe cause "Too many open files".
So, when insert inverted index searcher into InvertedIndexSearcherCache, need also check whether reach file_descriptor_number limit for inverted index file.
Save cached file segment into path like `cache_path / hash(filepath).substr(0, 3) / hash(filepath) / offset`
to prevent too many directories in `cache_path`.
Support IPV6 in Apache Doris, the main changes are:
1. enable binding to IPV6 address if network priority in config file contains an IPV6 CIDR string
2. BRPC and HTTP support binding to IPV6 address
3. BRPC and HTTP support visiting IPV6 Services
Add cache for inverted index query match bitmap to accelerate common query keyword, especially for keyword matching many rows.
Tests result:
- large result: matching 99% out of 247 million rows shows 8x speed up.
- small result: matching 0.1% out of 247 million rows shows 2x speed up.
Issue Number: close#16351
Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.
This commit support:
1、Insert + select for struct/map type
2、Json stream load for struct type
3、m[key] function for map type
How to use:
Set the fe config to create table for struct and map type
1、admin set frontend config("enable_struct_type" = "true");
2、admin set frontend config("enable_map_type" = "true");
#16547
Co-authored-by: xy720 <xuyang25@baidu.com>
Co-authored-by: amory <wangqiannan@selectdb.com>
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
Co-authored-by: hucheng01 <hucheng01@baidu.com>
both update status and open_vectorized_internal will call send_report and stop report thread. move update_status code to open method and remove unnecessary send_report and stop_report_thread.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>