doris

Author	SHA1	Message	Date
Xinyi Zou	6964d9f99c	[fix](function) resubmit-fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17907 ) * Revert "[fix](function) fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17420)" This reverts commit 397cc011c4f1ba5a25c770258c13f1cd3f28b47d. * [fix-resubmit](function) fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17420) ECB algorithm, block_encryption_mode does not take effect, it only takes effect when init vector is provided. Solved: 192/256 supports calculation without init vector For other algorithms, an error should be reported when there is no init vector Initialization Vector. The default value for the block_encryption_mode system variable is aes-128-ecb, or ECB mode, which does not require an initialization vector. The alternative permitted block encryption modes CBC, CFB1, CFB8, CFB128, and OFB all require an initialization vector. Reference: https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_aes-decrypt Note: This fix does not support smooth upgrades. during upgrade process, query may report error: funciton not found	2023-03-29 21:13:01 +08:00
Xinyi Zou	d9fe5f7b67	[enhancement](memory) Remove MemPool and replace it with Arena (#17820 ) Arena can replace MemPool in most scenarios. Except for memory reuse, MemPool supports reuse of previous memory chunks after clear, but Arena does not. Some comparisons between MemPool and Arena: 1. Expansion Arena is less than 128M index 2 alloc chunk; more than 128M memory, allocate 128M * n > `size`, n is equal to the minimum value that satisfies the expression; MemPool less than 512K index 2 alloc chunk, greater than 512K memory, separately apply for a `size` length chunk After Arena applied for a chunk larger than 128M last time, the minimum chunk applied for after that is 128M. Does this seem to be a waste of memory? MemPool is also similar. After the chunk of 512K was applied for last time, the minimum chunk of subsequent applications is 512K. 2. Alignment MemPool defaults to 16 alignment, because memtable and other places that use int128 require 16 alignment; Arena has no default alignment; 3. Memory reuse Arena only supports `rollback`, which reuses the memory of the current chunk, usually the memory requested last time. MemPool supports clear(), all chunks can be reused; or call ReturnPartialAllocation() to roll back the last requested memory; if the last chunk has no memory, search for the most free chunk for allocation 4. Realloc Arena supports realloc contiguous memory; it also supports realloc contiguous memory from any position at the time of the last allocation. The difference between `alloc_continue` and `realloc` is: 1. Alloc_continue does not need to specify the old size, but the default old size = head->pos - range_start 2. alloc_continue supports expansion from range_start when additional_bytes is between head and pos, which is equivalent to reusing a part of memory, while realloc completely allocates a new memory MemPool does not support realloc, but supports transferring or absorbing chunks between two MemPools 5. check mem limit MemPool checks the mem limit, and Arena checks at the Allocator layer. 6. Support for ASAN Arena does something extra 7. Error handling MemPool supports returning the error message of application failure directly through `Status`, and Arena throws Exception. Tests that Arena can consider 1. After the last applied chunk is larger than 128M, the minimum applied chunk is 128M, which seems to waste memory; 2. Support clear, memory multiplexing; 3. Increase the large list, alloc the memory larger than 128M, and the size is equal to `size`, so as to avoid the current chunk not being fully used, which is wasteful. 4. In some cases, it may be possible to allocate backwards to find chunks t	2023-03-29 20:56:49 +08:00
Pxl	664fbffcba	[Enchancement](table-function) optimization for vectorized table function (#17973 )	2023-03-29 10:45:00 +08:00
Mingyu Chen	05db6e9b55	[refactor](file-system)(step-2) remove env, file_utils and filesystem_utils (#18009 ) Follow #17586. This PR mainly changes: Remove env/ Remove FileUtils/FilesystemUtils Some methods are moved to LocalFileSystem Remove olap/file_cache Add s3 client cache for s3 file system In my test, the time of open s3 file can be reduced significantly Fix cold/hot separation bug for s3 fs. This is the last PR of #17764. After this, all IO operation should be in io/fs. Except for tests in #17586, I also tested some case related to fs io: clone concurrency query on local/s3/hdfs load error log create and clean disk metrics	2023-03-29 09:00:52 +08:00
Liqf	012f7bd031	[feature](function)Add ST_Area function (#18138 )	2023-03-28 19:36:09 +08:00
herry2038	09e346e47c	[fix](type) Data precision is lost when converting DOUBLE type data to DECIMAL (#17191 ) (#17562 ) 1. Fix bug when converting DOUBLE to DECIMAL; 2. Fix bug when converting DOUBLE to DECIMALV3;	2023-03-28 09:46:43 +08:00
yiguolei	359f5be53e	[refactor](cgroup) remove cgroup manager it is useless (#18124 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-27 23:02:18 +08:00
zclllyybb	5191b4f473	[fix](ut)support run be-ut on release mode (#18119 ) Fixed improper usage. So now be ut could be run on release mode. btw, split be build type environment variable to be/be-ut.	2023-03-27 23:00:03 +08:00
YueW	2929a96224	[Refactor](inverted index cache) Use asc set instead of priority queue at the lru cache (#18033 ) use asc set instead of priority queue at the LRU cache, to keep the lifecycle of the LRUHandle consistent in the sorted set and the LRU free list	2023-03-27 10:27:37 +08:00
Liqf	bcf95cd920	[feature](function)Add ST_Angle_Sphere function (#17919 )	2023-03-27 10:14:46 +08:00
Mingyu Chen	7c0bcbdca1	[enhance](parquet-reader) cache file meta of parquet to speed up query (#18074 ) Problem: 1. FE will split the parquet file into split. So a file can have several splits. 2. BE will scan each split, read the footer of the parquet file. 3. If 2 splits belongs to a same parquet file, the footer of this file will be read twice. This PR mainly changes: 1. Use kv cache to cache the footer of parquet file. 2. The kv cache is belong to a scan node, so all parquet reader belong to this scan node will share same kv cache. 3. In cache, the key is "meta_file_path", the value is parsed thrift footer. The KV Cache is sharded into mutlti sub cache. So that different file can use different sub cache, avoid blocking each other In my test, a query with 26 splits can reduce the footer parse time from 4s -> 1s	2023-03-25 23:22:57 +08:00
xueweizhang	50eeb2d9a4	[fix](json) change int to bigint for json function (#17769 )	2023-03-25 21:57:29 +08:00
hqx871	1999cccde9	[feature](array-type) Unique table support array value (#17024 ) Unique table support array value --------- Co-authored-by: huangqixiang.871 <huangqixiang.871@bytedance.com>	2023-03-24 10:18:59 +08:00
Xinyi Zou	ebef0c038d	Revert "[fix](function) fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17420 )" (#17887 ) This reverts commit 397cc011c4f1ba5a25c770258c13f1cd3f28b47d.	2023-03-22 13:28:25 +08:00
AlexYue	6cbf393665	[enhance](meta action) remove useless pb field and refactor writer cooldown meta code (#17652 )	2023-03-22 11:13:13 +08:00
Mingyu Chen	cb79e42e5c	[refactor](file-system)(step-1) refactor file sysmte on BE and remove storage_backend (#17586 ) See #17764 for details I have tested: - Unit test for local/s3/hdfs/broker file system: be/test/io/fs/file_system_test.cpp - Outfile to local/s3/hdfs/broker. - Load from local/s3/hdfs/broker. - Query file on local/s3/hdfs/broker file system, with table value function and catalog. - Backup/Restore with local/s3/hdfs/broker file system Not test: - cold & host data separation case.	2023-03-21 21:08:38 +08:00
Gabriel	bd8e3e6405	[refactor](date) unify DateTimeValue and VecDateTimeValue (#17670 )	2023-03-20 16:27:08 +08:00
yiguolei	dd53bc1c8d	[unify type system](remove unused type desc) remove some code (#17921 ) There are many type definitions in BE. Should unify the type system and simplify the development. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-19 14:05:02 +08:00
TengJianPing	dfa2528b5e	[fix](bitmap) fix wrong result of bitmap count functions for null values (#17849 ) bitmap count functions result is null when there are null values, which is not right:	2023-03-19 11:49:58 +08:00
Qi Chen	b4b126b817	[Feature](parquet-reader) Implements dict filter functionality parquet reader. (#17594 ) Implements dict filter functionality parquet reader to improve performance.	2023-03-16 20:29:27 +08:00
yixiutt	caed2155f5	[test](fix) use vertorized interface in test (#17649 )	2023-03-16 15:23:07 +08:00
yiguolei	77ab2fac20	[refactor](functioncontext) remove function context impl class (#17715 ) * [refactor](functioncontext) remove function context impl class Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> --------- Co-authored-by: yiguolei <yiguolei@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2023-03-14 11:21:45 +08:00
spaces-x	5b39fa9843	[Feature](vec)(quantile_state): support quantile state in vectorized engine (#16562 ) * [Feature](vectorized)(quantile_state): support vectorized quantile state functions 1. now quantile column only support not nullable 2. add up some regression test cases 3. set default enable_quantile_state_type = true --------- Co-authored-by: spaces-x <weixiang06@meituan.com>	2023-03-14 10:54:04 +08:00
Pxl	16fc3a0e22	[Chore](compile) remove some unused static on inline function to reduce compile time (#17603 ) remove some unused static on inline function to reduce compile time	2023-03-13 11:11:59 +08:00
Pxl	65b8dfc7ff	[Enchancement](function) Inline some aggregate function && remove nullable combinator (#17328 ) 1. Inline some aggregate function 2. remove nullable combinator	2023-03-09 10:39:04 +08:00
Xinyi Zou	397cc011c4	[fix](function) fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17420 ) ECB algorithm, block_encryption_mode does not take effect, it only takes effect when init vector is provided. Solved: 192/256 supports calculation without init vector For other algorithms, an error should be reported when there is no init vector Initialization Vector. The default value for the block_encryption_mode system variable is aes-128-ecb, or ECB mode, which does not require an initialization vector. The alternative permitted block encryption modes CBC, CFB1, CFB8, CFB128, and OFB all require an initialization vector. Reference: https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_aes-decrypt Note: This fix does not support smooth upgrades. during upgrade process, query may report error: funciton not found	2023-03-09 09:51:41 +08:00
ElvinWei	bd5ed2b0c2	[enhancement](histogram) optimize the histogram bucketing strategy, etc (#17264 ) * optimize the histogram bucketing strategy, etc * fix p0 regression of histogram	2023-03-08 20:12:05 +08:00
TengJianPing	eea6d770d7	[fix](bitmap) fix wrong result of bitmap_or for null (#17456 ) Result of select bitmap_to_string(bitmap_or(to_bitmap(1), null)) should be 1 instead of null. This PR fix logic of bitmap_or and bitmap_or_count. Other count related funcitons should also be checked and fix, they will be fixed in another PR.	2023-03-08 16:29:01 +08:00
AlexYue	273d2100ac	[enhance](cooldown) turn write cooldown meta async (#16813 )	2023-03-08 14:06:21 +08:00
yiguolei	9213dd906a	[enhancement](exception) add exception structure and using unique ptr in VExplodeBitmapTableFunction (#17531 ) add exception class in common. using unique ptr in VExplodeBitmapTableFunction support single exception or nested exception, like this: ---SingleException [E-100] test OS_ERROR bug @ 0x55e80b93c0d9 doris::Exception::Exception<>() @ 0x55e80b938df1 doris::ExceptionTest_NestedError_Test::TestBody() @ 0x55e82e16bafb testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x55e82e15ab3a testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x55e82e1361e3 testing::Test::Run() @ 0x55e82e136f29 testing::TestInfo::Run() @ 0x55e82e1376e4 testing::TestSuite::Run() @ 0x55e82e148042 testing::internal::UnitTestImpl::RunAllTests() @ 0x55e82e16dcab testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x55e82e15ce4a testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x55e82e147bab testing::UnitTest::Run() @ 0x55e80c4b39e3 RUN_ALL_TESTS() @ 0x55e80c4a99b5 main @ 0x7f0a619d0493 __libc_start_main @ 0x55e80b84602a _start @ (nil) (unknown)	2023-03-08 10:44:14 +08:00
yiguolei	4692d6764c	[refactor](remove string val) remove string val structure, it is same with string ref (#17461 ) remove stringval, decimalv2val, bigintval	2023-03-08 10:42:20 +08:00
pengxiangyu	e94170d81f	[feature](cooldown)add ut for cooldown on be (#17246 ) * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be * add ut for cooldown on be	2023-03-07 10:42:53 +08:00
Pxl	d8f0ca7108	[Chore](schema change) remove some unused code in schema change (#17459 ) remove some unused code in schema change. remove some row-based config and code.	2023-03-07 09:18:34 +08:00
Ashin Gau	1d858db617	[feature](filecache) add a const parameter to control the cache version (#17441 ) * [feature](filecache) add a const parameter to control the cache version * fix	2023-03-07 08:03:18 +08:00
yiguolei	9477c48ef8	[refactor](functioncontext) remove duplicate type definition in function context (#17421 ) remove duplicate type definition in function context remove unused method in function context not need stale state in vexpr context because vexpr is stateless and function context saves state and they are cloned. remove useless slot_size in all tuple or slot descriptor. remove doris_udf namespace, it is useless. remove some unused macro definitions. init v_conjuncts in vscanner, not need write the same code in every scanner. using unique ptr to manage function context since it could only belong to a single expr context. Issue Number: close #xxx --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-06 16:07:09 +08:00
AlexYue	400b4bf7a7	[enhance](report) add local and remote size in tablet meta header action (#17406 )	2023-03-06 10:43:57 +08:00
yiguolei	b9b028099d	[enhancement](stream load pipe) using queryid or load id to identify stream load pipe instead of fragment instance id (#17362 ) * [enhancement](stream load pipe) using queryid or load id to identify stream load pipe instead of fragment instance id NewLoadStreamMgr already has pipe and other info. Do not need save the pipe into fragment state. and FragmentState should be more clear. But this pr will change the behaviour of BE. I will pick the pr to doris 1.2.3 and add the load id to FE support. The user could upgrade from 1.2.3 to 2.x Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-04 16:19:36 +08:00
Adonis Ling	3b94ca5ceb	[chore](macOS) Use LLVM Clang by default (#17292 ) Use LLVM Clang by default	2023-03-03 14:18:02 +08:00
ZhaoChangle	e82b827bc8	[optimize](vectorization)Optimize to_string's performance. (#17076 )	2023-03-03 10:35:59 +08:00
yiguolei	17f4990bd3	[enhancement](functioncontext) function context should use shared ptr and simply function context (#17311 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-02 16:23:54 +08:00
Mingyu Chen	30df268c1f	[fix](hdfs)(catalog) fix BE crash when hdfs-site.xml not exist in be/conf and fix compute node logic (#17244 ) We set LIBHDFS3_CONF env in start_be.sh, so libhdfs3 will try to read this hdfs-site.xml, if file does not exist, it will throw error. But Doris does not handle this error, cause BE crash. This CL mainly changes: Modify start_be.sh to only set LIBHDFS3_CONF if hdfs-site.xml exist. Refactor the HDFSCommonBuilder so that it can return error correctly. Add BE IP info in status, so that we can get ip from error msg like: ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]failed to init reader for file 000.snappy.orc, err: [INTERNAL_ERROR][172.21.0.101]failed to init HDFSCommonBuilder, please check check be/conf/hdfs-site.xml The logic of prefer compute node is wrong, which causing the external table query can only assign up to 3 backends. This CL refactor this logic and also change some FE config: prefer_compute_node_for_external_table If set to true, query on external table will prefer to assign to compute node. And the max number of compute node is controlled by min_backend_num_for_external_table. If set to false, query on external table will assign to any node. min_backend_num_for_external_table Only take effect when prefer_compute_node_for_external_table is true. If the compute node number is less than this value, query on external table will try to get some mix node to assign, to let the total number of node reach this value. If the compute node number is larger than this value, query on external table will assign to compute node only.	2023-03-02 11:09:55 +08:00
zhengyu	62ec74f4e7	segcompaction featuring verticalcompaction (#16731 ) This patchset applies the following changes: using vertical compaction machanism to do segcompaction basic (WIP) refraction to separate segcompaction logic from BetaRowsetWriter add segcompaction specific ut and regression tests	2023-03-01 10:55:40 +08:00
Jerry Hu	a1db5c6f52	[fix](vec) crash caused by not-implemented function in ColumnFixedLengthObject (#17215 )	2023-02-28 15:27:06 +08:00
yiguolei	33acaa067b	[refactor](mempool) remove mempool parameter from key decoder methods (#17137 ) decode method is only used for big int and other decode method is only used in unit test. I remove the useless method and we can remove mempool parameter from decode method. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-02-27 11:16:14 +08:00
amory	7229751bd9	[Improve](map-type) Add contains_null for map (#16948 ) Add contains_null for map type.	2023-02-23 20:47:26 +08:00
zhannngchen	e5f884a6fc	[enhancement](cache) make segment cache prune more effectively (#17011 ) BloomFilter in MoW table may consume lots of memory, and it's life cycle is same as segment. This patch try to improve the efficiency of recycling segment cache, to release the memory in time.	2023-02-23 18:24:18 +08:00
zhannngchen	edead494cb	[Enhancement](storage) add a new hidden column __DORIS_VERSION_COL__ for unique key table (#16509 )	2023-02-23 15:47:17 +08:00
Lijia Liu	8eeb435963	[improvement](meta) Enhance Doris's fault tolerance to disk error (#16472 ) Sense io error. Retry query when io error. Greylist: When finds one disk is completely broken, or the diff of tablet number in BE and FE meta is too large,reduce the query priority of the BE.	2023-02-23 08:40:45 +08:00
Xinyi Zou	b194a7cf83	[improvement](memory) Support GC segment cache, when memory insufficient (#16987 ) fix segment cache memory tracker statistics support GC	2023-02-22 18:31:20 +08:00
Xin Liao	0b624d282d	[enhancement](ut) add merge-on-write ut code back (#16939 )	2023-02-22 16:29:15 +08:00

1 2 3 4 5 ...

1026 Commits