doris

Author	SHA1	Message	Date
carlvinhust2012	4caa1e8041	[optimization](array-type) update the docs for import data to array column (#13345 ) 1. this pr is used to update the json load docs for import data to array column. when we use json to import data to array column, the Rapidjson will cause precision problems. so we update the json-load docs to specify how to avoid these problems. Issue Number: #7570 Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-10-17 12:43:22 +08:00
abmdocrt	045bccdbea	[Feature](Retention) support retention function (#13056 )	2022-10-17 11:00:47 +08:00
HappenLee	6ea9a65bb6	[Opt](vec) opt runtime filter for TPCH Q22 (#13339 )	2022-10-17 10:30:07 +08:00
Dongyang Li	c1588b2900	[thirdparty](zstd)update dist info and thirdparty change log (#13392 )	2022-10-17 09:09:16 +08:00
Xin Liao	2da7fe940c	[fix](regression-test) fix that multiple cases conflict with the same table name (#13395 )	2022-10-17 09:08:30 +08:00
Xinyi Zou	9454bcca12	[fix](memory) Fix USE_JEMALLOC=true UBSAN compilation error #13398	2022-10-17 08:52:14 +08:00
xy720	e84d9a6c87	[fix](array-type) Fix cast null to array make be core (#13324 ) Doris do not support explicitly cast NULL_TYPE to ANY type . ``` mysql> select cast(NULL as int); ERROR 1105 (HY000): errCode = 2, detailMessage = Invalid type cast of NULL from NULL_TYPE to INT ``` So we should also forbid user from casting NULL_TYPE to ARRAY type. This commit will produce the following effect: ``` mysql> select cast(NULL as array<int>); ERROR 1105 (HY000): errCode = 2, detailMessage = Invalid type cast of NULL from NULL_TYPE to ARRAY<INT(11)> ```	2022-10-17 00:04:50 +08:00
camby	162e60eb19	[fix](array-type) check value valid while insert data into array column (#13365 ) We should prevent insert while value overflow. 1. create table: `CREATE TABLE test_array_load_test_array_int_insert_db.test_array_load_test_array_int_insert_tb ( k1 int NULL, k2 array<int> NULL ) DUPLICATE KEY(k1) DISTRIBUTED BY HASH(k1) BUCKETS 5` 2. try insert data less than INT_MIN. `insert into test_array_load_test_array_int_insert_tb values (1005, [-2147483649])` Before this pr, the insert will success, but the value it not correct. Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-10-17 00:01:03 +08:00
zxealous	a83eaddfcf	[test](cache)Add remote cache ut (#13377 )	2022-10-16 23:59:50 +08:00
Gabriel	1d5ba9cbcc	[Improvement](like) Change `like` function to batch call (#13314 )	2022-10-16 16:18:22 +08:00
Pxl	632670a49c	[Enhancement](function) refactor of date function (#13362 ) refactor of date function	2022-10-16 14:31:26 +08:00
HappenLee	144486e220	[Opt](fun) simd the substring function and use stack buf to speed up (#13338 )	2022-10-16 11:48:34 +08:00
yiguolei	a5f3880649	[improvement](memory) disable page cache and chunk allocator, optimize memory allocate size (#13285 ) disable page cache by default disable chunk allocator by default not use chunk allocator for vectorized allocator by default add a new config memory_linear_growth_threshold = 128Mb, not allocate memory by RoundUpToPowerOf2 if the allocated size is larger than this threshold. This config is added to MemPool, ChunkAllocator, PodArray, Arena.	2022-10-15 17:27:17 +08:00
starocean999	bf2e20c4c4	[fix](agg) reset the content of grouping exprs instead of replace it with original exprs (#13376 ) * [fix](agg)the reseet the content of grouping exprs instead of replace it with original exprs * keep old behavior if the grouping type is not GROUP_BY	2022-10-15 11:07:35 +08:00
Dongyang Li	52397df9f0	[thirdparty](update) zstd 1.5.0 to 1.5.2 #13378	2022-10-15 10:50:20 +08:00
starocean999	f2fa9606c9	[fix](agg)count function should return 0 for null value (#13247 ) count(null) should return 0 instead of 1, the streaming_agg_serialize_to_column function didn't handle if the input value is null, this pr fix it.	2022-10-15 10:40:52 +08:00
zhangstar333	4bc33a54a1	[Fix](agg) fix bitmap agg core dump when phmap pointer assert alignment (#13381 )	2022-10-15 10:39:23 +08:00
Gabriel	8218cfed40	[Bug](function) Fix constant predicate evaluation (#13346 )	2022-10-15 01:05:29 +08:00
Gabriel	79a5125eff	[Improvement](predicates) Use datev2 as the compatible type between string and datev2 (#13348 ) If string literal can be converted to dateV2, we use datev2 as the compatible type instead of datetimev2.	2022-10-14 19:00:37 +08:00
jakevin	993f38fe3c	[feature](Nereids): use Multi join to rearrange join to eliminate cross join by using predicate. (#13353 )	2022-10-14 17:26:34 +08:00
Yongqiang YANG	5bc8858571	[fix](jsonreader) teach jsonreader to release memory (#13336 ) Allocator of rapidjson does not release memory, this fix use allocator with local buffer and call Clear to release memory allocated beyond local buffer.	2022-10-14 15:52:05 +08:00
TengJianPing	6746434770	[improvement](schema change) avoid using column ptr swap (#13273 )	2022-10-14 15:19:08 +08:00
ElvinWei	b82e54a525	[feature](statistics) support to drop table or partition statistics (#13303 ) Manually drop statistics for tables or partitions. Table or partition can be specified, if neither is specified, all statistics under the current database will be deleted. syntax: ```SQL DROP STATS [tableName [PARTITIONS(partitionNames)]]; -- e.g. DROP STATS; -- drop all table statistics under the current database DROP STATS t0; -- drop t0 statistics DROP STATS t1 PARTITIONS(p1); -- drop partition p1 statistics of t1 ```	2022-10-14 15:15:37 +08:00
camby	005c2cd43b	[feature](remote) support local cache GC by disk usage (#12897 ) * support local cache gc by disk usage * support gc per disk * refractor file cache size logic * also consider unused file caches while GC by disk size * change config file_cache_max_size_per_disk from GB to B * bugfix * update * use two stage locks to avoid lock while disk io * rdlock one by one for dummy file cache gc Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-10-14 15:11:29 +08:00
Xinyi Zou	50ae9e6b19	[enhancement](planner) support select table sample (#10170 ) ### Motivation TABLESAMPLE allows you to limit the number of rows from a table in the FROM clause. Used for data detection, quick verification of the accuracy of SQL, table statistics collection. ### Grammar ``` [TABLET tids] TABLESAMPLE n [ROWS \| PERCENT] [REPEATABLE seek] ``` Limit the number of rows read from the table in the FROM clause, select a number of Tablets pseudo-randomly from the table according to the specified number of rows or percentages, and specify the number of seeds in REPEATABLE to return the selected samples again. In addition, can also manually specify the TableID, Note that this can only be used for OLAP tables. ### Example Q1: ``` SELECT * FROM t1 TABLET(10001,10002) limit 1000; ``` explain: ``` partitions=1/1, tablets=2/12, tabletList=10001,10002 ``` Select the specified tabletID of the t1. Q2: ``` SELECT * FROM t1 TABLESAMPLE(1000000 ROWS) REPEATABLE 1 limit 1000; ``` explain: ``` partitions=1/1, tablets=3/12, tabletList=10001,10002,10003 ``` Q3: ``` SELECT * FROM t1 TABLESAMPLE(1000000 ROWS) REPEATABLE 2 limit 1000; ``` explain: ``` partitions=1/1, tablets=3/12, tabletList=10002,10003,10004 ``` Pseudo-randomly sample 1000 rows in t1. Note that several Tablets are actually selected according to the statistics of the table, and the total number of selected Tablet rows may be greater than 1000, so if you want to explicitly return 1000 rows, you need to add Limit. ### Design First, determine how many rows to sample from each partition according to the number of partitions. Then determine the number of Tablets to be selected for each partition according to the average number of rows of Tablet, If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition. If seek is specified, it will be selected sequentially from the seek tablet of the partition. And add the manually specified Tablet id to the selected Tablet.	2022-10-14 15:05:23 +08:00
Zhengguo Yang	a2a2be22a5	[ResourceTag](tag) Unified tag format verification (#13312 )	2022-10-14 14:21:55 +08:00
luozenglin	71f167ac51	[fix](sort) fix nullable column sorting incorrectly (#13125 )	2022-10-14 12:45:04 +08:00
morrySnow	a2e513720e	[feature](Nereids) auto fallback to legacy planner if analyze failed (#13351 ) 1. add NereidsException to wrap any exception thrown by Nereids 2. when we catch NereidsException and switch 'enableFallbackToOriginalPlanner' is on, we will use Legacy Planner to plan again	2022-10-14 10:38:21 +08:00
Mingyu Chen	8d729f9386	[fix](error-code) fix misuse fo OLAP_ERR_WRITE_PROTOBUF_ERROR (#13347 )	2022-10-14 09:57:07 +08:00
zhannngchen	ed73096f19	[improvemnt][doc] refine doc for unique key data model (#13319 )	2022-10-14 09:55:52 +08:00
Luzhijing	b58ae34d1b	[Doc](Readme)Update the 1.1.3 release note. (#13358 )	2022-10-14 09:55:18 +08:00
Xinyi Zou	8dc09ad05c	[enhancement](memory) Default Jemalloc as generic memory allocator #13367 gperftools/tcmalloc[https://github.com/gperftools/gperftools] is outdated, there are no new features for many years, only fix bugs. doris is currently used by default. google/tcmalloc[https://github.com/google/tcmalloc], very active recently, has many new features, and is expected to perform better than jemalloc, but there is currently no stable version. Moreover, the compilation dependencies are complex and difficult to integrate, and are incompatible with gperftools/tcmalloc, and there are few reference documents. jemalloc[https://github.com/jemalloc/jemalloc] performs better than gperftools/tcmalloc under high concurrency, and is mature and stable, looking forward to being the default memory allocator. Tested in Doris: #12496	2022-10-14 09:54:54 +08:00
starocean999	5e0c34b35a	[fix](join) should call getOutputTblRefIds to get child's tuple info (#13227 ) * [fix](join) should call getOutputTblRefIds to get child's tuple info	2022-10-14 09:46:14 +08:00
carlvinhust2012	88e08a92d8	[fix](array-type) fix the wrong result when import array element with double quotes (#12786 ) Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-10-13 23:07:19 +08:00
xy720	87e5e2b48b	[Fix](array-type) Disable schema change between array type columns (#13261 ) Currently, we do not support schema change between array type columns. We should forbid users from doing this operation.	2022-10-13 22:59:09 +08:00
luozenglin	de4315c1c5	[feature](function) support `initcap` string function (#13193 ) support `initcap` string function	2022-10-13 21:31:44 +08:00
luozenglin	cb300b0b39	[feature](agg) support `any`,`any_value` agg functions. (#13228 )	2022-10-13 18:31:19 +08:00
yiguolei	71d2d61d33	[chore](build release) remove doris home and user info from doris_be --version output (#13344 ) There will be personal info in doris_be --version, like this: doris-0.0.0-trunk RELEASE (build git://hk-dev01/mnt/disk2/ygl/code/github/apache-doris/be/../@8b7d928af26318f71098f1be2ab03ed83b1955fd) Built on Wed, 12 Oct 2022 18:36:44 CST by ygl@hk-dev01 Since we always not need this info, commit id is enough, I remove these redundant info, the new result is like this: doris-0.0.0-trunk RELEASE (build git://hk-dev01@8b7d928) Built on Thu, 13 Oct 2022 15:03:01 CST by hk-dev01	2022-10-13 18:24:04 +08:00
zhannngchen	fe1524a287	[Enhancement](load) remove load mem limit (#13111 ) #12716 removed the mem limit for single load task, in this PR I propose to remove the session variable load_mem_limit, to avoid confusing. For compatibility, load_mem_limit in thrift not removed, the value is set equal to exec_mem_limit in FE	2022-10-13 17:19:22 +08:00
jakevin	4a6eb01ccb	[refactor](Nereids): refactor UT by using Pattern and rename to remove consecutive (#13337 ) * rename * refactor UT	2022-10-13 16:41:51 +08:00
Gabriel	baf2689610	[Improvement](join) compute hash values by vectorized way (#13335 )	2022-10-13 16:04:58 +08:00
yixiutt	87793b7c00	[bugfix](datatimev2) fix value column loss precision and scale (#13233 ) Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-10-13 15:39:53 +08:00
Zhengguo Yang	0ff04e81bc	[fix](DynamicPartition) Not check max_dynamic_partition_num when disable DynamicPartition (#13267 ) Disable max_dynamic_partition_num check when disable DynamicPartition by ALTER TABLE tbl_name SET ("dynamic_partition.enable" = "false"), when max_dynamic_partition_num changed to larger and then changed to a lower value, the actual dynamic partition num may larger than max_dynamic_partition_num, and cannot disable DynamicPartition	2022-10-13 14:37:39 +08:00
Pxl	c1ed7d4d7d	[Bug](function) fix core dump on case when have 1000 condition #13315	2022-10-13 14:37:03 +08:00
Mingyu Chen	bdb8e08bd3	[fix](ci) rename the checks name for branch-1.1 (#13342 )	2022-10-13 14:36:18 +08:00
starocean999	830183984a	[fix](hash)update_hashes_with_value method should handle if input value is null (#13332 ) * [fix](hash)update_hashes_with_value method should handle if input value is null * remove unnessasery xxHash64NullWithSeed	2022-10-13 14:36:01 +08:00
jakevin	db7f955a70	[improve](Nereids): split otherJoinCondition with List. (#13216 ) * split otherJoinCondition with List.	2022-10-13 13:49:46 +08:00
jakevin	4248c6f37c	[improve](Nereids): avoid duplicated stats derive. (#13293 )	2022-10-13 13:49:21 +08:00
Gabriel	3e84c04195	[Bug](predicate) fix nullptr in scan node (#13316 )	2022-10-13 12:14:42 +08:00
xueweizhang	e08ba8d573	[feature](restore) Add new property 'reserve_dynamic_partition_enable' to restore statement (#12498 ) Add restore new property 'reserve_dynamic_partition_enable', which means you can get a table with dynamic_partition_enable property which has the same value as before the backup. before this commit, you always get a table with property 'dynamic_partition_enable=false' when restore.	2022-10-13 11:16:15 +08:00

... 30 31 32 33 34 ...

8276 Commits