Commit Graph

18263 Commits

Author SHA1 Message Date
Pxl
45ad297a1d [Enchancement](function) change aggregate function creator to return AggregateFunctionPtr (#18025)
change creator_type to return AggregateFunctionPtr.
remove some function and use creator directly.
2023-03-26 11:41:34 +08:00
5df011cd43 [typo](doc)Add cancel create materialized view grammar #18084 2023-03-26 11:39:25 +08:00
0347ae4dbd [Enhancement](proc) sort result by backend id when show backends (#18112) 2023-03-26 11:30:47 +08:00
c63807ccfe [chore](be) reduce log when trying to do async write cooldown meta (#18107) 2023-03-26 11:10:21 +08:00
5846b3fc54 [fix](memory) Remove PODArray peak allocated memory tracking #18010
#11740 , solved the problem that the query memory statistics are higher than the actual physical memory, because PODArray does not have memset 0 when allocating memory, and the query mem tracker is virtual memory.

But in extreme cases, such as csv load, PODArray frequent insert will cause performance problems. So revert part of #11740 and part of #12820.

The accuracy of the query mem tracker, there is currently no feedback, no further attention.
2023-03-26 09:45:10 +08:00
c5dcb633e9 [fix](hive)throw exception if complex type in text format table (#18013)
For Hive text input format: the column types ARRAY/MAP/STRUCT are not supported yet.
It will be supported over successive versions.

Co-authored-by: jinzhe <jinzhe@selectdb.com>
2023-03-25 23:26:52 +08:00
7c0bcbdca1 [enhance](parquet-reader) cache file meta of parquet to speed up query (#18074)
Problem:
1. FE will split the parquet file into split. So a file can have several splits.
2. BE will scan each split, read the footer of the parquet file.
3. If 2 splits belongs to a same parquet file, the footer of this file will be read twice.

This PR mainly changes:
1. Use kv cache to cache the footer of parquet file.
2. The kv cache is belong to a scan node, so all parquet reader belong to this scan node will share same kv cache.
3. In cache, the key is "meta_file_path", the value is parsed thrift footer.

The KV Cache is sharded into mutlti sub cache.
So that different file can use different sub cache, avoid blocking each other

In my test, a query with 26 splits can reduce the footer parse time from 4s -> 1s
2023-03-25 23:22:57 +08:00
96f274b8f3 [fix](global-variable) fix bug that set default value for global variable will cause NullPointerException (#18004) 2023-03-25 22:45:26 +08:00
df0eca4003 [improvement] (schema change) Lightweight schema change of modify column with varchar length (#17207)
Signed-off-by: Yisong Han <yisong8686@gmail.com>
2023-03-25 22:38:19 +08:00
74fdb6c116 [refactor](regression-test) refactor ssl test from p0 to p2 (#17847) 2023-03-25 22:37:26 +08:00
cb6fca95b2 [fix](lambda-func) fix lambda functions exception message errors (#18068) 2023-03-25 22:36:55 +08:00
360d3050bc [Feature](array-function) Support array_reverse_sort function (#17754)
Co-authored-by: zhangyu209 <zhangyu209@meituan.com>
2023-03-25 21:58:11 +08:00
50eeb2d9a4 [fix](json) change int to bigint for json function (#17769) 2023-03-25 21:57:29 +08:00
855852d582 [enhancement](timeout) fix set timeout failure and simplify timeout logic (#17837) 2023-03-25 21:56:06 +08:00
193ae352e4 [fix](coalesce) fix problem that coalesce function may cause problem of block mem reuse (#17940) 2023-03-25 21:50:37 +08:00
Pxl
a8753faeb1 [Bug](function) fix column complex not resize after filter (#18043) 2023-03-25 21:48:13 +08:00
77c9550420 [fix](bitmapfilter) fix bitmap filter timeout unit error (#18110) 2023-03-25 21:46:32 +08:00
f9013f2668 [feature](Nereids): pullup all semijoin through join. (#18106) 2023-03-25 20:25:28 +08:00
f36465e76e [enhancement](memory) optimize jemalloc heap profile doc (#18094) 2023-03-25 13:04:45 +08:00
7ae51c856e [refactor](unify exception) unify exception definition and error code (#18006)
* [refactor](unify exception) unify exception definition and error code


---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-25 12:41:07 +08:00
f84481886b [feature](string_functions) The 'split_part' function supports non-constant parameters (#18029) 2023-03-25 12:03:11 +08:00
e0518fd19d [fix](nereids)remove redundant visit call in Validator (#18103) 2023-03-25 11:41:34 +08:00
1164611393 [enhancement](planner) fix unclear exception msg when create mv (#17537)
a materialized view's from clause can only be a single table and not sub-query, but the exception msg is npe. The pr change it to a clear msg.
2023-03-25 11:36:40 +08:00
2408ca5da8 [Bug](DECIMALV3) Fix wrong precision for plus/minus (#18052)
Result type for DECIMAL(x, y) plus/minus DECIMAL(m, n) should be DECIMAL(max(x - y, m - n) + max(y + n) + 1, max(y + n))
2023-03-25 09:42:39 +08:00
b2c70b51cc [refactor](vectorized) delete row-based AnyVal and DateTimeVal (#18093) 2023-03-25 09:40:04 +08:00
dc4b719528 [enhancement](stats) Make estimation with histogram much more precisely (#18053) 2023-03-25 01:02:36 +08:00
51962fbfaf [fix](meta) FE should delete a colocate table's replica when it is redundant (#17998)
If a colocate table's tablet is heathy. When a BE report a extra reaplica to FE, FE will not delete it.
But it should be deleted, otherwise it will report again and again but no one will handle it.
2023-03-25 00:16:31 +08:00
80d2e6f4c1 [fix](nereids) should not assign stats after cast on the original slot (#18061)
select * from T where A = 10.0
suppose A is int column
after stats derive on `cast(A as double) = 10.0`, 

we set column stats for `cast(A as double)` on `A`
2023-03-24 21:37:06 +08:00
473f0c45ff [Bug](delete) Fix bug of delete partition prune error (#18057) 2023-03-24 20:22:12 +08:00
0523860877 [Enhancement](streamload) print profile for streamload (#18015)
When both enable_profile and enable_stream_load_profile_log is true, stream load profile is printed to the log
2023-03-24 20:17:33 +08:00
7ac7d35703 [bugfix](publish) fix TabletLoadInfo may released by delete txn (#17986) 2023-03-24 20:14:34 +08:00
039688978c [docs](doc) Add autobucket doc (#16746)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-03-24 19:58:58 +08:00
219ef01c65 [bugfix](k8s)roll back jackson version (#18046)
when Upgrade the version of jackson,k8s client will failed

java.lang.NoClassDefFoundError: org/yaml/snakeyaml/LoaderOptions
at com.fasterxml.jackson.dataformat.yaml.YAMLParser.(YAMLParser.java:191) ~[jackson-dataformat-yaml-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.dataformat.yaml.YAMLFactory._createParser(YAMLFactory.java:509) ~[jackson-dataformat-yaml-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.dataformat.yaml.YAMLFactory.createParser(YAMLFactory.java:413) ~[jackson-dataformat-yaml-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.dataformat.yaml.YAMLFactory.createParser(YAMLFactory.java:386) ~[jackson-dataformat-yaml-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.dataformat.yaml.YAMLFactory.createParser(YAMLFactory.java:15) ~[jackson-dataformat-yaml-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3677) ~[jackson-databind-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3645) ~[jackson-databind-2.14.2.jar:2.14.2]
at io.fabric8.kubernetes.client.internal.KubeConfigUtils.parseConfigFromString(KubeConfigUtils.java:47) ~[kubernetes-client-5.12.2.jar:?]
...
2023-03-24 19:36:59 +08:00
7bdd854fdc [fix](nereids) bucket shuffle and colocate join is not correctly recognized (#17807)
1. close (https://github.com/apache/doris/issues/16458) for nereids
2. varchar and string type should be treated as same type in bucket shuffle join scenario.
```
create table shuffle_join_t1 ( a varchar(10) not null )
create table shuffle_join_t2 ( a varchar(5) not null, b string not null, c char(3) not null )
```
the bellow 2 sqls can use bucket shuffle join
```
select * from shuffle_join_t1 t1 left join shuffle_join_t2 t2 on t1.a = t2.a;
select * from shuffle_join_t1 t1 left join shuffle_join_t2 t2 on t1.a = t2.b;
```
3. PushdownExpressionsInHashCondition should consider both hash and other conjuncts
4. visitPhysicalProject should handle MarkJoinSlotReference
2023-03-24 19:21:41 +08:00
562f572311 [enhancement](UDF) The user defined functions support global ('show functions'/'show create') operation (#16973) (#17964)
1. add the global keyword.

SHOW [GLOBAL] [FULL] [BUILTIN] FUNCTIONS [IN|FROM db] [LIKE 'function_pattern']

SHOW CREATE GLOBAL FUNCTION function_name(arg_type [, ...]);

2. show the details of the global udf.
2023-03-24 19:07:38 +08:00
354d109130 [feat](Nereids): check Memo Plan for Unit Test. (#18082) 2023-03-24 18:31:33 +08:00
eb7b59c1c6 [docs](plugins) Fix the information in auditlog plugin documentation #18073
The information in the document is incomplete, user may be get error message like:

mysql> INSTALL PLUGIN FROM "http://127.0.0.1:8039/auditloader.zip";
ERROR 1105 (HY000): errCode = 2, detailMessage = http://127.0.0.1:8039/auditloader.zip.md5. you should set md5sum in plugin properties or provide a md5 URI to check plugin file
2023-03-24 18:16:16 +08:00
cd28e9f3b5 [fix](function) fix encrypt/decrypt function bug select list expression not produced by aggregation output #18078
Fix function analysis repeat add child.

select list expression not produced by aggregation output (missing from GROUP BY clause?): if(length(`r_2_3`.`name`) % 32 = 0, aes_decrypt(unhex(`r_2_3`.`name`), '***'), `r_2_3`.`name`)
2023-03-24 18:03:18 +08:00
ca0e4844e8 [typo](comment) code comment fix (#17870)
Co-authored-by: wangqingtao6 <wangqingtao6@jd.com>
2023-03-24 17:47:30 +08:00
b244c41371 [Bug](regression-test) Fix grace stop be coredump in pipeline (#18076) 2023-03-24 17:44:06 +08:00
1a3c6b7ed9 [bugfix](testcase) use different table name in map testcases to avoid confilt (#18077) 2023-03-24 17:43:18 +08:00
Pxl
8249441335 [Bug](planner) add conjunct slotref id to table function node to avoid result incorrect (#18063)
add conjunct slotref id to table function node to avoid result incorrect
2023-03-24 14:48:03 +08:00
e8b9587fe6 [Improvement](dict) compute hash only if needed (#18058) 2023-03-24 11:45:58 +08:00
aa3ea4beed [fix](planner) failed to create view when use window function (#17815)
fix failed to create view when use window function because the view string contains slot id and which cannot be parsed.
2023-03-24 10:58:52 +08:00
22fce33fb2 [fix](nereids) fix bitmap function nullable trait and dphyper bugs (#18041)
1. some bitmap functions like bitmap_or, bitmap_and_count, bitmap_or_count etc shouldn't follow constant fold rule for PropagateNullable functions. So remove PropagateNullable property and these functions would use their own constant fold logic correctly
2. dphyper's PlanReceiver class shouldn't change hyperGraph's complex project info. So make PlanReceiver use its own copy of complex project info now.
2023-03-24 10:53:45 +08:00
f9f87545d6 [improve](Nereids): check slot from children in validator. (#17951) 2023-03-24 10:52:12 +08:00
a65616a5cd [enhancement](MTMV) Add a timeout for regression tests (#18048)
MTMV regression tests may loop forever due to some potential bugs. Therefore, we add a timeout to avoid endless loop. The value of the timeout is hard coded 30 minutes now.
2023-03-24 10:39:42 +08:00
1999cccde9 [feature](array-type) Unique table support array value (#17024)
Unique table support array value

---------

Co-authored-by: huangqixiang.871 <huangqixiang.871@bytedance.com>
2023-03-24 10:18:59 +08:00
1f8ba4948d [Fix](multi-catalog) add handler for hms INSERT EVENT. (#17933)
When we use a hive client to submit a `INSERT INTO TBL SELECT * FROM ...` or `INSERT INTO TBL VALUES ...`
sql and the table is non-partitioned table, the hms will generate an insert event. The insert stmt may changed the
hdfs file distribution of this table, but currently we do not handle this, so the file cache of this table may be inaccurate.
2023-03-24 10:17:47 +08:00
2a35adbba8 [vectorized](udaf) fix java-udaf case of P0 is unstable (#18054)
the udaf case is unstable reason:
when enable_pipeline_engine=true, the case of agg function only 1 instance,
so not merge the default value, but if instance>1, will merge the default value
2023-03-24 09:10:58 +08:00