doris

Author	SHA1	Message	Date
starocean999	fff1983f40	[fix](planner)use tupleId of agg node to get its unsigned conjuncts (#21949 )	2023-07-19 00:46:49 +08:00
yujun	beec0e9169	[Improvement](tablet clone) impr tablet sched speed and fix tablet sched failed too many times (#21856 )	2023-07-18 23:25:22 +08:00
Ashin Gau	dcb165cc9f	[opt](hudi) get hudi split concurrently by using parallelStream (#21871 ) This PR contains two optimizations: 1. Using parallel stream to get hoodie splits concurrently. It reduce the split time from 1min20s to 12s when splitting 10,000 partitions. 2. Reading hoodie meta table to get table partitions. It reduce the getting partition time from 12min to 3s when reading 10,000 partitions.	2023-07-18 23:19:34 +08:00
morrySnow	d6d27ef428	[fix](Nereids) join other conjuncts should get slot from join output (#21840 )	2023-07-18 18:22:40 +08:00
Siyang Tang	e654b5ddfc	[enhancement](broker-load) support special partition path pattern (#21778 ) Some users may have non-ACID path like `/path/to/k=v/1/filename`, introducing by HQL statement `insert into union all`, for which path partition `k=v` should be parsed normally in broker load.	2023-07-18 14:50:37 +08:00
starocean999	ec12a4159a	[fix](planner) push conjuncts into SetOperationStmt inline view (#21718 ) * [fix](planner)push conjuncts into SetOperationStmt inline view	2023-07-18 14:17:07 +08:00
Xiangyu Wang	50b81a9c13	[Fix](multi-catalog) Filter invisible files for hive table. (#21867 ) In fact, hive can not read files which startswith "." or "_", so we need filter these files.	2023-07-18 13:08:12 +08:00
Pxl	417e3e5616	[Feature](delete) support fold constant on delete stmt (#21833 ) support fold constant on delete stmt	2023-07-18 12:56:28 +08:00
Pxl	19492b06c1	[Bug](decimalv3) fix failed on test_dup_tab_decimalv3 due to wrong precision (#21890 ) fix failed on test_dup_tab_decimalv3 due to wrong precision	2023-07-18 12:53:09 +08:00
mch_ucchi	e1a116af94	[fix](planner)normalize the behavior of from_unixtime() according to Nereids planner (#21723 ) if from_unixtime() receive an integer out of int range, the function returns null.	2023-07-18 12:15:38 +08:00
starocean999	07e720e65d	[fix](planner)need recalculate nullable info of output slots for join node (#21650 ) * [fix](planner)need recalculate nullable info of output slots for join node	2023-07-18 12:10:27 +08:00
Jibing-Li	489171e4c1	[Fix](multi catalog)Fix hive partition value contains special character such as / bug (#21876 ) Hive escapes some special characters in partition value to %XX, for example, / is escaped to %2F. Doris didn't handle this case which will cause doris failed to list the files under partition with special characters. This pr is to fix this bug.	2023-07-18 11:20:38 +08:00
yujun	ebd2a4b707	[fix](dynamic partition) fix create hot partition failed without error response (#20996 )	2023-07-18 10:56:37 +08:00
Mryange	b656f31cf2	[Enchancement](compatible) show decimalv3 to decimal (#21782 )	2023-07-18 09:17:14 +08:00
zhangstar333	b6517ed83b	[Enhance](function) add boolean type for sum agg function (#21862 ) before the sum agg not register for boolean type, so it need cast to other type can execute.	2023-07-18 08:06:52 +08:00
Tiewei Fang	83e5a29855	[Fix](Export) fix nullptr exception when upgrading from 1.2.3 to 2.0 (#21799 )	2023-07-18 00:07:09 +08:00
AKIRA	05cf095506	[feature](stats) Support full auto analyze (#21192 ) 1. Auto analyze all tables except for internal tables 2. make resource used by analyze configurable	2023-07-17 20:42:57 +08:00
yujun	be750e88b2	[fix](clone) fix cannot further repair clone replica which miss version data (#21382 )	2023-07-17 20:00:50 +08:00
zy-kkk	014b34bebb	[enhancement](jdbc catalog) Add mysql jdbc url param `rewriteBatchedStatements=true` (#21864 ) When `rewriteBatchedStatements=false`, the JDBC driver will not merge multiple insert statements into one larger insert statement. Therefore, during the batch insertion process, each insert statement needs to be sent to the MySQL server individually, leading to a higher number of network roundtrips. Network latency could potentially be a significant factor contributing to the performance degradation. For this reason, we propose to set this parameter to true by default, to enhance the performance of prepared statement batch inserts.	2023-07-17 17:39:26 +08:00
Jibing-Li	a92508c3f9	[Fix](statistics) Fix analyze db always use internal catalog bug (#21850 ) `Analyze database db_name ` command couldn't use current catalog, it is always using the internal catalog. This will cause the command failed to find the db. This pr is to fix this bug.	2023-07-17 15:28:54 +08:00
Mingyu Chen	5fc0a84735	[improvement](catalog) reduce the size thrift params for external table query (#21771 ) ### 1 In previous implementation, for each FileSplit, there will be a `TFileScanRange`, and each `TFileScanRange` contains a list of `TFileRangeDesc` and a `TFileScanRangeParams`. So if there are thousands of FileSplit, there will be thousands of `TFileScanRange`, which cause the thrift data send to BE too large, resulting in: 1. the rpc of sending fragment may fail due to timeout 2. FE will OOM For a certain query request, the `TFileScanRangeParams` is the common part and is same of all `TFileScanRange`. So I move this to the `TExecPlanFragmentParams`. After that, for each FileSplit, there is only a list of `TFileRangeDesc`. In my test, to query a hive table with 100000 partitions, the size of thrift data reduced from 151MB to 15MB, and the above 2 issues are gone. ### 2 Support when setting `max_external_file_meta_cache_num` <=0, the file meta cache for parquet footer will not be used. Because I found that for some wide table, the footer is too large(1MB after compact, and much more after deserialized to thrift), it will consuming too much memory of BE when there are many files. This will be optimized later, here I just support to disable this cache.	2023-07-17 13:37:02 +08:00
lihangyu	1101d7d947	[chore](topn opt) disable two phase read when light schema change is disabled (#21809 )	2023-07-17 12:46:28 +08:00
zy-kkk	03b575842d	[Feature](table function) support explode_json_array_json (#21795 )	2023-07-17 11:40:02 +08:00
zclllyybb	d0775f8209	[log](profile) add doris version info to query profile (#21501 )	2023-07-17 11:18:05 +08:00
Pxl	86841d8653	[Bug](materialized-view) fix some problems of mv and make ssb mv work on nereids (#21559 ) fix some problems of mv and make ssb mv work on nereids	2023-07-17 10:08:25 +08:00
herry2038	6fba092741	[optimization](show-frontends) Add start time in Show frontends (#21844 ) --------- Co-authored-by: yuxianbing <iloveqaz123>	2023-07-17 05:09:43 +08:00
starocean999	7a61953d17	[fix](nereids)SimplifyComparisonPredicate rule need special care for deicmalv3 and datetimev2 literal (#21575 )	2023-07-14 23:05:14 +08:00
mch_ucchi	c9a99ce171	[Feature](Nereids) support udf for Nereids (#18257 ) Support alias function, Java UDF, Java UDAF for Nereids. Implementation: UDFs(alias function, Java UD(A)F) are saved in database object, we get it by FunctionDesc, which requires function name and arg types. So firstly we bind expressions of its children so that we can get the return type of args. Then we get the best selection. Secondly: For alias function: The original function of the alias function is represented as original planner-style function, it's too hard to translate it to nereids-style expression hence we transfer it to the corresponding sql and parse it. Now we get the nereids-style function, and try to bind the function. the bound function will also change the type by add cast node of its children to its expecting input types, so that if we travel a bound function more than one times, the cast node will be different. To solve the problem, we add a flag isAnalyzedFunction. it's set false by default and will be set true when return from the visitor function. If the flag is true, it will return immediately in visitor function. Now we can ensure that the bound functions in children will be the same though we travel it more than one time. we can replace the alias function to its original function and bind the unbound functions. For JavaUDF and JavaUDAF JavaUDF and JavaUDAF can be recognized as a catalog function and hard to be entirely translated to Nereids-style function, we create a nereids expression object JavaUdf and JavaUdaf to wrap it. All in all, now Nereids support UDFs and nesting them.	2023-07-14 17:02:01 +08:00
DeadlineFen	d57bb84842	[Enhancement] (binlog) TBinlog and BinlogManager V2 (#21674 )	2023-07-14 16:59:32 +08:00
minghong	f95d728d3e	[shape](nereids) TPCDS check all query shape, except ds64 (#21742 ) there is a known bug on ds64 analyze. add ds 64 shape check latter	2023-07-14 16:56:46 +08:00
Pxl	4d44cea784	[Bug](materialized-view) check group expr at create mv (#21798 ) check group expr at create mv	2023-07-14 15:39:38 +08:00
minghong	62214cd1f4	[feature](nereids) adjust min/max of column stats for cast function (#21772 ) cast(A as date), where A is a string column. the min/max of result column stats should be calc like this: convert A.minExpr to a date dateA, and then get double value from dateA. add "explain memo plan select ..." to print memo from mysql client dump column stats for FileScanNode, used in datalake.	2023-07-14 12:54:04 +08:00
jakevin	2c897b82ad	[enhance](Nereids) Pushdown Project Through OuterJoin. (#21730 ) PushdownJoinOtherCondition will pushdown expression in condition into project, it will block JoinReorder, so we need to pushdown project to help JoinReorder	2023-07-14 11:46:29 +08:00
谢健	b2778d0724	[fix](Nereids) use groupExpr's children to make logicalPlan (#21794 ) After mergeGroup, the children of the plan are different from GroupExpr. To avoid optimizing out-dated group, we should construct new plan with groupExpr's children rather than plan's children	2023-07-14 11:41:38 +08:00
zhangstar333	c07e2ada43	[imporve](udaf) refactor java-udaf executor by using for loop (#21713 ) refactor java-udaf executor by using for loop	2023-07-14 11:37:19 +08:00
minghong	ea73dd5851	[improve](nereids)inner join estimation: assume children output at least one tuple #21792 this assumption is good to eliminate error propagation, when the filter estimation is too low, less than one.	2023-07-14 11:30:25 +08:00
Mryange	ebe771d240	[refactor](executor) remove unused variable	2023-07-14 10:35:59 +08:00
daidai	ca6e33ec0c	[feature](table-value-functions)add catalogs table-value-function (#21790 ) mysql> select * from catalogs() order by CatalogId;	2023-07-14 10:25:16 +08:00
Jibing-Li	352a0c2e17	[Improvement](multi catalog)Cache file system to improve list remote files performance (#21700 ) Use file system type and Conf as key to cache remote file system. This could avoid get a new file system for each external table partition's location. The time cost for fetching 100000 partitions with 1 file for each partition is reduced to 22s from about 15 minutes.	2023-07-14 09:59:46 +08:00
Ashin Gau	4158253799	[feature](hudi) support hudi time travel in external table (#21739 ) Support hudi time travel in external table: ``` select * from hudi_table for time as of '20230712221248'; ``` PR(https://github.com/apache/doris/pull/15418) supports to take timestamp or version as the snapshot ID in iceberg, but hudi only has timestamp as the snapshot ID. Therefore, when querying hudi table with `for version as of`, error will be thrown like: ``` ERROR 1105 (HY000): errCode = 2, detailMessage = Hudi table only supports timestamp as snapshot ID ``` The supported formats of timestamp in hudi are: 'yyyy-MM-dd HH:mm:ss[.SSS]' or 'yyyy-MM-dd' or 'yyyyMMddHHmmss[SSS]', which is consistent with the [time-travel-query.](https://hudi.apache.org/docs/quick-start-guide#time-travel-query) ## Partitioning Strategies Before this PR, hudi's partitions need to be synchronized to hive through [hive-sync-tool](https://hudi.apache.org/docs/syncing_metastore/#hive-sync-tool), or by setting very complex synchronization parameters in [spark conf](https://hudi.apache.org/docs/syncing_metastore/#sync-template). These processes are exceptionally complex and unnecessary, unless you want to query hudi data through hive. In addition, partitions are changed in time travel. We cannot guarantee the correctness of time travel through partition synchronization. So this PR directly obtain partitions by reading hudi meta information. Caching and updating table partition information through hudi instant timestamp, and reusing Doris' partition pruning.	2023-07-13 22:30:07 +08:00
slothever	c5dbd53e6f	[fix](multi-catalog)support oss-hdfs service (#21504 ) 1. support oss-hdfs if it is enabled when use dlf or hms catalog 2. add docs for aliyun dlf and mc.	2023-07-13 18:02:15 +08:00
mch_ucchi	d4bdd6768c	[Feature](Nereids) support select into outfile (#21197 )	2023-07-13 17:01:47 +08:00
Jack Drogon	14253b6a30	[fix](ccr) Add tableName in DropInfo && BatchDropInfo (#21736 ) Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>	2023-07-13 11:47:49 +08:00
LiBinfeng	f863c653e2	[Fix](Planner) fix limit execute before sort in show export job (#21663 ) Problem: When doing show export jobs, limit would execute before sort before changed. So the result would not be expected because limit always cut results first and we can not get what we want. Example: we having export job1 and job2 with JobId1 > JobId2. We want to get job with JobId1 show export from db order by JobId desc limit 1; We do limit 1 first, so we would probably get Job2 because JobId assigned from small to large Solve: We can not cut results first if we have order by clause. And cut result set after sorting	2023-07-13 11:17:28 +08:00
Calvin Kirs	2d2beb637a	[enhancement](RoutineLoad)Mutile table support pipeline load (#21678 )	2023-07-13 10:26:46 +08:00
Siyang Tang	e18465eac7	[feature](TVF) support path partition keys for external file TVF (#21648 )	2023-07-13 10:15:55 +08:00
Xiangyu Wang	105a162f94	[Enhancement](multi-catalog) Merge hms events every round to speed up events processing. (#21589 ) Currently we find that MetastoreEventsProcessor can not catch up the event producing rate in our cluster, so we need to merge some hms events every round.	2023-07-12 23:41:07 +08:00
minghong	0243c403f1	[refactor](nereids)set session var for bushy join (#21744 ) add session var: MAX_JOIN_NUMBER_BUSHY_TREE, default is 5 if table number is less than MAX_JOIN_NUMBER_BUSHY_TREE in a join cluster, nereids try bushy tree, o.w. zigzag tree	2023-07-12 16:40:48 +08:00
ElvinWei	3b76428de9	[fix](stats) when some stat is NULL, causing an exception during display stats (#21588 ) During manual statistics injection, some statistics may beNULL, causing an exception during display.	2023-07-12 14:57:06 +08:00
AKIRA	a18b345459	[opt](stats)update tbl stats of statistics collection after system statistics collection job succeeded (#21528 ) So that if FE crushed when system analyze task running, the system task for column could be created and running when FE recovered	2023-07-12 11:11:50 +08:00

1 2 3 4 5 ...

5219 Commits