doris

Author	SHA1	Message	Date
Kang	83ce4379ff	[regression] add order by in test case for stable output (#21815 )	2023-07-14 18:01:43 +08:00
Mryange	7a6ae12ebb	[imporve](bloomfilter) refactor runtime_filter_mgr with bloomfilter and fix bug in change_to_bloom_filter (#21783 )	2023-07-14 17:47:32 +08:00
mch_ucchi	c9a99ce171	[Feature](Nereids) support udf for Nereids (#18257 ) Support alias function, Java UDF, Java UDAF for Nereids. Implementation: UDFs(alias function, Java UD(A)F) are saved in database object, we get it by FunctionDesc, which requires function name and arg types. So firstly we bind expressions of its children so that we can get the return type of args. Then we get the best selection. Secondly: For alias function: The original function of the alias function is represented as original planner-style function, it's too hard to translate it to nereids-style expression hence we transfer it to the corresponding sql and parse it. Now we get the nereids-style function, and try to bind the function. the bound function will also change the type by add cast node of its children to its expecting input types, so that if we travel a bound function more than one times, the cast node will be different. To solve the problem, we add a flag isAnalyzedFunction. it's set false by default and will be set true when return from the visitor function. If the flag is true, it will return immediately in visitor function. Now we can ensure that the bound functions in children will be the same though we travel it more than one time. we can replace the alias function to its original function and bind the unbound functions. For JavaUDF and JavaUDAF JavaUDF and JavaUDAF can be recognized as a catalog function and hard to be entirely translated to Nereids-style function, we create a nereids expression object JavaUdf and JavaUdaf to wrap it. All in all, now Nereids support UDFs and nesting them.	2023-07-14 17:02:01 +08:00
DeadlineFen	d57bb84842	[Enhancement] (binlog) TBinlog and BinlogManager V2 (#21674 )	2023-07-14 16:59:32 +08:00
minghong	f95d728d3e	[shape](nereids) TPCDS check all query shape, except ds64 (#21742 ) there is a known bug on ds64 analyze. add ds 64 shape check latter	2023-07-14 16:56:46 +08:00
Pxl	4d44cea784	[Bug](materialized-view) check group expr at create mv (#21798 ) check group expr at create mv	2023-07-14 15:39:38 +08:00
minghong	62214cd1f4	[feature](nereids) adjust min/max of column stats for cast function (#21772 ) cast(A as date), where A is a string column. the min/max of result column stats should be calc like this: convert A.minExpr to a date dateA, and then get double value from dateA. add "explain memo plan select ..." to print memo from mysql client dump column stats for FileScanNode, used in datalake.	2023-07-14 12:54:04 +08:00
Siyang Tang	b013f8006d	[enhancement](multi-table) enable mullti table routine load on pipeline engine (#21729 )	2023-07-14 12:16:32 +08:00
jakevin	2c897b82ad	[enhance](Nereids) Pushdown Project Through OuterJoin. (#21730 ) PushdownJoinOtherCondition will pushdown expression in condition into project, it will block JoinReorder, so we need to pushdown project to help JoinReorder	2023-07-14 11:46:29 +08:00
谢健	b2778d0724	[fix](Nereids) use groupExpr's children to make logicalPlan (#21794 ) After mergeGroup, the children of the plan are different from GroupExpr. To avoid optimizing out-dated group, we should construct new plan with groupExpr's children rather than plan's children	2023-07-14 11:41:38 +08:00
zhangstar333	c07e2ada43	[imporve](udaf) refactor java-udaf executor by using for loop (#21713 ) refactor java-udaf executor by using for loop	2023-07-14 11:37:19 +08:00
minghong	ea73dd5851	[improve](nereids)inner join estimation: assume children output at least one tuple #21792 this assumption is good to eliminate error propagation, when the filter estimation is too low, less than one.	2023-07-14 11:30:25 +08:00
Mryange	ebe771d240	[refactor](executor) remove unused variable	2023-07-14 10:35:59 +08:00
daidai	ca6e33ec0c	[feature](table-value-functions)add catalogs table-value-function (#21790 ) mysql> select * from catalogs() order by CatalogId;	2023-07-14 10:25:16 +08:00
Jibing-Li	352a0c2e17	[Improvement](multi catalog)Cache file system to improve list remote files performance (#21700 ) Use file system type and Conf as key to cache remote file system. This could avoid get a new file system for each external table partition's location. The time cost for fetching 100000 partitions with 1 file for each partition is reduced to 22s from about 15 minutes.	2023-07-14 09:59:46 +08:00
amory	cbddff0694	[FIX](map) fix map key-column nullable for arrow serde #21762 arrow is not support key column has null element , but doris default map key column is nullable , so need to deal with if doris map row if key column has null element , we put null to arrow	2023-07-14 00:30:07 +08:00
HappenLee	254f76f61d	[Agg](exec) support aggregation_node limit short circuit (#21767 )	2023-07-14 00:29:19 +08:00
Qi Chen	6fd8f5cd2f	[Fix](parquet-reader) Fix parquet string column min max statistics issue which caused query result incorrectly. (#21675 ) In parquet, min and max statistics may not be able to handle UTF8 correctly. Current processing method is using min_value and max_value statistics introduced by PARQUET-1025 if they are used. If not, current processing method is temporarily ignored. A better way is try to read min and max statistics if it contains only ASCII characters. I will improve it in the future PR.	2023-07-14 00:09:41 +08:00
Ashin Gau	4158253799	[feature](hudi) support hudi time travel in external table (#21739 ) Support hudi time travel in external table: ``` select * from hudi_table for time as of '20230712221248'; ``` PR(https://github.com/apache/doris/pull/15418) supports to take timestamp or version as the snapshot ID in iceberg, but hudi only has timestamp as the snapshot ID. Therefore, when querying hudi table with `for version as of`, error will be thrown like: ``` ERROR 1105 (HY000): errCode = 2, detailMessage = Hudi table only supports timestamp as snapshot ID ``` The supported formats of timestamp in hudi are: 'yyyy-MM-dd HH:mm:ss[.SSS]' or 'yyyy-MM-dd' or 'yyyyMMddHHmmss[SSS]', which is consistent with the [time-travel-query.](https://hudi.apache.org/docs/quick-start-guide#time-travel-query) ## Partitioning Strategies Before this PR, hudi's partitions need to be synchronized to hive through [hive-sync-tool](https://hudi.apache.org/docs/syncing_metastore/#hive-sync-tool), or by setting very complex synchronization parameters in [spark conf](https://hudi.apache.org/docs/syncing_metastore/#sync-template). These processes are exceptionally complex and unnecessary, unless you want to query hudi data through hive. In addition, partitions are changed in time travel. We cannot guarantee the correctness of time travel through partition synchronization. So this PR directly obtain partitions by reading hudi meta information. Caching and updating table partition information through hudi instant timestamp, and reusing Doris' partition pruning.	2023-07-13 22:30:07 +08:00
Calvin Kirs	23272abf48	[chore](docs)Removed documentation related to dynamic tables (#21803 ) since the feature was reworked	2023-07-13 22:20:20 +08:00
minghong	37e247536a	[tpcds](nereids) add tpchds 1T shape check #21753 add regression case to simulate tpcds 1T. shape check will be added later after they are stable.	2023-07-13 21:44:10 +08:00
abmdocrt	fd6553b218	[Fix](MoW) Fix bug about caculating all committed rowsets delete bitmaps when do comapction (#21760 )	2023-07-13 21:10:15 +08:00
Xin Liao	2c83e5a538	[fix](merge-on-write) fix be core and delete unused pending publish info for async publish when tablet dropped (#21793 )	2023-07-13 21:09:51 +08:00
Xin Liao	35fa9496e7	[fix](merge-on-write) fix wrong result when query with prefix key predicate (#21770 )	2023-07-13 19:56:00 +08:00
slothever	c5dbd53e6f	[fix](multi-catalog)support oss-hdfs service (#21504 ) 1. support oss-hdfs if it is enabled when use dlf or hms catalog 2. add docs for aliyun dlf and mc.	2023-07-13 18:02:15 +08:00
Jibing-Li	c78349a4c6	[Docs](statistics)Add external table statistic docs (#21567 )	2023-07-13 17:54:34 +08:00
Dongyang Li	22b59038d5	[pipeline](ckb) Update auto_trigger_teamcity.yml (#21769 )	2023-07-13 17:44:25 +08:00
Kang	abc21f5d77	[bugfix](ngram bf index) process differently for normal bloom filter index and ngram bf index (#21310 ) * process differently for normal bloom filter index and ngram bf index * fix review comments for readbility * add test case * add testcase for delete condition	2023-07-13 17:31:45 +08:00
mch_ucchi	d4bdd6768c	[Feature](Nereids) support select into outfile (#21197 )	2023-07-13 17:01:47 +08:00
Mingyu Chen	b72e0d9172	[github](labeler) remove scope labeler (#21789 ) Scope labeler is useless now, I think we can remove it.	2023-07-13 16:13:58 +08:00
Euporia	8a42ba5742	[typo](docs) modify bitmap function document (#21721 )	2023-07-13 14:02:10 +08:00
AKIRA	06d129c364	[docs](stats) Update statistics related content #21766 1. Update grammar of `ANALYZE` 2. Add command description about how to delete a analyze job	2023-07-13 13:51:26 +08:00
airborne12	e167394dc1	[Fix](pipeline) close sink when fragment context destructs (#21668 ) Co-authored-by: airborne12 <airborne12@gmail.com>	2023-07-13 11:52:24 +08:00
Jack Drogon	14253b6a30	[fix](ccr) Add tableName in DropInfo && BatchDropInfo (#21736 ) Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>	2023-07-13 11:47:49 +08:00
lihangyu	9cad929e96	[Fix](rowset) When a rowset is cooled down, it is directly deleted. This can result in data query misses in the second phase of a two-phase query. (#21741 ) * [Fix](rowset) When a rowset is cooled down, it is directly deleted. This can result in data query misses in the second phase of a two-phase query. related pr #20732 There are two reasons for moving the logic of delayed deletion from the Tablet to the StorageEngine. The first reason is to consolidate the logic and unify the delayed operations. The second reason is that delayed garbage collection during queries can cause rowsets to remain in the "stale rowsets" state, preventing the timely deletion of rowset metadata, It may cause rowset metadata too large. * not use unused rowsets	2023-07-13 11:46:12 +08:00
LiBinfeng	f863c653e2	[Fix](Planner) fix limit execute before sort in show export job (#21663 ) Problem: When doing show export jobs, limit would execute before sort before changed. So the result would not be expected because limit always cut results first and we can not get what we want. Example: we having export job1 and job2 with JobId1 > JobId2. We want to get job with JobId1 show export from db order by JobId desc limit 1; We do limit 1 first, so we would probably get Job2 because JobId assigned from small to large Solve: We can not cut results first if we have order by clause. And cut result set after sorting	2023-07-13 11:17:28 +08:00
Mryange	cf016f210d	Revert "[imporve](bloomfilter) refactor runtime_filter_mgr with bloomfilter (#21715 )" (#21763 ) This reverts commit 925da90480f60afc0e5333a536d41e004234874e.	2023-07-13 10:44:20 +08:00
Calvin Kirs	2d2beb637a	[enhancement](RoutineLoad)Mutile table support pipeline load (#21678 )	2023-07-13 10:26:46 +08:00
Siyang Tang	e18465eac7	[feature](TVF) support path partition keys for external file TVF (#21648 )	2023-07-13 10:15:55 +08:00
Xiangyu Wang	105a162f94	[Enhancement](multi-catalog) Merge hms events every round to speed up events processing. (#21589 ) Currently we find that MetastoreEventsProcessor can not catch up the event producing rate in our cluster, so we need to merge some hms events every round.	2023-07-12 23:41:07 +08:00
yujun	2e3d15b552	[Feature](doris compose) A tool for setup and manage doris docker cluster scaling easily (#21649 )	2023-07-12 22:13:38 +08:00
YueW	00c48f7d46	[opt](regression case) add more index change case (#21734 )	2023-07-12 21:52:48 +08:00
zhannngchen	7f133b7514	[fix](partial-update) transient rowset writer should not trigger segcompaction when build rowset (#21751 )	2023-07-12 21:47:07 +08:00
amory	be55cb8dfc	[Improve](jsonb_extract) support jsonb_extract multi parse path (#21555 ) support jsonb_extract multi parse path	2023-07-12 21:37:36 +08:00
Xin Liao	da67d08bca	[fix](compile) fix be compile error (#21765 ) * [fix](compile) fix be compile error * remove warning	2023-07-12 21:14:04 +08:00
amory	3163841a3a	[FIX](serde)Fix decimal for arrow serde (#21716 )	2023-07-12 19:15:48 +08:00
Xin Liao	f0d08da97c	[enhancement](merge-on-write) split delete bitmap from tablet meta (#21456 )	2023-07-12 19:13:36 +08:00
Siyang Tang	9d96e18614	[fix](multi-table-load) fix memory leak when processing multi-table routine load (#21611 ) * use naked ptr to prevent loop ref * add comments	2023-07-12 17:32:56 +08:00
minghong	0243c403f1	[refactor](nereids)set session var for bushy join (#21744 ) add session var: MAX_JOIN_NUMBER_BUSHY_TREE, default is 5 if table number is less than MAX_JOIN_NUMBER_BUSHY_TREE in a join cluster, nereids try bushy tree, o.w. zigzag tree	2023-07-12 16:40:48 +08:00
ElvinWei	3b76428de9	[fix](stats) when some stat is NULL, causing an exception during display stats (#21588 ) During manual statistics injection, some statistics may beNULL, causing an exception during display.	2023-07-12 14:57:06 +08:00

1 2 3 4 5 ...

11844 Commits