doris

Author	SHA1	Message	Date
Nitin-Kashyap	cd70f45ce2	[test](ut) added UT cases for show build index (#29561 ) Added UT case for show build index flow	2024-01-25 13:24:09 +08:00
koarz	ca5a314765	[fix](function) make STRLEFT and STRRIGHT and SUBSTR function DEPEND_ON_ARGUMENT (#28352 ) make STRLEFT and STRRIGHT function DEPEND_ON_ARGUMENT	2024-01-25 13:23:59 +08:00
zy-kkk	88fdb2677d	[fix](catalog) fix Uninitialized connection pool parameters in hmsjdbcclient (#30262 )	2024-01-25 13:23:59 +08:00
seawinde	2f68aac885	[Improvement](Nereids) Support to query rewrite by materialized view when join input has aggregate (#30230 ) Support to query rewrite by materialized view when join input has aggregate, the aggregate should be simple For example as following: The materialized view def is > select > l_linenumber, > count(distinct l_orderkey), > sum(case when l_orderkey in (1,2,3) then l_suppkey * l_linenumber else 0 end), > max(case when l_orderkey in (4, 5) then (l_quantity 2 + part_supp_a.qty_max) 0.88 else 100 end), > avg(case when l_partkey in (2, 3, 4) then l_discount + o_totalprice + part_supp_a.qty_sum else 50 end) > from lineitem > left join orders on l_orderkey = o_orderkey > left join > (select ps_partkey, ps_suppkey, sum(ps_availqty) qty_sum, max(ps_availqty) qty_max, > min(ps_availqty) qty_min, > avg(ps_supplycost) cost_avg > from partsupp > group by ps_partkey,ps_suppkey) part_supp_a > on l_partkey = part_supp_a.ps_partkey > and l_suppkey = part_supp_a.ps_suppkey > group by l_linenumber; when query is like following, it can be rewritten by mv above > select > l_linenumber, > sum(case when l_orderkey in (1,2,3) then l_suppkey * l_linenumber else 0 end), > avg(case when l_partkey in (2, 3, 4) then l_discount + o_totalprice + part_supp_a.qty_sum else 50 end) > from lineitem > left join orders on l_orderkey = o_orderkey > left join > (select ps_partkey, ps_suppkey, sum(ps_availqty) qty_sum, max(ps_availqty) qty_max, > min(ps_availqty) qty_min, > avg(ps_supplycost) cost_avg > from partsupp > group by ps_partkey,ps_suppkey) part_supp_a > on l_partkey = part_supp_a.ps_partkey > and l_suppkey = part_supp_a.ps_suppkey > group by l_linenumber;	2024-01-25 13:23:59 +08:00
Nitin-Kashyap	f85b04c2c6	[fix](datatype) fixed decimal type implicit cast handling in BinaryPredicate (#30181 )	2024-01-25 13:23:12 +08:00
nanfeng	c7360fd014	[feature](function) support ip function named ipv4_cidr_to_range(addr, cidr) (#29819 ) * support ip function ipv4_cidr_to_range * fix ipv4_cidr_to_range function only support ipv4 type	2024-01-24 10:02:03 +08:00
Jack Drogon	dde5ed5231	[fix](fe-memory) Fix fe schema change high memory usage (#30231 ) Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>	2024-01-24 10:02:03 +08:00
starocean999	2b3e7589b7	[fix](nereids)group by expr may be lost in EliminateGroupByConstant rule (#30274 )	2024-01-24 10:01:14 +08:00
minghong	4af3fd2a2e	[fix](Nereids) fix bug in case-when/if stats estimation (#30265 )	2024-01-24 10:00:25 +08:00
ZhangJian He	72f4e7e2d1	[security] Don't print token (#30227 )	2024-01-24 09:59:45 +08:00
zhangdong	b98acf2d90	[fix](mtmv)mtmv default open enableNereidsDML #30235	2024-01-24 09:59:45 +08:00
zxealous	4cbacb5b39	[enhancement](recover) Support skipping bad tablet in select by session variable (#30241 ) In some scenarios, user has a huge amount of data and only a single replica was specified when creating the table, if one of the tablet is damaged, the table will not be able to be select. If the user does not care about the integrity of the data, they can use this variable to temporarily skip the bad tablet for querying and load the remaining data into a new table.	2024-01-24 09:59:43 +08:00
jakevin	1b9f1f6483	[feature](Planner): Push down TopNDistinct through Join (#30216 ) Push down TopNDistinct through Outer/Cross Join	2024-01-24 09:59:13 +08:00
谢健	f4a10c3fbc	[enhancement](Nereids): add builder for hyper graph (#30061 )	2024-01-24 09:58:31 +08:00
morrySnow	9a8bcf2b1b	[fix](planner) row policy rewriter generate wrong plan on join table ref (#30233 )	2024-01-23 14:11:54 +08:00
Xujian Duan	2499ca6d89	[Enhancement](plan) Optimize preagg for aggregate function (#28886 )	2024-01-23 13:22:14 +08:00
Pxl	1e74ad3f3b	[Feature](materialized-view) support predicate apprear both on key and value mv column (#30215 ) support predicate apprear both on key and value mv column	2024-01-23 13:22:14 +08:00
Gabriel	0e5d56fc2e	[pipelineX](fix) Fix use-after-free MultiCastSourceDependency (#30199 )	2024-01-23 13:22:14 +08:00
zhangdong	510d88f315	[fix](mtmv)return MTMV with at least one available partition #30156	2024-01-23 10:12:37 +08:00
HHoflittlefish777	32c5153999	[fix](routine-load) pause job when json path is invalid #30197 If jsonpaths is set wrong, routine load job will report error but running all time.For example: CREATE ROUTINE LOAD jobName ON tableName PROPERTIES ( "format" = "json", "max_batch_interval" = "5", "max_batch_rows" = "300000", "max_batch_size" = "209715200", "jsonpaths" = "[\'t\',\'a\']" ) FROM KAFKA ( "kafka_broker_list" = "$IP:PORT", "kafka_topic" = "XXX", "property.kafka_default_offsets" = "OFFSET_BEGINNING" ); Jsonpaths ['t','a'] is invalid, but job will running all time.	2024-01-23 10:12:37 +08:00
meiyi	9c742d46a2	[fix](group commit) abort txn should use label if replay wal failed (#30219 )	2024-01-23 10:12:35 +08:00
wangbo	9e0c518aaf	[Feature](executor)Workload Group support Non-Pipeline Execution (#30164 )	2024-01-23 10:11:25 +08:00
morrySnow	ce47354d59	[fix](Nereids) result nullable of sum distinct in scalar agg is wrong (#30221 )	2024-01-23 10:09:54 +08:00
yangshijie	d5d0e5e611	[feature](function) support ip functions named to_ipv4[or_default, or_null](string) and to_ipv6[or_default, or_null](string) (#29838 )	2024-01-23 10:09:54 +08:00
xy	45f6cba837	[fix](Nereids) Fixed a bug where the execution plan was incorrect after ddl (#30107 ) should only compare column name when generate data dist info of PhysicalOlapScan Co-authored-by: xingying01 <xingying01@corp.netease.com>	2024-01-23 10:09:54 +08:00
morrySnow	8061597f2a	[fix](Nereids) nullable not adjust in output exprs in result sink node (#30206 )	2024-01-23 10:09:54 +08:00
Calvin Kirs	5c43708d92	[Fix](Job)Incorrect task query result of insert type (#30024 ) - IdToTask has no persistence, so the queried task will be lost once it is restarted. - The cancel task does not update metadata after being removed from the running task. - tvf displays an error when some fields in the query task result are empty - cycle scheduling job should not be STOP when task fail	2024-01-23 10:09:54 +08:00
starocean999	24c0900b41	[fix](planner) should return outputTupleDesc's id instead of tupleIds if outputTupleDesc is set in Plan Node (#30150 )	2024-01-23 10:09:54 +08:00
zhiqiang	e5dea910bf	[feature](bitwise function) bit_count/bit_shift_left/bit_shift_right implementation (#30046 )	2024-01-23 10:09:54 +08:00
Jibing-Li	62a46876b6	[improvement](statistics) Optimize drop stats operation (#30144 ) Before, drop stats operation need to call columns * followers times of isMaster() function and the same times of rpc to drop remote column stats. This pr is to reduce the rpc calls and use more efficient way to check master node instead of using isMaster()	2024-01-23 10:09:54 +08:00
HHoflittlefish777	3e73933857	[fix](routineload) check offset when schedule tasks (#30136 )	2024-01-23 10:09:54 +08:00
HHoflittlefish777	d0dd090458	[fix](routine-load) optimize error msg when meet out of range (#30118 )	2024-01-23 10:09:54 +08:00
seawinde	9a58cacf0f	[Improvement](nereids) Make sure to catch and record exception for every materialization context (#29953 ) 1. Make sure instance when change params of StructInfo,Predicates. 2. Catch and record exception for every materialization context, this make sure that if throw exception when one materialization context rewrite, it will not influence others. 3. Support to mv rewrite when hava count function when aggregate without group by	2024-01-23 10:09:54 +08:00
jakevin	ad1c19bd65	[refactor](Nereids): Eager Aggregation unify pushdown agg function (#30142 )	2024-01-23 10:09:54 +08:00
Chester	dfde10d4c8	[improvement](function) switch inet(6)_aton alias origin function (#30196 )	2024-01-23 10:09:54 +08:00
lihangyu	4480f751e6	[Improve](Variant) support implicit cast to numeric and string type (#30029 )	2024-01-23 10:09:54 +08:00
minghong	332b9cb619	[opt](nereids) do not change RuntimeFilter Type from IN-OR_BLOOM to BLOOM on broadcast join (#30148 ) 1. do not change RuntimeFilter Type from IN-OR_BLOOM to BLOOM on broadcast join tpcds1T, q48 improved from 4.x sec to 1.x sec 2. skip some redunant runtime filter example: A join B on A.a1=B.b and A.a1 = A.a2 RF B.b->(A.a1, A.a2) however, RF(B.b->A.a2) is implied by RF(B.a->A.a1) and A.a1=A.a2 we skip RF(B.b->A.a2) Issue Number: close #xxx	2024-01-23 10:07:51 +08:00
meiyi	1bb1d35f70	[fix](group commit) Fix some group commit case (#30132 )	2024-01-23 10:07:51 +08:00
Chester	ead3b4ac1d	[feature](function) support ip function is_ipv4_compat, is_ipv4_mapped (#29954 )	2024-01-23 10:07:51 +08:00
Gabriel	caf7790797	[pipelineX](filescan) Support parallel executing for external table scanning (#30121 )	2024-01-23 10:06:59 +08:00
Xiangyu Wang	8a75da0fec	[enhance-wip](multi-catalog) Speed up consume rate of hms events. (#27666 ) ## Proposed changes The current implement will persist all catalogs/databases of external catalogs, and only the master FE can handle hms events and make all slave nodes replay these events, this will bring some problems: - The hms event processor ( `MetastoreEventsProcessor` ) can not consume the events in time. (Add journal log is a synchronized method, we can not speed up the consume rate by using concurrent processing, and each add-journal-log operation costs about tens of milliseconds) So the meta info of hive maybe out of date. - Slave FE nodes maybe crashed if FE replays the journal logs of hms events failed. (In fact we have fixed some issues about this, but we can not make sure all the issues have been resolved) - There are many journal logs which are produced by hms events, but in fact these logs are not used anymore after FE restart. It makes the start time of all FE nodes very long. Now doris try to persis all databases/tables of external catalogs just to make sure that the dbId/tableId of databases/tables are the same through all FE nodes, it will be used by analysis jobs. In this pr, we use a meta id manager called `ExternalMetaIdMgr` to manage these meta ids. On every loop when master fetches a batch of hms events, it handles the meta ids first and produce only one meta id mappings log, slave FE nodes will replay this log to sync the changes about these meta ids. `MetastoreEventsProcessor` will start on every FE nodes and try to consume these hms events as soon as possible. ## Further comments I've submitted two prs ( #22869 #21589 ) to speed up the consume rate of hms events before, it works fine when there are many `AlterTableEvent` / `DropTableEvent` on hive cluster. But the improvement is not that significant when most of hms events are partition-events. Unfortunately, we performed a cluster upgrade (upgrade spark 2.x to spark 3.x), maybe this is the reason that resulting in the majority of Hive Metastore events became partition-events. This is also the reason for the existence of this pull request. Based on our observation, after merging this pull request, Doris is now capable of processing thousands of Hive Metastore events per second, compared to the previous capability of handling only a few dozen events. ```java 2023-12-07 05:17:03,518 INFO (replayer\|105) [Env.replayJournal():2614] replayed journal id is 18287902, replay to journal id is 18287903 2023-12-07 05:17:03,735 INFO (org.apache.doris.datasource.hive.event.MetastoreEventsProcessor\|37) [MetastoreEventFactory.mergeEvents():188] Event size on catalog [xxx] before merge is [1947], after merge is [1849] 2023-12-07 05:17:03,735 INFO (org.apache.doris.datasource.hive.event.MetastoreEventsProcessor\|37) [MetastoreEvent.infoLog():193] EventId: 357955309 EventType: ALTER_PARTITION catalogName:[xxx],dbName:[xxx],tableName:[xxx],partitionNameBefore:[partitions=2022-05-27],partitionNameAfter:[partitions=2022-05-27] 2023-12-07 05:17:03,735 INFO (org.apache.doris.datasource.hive.event.MetastoreEventsProcessor\|37) [MetastoreEvent.infoLog():193] EventId: 357955310 EventType: ALTER_PARTITION catalogName:[xxx],dbName:[xxx],tableName:[xxx],partitionNameBefore:[pday=20230318],partitionNameAfter:[pday=20230318] 2023-12-07 05:17:03,735 INFO (org.apache.doris.datasource.hive.event.MetastoreEventsProcessor\|37) [MetastoreEvent.infoLog():193] EventId: 357955311 EventType: ALTER_PARTITION catalogName:[xxx],dbName:[xxx],tableName:[xxx],partitionNameBefore:[pday=20190826],partitionNameAfter:[pday=20190826] 2023-12-07 05:17:03,735 INFO (org.apache.doris.datasource.hive.event.MetastoreEventsProcessor\|37) [MetastoreEvent.infoLog():193] EventId: 357955312 EventType: ALTER_PARTITION catalogName:[xxx],dbName:[xxx],tableName:[xxx],partitionNameBefore:[partitions=2021-09-16],partitionNameAfter:[partitions=2021-09-16] 2023-12-07 05:17:03,735 INFO (org.apache.doris.datasource.hive.event.MetastoreEventsProcessor\|37) [MetastoreEvent.infoLog():193] EventId: 357955314 EventType: ALTER_PARTITION catalogName:[xxx],dbName:[xxx],tableName:[xxx],partitionNameBefore:[partitions=2020-04-26],partitionNameAfter:[partitions=2020-04-26] 2023-12-07 05:17:03,735 INFO (org.apache.doris.datasource.hive.event.MetastoreEventsProcessor\|37) [MetastoreEvent.infoLog():193] EventId: 357955315 EventType: ALTER_PARTITION catalogName:[xxx],dbName:[xxx],tableName:[xxx],partitionNameBefore:[pday=20230702],partitionNameAfter:[pday=20230702] 2023-12-07 05:17:03,735 INFO (org.apache.doris.datasource.hive.event.MetastoreEventsProcessor\|37) [MetastoreEvent.infoLog():193] EventId: 357955317 EventType: ALTER_PARTITION catalogName:[xxx],dbName:[xxx],tableName:[xxx],partitionNameBefore:[pday=20211019],partitionNameAfter:[pday=20211019] ... 2023-12-07 05:17:03,989 INFO (org.apache.doris.datasource.hive.event.MetastoreEventsProcessor\|37) [MetastoreEvent.infoLog():193] EventId: 357957252 EventType: ALTER_PARTITION catalogName:[xxx],dbName:[xxx],tableName:[xxx],partitionNameBefore:[partitions=2021-08-27],partitionNameAfter:[partitions=2021-08-27] 2023-12-07 05:17:03,989 INFO (org.apache.doris.datasource.hive.event.MetastoreEventsProcessor\|37) [MetastoreEvent.infoLog():193] EventId: 357957253 EventType: ALTER_PARTITION catalogName:[xxx],dbName:[xxx],tableName:[xxx],partitionNameBefore:[partitions=2022-02-05],partitionNameAfter:[partitions=2022-02-05] 2023-12-07 05:17:04,661 INFO (replayer\|105) [Env.replayJournal():2614] replayed journal id is 18287903, replay to journal id is 18287904 2023-12-07 05:17:05,028 INFO (org.apache.doris.datasource.hive.event.MetastoreEventsProcessor\|37) [MetastoreEventsProcessor.realRun():116] Events size are 587 on catalog [xxx] 2023-12-07 05:17:05,662 INFO (org.apache.doris.datasource.hive.event.MetastoreEventsProcessor\|37) [MetastoreEventFactory.mergeEvents():188] Event size on catalog [xxx] before merge is [587], after merge is [587] ```	2024-01-23 10:06:44 +08:00
minghong	ddeed079d4	[opt](Nereids)make orToIn rule appliable to in-pred (#29990 ) make orToIn rule appliable to in-pred	2024-01-19 15:48:56 +08:00
yangshijie	97b2a3b993	[improvement](ip function) refactor some ip functions and remove dirty codes (#30080 )	2024-01-19 15:48:56 +08:00
Jibing-Li	668a68967c	[fix](statistics)Reanalyze olapTable if getRowCount is not 0 and last time row count is 0 (#30096 ) Sample analyze may write 0 result if getRowCount is not updated while analyzing. So we need to reanalyze the table if getRowCount > 0 and previous analyze row count is 0. Otherwise the stats for this table may stay 0 for ever before user load new data to this table.	2024-01-19 15:48:56 +08:00
Calvin Kirs	ad111be2d1	[Fix](Show-Delete)Missing Delete job information causes query exception (#30092 )	2024-01-19 15:48:56 +08:00
jakevin	097641b543	[fix](Nereids): fix AssertNumRows StatsCalculator (#30053 )	2024-01-19 15:48:15 +08:00
谢健	f1462f6cf4	[fix](Nereids): eliminate redundant join condition after inferring condition (#30093 ) eliminate redundant join when find hashing join condition such as for plan: ``` T1 join T2 on T1.id = T2.id join T3 on T1.id = T3.id and T2.id = T3.id ``` we infer a new predicate T1.id = T2.id which is redundant. Therefore we need to eliminate it when find hash condition	2024-01-19 15:48:15 +08:00
morrySnow	7d3a3fee65	[fix](Nereids) update assignment column name should case insensitive (#30071 )	2024-01-19 15:48:15 +08:00
Pxl	2ccb69dbed	[Feature](materialized-view) support some case unmached to materialized-view (#30036 ) same column appears in key and value like select id,count(id) group by id; complex expr in sum select sum(if(xxx));	2024-01-18 12:03:07 +08:00
zy-kkk	0ccd706a30	[Enhancement](Jdbc Catalog) Map Jdbc Catalog JSON Type to String for Improved Performance and Compatibility (#30035 ) This PR proposes mapping external catalog JSON types to String instead of JsonB in Apache Doris. This change is motivated by the realization that JDBC retrieves JSON data as a String JSON string, regardless of its storage format (Json(String) or Json(Binary)). Mapping to String streamlines data retrieval, simplifies write-backs, and ensures compatibility with all JSON(String) and JSON(Binary) functions, despite potentially misleading displays of JSON data as Strings in Doris. This approach avoids the performance overhead and complexity of converting each row of data from JsonB to String, making the process more efficient and elegant. About Upgrade To ensure query compatibility with existing Catalogs in the upgraded version,we currently still retain the capability to query external JSON types as JSONB. However, once you upgrade to the new version and either refresh the Catalog or create a new one, all external JSON types will be treated as Strings. To ensure consistent behavior,and possible future removal of support for JSON as JSONB query code, it is highly recommended that you manually refresh your Catalog as soon as possible after upgrading to the new version.	2024-01-18 12:03:07 +08:00

1 2 3 4 5 ...

5928 Commits