doris

Author	SHA1	Message	Date
slothever	c68b353017	[feature][insert]add FE UT and support CTAS for external table (#32525 ) 1. add FE ut for create hive table 2. support external CTAS: > source table: ``` mysql> show create table hive.jz3.test; CREATE TABLE `test`( `id` int COMMENT 'col1', `name` string COMMENT 'col2') PARTITIONED BY ( `dt` string, `dtm` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://HDFS8000871/usr/hive/warehouse/jz3.db/test' TBLPROPERTIES ( 'transient_lastDdlTime'='1710837792', 'file_format'='orc') ``` > create unpartitioned target table ``` mysql> create table hive.jz3.ctas engine=hive as select * from hive.jz3.test; mysql> show create table ctas; CREATE TABLE `ctas`( `id` int COMMENT '', `name` string COMMENT '', `dt` string COMMENT '', `dtm` string COMMENT '') ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://HDFS8000871/usr/hive/warehouse/jz3.db/ctas' TBLPROPERTIES ( 'transient_lastDdlTime'='1710860377') ``` > create partitioned target table ``` mysql> create table hive.jz3.ctas1 engine=hive partition by list (dt,dtm) () as select * from hive.jz3.test; mysql> show create table hive.jz3.ctas1; CREATE TABLE `ctas1`( `id` int COMMENT '', `name` string COMMENT '') PARTITIONED BY ( `dt` string, `dtm` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://HDFS8000871/usr/hive/warehouse/jz3.db/ctas1' TBLPROPERTIES ( 'transient_lastDdlTime'='1710919070') ```	2024-04-12 09:58:49 +08:00
slothever	36a1bf1d73	[feature][insert]Adapt the create table statement to the nereids sql (#32458 ) issue: #31442 1. adapt create table statement from doris to hive 2. fix insert overwrite for table sink > The doris create hive table statement: ``` mysql> CREATE TABLE buck2( -> id int COMMENT 'col1', -> name string COMMENT 'col2', -> dt string COMMENT 'part1', -> dtm string COMMENT 'part2' -> ) ENGINE=hive -> COMMENT "create tbl" -> PARTITION BY LIST (dt, dtm) () -> DISTRIBUTED BY HASH (id) BUCKETS 16 -> PROPERTIES( -> "file_format" = "orc" -> ); ``` > generated hive create table statement: ``` CREATE TABLE `buck2`( `id` int COMMENT 'col1', `name` string COMMENT 'col2') PARTITIONED BY ( `dt` string, `dtm` string) CLUSTERED BY ( id) INTO 16 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://HDFS8000871/usr/hive/warehouse/jz3.db/buck2' TBLPROPERTIES ( 'transient_lastDdlTime'='1710840747', 'doris.file_format'='orc') ```	2024-04-12 09:57:37 +08:00
feiniaofeiafei	dc8da9ee89	[Fix](nereids) fix qualifier problem that affects delete stmt in another catalog (#33528 ) * [Fix](nereids) fix qualifier problem that affects delete stmt in another catalog * [Fix](nereids) fix qualifier problem that affects delete stmt in another catalog * [Fix](nereids) fix qualifier problem that affects delete stmt in another catalog * [Fix](nereids) fix qualifier problem that affects delete stmt in another catalog --------- Co-authored-by: feiniaofeiafei <moailing@selectdb.com>	2024-04-11 21:43:01 +08:00
zclllyybb	3d66723214	[branch-2.1](auto-partition) pick auto partition and some more prs (#33523 )	2024-04-11 17:12:17 +08:00
zhannngchen	ff38e7c497	[log](chore) print isBad in Replica::toString() (#33427 )	2024-04-11 09:31:50 +08:00
Jibing-Li	b5a84f7d23	Fix alter column stats without min max value deserialize failure. (#33406 )	2024-04-11 09:31:50 +08:00
Pxl	3070eda58c	[Bug](load) fix stream load file on hll type mv column (#33373 ) fix stream load file on hll type mv column	2024-04-11 09:31:50 +08:00
morrySnow	f35dd3fc35	[chore](test) let some case suitable for legacy planner and nereids (#33352 )	2024-04-11 09:31:50 +08:00
yiguolei	a38b97fbdd	[bugfix](profile) should use backend ip:heartbeat port as key during merge profile (#33368 )	2024-04-11 09:31:50 +08:00
feiniaofeiafei	2708641bee	[Fix]fix insert overwrite non-partition table null pointer exception (#33205 ) fix legacy planner bug when insert overwrite non-partition table.	2024-04-11 09:31:50 +08:00
wangbo	38b2e58d59	[Improvement](executor)cancel query when a query is queued (#33339 )	2024-04-11 09:31:50 +08:00
abmdocrt	326eee5d04	[Fix](schema change) Fix schema change fault when add complex type column (#31824 ) Problem: An error is encountered when executing a schema change on a unique table to add a column with a complex type, such as bitmap, as documented in https://github.com/apache/doris/issues/31365 Reason: The error arises because the schema change logic erroneously applies an aggregation check for new columns across all table types, demanding an explicit aggregation type declaration. However, unique and duplicate tables inherently assume default aggregation types for newly added columns, leading to this misstep. Solution: The schema change process for introducing new columns needs to distinguish between table types accurately. For unique and duplicate tables, it should automatically assign the appropriate aggregation type, which, for the purpose of smooth integration with subsequent processes, should be set to NONE.	2024-04-11 09:31:50 +08:00
Pxl	3081fc584d	[Improvement](runtime-filter) support sync join node build side's size to init bloom runtime filter (#32180 ) support sync join node build side's size to init bloom runtime filter	2024-04-11 09:31:50 +08:00
morrySnow	75f497976c	[opt](Nereids) auto fallback when query unsupport table type (#33357 )	2024-04-10 16:24:13 +08:00
morrySnow	ecbd92204d	[fix](Nereids) variant push down not work on slot without table (#33356 )	2024-04-10 16:23:41 +08:00
TengJianPing	0e262ba0e4	[improvement](spill) improve cancel of spill and improve log printing (#33229 ) * [improvement](spill) improve cancel of spill and improve log printing * fix	2024-04-10 16:23:20 +08:00
谢健	045dd05f2a	[fix](Nereids): don't transpose agg and join if join is mark join (#33312 )	2024-04-10 16:23:20 +08:00
Pxl	6462264e77	[Improvement](materialized-view) adjust priority of materialized view match rule (#33305 ) adjust priority of materialized view match rule	2024-04-10 16:23:04 +08:00
minghong	4eee1a1f0d	[fix](nereids) make runtime filter targets in fixed order (#33191 ) * make runtime filter targets in fixed order	2024-04-10 16:22:39 +08:00
yujun	159ebc76e7	[fix](npe) fix kafka be id npe (#33151 )	2024-04-10 16:22:27 +08:00
meiyi	741d4ff97e	[fix](group commit) Fix syntax error when insert into table which column names contain keyword (#33322 )	2024-04-10 16:22:09 +08:00
meiyi	4079a7b6ab	[fix](txn insert) Fix txn insert into values for sequence column or column name is keyword (#33336 )	2024-04-10 16:21:31 +08:00
Luwei	29777bc3a8	[fix](fe)reduce memory usage in alter (#32810 ) (#33474 ) Co-authored-by: kylinmac <kylinmac@163.com>	2024-04-10 16:04:50 +08:00
morrySnow	d1099852b5	[fix](Nereids) partial update generate column in wrong way (#33326 ) intro by PR #31461	2024-04-10 16:02:54 +08:00
morrySnow	f31e273ae8	[fix](Nereids) variant column prune push down failed on variant literal (#33328 )	2024-04-10 16:02:54 +08:00
morrySnow	93b20f0cc4	[chore](Nereids) create policy always allow fallback (#33226 )	2024-04-10 16:01:58 +08:00
morrySnow	bcc819ddd9	[fix](Nereids) array_range not support amount without unit (#33231 )	2024-04-10 16:01:58 +08:00
morrySnow	b8d4a87703	[chore](Nereids) load command always could fallback (#33233 )	2024-04-10 16:00:53 +08:00
camby	14c5247fb7	[feature](replica) support force set replicate allocation for olap tables (#32916 ) Add a config to force set replication allocation for all OLAP tables and partitions.	2024-04-10 16:00:15 +08:00
Pxl	2092a862fc	[Bug](materialized-view) fix wrong result when salias name same with base slot on mv (#33198 ) fix wrong result when salias name same with base slot on mv	2024-04-10 16:00:05 +08:00
wangbo	2785269d36	[Improvement](executor)Add BypassWorkloadGroup to pass query queue #33101	2024-04-10 15:56:41 +08:00
yiguolei	16f8afc408	[refactor](coordinator) split profile logic and instance report logic (#32010 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2024-04-10 15:51:32 +08:00
morrySnow	96867ff3fd	[fix](Nereids) support update without filter (#33214 )	2024-04-10 15:26:09 +08:00
Xinyi Zou	b696909775	[fix](plsql) Fix plsql variable initialization (#33186 )	2024-04-10 15:26:09 +08:00
morrySnow	edd1701963	[fix](Nereids) convert agg state type failed in some cases (#33208 )	2024-04-10 15:26:09 +08:00
feiniaofeiafei	5e59c09a60	[Fix](nereids) modify the binding aggregate function in order by (#32758 ) modify the bind logical to make the order by has same behavior with mysql when sort child is aggregate. when an order by Expr has aggregate function, all slots in this order by Expr should bind the LogicalAggregate non-AggFunction outputs first, then bind the LogicalAggregate Child e.g. select 2*abs(sum(c1)) as c1, c1,sum(c1)+c1 from t_order_by_bind_priority group by c1 order by sum(c1)+c1 asc; in this sql, the two c1 in order by all bind to the c1 in t_order_by_bind_priority	2024-04-10 15:26:09 +08:00
feiniaofeiafei	67bb519613	[Fix](nereids) forward the user define variables to master (#33013 )	2024-04-10 15:26:08 +08:00
LiBinfeng	6798a24a27	[Enhencement](Nereids) reduce child output rows if agg child is literal (#32188 ) with group by: select max(1) from t1 group by c1; -> select 1 from (select c1 from t1 group by c1); without group by: select max(1) from t1; -> select max(1) from (select 1 from t1 limit 1) tmp;	2024-04-10 15:26:08 +08:00
zhangdong	0ab8b57db7	[enhance](mtmv)support create mtmv with other mtmv (#32984 )	2024-04-10 15:26:08 +08:00
Mingyu Chen	77ad3f6a19	[feature](hive)Get updated information from coordinate and commit (#32441 ) (#33466 ) issue: #31442 1. Get updated information from coordinate and commit 2. refresh table after commit Co-authored-by: wuwenchi <wuwenchihdu@hotmail.com>	2024-04-10 15:07:18 +08:00
924060929	9bc7902e5a	[fix](Nereids) fix bind group by int literal (#33117 ) This sql will failed because 2 in the group by will bind to 1 as col2 in BindExpression ResolveOrdinalInOrderByAndGroupBy will replace 1 to MIN (LENGTH (cast(age as varchar))) CheckAnalysis will throw an exception because group by can not contains aggregate function select MIN (LENGTH (cast(age as varchar))), 1 AS col2 from test_bind_groupby_slots group by 2 we should move ResolveOrdinalInOrderByAndGroupBy into BindExpression (cherry picked from commit 3fab4496c3fefe95b4db01f300bf747080bfc3d8)	2024-04-10 14:59:46 +08:00
924060929	cc363f26c2	[fix](Nereids) fix group concat (#33091 ) Fix failed in regression_test/suites/query_p0/group_concat/test_group_concat.groovy select group_concat( distinct b1, '?'), group_concat( distinct b3, '?') from table_group_concat group by b2 exception: lowestCostPlans with physicalProperties(GATHER) doesn't exist in root group The root cause is '?' is push down to slot by NormalizeAggregate, AggregateStrategies treat the slot as a distinct parameter and generate a invalid PhysicalHashAggregate, and then reject by ChildOutputPropertyDeriver. I fix this bug by avoid push down literal to slot in NormalizeAggregate, and forbidden generate stream aggregate node when group by slots is empty	2024-04-10 14:59:46 +08:00
924060929	38d580dfb7	[fix](Nereids) fix link children failed (#33134 ) #32617 introduce a bug: rewrite may not working when plan's arity >= 3. this pr fix it (cherry picked from commit 8b070d1a9d43aa7d25225a79da81573c384ee825)	2024-04-10 14:59:45 +08:00
924060929	ff990eb869	[enhancement](Nereids) refactor expression rewriter to pattern match (#32617 ) this pr can improve the performance of the nereids planner, in plan stage. 1. refactor expression rewriter to pattern match, so the lots of expression rewrite rules can criss-crossed apply in a big bottom-up iteration, and rewrite until the expression became stable. now we can process more cases because original there has no loop, and sometimes only process the top expression, like `SimplifyArithmeticRule`. 2. replace `Collection.stream()` to `ImmutableXxx.Builder` to avoid useless method call 3. loop unrolling some codes, like `Expression.<init>`, `PlanTreeRewriteBottomUpJob.pushChildrenJobs` 4. use type/arity specified-code, like `OneRangePartitionEvaluator.toNereidsLiterals()`, `PartitionRangeExpander.tryExpandRange()`, `PartitionRangeExpander.enumerableCount()` 5. refactor `ExtractCommonFactorRule`, now we can extract more cases, and I fix the deed loop when use `ExtractCommonFactorRule` and `SimplifyRange` in one iterative, because `SimplifyRange` generate right deep tree, but `ExtractCommonFactorRule` generate left deep tree 6. refactor `FoldConstantRuleOnFE`, support visitor/pattern match mode, in ExpressionNormalization, pattern match can criss-crossed apply with other rules; in PartitionPruner, visitor can evaluate expression faster 7. lazy compute and cache some operation 8. use int field to compare date 9. use BitSet to find disableNereidsRules 10. two level loop usually faster then build Multimap when bind slot in Scope, so I revert the code 11. `PlanTreeRewriteBottomUpJob` don't need to clearStatePhase any more ### test case 100 threads parallel continuous send this sql which query an empty table, test in my mac machine(m2 chip, 8 core), enable sql cache ```sql select count(1),date_format(time_col,'%Y%m%d'),varchar_col1 from tbl where partition_date>'2024-02-15' and (varchar_col2 ='73130' or varchar_col3='73130') and time_col>'2024-03-04' and time_col<'2024-03-05' group by date_format(time_col,'%Y%m%d'),varchar_col1 order by date_format(time_col,'%Y%m%d') desc, varchar_col1 desc,count(1) asc limit 1000 ``` before this pr: 3100 peak QPS, about 2700 avg QPS after this pr: 4800 peak QPS, about 4400 avg QPS (cherry picked from commit 7338683fdbdf77711f2ce61e580c19f4ea100723)	2024-04-10 14:59:45 +08:00
zclllyybb	c61d6ad1e2	[Feature] support function uuid_to_int and int_to_uuid #33005	2024-04-10 14:53:56 +08:00
zhiqiang	bf022f9d8d	[enhancement](function truncate) truncate can use column as scale argument (#32746 ) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2024-04-10 14:53:56 +08:00
Lei Zhang	a69f3eb870	[fix](fe) partitionInfo is null, fe can not start (#33108 )	2024-04-10 14:53:56 +08:00
LiBinfeng	02b24abed2	[Fix](Nereids) ntile function should check argument (#32994 ) Problem: when ntile using 0 as parameter, be would core because no checking of parameter Solved: check parameter in fe analyze	2024-04-10 14:53:56 +08:00
minghong	a7c8abe58c	[feature](nereids) support common sub expression by multi-layer projections (fe part) (#33087 ) * cse fe part	2024-04-10 14:53:56 +08:00
超威老仲	b0b5f84e40	[feature](load) support compressed JSON format data for broker load (#30809 )	2024-04-10 14:20:53 +08:00

... 3 4 5 6 7 ...

8289 Commits