doris

Author	SHA1	Message	Date
TengJianPing	79289e32dc	[fix](cast) fix wrong result of casting empty string to array date (#22281 )	2023-07-30 21:15:03 +08:00
Jibing-Li	03761c37cd	[Improvement](multi catalog) Support Iceberg, Paimon and MaxCompute table in nereids. (#22338 )	2023-07-29 21:43:35 +08:00
Mryange	47c2cc5c74	[vectorized](udf) java udf support with return map type (#22300 )	2023-07-29 12:52:27 +08:00
daidai	ae8a26335c	[opt](hive)opt select count() stmt push down agg on parquet in hive . (#22115 ) Optimization "select count() from table" stmtement , push down "count" type to BE. support file type : parquet ，orc in hive . 1. 4kfiles , 60kwline num before: 1 min 37.70 sec after: 50.18 sec 2. 50files , 60kwline num before: 1.12 sec after: 0.82 sec	2023-07-29 00:31:01 +08:00
zhannngchen	53d255f482	[fix](partial update) remove CHECK on illegal number of partial columns (#22319 )	2023-07-28 23:11:58 +08:00
xzj7019	f7c106c709	[opt](nereids) enhance broadcast join cost calculation (#22092 ) Enhance broadcast join cost calculation, by considering both the build side effort from building bigger hash table, and more probe side effort from bigger cost of ProbeWhenBuildSideOutput and ProbeWhenSearchHashTable, if parallel_fragment_exec_instance_num is more than 1. Current solution gives a penalty factor on rightRowCount, and the factor is the total instance number to the power of 2. Penalty on outputRows is not taken currently and will be refined in next generation cost model. Also brings some update for shape checking: update original control variable in shape file parallel_fragment_exec_instance_num to parallel_pipeline_task_num, if pipeline is enabled. fix a be_number variable inactive issue.	2023-07-28 23:06:02 +08:00
Kaijie Chen	2f43e59535	[test](regression) add partial update seq_col delete cases (#22340 )	2023-07-28 17:36:55 +08:00
HHoflittlefish777	05abfbc5ef	[improvement](regression-test) add compression algorithm regression test (#22303 )	2023-07-28 17:28:52 +08:00
starocean999	5a0ad09856	[fix](nereids) SubqueryToApply may lost conjunct (#22262 ) consider sql: ``` SELECT * FROM sub_query_correlated_subquery1 t1 WHERE coalesce(bitand( cast( (SELECT sum(k1) FROM sub_query_correlated_subquery3 ) AS int), cast(t1.k1 AS int)), coalesce(t1.k1, t1.k2)) is NULL ORDER BY t1.k1, t1.k2; ``` is Null conjunct is lost in SubqueryToApply rule. This pr fix it	2023-07-28 15:08:56 +08:00
bobhan1	0c734a861e	[Enhancement](delete) eliminate reading the old values of non-key columns for delete stmt (#22270 )	2023-07-28 14:37:33 +08:00
morrySnow	5da5fac37a	[refactor](Nereids) add result sink node (#22254 ) use ResultSink as query root node to let plan of query statement has the same pattern with insert statement	2023-07-28 11:31:09 +08:00
zhangy5	adc44d9f46	[regression-test] add list partition case and multi partition keys case (#22042 ) * [regression-test] add list partition case and multi partition keys case * fix delete failed	2023-07-28 10:12:35 +08:00
Ashin Gau	0d7d9b92db	[fix](multi-catalog) complex types parsing failed, with unexpected nulls and rows (#22228 ) Fix tow bugs: 1. Unexpected null values in array column. If 65535 consecutive values are not null in nullable array column, this error will be triggered. The reason is that the array parser did not handle boundary conditions. 2. The number of rows of key filed, and that of value field in map column are not equal. Similarly, the number of rows among fields in struct column are not the same. This would be triggered when the number of rows are not equal among parquet pages of different columns in a row group.	2023-07-28 10:03:08 +08:00
Qi Chen	8caa5a9ba4	[Fix](mutli-catalog) Fix null partitions error in iceberg tables. (#22185 ) ### Issue when partition has null partitions, it throws error `Failed to fill partition column: t_int=null` ### Resolution - Fix the following null partitions error in iceberg tables by replacing null partition to '\N'. - Add regression test for hive null partition.	2023-07-27 23:57:35 +08:00
Jerry Hu	b5fa29e138	[fix](bitmap) incorrect result of function 'bitmap_from_array' (#22305 )	2023-07-27 22:44:06 +08:00
谢健	716d58f5ff	[fix](Nereids) decimal divide should not return null if numerator is zero (#22309 )	2023-07-27 20:23:04 +08:00
Jibing-Li	a87d34b19b	[Fix](multi catalog statistics)Improve external table statistics collection (#22224 ) Improve external table statistics collection, including log, observability and fix some bugs. 1. Add Running state for statistics job. 2. Add progress for show analyze job. (n/m tasks finished, n/m task failed and so on) 3. Add analyze time cost for show analyze task. 4. Make task failure message more clear. 5. Synchronize the job status updating code in updateTaskStatus. 6. Fix NPE in HMSAnalyzeTask. (Avoid refreshing statistics cache if the collection sql failed) 7. Return error message for with sync collection while timeout. 8. Log level improvement 9. Fix misuse of logCreateAnalysisJob for tasks.	2023-07-27 20:01:14 +08:00
morrySnow	ae5e39ad26	[opt](Nereids) add double signature back for round like function (#22284 ) add double signature back for round like function	2023-07-27 19:10:43 +08:00
lsy3993	6f1c03c766	[fix](jdbc_catalog) fix int and bigint in mysql view when use doris catalog (#22251 )	2023-07-27 16:50:42 +08:00
Kaijie Chen	0512e0b168	[test](regression) add cases for partial update with sequence_type (#22215 )	2023-07-27 15:51:01 +08:00
lsy3993	4f6a3c5bf0	[feature](catalog) support clob type in oracle jdbc catalog (#21532 )	2023-07-27 15:49:15 +08:00
zhangstar333	ddfdf62993	[opt](planner) support to parse scientific notation(aEb) (#22248 )	2023-07-27 13:31:37 +08:00
wuwenchi	41a230b721	[fix] iceberg catalog to specify the version and time (#22209 ) problem: 1. create a iceberg_type catalog: 2. use iceberg catalog to specify verison ``` mysql> show catalog iceberg; +----------------------+--------------------------+ \| Key \| Value \| +----------------------+--------------------------+ \| type \| iceberg \| \| iceberg.catalog.type \| hms \| \| hive.metastore.uris \| thrift://127.0.0.1:9083 \| \| hadoop.username \| hadoop \| \| create_time \| 2023-07-25 16:51:00.522 \| +----------------------+--------------------------+ 5 rows in set (0.02 sec) mysql> select * from iceberg.iceberg_db.tb1 FOR VERSION AS OF 8783036402036752909; ERROR 5090 (42000): errCode = 2, detailMessage = Only iceberg/hudi external table supports time travel in current version ``` change: Add `ICEBERG_EXTERNAL_TABLE` type for specify the version and time	2023-07-27 12:04:41 +08:00
zy-kkk	619a2857e1	[improvement](jdbc catalog) improve mysql jdbc catalog read bytea`s types & else improve (#22233 )	2023-07-27 10:18:37 +08:00
Gabriel	341c45974c	[round](decimalv2) round precise decimalv2 value (#22258 )	2023-07-27 10:00:36 +08:00
Xinyi Zou	163a38a527	[opt](Nereids) support sql cache (#22144 ) 1. let Nereids support sql cache 2. let legacy planner's sql cache supports union all	2023-07-27 09:57:31 +08:00
Siyang Tang	8fb28ecc9e	[test](partial-update) add some cases for partial-update (#22210 )	2023-07-27 09:52:40 +08:00
HHoflittlefish777	dcd6844ea5	[improvement](regression-test) add partial update with schema change case (#22213 )	2023-07-27 09:51:42 +08:00
zhangstar333	fb41265c27	[opt](Nereids) add boolean type signature for sum aggregate function (#21959 )	2023-07-27 09:41:19 +08:00
TengJianPing	8ff487cc4b	[fix](cast) fix invalid value error when casting null date value to string then casting to date value (#22223 )	2023-07-26 17:59:01 +08:00
morrySnow	14dcc53135	[fix](Nereids) cast time should turn nullable on all valid types (#22242 ) valid types to cast to time/timev2: - TINYINT - SMALLINT - INT - BIGINT - LARGEINT - FLOAT - DOUBLE - CHAR - VARCHAR - STRING	2023-07-26 17:56:19 +08:00
bobhan1	be69025878	[opt](Nereids) add partial update support for delete stmt (#22184 ) Currently, the new optimizer don't consider anything about partial update. This PR add the ability to convert a delete statement to a partial update insert statement for merge-on-write unique table	2023-07-26 17:34:31 +08:00
jakevin	bb67a1467a	[fix](Nereids): mergeGroup should merge target Group into existed Group (#22123 )	2023-07-26 13:13:25 +08:00
morrySnow	21a3593a9a	[fix](Nereids) translate failed when enable topn two phase opt (#22197 ) 1. should not add rowid slot to reslovedTupleExprs 2. should set notMaterialize to sort's tuple when do two phase opt	2023-07-26 11:38:50 +08:00
zy-kkk	cf677b327b	[fix](jdbc catalog) Fixed mappings with type errors for bool and tinyint(1) (#22089 ) First of all, mysql does not have a boolean type, its boolean type is actually tinyint(1), in the previous logic, We force tinyint(1) to be a boolean by passing tinyInt1isBit=true, which causes an error if tinyint(1) is not a 0 or 1, Therefore, we need to match tinyint(1) according to tinyint instead of boolean, and this change will not affect the correctness of where k = 1 or where k = true queries	2023-07-25 22:45:22 +08:00
zhengyu	5c8eda8685	[enhencement](regression) add UPDATE & DELETE tests for MOW partial update (#22212 )	2023-07-25 22:03:38 +08:00
airborne12	fc2b9db0ad	[Feature](inverted index) add tokenize function for inverted index (#21813 ) In this PR, we introduce TOKENIZE function for inverted index, it is used as following: ``` SELECT TOKENIZE('I love my country', 'english'); ``` It has two arguments, first is text which has to be tokenized, the second is parser type which can be english, chinese or unicode. It also can be used with existing table, like this: ``` mysql> SELECT TOKENIZE(c,"chinese") FROM chinese_analyzer_test; +---------------------------------------+ \| tokenize(`c`, 'chinese') \| +---------------------------------------+ \| ["来到", "北京", "清华大学"] \| \| ["我爱你", "中国"] \| \| ["人民", "得到", "更", "实惠"] \| +---------------------------------------+ ```	2023-07-25 15:05:35 +08:00
mch_ucchi	d96e31c4d7	[opt](Nereids) not push down global limit to avoid early gather (#21891 ) the global limit will create a gather action, and all the data will be calculated in one instance. If we push down the global limit, the node run after the limit node will run slowly. We fix it by push down only local limit. a join plan tree before fixing: ``` LogicalLimit(global) LogicalLimit(local) Plan() LogicalLimit(global) LogicalLimit(local) LogicalJoin LogicalLimit(global) LogicalLimit(local) Plan() LogicalLimit(global) LogicalLimit(local) Plan() after fixing: LogicalLimit(global) LogicalLimit(local) Plan() LogicalLimit(local) LogicalJoin LogicalLimit(local) Plan() LogicalLimit(local) Plan() ```	2023-07-25 14:45:20 +08:00
bobhan1	2b4bfe5be7	[fix](autoinc) fix `_fill_auto_inc_cols` when the input column is `ColumnConst` (#22175 )	2023-07-25 14:41:36 +08:00
YueW	c01230f99a	[fix](match) Optimize the logic for match_phrase function filter (#21622 )	2023-07-25 14:22:37 +08:00
Mryange	0f439bb1ca	[vectorized](udf) java udf support map type (#22059 )	2023-07-25 11:56:20 +08:00
Jerry Hu	b41fcbb783	[feature](agg) add the aggregation function 'mag_agg' (#22043 ) New aggregation function: map_agg. This function requires two arguments: a key and a value, which are used to build a map. select map_agg(column1, column2) from t group by column3;	2023-07-25 11:21:03 +08:00
Gabriel	a0463ea047	[round](decimalv2) round decimalv2 to precision value (#22138 ) * [round](decimalv2) round decimalv2 to precision value * update * update`	2023-07-25 03:29:48 +08:00
Qi Chen	752cec9e19	[Fix](multi-catalog) Fix not single slot filter conjuncts with dict filter issue. (#22052 ) ### Issue Dictionary filtering is a mechanism that directly reads the dictionary encoding of a single string column filter condition for filter comparison. But dictionary filtered single string columns may be included in other multi-column filter conditions. This can cause problems. For example: `select * from multi_catalog.lineitem_string_date_orc where l_commitdate < l_receiptdate and l_receiptdate = '1995-01-01' order by l_orderkey, l_partkey, l_suppkey, l_linenumber limit 10;` `l_receiptdate` is string filter column，it is included by multi-column filter condition `l_commitdate < l_receiptdate`. ### Solution Resolve it by separating the multi-column filter conditions and executing it after the dictionary filter column is converted to string.	2023-07-24 22:31:18 +08:00
morrySnow	21deb57a4d	[fix](Nereids) remove double sigature of ceil, floor and round (#22134 ) we convert input parameters to double for function ceil, floor and round, because DecimalV2 could not do these operation. Since we intro DecimalV3, we should convert all parameters to DecimalV3 to get correct result. For example, when we use double as parameters, we get wrong result: ```sql select round(341/20000,4),341/20000,round(0.01705,4); +-------------------------+---------------+-------------------+ \| round((341 / 20000), 4) \| (341 / 20000) \| round(0.01705, 4) \| +-------------------------+---------------+-------------------+ \| 0.017 \| 0.01705 \| 0.0171 \| +-------------------------+---------------+-------------------+ ``` DecimalV3 could get correct result ```sql select round(341/20000,4),341/20000,round(0.01705,4); +-------------------------+---------------+-------------------+ \| round((341 / 20000), 4) \| (341 / 20000) \| round(0.01705, 4) \| +-------------------------+---------------+-------------------+ \| 0.0171 \| 0.01705 \| 0.0171 \| +-------------------------+---------------+-------------------+ ```	2023-07-24 16:08:00 +08:00
morrySnow	ac9480123c	[refactor](Nereids) push down all non-slot order key in sort and prune them upper sort (#22034 ) According the implementation in execution engine, all order keys in SortNode will be output. We must normalize LogicalSort follow by it. We push down all non-slot order key in sort to materialize them behind sort. So, all order key will be slot and do not need do projection by SortNode itself. This will simplify translation of SortNode by avoid to generate resolvedTupleExprs and sortTupleDesc.	2023-07-24 15:36:33 +08:00
xzj7019	b5f27b5349	[enhance](nereids) enable wf partition topn by default (#21860 )	2023-07-24 14:21:45 +08:00
minghong	138e6c2f01	[stats](nereids)keep min/max expr in colstats (#22064 ) columnStatistics.minExpr and maxExpr is useful when we derive stats for cast function. This pr 1. maintains the min/max expr during stats derive in filter condition: col<literal, col>literal and col=literal 2. adjust column stats range for cast function (now only support cast from string to other types) ds9 is changed, but no performance issue: on tpcds_sf100_rf exe time is 1.5~1.6sec, the same as master	2023-07-24 10:28:36 +08:00
Xin Liao	4f0158c458	[fix](partial-update) fix update core for merge-on-write table (#22090 )	2023-07-23 13:35:08 +08:00
Pxl	ae809fbeba	[Bug](storage )fix dead lock when create_tablet need lock two tablet && update mv_p0… (#21969 ) fix dead lock when create_tablet need lock two tablet && update mv_p0/ssb case	2023-07-22 15:27:05 +08:00

1 2 3 4 5 ...

1482 Commits