doris

Author	SHA1	Message	Date
Gabriel	bbf88ecc49	[Bug](datetimev2) Fix BE crash if scale is invalid (#17763 )	2023-03-15 12:08:23 +08:00
morrySnow	049b70b957	[test](Nereids) add yandex metrica p2 regression case (#17082 )	2023-03-15 11:50:00 +08:00
谢健	97bf07fe26	[enhancement](Nereids) add new distributed cost model (#17556 ) Add a new distributed cost model in Nereids. The new cost model models the cost of the pipeline execute engine by dividing cost into run and start costs. They are: * START COST: the cost from starting to emitting the fist tuple * RUN COST: the cost from emitting the first tuple to emitting all tuples For the parent operator and child operator, we assume the timeline of them is: ``` child start ---> child run --------------------> finish \|---> parent start ---> parent run -> finish ``` Therefore, in the parallel model, we can get: ``` start_cost(parent) = start_cost(child) + start_cost(parent) run_cost(parent) = max(run_cost(child), start_cost(parent) + run_cost(parent)) ```	2023-03-15 11:22:31 +08:00
ZhaoChangle	66f3ef568e	(functions) optimize const_column to full convert	2023-03-15 10:57:03 +08:00
zhangstar333	85080ee3c3	[vectorized](function) support array_map function (#17581 )	2023-03-15 10:51:29 +08:00
morrySnow	5ab758674e	[fix](planner) nested loop join with left semi generate repeat result (#17767 )	2023-03-15 09:56:44 +08:00
TengJianPing	64c2437be5	[fix](coalesce) support coalesce function for bitmap (#17798 )	2023-03-15 09:34:44 +08:00
morrySnow	6348819c27	[fix](Nereids) remove bitmap_union_int(bigint) signature (#17356 )	2023-03-14 20:42:47 +08:00
morrySnow	699159698e	[enhancement](planner) support update from syntax (#17639 ) support update from syntax note: enable_concurrent_update is not supported now ``` UPDATE <target_table> SET <col_name> = <value> [ , <col_name> = <value> , ... ] [ FROM <additional_tables> ] [ WHERE <condition> ] ``` for example: t1 ``` +----+----+----+-----+------------+ \| id \| c1 \| c2 \| c3 \| c4 \| +----+----+----+-----+------------+ \| 3 \| 3 \| 3 \| 3.0 \| 2000-01-03 \| \| 2 \| 2 \| 2 \| 2.0 \| 2000-01-02 \| \| 1 \| 1 \| 1 \| 1.0 \| 2000-01-01 \| +----+----+----+-----+------------+ ``` t2 ``` +----+----+----+------+------------+ \| id \| c1 \| c2 \| c3 \| c4 \| +----+----+----+------+------------+ \| 4 \| 4 \| 4 \| 4.0 \| 2000-01-04 \| \| 2 \| 20 \| 20 \| 20.0 \| 2000-01-20 \| \| 5 \| 5 \| 5 \| 5.0 \| 2000-01-05 \| \| 1 \| 10 \| 10 \| 10.0 \| 2000-01-10 \| \| 3 \| 30 \| 30 \| 30.0 \| 2000-01-30 \| +----+----+----+------+------------+ ``` t3 ``` +----+ \| id \| +----+ \| 1 \| \| 5 \| \| 4 \| +----+ ``` do update ```sql update t1 set t1.c1 = t2.c1, t1.c3 = t2.c3 * 100 from t2 inner join t3 on t2.id = t3.id where t1.id = t2.id; ``` the result ``` +----+----+----+--------+------------+ \| id \| c1 \| c2 \| c3 \| c4 \| +----+----+----+--------+------------+ \| 3 \| 3 \| 3 \| 3.0 \| 2000-01-03 \| \| 2 \| 2 \| 2 \| 2.0 \| 2000-01-02 \| \| 1 \| 10 \| 1 \| 1000.0 \| 2000-01-01 \| +----+----+----+--------+------------+ ```	2023-03-14 19:26:30 +08:00
AKIRA	f1dde20315	[ehancemnet](nereids) Refactor statistics (#17637 ) 1. Support for more expression type 2. Support derive with histogram 3. Use StatisticRange to abstract to logic 4. Use Statistics rather than StatisDeriveResult	2023-03-14 13:10:55 +08:00
spaces-x	5b39fa9843	[Feature](vec)(quantile_state): support quantile state in vectorized engine (#16562 ) * [Feature](vectorized)(quantile_state): support vectorized quantile state functions 1. now quantile column only support not nullable 2. add up some regression test cases 3. set default enable_quantile_state_type = true --------- Co-authored-by: spaces-x <weixiang06@meituan.com>	2023-03-14 10:54:04 +08:00
weij	ba0f5a2355	[test](mv) Add mv case from fe ut (#17204 ) add some mv case from fe ut MaterializedViewFunctionTest	2023-03-14 10:29:43 +08:00
Qi Chen	c6630a06c1	[Fix](multi-catalog) Fix "test_hive_other" regression test. (#17611 )	2023-03-14 09:16:48 +08:00
lihangyu	9b7596f1c6	[Feature](Dynamic schema table) step1 support schema change expression (#17494 ) 1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns 2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility	2023-03-13 15:12:42 +08:00
gitccl	c302fa2564	[Feature](array-function) Support array_pushfront function (#17584 )	2023-03-13 14:26:02 +08:00
zhengyu	2b31fc1472	[fix](regression) segcompaction timeout too short (#16731 ) (#17565 ) Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2023-03-13 11:19:21 +08:00
chunping	b9fac82fb1	[fix](regression) adjust regression pipeline config(tablet_create_timeout_second) for avoiding create partition timeout (#17668 ) This pull request for bellow problem : regression pipleline fail case always meet error "Failed to create partition. Timeout. Unfinished mark: 10003=57059", so adjust tablet_create_timeout_second to 100	2023-03-13 11:18:03 +08:00
starocean999	782001c75b	[fix](planner) project should be done inside subquery (#17630 ) WITH t0 AS( SELECT report.date1 AS date2 FROM( SELECT DATE_FORMAT(date, '%Y%m%d') AS date1 FROM cir_1756_t1 ) report GROUP BY report.date1 ), t3 AS( SELECT date_format(date, '%Y%m%d') AS date3 FROM cir_1756_t2 ) SELECT row_number() OVER(ORDER BY date2) FROM( SELECT t0.date2 FROM t0 LEFT JOIN t3 ON t0.date2 = t3.date3 ) tx; The DATE_FORMAT(date, '%Y%m%d') was calculated in GROUP BY node, which is wrong. This expr should be calculated inside the subquery.	2023-03-13 11:10:27 +08:00
abmdocrt	55c42da511	[Feature](array) Support array<decimalv3> data type (#16640 )	2023-03-13 10:48:13 +08:00
camby	3a6c0e7867	[fix](regression) fix test_array_export and test_map_export dir conflict #17636 regression test test_array_export and test_map_export use same output dir, if they run at the same time, the cases will failed.	2023-03-13 10:35:50 +08:00
HappenLee	39b5682d59	[Pipeline](shared_scan_opt) Support shared scan opt in pipeline exec engine	2023-03-13 10:33:57 +08:00
Tiewei Fang	13e05c4a5d	[Enhencement](stream load) add some regression test for json format streamload (#17520 )	2023-03-12 20:13:07 +08:00
slothever	455c800405	[feature](parquet-reader) add rle bool and delta decoder to read AWS Glue (#17112 ) Support delta encoding and rle(bool) to read Glue data add delta bit pack decoder, add delta length byte array decoder, add delta byte array decoder. add rle bool decoder. We find some data type is read with delta encoding on AWS Glue, so it should be supported. The definition of delta encoding can refer to the delta encoding in parquet.	2023-03-12 20:09:58 +08:00
Pxl	8328ab69ad	[Chore](Materialized-View) add some mv regression test case (#17345 ) 1. add some mv regression test case 2. rename materialized_view_p0 to mv_p0 (avoid create database failed because long db name)	2023-03-11 10:55:11 +08:00
camby	6dcd791b74	[feature](struct-type) support CAST AS Struct type (#17553 ) 1. add support `CAST AS Struct` from Struct type; 2. fix crash while `CAST('{}' AS Struct)`; 3. `CAST('' AS complext_type)` should return NULL instead of empty object; --------- Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2023-03-10 21:21:16 +08:00
morrySnow	365c8eed7e	[fix](function) width_bucket should get min and max from each tuple (#17466 )	2023-03-10 13:14:12 +08:00
lihangyu	a79b8ede88	[Bug](ColumnArray) Fix array column replicate `replicate_offsets` not matched (#17616 ) the input replicate_offsets should be the same size as ColumnArray's offset. ``` IColumn::Offsets replicate_offsets(get_offsets().size(), 0); // \|---------------------\|-------------------------\|-------------------------\| // [0, begin) [begin, begin + count_sz) [begin + count_sz, size()) // do not need to copy copy counts[n] times do not need to copy ``` we should	2023-03-10 11:52:22 +08:00
lihangyu	fcd25b53bf	[Optimize](Random distribution) Improve the performance of tablet sin… (#17389 ) The current distribution model for Doris is as follows: OlapTableSink seperate the original Block into serveral subblocks of each node(BE) by tablets distribution and distributes subblocks to storage engine of backends, then the storage engine will seperate the subblock into multiple tablets channel and each delta writer will handle partial of the block. This model causes blocks to be split according to tablets, and the splitting process can be a relatively heavy operation. After splitting, the blocks are distributed to different DeltaWriters (Memtables) through RPCs to TabletChannels. The distribution operation on TabletChannels is also a relatively heavy operation. If the distribution property of the table is RANDOM distribution, then we have the opportunity to distribute the blocks according to the complete block during distribution. The advantage of doing so is to reduce memory copying and improve write locality, similar to appending the entire block to the memtable. This optimze could save 10% ~ 20% CPU cost of RANDOM distribution table load when enable load_to_single_tablet	2023-03-10 10:52:40 +08:00
Mingyu Chen	fe6361f4b5	[regression-test](p0) fix some unstable p0 cases (#17518 ) drop database before create remove some large, unused debug log	2023-03-10 10:21:39 +08:00
bobhan1	e1bf9411de	[feature](array function) add support for array_enumerate_uniq (#17541 ) add support for array_enumerate_uniq()	2023-03-10 10:20:49 +08:00
huangzhaowei	4ba93efc98	[Enhance](DOE)Support parse default es iso datetime string (#17412 ) * support parse default es iso datetime string	2023-03-10 09:59:20 +08:00
morrySnow	006f7a91ac	[fix](planner) should not turn on push agg op when olapscan has conjuncts on it (#17598 ) we should not set PushAggOp to any type, if olap scan already has conjunct on it.	2023-03-10 09:33:08 +08:00
WenYao	a745ab1703	[fix](schema scanner) fix query some schema table report invalid parameter (#17626 ) Example: SELECT ROUTINE_SCHEMA AS PROCEDURE_CAT, NULL AS PROCEDURE_SCHEM,ROUTINE_NAME AS PROCEDURE_NAME,NULL AS NUM_INPUT_PARAMS,NULL AS NUM_OUTPUT_PARAMS,NULL AS NUM_RESULT_SETS,ROUTINE_COMMENT AS REMARKS,IF(ROUTINE_TYPE = 'FUNCTION', 2,IF(ROUTINE_TYPE= 'PROCEDURE', 1, 0)) AS PROCEDURE_TYPE FROM INFORMATION_SCHEMA.ROUTINES WHERE ROUTINE_SCHEMA = DATABASE(); ERROR 1105 (HY000): errCode = 2, detailMessage = invalid parameter This wrong and some BI tools could not work correctly.	2023-03-10 08:52:09 +08:00
Jerry Hu	08f0170895	[fix](olap) The 'scan key' generated by the 'is null' expression causes incorrect query results (#17569 )	2023-03-10 08:51:06 +08:00
Xinyi Zou	f9baf9c556	[improvement](scan) Support pushdown execute expr ctx (#15917 ) In the past, only simple predicates (slot=const), and, like, or (only bitmap index) could be pushed down to the storage layer. scan process: Read part of the column first, and calculate the row ids with a simple push-down predicate. Use row ids to read the remaining columns and pass them to the scanner, and the scanner filters the remaining predicates. This pr will also push-down the remaining predicates (functions, nested predicates...) in the scanner to the storage layer for filtering. scan process: Read part of the column first, and use the push-down simple predicate to calculate the row ids, (same as above) Use row ids to read the columns needed for the remaining predicates, and use the pushed-down remaining predicates to reduce the number of row ids again. Use row ids to read the remaining columns and pass them to the scanner.	2023-03-10 08:35:32 +08:00
Xin Liao	849b5b7b8f	[fix](sequence) fix that the result is wrong when load multiple duplicate keys (#17575 )	2023-03-09 20:59:23 +08:00
YueW	4a0361914b	[fix](alter inverted index) add or drop inverted index also need change table state to SCHEMA_CHANGE (#17471 ) before this pr, add or drop inverted index not change table state, maybe multiple alter jobs executed at the same time, that may lead to some unexpected problems.	2023-03-09 16:33:46 +08:00
AlexYue	62a03ec24c	[feature](regression) add http test action (#17567 )	2023-03-09 15:13:04 +08:00
chunping	e182e2426f	[fix](regression) close p0 fe regression pipline config for avoiding flink load fail (get tableList write lock timeout) (#17573 ) This pull request for bellow problem : when fe config set sys_log_verbos_modules = org.apache.doris, which will make fe get writeLock longer. In this config, make a stream load, that stream load will failed with this message ([ANALYSIS_ERROR]errCode = 2, detailMessage = get tableList write lock timeout, tableList=(Table [id=86135, name=flink_connector, type=OLAP]))	2023-03-09 14:18:38 +08:00
morrySnow	6c894be007	[enhancement](Nereids) support decimalv3 and precision derive (#17393 )	2023-03-09 14:12:10 +08:00
谢健	e1ea2e1f2c	[fix](Nereids) store offset of Limit in exchangeNode (#17548 ) When the limit has offset, we should add an exchangeNode and store the offset in it	2023-03-09 13:43:12 +08:00
zhangstar333	4ef46159ae	[vectorized](udaf) support array type for java-udaf (#17351 )	2023-03-09 11:30:07 +08:00
amory	06dee69174	[Refactor](map) remove using column array in map to reduce offset column (#17330 ) 1. remove column array in map 2. add offsets column in map Aim to reduce duplicate offset from key-array and value-array in disk	2023-03-09 11:22:26 +08:00
lihangyu	368e6a4f9c	[Bug](array filter) Fix bug due to `ColumnArray::filter_generic` invalid inplace `size_at` after `set_end_ptr` (#17554 ) We should make a new PodArray to add items instead of do it inplace	2023-03-09 10:59:29 +08:00
luozenglin	00727e8c11	[fix](in-bitmap) fix result may be wrong if the left side of the in bitmap predicate is a constant (#17570 )	2023-03-09 10:59:05 +08:00
Xinyi Zou	397cc011c4	[fix](function) fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17420 ) ECB algorithm, block_encryption_mode does not take effect, it only takes effect when init vector is provided. Solved: 192/256 supports calculation without init vector For other algorithms, an error should be reported when there is no init vector Initialization Vector. The default value for the block_encryption_mode system variable is aes-128-ecb, or ECB mode, which does not require an initialization vector. The alternative permitted block encryption modes CBC, CFB1, CFB8, CFB128, and OFB all require an initialization vector. Reference: https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_aes-decrypt Note: This fix does not support smooth upgrades. during upgrade process, query may report error: funciton not found	2023-03-09 09:51:41 +08:00
starocean999	2b6d971c2f	[fix](nereids)fix first_value/lead/lag window function bug in nereids (#17315 ) * [fix](nereids)fix first_value/lead/lag window function bug in nereids * add more test * add order by to fix test case * fix test cases	2023-03-09 09:35:27 +08:00
minghong	4822b9811a	[feature](nereids)support bitmap runtime filter on nereids (#16927 ) * A in(B) -> bitmap_contains(bitmap_union(B), A) support bitmap runtime filter on nereids * GroupPlan -> Plan * fmt * fix target cast problem remove test code	2023-03-09 09:30:24 +08:00
qiye	f0bd002911	[fix](DOE) Fix esquery not working (#17566 ) Function esquery does not work because there is a problem parsing the first parameter type. The first parameter, which is SlotRef, will be cast to CastExpr. This will cause error while generating ES DSL. Add more types to adapt esquery function.	2023-03-08 21:51:17 +08:00
ElvinWei	bd5ed2b0c2	[enhancement](histogram) optimize the histogram bucketing strategy, etc (#17264 ) * optimize the histogram bucketing strategy, etc * fix p0 regression of histogram	2023-03-08 20:12:05 +08:00

1 2 3 4 5 ...

1275 Commits