doris

Author	SHA1	Message	Date
morrySnow	5f95e97c56	[fix](function) array distance should return null when result is nan (#25214 )	2023-10-10 04:41:51 -05:00
morrySnow	181c58c691	[fix](Nereids) count_by_enum signature is wrong (#25167 )	2023-10-10 13:05:20 +08:00
morrySnow	59dee6b235	[fix](Nereids) support string cast to complex type (#25154 )	2023-10-10 10:26:33 +08:00
Jerry Hu	f5b826b66d	[fix](mark join) mark join column should be nullable (#24910 )	2023-10-10 10:10:36 +08:00
amory	e2be5fafa9	[case](regresstest) update query for parquet/orc with array/map nested type and insert into (#24746 )	2023-10-10 10:07:22 +08:00
amory	53b46b7e6c	[FIX](filter) update for filter_by_select logic (#25007 ) this pr is aim to update for filter_by_select logic and change delete limit only support scala type in delete statement where condition only support column nullable and predict column support filter_by_select logic, because we can not push down non-scala type to storage layer to pack in predict column but do filter logic	2023-10-09 21:27:40 +08:00
morrySnow	37247ac449	[opt](Nereids) add two args signature to trim family functions (#25169 )	2023-10-09 07:17:52 -05:00
Tiewei Fang	977d119545	[fix](Insert select tvf) fix NPE because tvf do not have catalog name (#25149 )	2023-10-09 18:02:43 +08:00
morrySnow	d02ef36631	[opt](Nereids) match predicate support array as first arg (#25172 )	2023-10-09 04:17:27 -05:00
JingDas	263631e983	[improvement](meta) Infer the column name when create view if the column is expression (#24990 ) ## Proposed changes Infer the column name when create view if the column is expression ## Further comments expr column name infer strategy as following: \| expr \| example \| column name(before) \| Inferred column name(if position is 2) \| \| ------------- \| --------------------------------------- \| ------------------------------ \| -------------------------------------- \| \| function \| dayofyear() \| dayofyear() \| __dayofyear_1 \| \| cast \| cast(1 as bigint) \| CAST(1 AS BIGINT) \| __cast_1 \| \| anylyticExpr \| min() \| min() \| __min_1 \| \| predicate \| 1 in (1,2,3,4) \| 1 IN (1, 2, 3, 4) \| __in_predicate_1 \| \| literal \| 1 or 'string_var_name' \| 1 or 'string_var_name' \| __literal_1 \| \| arithmeticExpr \| & \| ... & ... \| __arithmetic_expr_1 \| \| identifier \| a or b \| a or b \| a or b \| \| case \| CASE WHEN remark = 's' THEN 1 ELSE 2 END \| CASE WHEN remark = 's' THEN 1 ELSE 2 END \| __case_1 \| \| window \| min(timestamp) OVER (...) \| min(timestamp) OVER(...) \| __min_1 \| SQL for example: ```sql CREATE VIEW v1 AS SELECT error_code, 1, 'string', now(), dayofyear(op_time), cast (source AS BIGINT), min(`timestamp`) OVER ( ORDER BY op_time DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 FOLLOWING ), 1 > 2, 2 + 3, 1 IN (1, 2, 3, 4), remark LIKE '%like', CASE WHEN remark = 's' THEN 1 ELSE 2 END, TRUE \| FALSE FROM db_test.table_test1 ``` the output column name is as following: ``` error_code __literal_1 __literal_2 __now_3 __dayofyear_4 __cast_expr_5 __min_6 __binary_predicate_7 __arithmetic_expr_8 __in_predicate_9 __like_predicate_10 __case_expr_11 __arithmetic_expr_12 ```	2023-10-09 04:14:01 -05:00
HHoflittlefish777	79fa1d1640	[enhancement](regression-test) add stream load json case (#25168 )	2023-10-09 16:40:39 +08:00
morrySnow	320709b9ff	[opt](Nereids) support like and regexp function (#25148 )	2023-10-09 02:55:57 -05:00
morrySnow	cdba4c4775	[fix](Nereids) deep copier generate wrong slot for TVF (#25156 )	2023-10-09 14:52:36 +08:00
jakevin	b41ec6a8a4	[feature](Nereids): Pushdown LimitDistinct Through Join (#25113 ) Push down limit-distinct through left/right outer join or cross join. such as select t1.c1 from t1 left join t2 on t1.c1 = t2.c1 order by t1.c1 limit 1;	2023-10-09 14:19:22 +08:00
zzzzzzzs	5a55e47acd	[Enhancement](Load) stream tvf support two phase commit (#23800 )	2023-10-09 14:15:56 +08:00
qiye	7e9ffad933	[fix](ES catalog)Doris cannot parse ES date field without time zone (#24864 ) 1. Add support for Doris to parse ES date field without time zone info. eg: `2023-04-17T23:01:18.151`, this time will be treated as UTC time, since ES assumes that the time zone for time fields without time zones is UTC. 2. Change local time zone convertion from system local time zone to session variable time zone.	2023-10-08 19:28:08 +08:00
谢健	3a45001447	[fix](Nerids) fix error when the view has lambda functions (#25067 ) 1. To ensure compatibility with the original optimizer, expose the non-lambda signature of highorder function externally. 2. fix some bugs in toSql function in the original optimizer	2023-10-08 15:45:24 +08:00
zhangstar333	feb1cbe9ed	[bug](partition_sort)partition sort need sort all data in two phase global (#24960 ) #24886 this PR have mark phase in FE, now add those change in BE. partition sort need sort all data in two pahse global	2023-10-08 10:46:43 +08:00
Guangdong Liu	fddef8b473	[fix](es-catalog)fix error when querying the index ,elasticsearch version 8.9.1 (#24839 ) Issue Number: close #24833	2023-10-08 10:19:45 +08:00
Mingyu Chen	727fa2c0cd	[opt](tvf) refine the class of ExternalFileTableValuedFunction (#24706 ) `ExternalFileTableValuedFunction` now has 3 derived classes: - LocalTableValuedFunction - HdfsTableValuedFunction - S3TableValuedFunction All these tvfs are for reading data from file. The difference is where to read the file, eg, from HDFS or from local filesystem. So I refine the fields and methods of these classes. Now there 3 kinds of properties of these tvfs: 1. File format properties File format properties, such as `format`, `column_separator`. For all these tvfs, they are common properties. So these properties should be analyzed in parenet class `ExternalFileTableValuedFunction`. 2. URI or file path The URI or file path property indicate the file location. For different storage, the format of the uri are not same. So they should be analyzed in each derived classes. 3. Other properties All other properties which are special for certain tvf. So they should be analyzed in each derived classes. There are 2 new classes: - `FileFormatConstants`: Define some common property names or variables related to file format. - `FileFormatUtils`: Define some util methods related to file format. After this PR, if we want to add some common properties for all these tvfs, only need to handled it in `ExternalFileTableValuedFunction`, to avoid missing handle it in any one of them. ### Behavior change 1. Remove `fs.defaultFS` property in `hdfs()`, it can be got from `uri` 2. Use `\t` as the default column separator of csv format, same as stream load	2023-10-07 12:44:04 +08:00
morrySnow	70f5b0006f	[fix](Nereids) ctas throw npe when default value is null (#25009 )	2023-10-06 22:39:32 -05:00
starocean999	f1e948e5f4	[fix](planner)the common type of date and decimal should be double (#24956 )	2023-10-07 11:27:19 +08:00
bobhan1	d1f4d69032	[regression-test](merge-on-write) Add cases for partial update using insert statement with schema change (#24902 )	2023-10-05 22:09:22 +08:00
meiyi	4ce5213b1c	[fix](insert) Fix test_group_commit_stream_load and add more regression in test_group_commit_http_stream (#24954 )	2023-10-03 20:56:24 +08:00
amory	10f0c63896	[FIX](complex-type) fix agg table with complex type with replace state (#24873 ) fix agg table with complex type with replace state	2023-10-03 16:32:58 +08:00
Siyang Tang	2c25e0a681	[test](load) add more s3 load regression test cases (#24906 )	2023-09-28 22:01:36 +08:00
minghong	4c94820ff9	[opt](nereids) adjust column stats in filter estimation (#24973 ) TPCDS before query4 9335 8113 8070 8070 query13 3104 1386 1385 1385 query18 1704 1216 1151 1151 query48 840 840 839 839 query61 435 379 383 379 query71 715 570 579 570 query85 2822 2627 2612 2612 query88 1897 1816 1793 1793 Total cold run time: 20852 ms Total hot run time: 16799 ms after: query4 9610 8287 8249 8249 query13 1721 1013 1042 1013 query18 1585 1186 1155 1155 query48 789 777 778 777 query61 384 387 381 381 query71 713 610 584 584 query85 2020 1867 1843 1843 query88 1859 1812 1805 1805 Total cold run time: 18681 ms Total hot run time: 15807 ms	2023-09-28 21:34:17 +08:00
morrySnow	b50c1448df	[fix](Nereids) should not replace slot by Alias when do NormalizeSlot (#24928 ) when we do NormalizeToSlot, we pushed complex expression and only remain slot of it. When we do this, we collect alias and their child and compute its child in bottom project, remain the result slot in current node. for example Window(max(...), c1 as a1) after normalization, we get Window(max(...), a1) +-- Project(..., c1 as a1) But, in some cases, we remove some SlotReference by mistake, for example Window(max(...), c1, c1 as a1) after normalization, we get Window(max(...), a1) +-- Project(..., c1 as a1) we lost the SlotReference c1. This PR fix this problem. After this Pr, we get Window(max(...), c1, a1) +-- Project(..., c1, c1 as a1)	2023-09-28 14:51:08 +08:00
zzzzzzzs	4ff1ab7a4d	[fix](regression-test) regenerate test_http_stream_properties.out file (#24946 )	2023-09-28 10:39:15 +08:00
Gabriel	671b5f0a0a	[Bug](pipeline) Fix block reusing for union source operator (#24977 ) [CANCELLED][INTERNAL_ERROR]Merge block not match, self:[String], input:[String, Nullable(String), Nullable(String), Nullable(String), Nullable(String), DateV2]	2023-09-27 19:41:56 +08:00
xzj7019	bb7f8d18a8	[fix](nereids) push down filter through partition topn (#24944 ) support pushing down filter through partition topn if the filter can pass through window. fix CreatePartitionTopNFromWindow bug which may generate two partition topn unexpectly. case: select * from (select c2, row_number() over (partition by c2) as rn from t1) T where rn<=1 and c2 = 1; before this pr: \| PhysicalResultSink \| \| --PhysicalDistribute \| \| ----filter((rn <= 1)) \| \| ------PhysicalWindow \| \| --------PhysicalQuickSort \| \| ----------PhysicalDistribute \| \| ------------PhysicalPartitionTopN \| \| --------------filter((T.c2 = 1)) \| \| ----------------PhysicalPartitionTopN \| \| ------------------PhysicalProject \| \| --------------------PhysicalOlapScan[t1] \| +------------------------------------------+ after: \| PhysicalResultSink \| \| --PhysicalDistribute \| \| ----filter((rn <= 1)) \| \| ------PhysicalWindow \| \| --------PhysicalQuickSort \| \| ----------PhysicalDistribute \| \| ------------PhysicalPartitionTopN \| \| --------------PhysicalProject \| \| ----------------filter((T.c2 = 1)) \| \| ------------------PhysicalOlapScan[t1] \| +----------------------------------------+	2023-09-27 19:38:04 +08:00
LiBinfeng	00e8d1c3b4	[Fix](Planner) disable bitmap type in compare expression (#24792 ) Problem: be core because of bitmap calculation. Reason: when be check failed, it would core directly. Example: SELECT id_bitmap FROM test_bitmap WHERE id_bitmap IN (NULL) LIMIT 20; Solved: Forbidden this kind of expression in fe when analyze. And also forbid bitmap type comparing in other unsupported expressions.	2023-09-27 16:57:06 +08:00
谢健	9562e280af	[enhancement](Nereids): remove stats derivation in CostAndEnforce job (#24945 ) 1. remove stats derivation in CostAndEnforce job 2. enforce valid for each stats after estimating	2023-09-27 16:31:03 +08:00
Ashin Gau	26818de9c8	[feature](jni) support complex types in jni framework (#24810 ) Support complex types in jni framework, and successfully run end-to-end on hudi. ### How to Use Other scanners only need to implement three interfaces in `ColumnValue`: ``` // Get array elements and append into values void unpackArray(List<ColumnValue> values); // Get map key array&value array, and append into keys&values void unpackMap(List<ColumnValue> keys, List<ColumnValue> values); // Get the struct fields specified by `structFieldIndex`, and append into values void unpackStruct(List<Integer> structFieldIndex, List<ColumnValue> values); ``` Developers can take `HudiColumnValue` as an example.	2023-09-27 14:47:41 +08:00
zzzzzzzs	452318a9fc	[Enhancement](streamload) stream tvf support user specified label (#24219 ) stream tvf support user specified label example: curl -v --location-trusted -u root: -H "sql: insert into test.t1 WITH LABEL label1 select c1,c2 from http_stream(\"format\" = \"CSV\", \"column_separator\" = \",\")" -T example.csv http://127.0.0.1:8030/api/_http_stream return: { "TxnId": 2064, "Label": "label1", "Comment": "", "TwoPhaseCommit": "false", "Status": "Success", "Message": "OK", "NumberTotalRows": 2, "NumberLoadedRows": 2, "NumberFilteredRows": 0, "NumberUnselectedRows": 0, "LoadBytes": 27, "LoadTimeMs": 152, "BeginTxnTimeMs": 0, "StreamLoadPutTimeMs": 83, "ReadDataTimeMs": 92, "WriteDataTimeMs": 41, "CommitAndPublishTimeMs": 24 }	2023-09-27 12:09:35 +08:00
Pxl	18b5f70a7c	[Bug](materialized-view) enable rewrite on select materialized index with aggregate mode (#24691 ) enable rewrite on select materialized index with aggregate mode	2023-09-27 11:30:36 +08:00
minghong	a8f312794e	[feature](nereids)support stats estimation for is-null predicate (#24764 ) 1. condition order: filter/hashCondition/otherCondition, 2. update regression out 3. remove tpch_sf500 shape case(covered by tpch sf1000) 4. implement is-null stats estimation 5. update ssb shape	2023-09-27 10:04:35 +08:00
zzzzzzzs	6d27a016b9	[Improvement](regression-test) add http_stream case (#24930 )	2023-09-27 09:55:52 +08:00
Mryange	9a78681d6f	[fix](pipelineX) remove cases with conflicting table names (#24922 )	2023-09-26 22:24:48 +08:00
morrySnow	90c5461ad2	[fix](Nereids) let dml work well (#24748 ) Co-authored-by: sohardforaname <organic_chemistry@foxmail.com> TODO: 1. support agg_state type 2. support implicit cast literal exception 3. use nereids execute dml for these regression cases: - test_agg_state_nereids (for TODO 1) - test_array_insert_overflow (for TODO 2) - nereids_p0/json_p0/test_json_load_and_function (for TODO 2) - nereids_p0/json_p0/test_json_unique_load_and_function (for TODO 2) - nereids_p0/jsonb_p0/test_jsonb_load_and_function (for TODO 2) - nereids_p0/jsonb_p0/test_jsonb_unique_load_and_function (for TODO 2) - json_p0/test_json_load_and_function (for TODO 2) - json_p0/test_json_unique_load_and_function (for TODO 2) - jsonb_p0/test_jsonb_load_and_function (for TODO 2) - jsonb_p0/test_jsonb_unique_load_and_function (for TODO 2) - test_multi_partition_key (for TODO 2)	2023-09-26 21:08:24 +08:00
zzzzzzzs	a6a0e78f32	[Enhancement](streamload) stream tvf support compress (#24303 )	2023-09-26 20:58:20 +08:00
meiyi	55d1090137	[feature](insert) Support group commit stream load (#24304 )	2023-09-26 20:57:02 +08:00
zhiqqqq	c9cf9499b6	[impro](regression test) Add case for time cast #24895	2023-09-26 19:47:38 +08:00
starocean999	04bf9bce54	[fix](planner)update explode slot's nullable info in analyze phase (#24879 )	2023-09-26 18:14:04 +08:00
airborne12	94082ae59c	[Fix](inverted index) fix tokenize function coredump (#24896 )	2023-09-26 17:31:10 +08:00
bobhan1	1abda1c446	[Fix](merge-on-write) Correct the alignment process when the existing rows with same key has marked delete sign (#24877 )	2023-09-26 16:09:20 +08:00
HHoflittlefish777	bc747be511	[Improvement](regression-test) add stream load case (#24396 )	2023-09-26 15:35:19 +08:00
Mryange	733b71828c	[fix](pipelineX) fix do not set per_fragment_instance_idx (#24890 )	2023-09-26 13:10:30 +08:00
Siyang Tang	dae0dc1652	[test](load) add some S3 TVF load regression tests (#24719 )	2023-09-26 12:21:42 +08:00
jakevin	e4c0c98efa	[fix](Nereids): round microsecond when specify scale of microsecond (#24854 )	2023-09-26 10:11:53 +08:00

1 2 3 4 5 ...

1835 Commits