Commit Graph

1851 Commits

Author SHA1 Message Date
42f8b253aa [function](nereids) support array_apply/array_repeat/group_uniq_array/ipv4numtostring (#25249)
nereids support functions: array_apply/array_repeat/group_uniq_array/ipv4numtostring
2023-10-12 11:08:42 +08:00
Pxl
a0d2b1ec56 [Bug](materialized-view) fix not match mv when some alias on agg (#25321)
fix not match mv when some alias on agg
2023-10-12 11:02:55 +08:00
7ca63665b4 [fix](agg) garbled characters in result of map_agg (#25318) 2023-10-12 10:10:55 +08:00
73c3e3ab55 [Feature](x-load) support config min replica num for loading data (#21118) 2023-10-11 21:07:35 +08:00
ba87f7d3a3 [fix](pipelineX) add table sink and some fix in pipelineX (#25314) 2023-10-11 20:18:08 +08:00
f680a2141d [enhancement](regression-test) add routine load json case (#25253) 2023-10-11 19:43:08 +08:00
c6b1c903e4 [fix](Regression-test) fix that the String type in a nested type should contain double quotes and add regression-test (#25115) 2023-10-11 18:30:26 +08:00
e514d52232 [fix](point-query) Support mow table with sequence column (#25308) 2023-10-11 18:22:16 +08:00
2d19f2fbfe [fix](planner)need call materializeSrcExpr for materialized slots in join node (#25204) 2023-10-11 16:34:53 +08:00
e9554e36a8 [fix](nereids)disable parallel scan in some case (#25089) 2023-10-11 16:32:09 +08:00
6d999f5b95 [enhancement](nereids)add eliminate filter on one row relation rule (#24980)
1.simplify PushdownFilterThroughSetOperation rule
2.add eliminate filter on one row relation rule
2023-10-11 16:12:24 +08:00
47578c0fc9 [fix](Nereids) fix toSql of date literal (#25243)
toSql should return '2023-2-1 ' for DateLiteral 2023-2-1
2023-10-11 13:04:05 +08:00
b91bce8a62 [feature](Nereids) add array distance functions (#25196)
- l1_distance
- l2_distance
- cosine_distance
- inner_product
2023-10-10 21:35:06 -05:00
5be29f859a [enhancement](node) add filter in partition sort node in BE #25188
add filter in partition sort node in BE
2023-10-11 10:30:15 +08:00
1fa8720164 [regression-test](merge-on-write) Fix partial update concurrency conflict case (#25212) 2023-10-11 10:17:01 +08:00
b7ac95a970 [enhancement](regression-test) open routine load regression test by default and add data check (#25122) 2023-10-11 10:03:16 +08:00
5f95e97c56 [fix](function) array distance should return null when result is nan (#25214) 2023-10-10 04:41:51 -05:00
181c58c691 [fix](Nereids) count_by_enum signature is wrong (#25167) 2023-10-10 13:05:20 +08:00
59dee6b235 [fix](Nereids) support string cast to complex type (#25154) 2023-10-10 10:26:33 +08:00
f5b826b66d [fix](mark join) mark join column should be nullable (#24910) 2023-10-10 10:10:36 +08:00
e2be5fafa9 [case](regresstest) update query for parquet/orc with array/map nested type and insert into (#24746) 2023-10-10 10:07:22 +08:00
53b46b7e6c [FIX](filter) update for filter_by_select logic (#25007)
this pr is aim to update for filter_by_select logic and change delete limit

only support scala type in delete statement where condition
only support column nullable and predict column support filter_by_select logic, because we can not push down non-scala type to storage layer to pack in predict column but do filter logic
2023-10-09 21:27:40 +08:00
37247ac449 [opt](Nereids) add two args signature to trim family functions (#25169) 2023-10-09 07:17:52 -05:00
977d119545 [fix](Insert select tvf) fix NPE because tvf do not have catalog name (#25149) 2023-10-09 18:02:43 +08:00
d02ef36631 [opt](Nereids) match predicate support array as first arg (#25172) 2023-10-09 04:17:27 -05:00
263631e983 [improvement](meta) Infer the column name when create view if the column is expression (#24990)
## Proposed changes

Infer the column name when create view if the column is expression

## Further comments
expr column name infer strategy as following:
|      expr       |                example                    |           column name(before)             | Inferred column name(if position is 2)  |
|  -------------  | ---------------------------------------   | ------------------------------            | --------------------------------------  |
| function        | dayofyear()                               | dayofyear()                               | __dayofyear_1                           |
| cast            | cast(1 as bigint)                         | CAST(1 AS BIGINT)                         | __cast_1                                |
| anylyticExpr    | min()                                     | min()                                     | __min_1                                 |
| predicate       | 1 in (1,2,3,4)                            | 1 IN (1, 2, 3, 4)                         | __in_predicate_1                        |
| literal         | 1 or 'string_var_name'                    | 1 or 'string_var_name'                    | __literal_1                             |
| arithmeticExpr  | &                                         | ... & ...                                 | __arithmetic_expr_1                     |
| identifier      | a or b                                    | a or b                                    | a or b                                  |
| case            | CASE WHEN remark = 's' THEN 1 ELSE 2 END  | CASE WHEN remark = 's' THEN 1 ELSE 2 END  | __case_1                                |
| window          | min(timestamp) OVER (...)                 | min(timestamp) OVER(...)                  | __min_1                                 |


SQL for example:
```sql
CREATE VIEW v1 AS 
SELECT 
  error_code,
  1, 
  'string', 
  now(), 
  dayofyear(op_time), 
  cast (source AS BIGINT), 
  min(`timestamp`) OVER (
    ORDER BY 
      op_time DESC ROWS BETWEEN UNBOUNDED PRECEDING
      AND 1 FOLLOWING
  ), 
  1 > 2,
  2 + 3,
  1 IN (1, 2, 3, 4), 
  remark LIKE '%like', 
  CASE WHEN remark = 's' THEN 1 ELSE 2 END,
  TRUE | FALSE 
FROM 
  db_test.table_test1
```

the output column name is as following:
```
error_code
__literal_1
__literal_2
__now_3
__dayofyear_4
__cast_expr_5
__min_6
__binary_predicate_7
__arithmetic_expr_8
__in_predicate_9
__like_predicate_10
__case_expr_11
__arithmetic_expr_12
```
2023-10-09 04:14:01 -05:00
79fa1d1640 [enhancement](regression-test) add stream load json case (#25168) 2023-10-09 16:40:39 +08:00
320709b9ff [opt](Nereids) support like and regexp function (#25148) 2023-10-09 02:55:57 -05:00
cdba4c4775 [fix](Nereids) deep copier generate wrong slot for TVF (#25156) 2023-10-09 14:52:36 +08:00
b41ec6a8a4 [feature](Nereids): Pushdown LimitDistinct Through Join (#25113)
Push down limit-distinct through left/right outer join or cross join.

such as select t1.c1 from t1 left join t2 on t1.c1 = t2.c1 order by t1.c1 limit 1;
2023-10-09 14:19:22 +08:00
5a55e47acd [Enhancement](Load) stream tvf support two phase commit (#23800) 2023-10-09 14:15:56 +08:00
7e9ffad933 [fix](ES catalog)Doris cannot parse ES date field without time zone (#24864)
1. Add support for Doris to parse ES date field without time zone info. eg: `2023-04-17T23:01:18.151`, this time will be treated as UTC time, since ES assumes that the time zone for time fields without time zones is UTC.
2. Change local time zone convertion from system local time zone to session variable time zone.
2023-10-08 19:28:08 +08:00
3a45001447 [fix](Nerids) fix error when the view has lambda functions (#25067)
1. To ensure compatibility with the original optimizer, expose the non-lambda signature of highorder function externally.
2. fix some bugs in toSql function in the original optimizer
2023-10-08 15:45:24 +08:00
feb1cbe9ed [bug](partition_sort)partition sort need sort all data in two phase global (#24960)
#24886 this PR have mark phase in FE, now add those change in BE.
partition sort need sort all data in two pahse global
2023-10-08 10:46:43 +08:00
fddef8b473 [fix](es-catalog)fix error when querying the index ,elasticsearch version 8.9.1 (#24839)
Issue Number: close #24833
2023-10-08 10:19:45 +08:00
727fa2c0cd [opt](tvf) refine the class of ExternalFileTableValuedFunction (#24706)
`ExternalFileTableValuedFunction` now has 3 derived classes:

- LocalTableValuedFunction
- HdfsTableValuedFunction
- S3TableValuedFunction

All these tvfs are for reading data from file. The difference is where to read the file, eg, from HDFS or from local filesystem.

So I refine the fields and methods of these classes.
Now there 3 kinds of properties of these tvfs:

1. File format properties

	File format properties, such as `format`, `column_separator`. For all these tvfs, they are common properties.
	So these properties should be analyzed in parenet class `ExternalFileTableValuedFunction`.
	
2. URI or file path

	The URI or file path property indicate the file location. For different storage, the format of the uri are not same.
	So they should be analyzed in each derived classes.
	
3. Other properties

	All other properties which are special for certain tvf.
	So they should be analyzed in each derived classes.
	
There are 2 new classes:

- `FileFormatConstants`: Define some common property names or variables related to file format.
- `FileFormatUtils`: Define some util methods related to file format.

After this PR, if we want to add some common properties for all these tvfs, only need to handled it in
`ExternalFileTableValuedFunction`, to avoid missing handle it in any one of them.

### Behavior change

1. Remove `fs.defaultFS` property in `hdfs()`, it can be got from `uri`
2. Use `\t` as the default column separator of csv format, same as stream load
2023-10-07 12:44:04 +08:00
70f5b0006f [fix](Nereids) ctas throw npe when default value is null (#25009) 2023-10-06 22:39:32 -05:00
f1e948e5f4 [fix](planner)the common type of date and decimal should be double (#24956) 2023-10-07 11:27:19 +08:00
d1f4d69032 [regression-test](merge-on-write) Add cases for partial update using insert statement with schema change (#24902) 2023-10-05 22:09:22 +08:00
4ce5213b1c [fix](insert) Fix test_group_commit_stream_load and add more regression in test_group_commit_http_stream (#24954) 2023-10-03 20:56:24 +08:00
10f0c63896 [FIX](complex-type) fix agg table with complex type with replace state (#24873)
fix agg table with complex type with replace state
2023-10-03 16:32:58 +08:00
2c25e0a681 [test](load) add more s3 load regression test cases (#24906) 2023-09-28 22:01:36 +08:00
4c94820ff9 [opt](nereids) adjust column stats in filter estimation (#24973)
TPCDS before
query4  9335    8113    8070    8070
query13 3104    1386    1385    1385
query18 1704    1216    1151    1151
query48 840     840     839     839
query61 435     379     383     379
query71 715     570     579     570
query85 2822    2627    2612    2612
query88 1897    1816    1793    1793
Total cold run time: 20852 ms
Total hot run time: 16799 ms

after:
query4  9610    8287    8249    8249
query13 1721    1013    1042    1013
query18 1585    1186    1155    1155
query48 789     777     778     777
query61 384     387     381     381
query71 713     610     584     584
query85 2020    1867    1843    1843
query88 1859    1812    1805    1805
Total cold run time: 18681 ms
Total hot run time: 15807 ms
2023-09-28 21:34:17 +08:00
b50c1448df [fix](Nereids) should not replace slot by Alias when do NormalizeSlot (#24928)
when we do NormalizeToSlot, we pushed complex expression and only remain
slot of it. When we do this, we collect alias and their child and
compute its child in bottom project, remain the result slot in current
node. for example

Window(max(...), c1 as a1)

after normalization, we get

Window(max(...), a1)
+-- Project(..., c1 as a1)

But, in some cases, we remove some SlotReference by mistake, for example

Window(max(...), c1, c1 as a1)

after normalization, we get

Window(max(...), a1)
+-- Project(..., c1 as a1)

we lost the SlotReference c1. This PR fix this problem. After this Pr,
we get

Window(max(...), c1, a1)
+-- Project(..., c1, c1 as a1)
2023-09-28 14:51:08 +08:00
4ff1ab7a4d [fix](regression-test) regenerate test_http_stream_properties.out file (#24946) 2023-09-28 10:39:15 +08:00
671b5f0a0a [Bug](pipeline) Fix block reusing for union source operator (#24977)
[CANCELLED][INTERNAL_ERROR]Merge block not match, self:[String], input:[String, Nullable(String), Nullable(String), Nullable(String), Nullable(String), DateV2]
2023-09-27 19:41:56 +08:00
bb7f8d18a8 [fix](nereids) push down filter through partition topn (#24944)
support pushing down filter through partition topn if the filter can pass through window.
fix CreatePartitionTopNFromWindow bug which may generate two partition topn unexpectly.
case:
select * from (select c2, row_number() over (partition by c2) as rn from t1) T where rn<=1 and c2 = 1;
before this pr:
| PhysicalResultSink                       |
| --PhysicalDistribute                     |
| ----filter((rn <= 1))                    |
| ------PhysicalWindow                     |
| --------PhysicalQuickSort                |
| ----------PhysicalDistribute             |
| ------------PhysicalPartitionTopN        |
| --------------filter((T.c2 = 1))         |
| ----------------PhysicalPartitionTopN    |
| ------------------PhysicalProject        |
| --------------------PhysicalOlapScan[t1] |
+------------------------------------------+
after:

| PhysicalResultSink                     |
| --PhysicalDistribute                   |
| ----filter((rn <= 1))                  |
| ------PhysicalWindow                   |
| --------PhysicalQuickSort              |
| ----------PhysicalDistribute           |
| ------------PhysicalPartitionTopN      |
| --------------PhysicalProject          |
| ----------------filter((T.c2 = 1))     |
| ------------------PhysicalOlapScan[t1] |
+----------------------------------------+
2023-09-27 19:38:04 +08:00
00e8d1c3b4 [Fix](Planner) disable bitmap type in compare expression (#24792)
Problem:
be core because of bitmap calculation.

Reason:
when be check failed, it would core directly.

Example:
SELECT id_bitmap FROM test_bitmap WHERE id_bitmap IN (NULL) LIMIT 20;

Solved:
Forbidden this kind of expression in fe when analyze. And also forbid bitmap type comparing in other unsupported expressions.
2023-09-27 16:57:06 +08:00
9562e280af [enhancement](Nereids): remove stats derivation in CostAndEnforce job (#24945)
1. remove stats derivation in CostAndEnforce job
2. enforce valid for each stats after estimating
2023-09-27 16:31:03 +08:00
26818de9c8 [feature](jni) support complex types in jni framework (#24810)
Support complex types in jni framework, and successfully run end-to-end on hudi.
### How to Use
Other scanners only need to implement three interfaces in `ColumnValue`:
```
// Get array elements and append into values
void unpackArray(List<ColumnValue> values);

// Get map key array&value array, and append into keys&values
void unpackMap(List<ColumnValue> keys, List<ColumnValue> values);

// Get the struct fields specified by `structFieldIndex`, and append into values
void unpackStruct(List<Integer> structFieldIndex, List<ColumnValue> values);
```
Developers can take `HudiColumnValue` as an example.
2023-09-27 14:47:41 +08:00