Commit Graph

6215 Commits

Author SHA1 Message Date
d6ff9744c9 [feature](Nereids) covert predicate to SARGABLE (#25180)
covert predicate to SARGABLE 
1. support format like `1 - a`
2. support rearrange `year/month/week/day/minutes/seconds_sub/add` function
2023-10-12 14:46:56 +08:00
c63bf24c84 [Improvement](statistics) Improve sample count accuracy (#25175)
While doing sample analyze, the result of row count, null number and datasize need to multiply a coefficient based on 
the sample percent/rows. This pr is mainly to calculate the coefficient according to the sampled file size over total size.
2023-10-12 14:42:02 +08:00
80a49ed97a [fix](nereids)fix some function signature issue (#25301)
1. remove wrong signature of nvl
2. the promoted type datetimev2 for datetime should be datetimev2(0)
2023-10-12 01:23:20 -05:00
a0d3206d78 [fix](Nereids) support nested complex type literal (#25287) 2023-10-12 01:17:38 -05:00
42f8b253aa [function](nereids) support array_apply/array_repeat/group_uniq_array/ipv4numtostring (#25249)
nereids support functions: array_apply/array_repeat/group_uniq_array/ipv4numtostring
2023-10-12 11:08:42 +08:00
Pxl
a0d2b1ec56 [Bug](materialized-view) fix not match mv when some alias on agg (#25321)
fix not match mv when some alias on agg
2023-10-12 11:02:55 +08:00
9a4baf7ccf [fix](Nereids)Fix the bug that count(*) does not push down for tables with only one column. (#25222)
after pr #22115 .

Fixed the bug that when selecting count(*) from table, if the table has only one column, the aggregate count is not pushed down.
2023-10-11 23:17:30 +08:00
d1f59a4025 [fix](catalog)fix when modifying comments in property, it will modify the comments in the catalog (#24857)
- fix when modifying comments in property, it will modify the comments in the catalog
- add `alter catalog modify comment` to modify comment for catalog
- abstract some logic of `alter catalog` to parent class
2023-10-11 23:16:19 +08:00
73c3e3ab55 [Feature](x-load) support config min replica num for loading data (#21118) 2023-10-11 21:07:35 +08:00
ba87f7d3a3 [fix](pipelineX) add table sink and some fix in pipelineX (#25314) 2023-10-11 20:18:08 +08:00
46be6c07e1 [opt](Nereids) expose multi distinct functions (#25309) 2023-10-11 05:42:39 -05:00
1e300d895d [improvement](checkpoint) checkpoint thread update tablet invert index (#25098) 2023-10-11 18:18:03 +08:00
2d19f2fbfe [fix](planner)need call materializeSrcExpr for materialized slots in join node (#25204) 2023-10-11 16:34:53 +08:00
dabeeb0338 [fix](planner)should always use plan node's getTblRefIds method to get unassigned conjuncts for this node (#25130) 2023-10-11 16:34:21 +08:00
2221c8e2ed [fix](planner)implicit cast should use type member variable instead of targetTypeDef (#24582) 2023-10-11 16:33:48 +08:00
e9554e36a8 [fix](nereids)disable parallel scan in some case (#25089) 2023-10-11 16:32:09 +08:00
a9b84ae6ee [test](nereids)add more case in PushdownFilterThroughAggregationTest (#24927) 2023-10-11 16:14:36 +08:00
6d999f5b95 [enhancement](nereids)add eliminate filter on one row relation rule (#24980)
1.simplify PushdownFilterThroughSetOperation rule
2.add eliminate filter on one row relation rule
2023-10-11 16:12:24 +08:00
47578c0fc9 [fix](Nereids) fix toSql of date literal (#25243)
toSql should return '2023-2-1 ' for DateLiteral 2023-2-1
2023-10-11 13:04:05 +08:00
0d603dd4c3 [Bug](delete) Use date as common type for date comparison (#25262) 2023-10-11 11:51:43 +08:00
1e6d34d1d0 [Enhancement](sql-cache) Add partition update time for hms table and use it at sql-cache. (#24491)
Now FE does not record the update time of hms tbl's partitons, so the sql cache may be hit even the hive table's partitions have changed. This pr add a field to record the partition update time, and use it when enable sql-cache.
The cache will be missed if any partition has changed at hive side.

Use System.currentTimeMillis() but not the event time of hms event because we would better keep the same measurement with the schemaUpdateTime of external table. Add this value to ExternalObjectLog and let slave FEs replay it because it is better to keep the same value with all FEs, so the sql-cache can be hit by the querys through different FEs.
2023-10-11 11:05:16 +08:00
b91bce8a62 [feature](Nereids) add array distance functions (#25196)
- l1_distance
- l2_distance
- cosine_distance
- inner_product
2023-10-10 21:35:06 -05:00
d4673ce28a [Feature](Job)Jobs in the Finish state will be automatically deleted after three days. (#25170) 2023-10-11 10:04:19 +08:00
fb3b888ff1 [prune](partition)support prune partition when is auto partition with function call (#24747)
now create table use auto create partition:
AUTO PARTITION BY RANGE date_trunc(event_day, 'day')
so the value of event_day will be insert into partition of date_trunc(event_day, 'day'),
eg: select * from partition_range where date_trunc(event_day,"day")= "2023-08-07 11:00:00";
we can prune some partitions by invoke function of date_trunc("2023-08-07 11:00:00","day" );
2023-10-10 20:39:43 +08:00
62a6b132be [Fix](func numbers) Remove backend_nums argument of numbers function (#25200) 2023-10-10 20:25:58 +08:00
fc1bad9a6b [feature](Nereids) support query MATERIALIZED_VIEW type table (#25227) 2023-10-10 06:44:29 -05:00
67ddfb1abc [fix](httpserver) creating this cookie without the "secure" flag and enabling cross-origin resource safe (#25107) 2023-10-10 06:25:09 -05:00
8b56ca84c7 [fix](Nereids) support AnyDataType in function signature (#25173)
1. support AnyDataType in function signature
2. update histogram signature
2023-10-10 06:09:47 -05:00
0435b286fb [feature](Nereids) support metadata tvf and fix bugs in group_commit() (#25224)
metadata tvf list:
- backends
- catalogs
- frontends
- frontends_disks
- group_commit
- iceberg_meta
- workload_groups

fix group_commit bugs
- throw NPE when properties do not contain 'table_id'
- throw NPE when table_id's table do not exist
- throw class Cast failed when table_id's table's type is not OLAP
2023-10-10 05:20:19 -05:00
7276665f1e [enhancement](Nereids) avoiding broadcast join heuristically and pruning more in CostAndEnforceJob (#25137)
When the rowCount exceeds a certain threshold, refrain from generating a broadcast join.
Only enforce the best expression in CostAndEnforce Job, rather than enforcing every expression.
Remove lower bound group pruning
2023-10-10 13:38:10 +08:00
181c58c691 [fix](Nereids) count_by_enum signature is wrong (#25167) 2023-10-10 13:05:20 +08:00
880d0d7e70 [Bug](pipeline) Support the auto partition in pipeline load (#25176) 2023-10-10 11:51:12 +08:00
59dee6b235 [fix](Nereids) support string cast to complex type (#25154) 2023-10-10 10:26:33 +08:00
f5b826b66d [fix](mark join) mark join column should be nullable (#24910) 2023-10-10 10:10:36 +08:00
90ad48cdb7 [feature](pipelineX) add node id and profilev2 in pipelineX (#25084) 2023-10-10 09:09:26 +08:00
5e8aef4756 [feature](Nereids) fold weeks_sub/add on fe (#25155)
support folding weeks_sub/add on fe
2023-10-09 21:52:44 +08:00
53b46b7e6c [FIX](filter) update for filter_by_select logic (#25007)
this pr is aim to update for filter_by_select logic and change delete limit

only support scala type in delete statement where condition
only support column nullable and predict column support filter_by_select logic, because we can not push down non-scala type to storage layer to pack in predict column but do filter logic
2023-10-09 21:27:40 +08:00
37247ac449 [opt](Nereids) add two args signature to trim family functions (#25169) 2023-10-09 07:17:52 -05:00
08e7a7b932 [feat](optimizer) Scale sample stats with ratio to make it more precise (#25079)
Since Doris support query specific tablet only, so we don't depend on tableSample to do sample, instead use grammar: TABLET(id) to do so. In OlapAnalyzeTask, we calculate which tablets would be hit and set theirs id in it, so we could get how many rows actually queried and furthur we could get the scale up ratio here
2023-10-09 07:01:59 -05:00
400b9f2f97 [Enhancement](log) Improve Safety and Robustness of Log4j Configuration (#24861) 2023-10-09 06:44:37 -05:00
f8eb36158a [fix](Nereids) alias function support arithmetic functions (#25162) 2023-10-09 19:04:47 +08:00
977d119545 [fix](Insert select tvf) fix NPE because tvf do not have catalog name (#25149) 2023-10-09 18:02:43 +08:00
d02ef36631 [opt](Nereids) match predicate support array as first arg (#25172) 2023-10-09 04:17:27 -05:00
263631e983 [improvement](meta) Infer the column name when create view if the column is expression (#24990)
## Proposed changes

Infer the column name when create view if the column is expression

## Further comments
expr column name infer strategy as following:
|      expr       |                example                    |           column name(before)             | Inferred column name(if position is 2)  |
|  -------------  | ---------------------------------------   | ------------------------------            | --------------------------------------  |
| function        | dayofyear()                               | dayofyear()                               | __dayofyear_1                           |
| cast            | cast(1 as bigint)                         | CAST(1 AS BIGINT)                         | __cast_1                                |
| anylyticExpr    | min()                                     | min()                                     | __min_1                                 |
| predicate       | 1 in (1,2,3,4)                            | 1 IN (1, 2, 3, 4)                         | __in_predicate_1                        |
| literal         | 1 or 'string_var_name'                    | 1 or 'string_var_name'                    | __literal_1                             |
| arithmeticExpr  | &                                         | ... & ...                                 | __arithmetic_expr_1                     |
| identifier      | a or b                                    | a or b                                    | a or b                                  |
| case            | CASE WHEN remark = 's' THEN 1 ELSE 2 END  | CASE WHEN remark = 's' THEN 1 ELSE 2 END  | __case_1                                |
| window          | min(timestamp) OVER (...)                 | min(timestamp) OVER(...)                  | __min_1                                 |


SQL for example:
```sql
CREATE VIEW v1 AS 
SELECT 
  error_code,
  1, 
  'string', 
  now(), 
  dayofyear(op_time), 
  cast (source AS BIGINT), 
  min(`timestamp`) OVER (
    ORDER BY 
      op_time DESC ROWS BETWEEN UNBOUNDED PRECEDING
      AND 1 FOLLOWING
  ), 
  1 > 2,
  2 + 3,
  1 IN (1, 2, 3, 4), 
  remark LIKE '%like', 
  CASE WHEN remark = 's' THEN 1 ELSE 2 END,
  TRUE | FALSE 
FROM 
  db_test.table_test1
```

the output column name is as following:
```
error_code
__literal_1
__literal_2
__now_3
__dayofyear_4
__cast_expr_5
__min_6
__binary_predicate_7
__arithmetic_expr_8
__in_predicate_9
__like_predicate_10
__case_expr_11
__arithmetic_expr_12
```
2023-10-09 04:14:01 -05:00
320709b9ff [opt](Nereids) support like and regexp function (#25148) 2023-10-09 02:55:57 -05:00
7ceb029a17 [Fix](statistics)Fix alter column stats data size is always 0 bug (#24891)
Fix alter column stats data size is always 0 bug.
2023-10-09 15:48:11 +08:00
0bf954ba05 [fix](Nereids) unique table support bitmap column (#25160) 2023-10-09 02:39:11 -05:00
4f7fad5498 [fix](Nereids) properties parser should return map (#25150) 2023-10-09 02:32:56 -05:00
cdba4c4775 [fix](Nereids) deep copier generate wrong slot for TVF (#25156) 2023-10-09 14:52:36 +08:00
d34ab7accc [fix](Nereids) bind sink should use full base schema (#25153) 2023-10-09 01:40:57 -05:00