Create partitions use :
```
PARTITION BY RANGE(event_day)(
FROM ("2000-11-14") TO ("2021-11-14") INTERVAL 1 YEAR,
FROM ("2021-11-14") TO ("2022-11-14") INTERVAL 1 MONTH,
FROM ("2022-11-14") TO ("2023-01-03") INTERVAL 1 WEEK,
FROM ("2023-01-03") TO ("2023-01-14") INTERVAL 1 DAY,
PARTITION p_20230114 VALUES [('2023-01-14'), ('2023-01-15'))
)
PARTITION BY RANGE(event_time)(
FROM ("2023-01-03 12") TO ("2023-01-14 22") INTERVAL 1 HOUR
)
```
can create a year/month/week/day/hour's date partitions in a batch,
also it is compatible with the single partitioning method.
## Problem summary
This pr support
1. `numbers` TableValuedFunction for nereids test, like `select * from numbers(number = 10, backend_num = 1)`
2. bitmap/hll aggregate function
3. support find variable length function in function registry, like `coalesce`
4. fix a bug that print nerieds trace will throw exception because use RewriteRule in ApplyRuleJob, e.g: `AggregateDisassemble`, introduced by #13957
To support query like that:
SELECT c1 + 1 as a, sum(c2) FROM t GROUP BY c1 + 1 ORDER BY c1 + 1
After rewrite, plan will equal to
SELECT c1 + 1 as a, sum(c2) FROM t GROUP BY c1 + 1 ORDER BY a
1. add a post processor: runtime filter pruner
Doris generates RFs (runtime filter) on Join node to reduce the probe table at scan stage. But some RFs have no effect, because its selectivity is 100%. This pr will remove them.
A RF is effective if
a. the build column value range covers part of that of probe column, OR
b. the build column ndv is less than that of probe column, OR
c. the build column's ColumnStats.selectivity < 1, OR
d. the build column is reduced by another RF, which satisfies above criterions.
2. explain graph
a. add RF info in Join and Scan node
b. add predicate count in Scan node
3. Rename session variable
rename `enable_remove_no_conjuncts_runtime_filter_policy` to `enable_runtime_filter_prune`
4. fix min/max column stats derive bug
`select max(A) as X from T group by B`
X.min is A.min, not A.max
1. Supports for persisting collected statistics to a pre-built OLAP table named `column_statistics`.
2. Use a much simpler mechanism to collect statistics: all the gauges are collected in single one SQL for each partition and then the whole column, which defined in class `AnalysisJob`
3. Implement a cache to manage the statistics records in FE
TODO:
1. Use opentelemetry to monitor the execution time of each job
2. Format the internal analysis SQL
3. split SQL to promise the in expr's child count not exceeds the FE limits of generated SQL for deleting expired records
4. Implements show statements
When execute analyze table, doris fails on decimal columns.
The root cause is the scale in decimalV2 is 9, but 2 in schema.
There is no need to check scale for decimalV2, since it is not a float point type.
1. binding slot in order by that not show in project, such as:
SELECT c1 FROM t WHERE c2 > 0 ORDER BY c3
2. not check unbound when bind slot reference. Instead, do it in analysis check.
Introduce a SQL syntax for creating inverted index and related metadata changes.
```
-- create table with INVERTED index
CREATE TABLE httplogs (
ts datetime,
clientip varchar(20),
request string,
status smallint,
size int,
INDEX idx_size (size) USING INVERTED,
INDEX idx_status (status) USING INVERTED,
INDEX idx_clientip (clientip) USING INVERTED PROPERTIES("parser"="none")
)
DUPLICATE KEY(ts)
DISTRIBUTED BY RANDOM BUCKETS 10
-- add an INVERTED index to a table
CREATE INDEX idx_request ON httplogs(request) USING INVERTED PROPERTIES("parser"="english");
```
This PR implements the function of predicate inference
For example:
``` sql
select * from student left join score on student.id = score.sid where score.sid > 1
```
transformed logical plan tree:
left join
/ \
filter(sid >1) filter(id > 1) <---- inferred predicate
| |
scan scan
See `InferPredicatesTest` for more cases
The logic is as follows:
1. poll up bottom predicate then infer additional predicates
for example:
select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id
1. poll up bottom predicate
select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id and t.id = 1
2. infer
select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id and t.id = 1 and t2.id = 1
finally transformed sql:
select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id and t2.id = 1
2. put these predicates into `otherJoinConjuncts` , these predicates are processed in the next
round of predicate push-down
Now only support infer `ComparisonPredicate`.
TODO: We should determine whether `expression` satisfies the condition for replacement
eg: Satisfy `expression` is non-deterministic
1. add back TPC-H regression test cases
2. fix decimal problem on aggregate function sum and agg introduced by #13764
3. fix memo merge group NPE introduced by #13900
Support Aliyun DLF
Support data on s3-compatible object storage, such as aliyun oss.
Refactor some interface of catalog, to make it more tidy.
Fix bug that the default text format field delimiter of hive should be \x01
Add a new class PooledHiveMetaStoreClient to wrap the IMetaStoreClient.
In GraphSimplifier, we can use simple cost to calculate the benefit.
And only when the best neighbor of the apply step is the processing edge, we need to update recursively.