For each release of Doris, there are some experimental features.
These feature may not stable or qualified enough, and user need to use it by setting config or session variables,
eg, set enable_mtmv = true, otherwise, these feature is disable by default.
We should explicitly tell user which features are experimental, so that user will notice that and decide whether to
use it.
Changes
In this PR, I support the experimental_ prefix for FE config and session variables.
Session Variable
Given enable_nereids_planner as an example.
The Nereids planner is an experimental feature in Doris, so there is an EXPERIMENTAL annotation for it:
@VariableMgr.VarAttr(..., expType = ExperimentalType.EXPERIMENTAL)
private boolean enableNereidsPlanner = false;
And for compatibility, user can set it by:
set enable_nereids_planner = true;
set experimental_enable_nereids_planner = true;
And for show variables, it will only show experimental_enable_nereids_planner entry.
And you can also see all experimental session variables by:
show variables like "%experimental%"
Config
Same as session variable, give enable_mtmv as an example.
@ConfField(..., expType = ExperimentalType.EXPERIMENTAL)
public static boolean enable_mtmv = false;
User can set it in fe.conf or ADMIN SET FRONTEND CONFIG stmt with both names:
enable_mtmv
experimental_enable_mtmv
And user can see all experimental FE configs by:
ADMIN SHOW FRONTEND CONFIG LIKE "%experimental%";
TODO
Support this feature for BE config
Only add experimental for:
enable_pipeline_engine
enable_nereids_planner
enable_single_replica_insert
and FE config:
enable_mtmv
enabel_ssl
enable_fqdn_mode
Should modify other config and session vars
select cast(k1 as INT) as id from tbl1 order by id limit 2;
is not valid for topN optimization, because 'id' is
a cast expr not a table column from scan node.
This pr address this issue.
Support to delete expired stats periodically and manually.
default cleaner running interval is 2 days
Manually clean syntax is
```sql
DROP EXPIRED STATS
```
TODO:
1. process external catalog's stats
2. run drop at the appointed time
3. sleep a short time after drop one batch
`select count(*) from T group by A, B`
suppose `ndv(A) > ndv(B)`
the estimated row count of aggregate is between ndv(A) and ndv(A) * ndv(B)
in previous version, we choose upper bound, that is ndv(A) * ndv(B). The drawback of this choice is the estimated row is often bigger that row count of T.
In this version, we choose the lower bound.
if we have expr like below
```
date(c1) -- c1's type is date or datev2
```
the expr's result is exactly same with c1, and we should
remove date function. This expr optimization will simplify
expr, speed up execution and increase the opportunity of
push filters to storage layer.
1. fix bind ambiguous slots exception because select same slots
2. fix bind SetOperation multiple times because CTE
3. fix case when clause not coercion to same type
4. fix an exception when set_var hint exists in subquery or CTE
Because of the limitation of ProjectPlanner, we have to keep set agg functions materialized if there is any virtual slots in the group by list, such as 'GROUPING_ID' in the group by list etc.
Consider sql
select table_B_alias.b from table_B_alias where table_B_alias.b in ( select a from table_A_alias );
if table_B_alias.b is int and table_A_alias.a is bigint,
we should cast(b as bigint) to make the data type the same as the InSubquery.
when group-by-keys does not contain unique column
1. with out distinct: we prefer two phase aggregate to one phase aggregate
2. with distinct: we prefer three phase aggregate to two phase aggregate
steps to repo:
1, create any catalog re; [OK]
2, switch re [OK]
3, show catalogs [OK]
4, drop catalog re [OK]
5, show catalogs [FAIL with "Current catalog is not exist, please switch catalog." ]
expect:
show catalogs should always be OK, not depends on current catalog.