The image file of our cluster reaches 2.3G. After the checkpoint, Followers synchronize the image timeout, resulting in the continuous increase of the bdb directory.
related pr: #25768
could not run multi group_concat distinct with more than one parameters.
This bug is not just for group_concat, but we usually use literal as
parameters in group_concat. So group_concat brought the problem to light.
In the original logic, we think only distinct aggregate function with
zero or one parameter could run in multi distinct mode. But it is wrong.
We could process all distinct aggregate function with not more than one
input slots.
Think about sql:
```sql
SELECT
group_concat(distinct c1, ','), group_concat(distinct c2, ',')
FROM t
GROUP BY c3
```
we translate expression to legacy one when do fold constant on BE.
some times, we generate invalid expression that cannot be tranlsated.
So, we should catch translate exception to avoid query failed.
When replaying drop function edit log, the function may not be found, causing runtime exception and
FE will fail to start.
The function SHOULD be exist, but the reason is still unknown.
I change the logic to NOT throw exception if function is not found.
This is a workaround to make sure FE can start, and add some log for later debug.
we put bound expr into unbound group by list by mistake.
This will lead to bind twice on some exprssion.
Since binding is not idempotent, below exception will be thrown for sql
```sql
select k5 / k5 as nu, sum(k1) from test group by nu order by nu nulls first
```
```
Caused by: org.apache.doris.nereids.exceptions.AnalysisException: Input slot(s) not in child's output: k5#5 in plan: LogicalProject[176] ( distinct=false, projects=[(cast(k5#5 as DECIMALV3(16, 10)) / k5#5) AS `nu`#14, sum(k1)#15], excepts=[] ), child output is: [nu#16, sum(k1)#15]
plan tree:
LogicalProject[176] ( distinct=false, projects=[(cast(k5#5 as DECIMALV3(16, 10)) / k5#5) AS `nu`#14, sum(k1)#15], excepts=[] )
+--LogicalAggregate[168] ( groupByExpr=[nu#16], outputExpr=[nu#16, sum(k1#1) AS `sum(k1)`#15], hasRepeat=false )
+--LogicalProject[156] ( distinct=false, projects=[k1#1, (cast(k5#5 as DECIMALV3(16, 10)) / k5#5) AS `nu`#16], excepts=[] )
+--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.test, indexName=test, selectedIndexId=503229, preAgg=OFF, Aggregate function sum(k1) contains key column k1. )
at org.apache.doris.nereids.rules.analysis.CheckAfterRewrite.checkAllSlotReferenceFromChildren(CheckAfterRewrite.java:108) ~[classes/:?]
```
in predicate rewrite phase, we eliminate some conjuncts which contains un-interested columns.
for example: T (a, b) partition by (a)
interest cols: a
uninsterest cols: b
for parition prune,
filter "a=1 and a>b" is equivalent to "a=1",
filter "a=1 or a>b" is equivalent to "TRUE"
old error msg:
default value precision: 2023-10-25 14:45:30.292 can not be greater than type precision: DATETIME(1)
new error msg:
default value precision: CURRENT_TIMESTAMP(3) can not be greater than type precision: DATETIME(1)
Follow #25496.
In #25496, I fixed the issue that the aws s3 properties are invalid when passing from FE to BE.
But it missed the `use_path_style` property, which is useful for minio access.
This PR fix it.
This pr do two things:
1. Infer the column name if the column is expression in `select into outfile`. The rule for column name generation can be refered in pr: #24990
2. fix bug that it will core dump if the `_schema` fails to build in the open phase in vorc_transformer.cpp
TODO:
1. Support infer the column name if the column is expression in `select into outfile` in new optimizer(Nereids).
There is FE config `infodb_support_ext_catalog`, the default is false.
Which means that the tables in `information_schema` database will not return info of external catalog.
Because if there are too many external catalogs in Doris with lots of db/tbl (like running p0 regression tests),
querying infomation_schema db will take a long time and may causing rpc timeout.
And there is an unresolved issue that if thrift rpc timeout, the BE may be crashed in ASAN mode.
So to avoid this issue(not fix yet), this PR mainly changes:
if `infodb_support_ext_catalog` is false,
1. query info of external catalog in information_schema db is not allowed, such as
show database like "external_catalog";
show tables like "xxx"
2. select * from information_schema.tbl will not contains external catalogs' info
3. For external p0 regression test pipeline, set `infodb_support_ext_catalog` to true to run the tests related to external catalog
for example:
```sql
with a as (with a as (select * from t) select * from a) select * from a;
with a as (select * from t1), b as (with a as (select * from a) select * from a) select * from b;
```
Fix select table tablet not effective, table distributed by random.
If tabletID specified in query does not exist in this partition, skip scan partition.
extract slot and literal in comparison predicate. infer new one by equals predicates.
use TypeCoercion to add cast on new comparison predicate to ensure it is correct.
This reverts "[Fix](Nereids) Add cast comparison with slot reference when inferring predicate (#21171)"
commit 58f2593ba1b65713e7b3c1ed39fc84be8cc3ff2c.