1. LOG sql when analyze failed
2. Return directly for analyze_test suite when there is more than one frontend
3. Set query_timeout for tpcds suites to avoid unneccessary failed caused by analyze sync
complex predicate in delete stmt like:
```sql
delete from t1 where t1.id in (select id from t2);
```
will be replaced to an insert stmt.
```sql
insert into t1(id, __DORIS_DELETE_SIGN__) select id, 1 from t1 where id in (select id from t2);
```
currently, insert into a table creating a mv will rise an exception, we fix it by use the create mv action to ensure when we insert to a table, it will not be creating a mv.
* [Improve](dynamic schema) support filtering invalid data
1. Support dynamic schema to filter illegal data.
2. Expand the regular expression for ColumnName to support more column names.
3. Be compatible with PropertyAnalyzer and support legacy tables.
4. Default disable parse multi dimenssion array, since some bug unresolved
In doris regression-test/suites, a lot of test cases quit immediately only if "FINISHED", otherwise they will wait till timeout. For example:
while (max_try_secs--) {
String res = getJobState(tbName1)
if (res == "FINISHED") {
sleep(3000)
break
} else {
Thread.sleep(1000)
if (max_try_secs < 1) {
println "test timeout," + "state:" + res
assertEquals("FINISHED", res)
}
}
}
This PR added checks so that these test cases can quit immediately also if "CANCELLED", which is the only unchanging status other than "FINISHED".
1. query cache for chinese tokenizer is confusing when just converting w_char to char.
2. seperate query_type from inverted_index_reader to clean code.
In the previous implementation, the check for groupby exprs was ignored. Add this necessary check to make sure it would work
You could reproduce it by runnning belowing sql:
CREATE TABLE t_push_filter_through_agg (col1 varchar(11451) not null, col2 int not null, col3 int not null)
UNIQUE KEY(col1)
DISTRIBUTED BY HASH(col1)
BUCKETS 3
PROPERTIES(
"replication_num"="1"
);
CREATE VIEW `view_i` AS
SELECT
`b`.`col1` AS `col1`,
`b`.`col2` AS `col2`
FROM
(
SELECT
`col1` AS `col1`,
sum(`cost`) AS `col2`
FROM
(
SELECT
`col1` AS `col1`,
sum(CAST(`col3` AS INT)) AS `cost`
FROM
`t_push_filter_through_agg`
GROUP BY
`col1`
) a
GROUP BY
`col1`
) b;
SELECT SUM(`total_cost`) FROM view_a WHERE `dt` BETWEEN '2023-06-12' AND '2023-06-18' LIMIT 1;
fe foldconstRule make array() function expr with const literal , and would not pass this array literal to be . but we should make fe array string output format is same with be array string output
Problem:
when use select group_concat(distinct a, 'seg1'), group_concat(distinct b, 'seg2') ... Error would rised
Reason:
Group_concat function regard 'seg' as arguments also, so multi distinct column error would rised
Solved:
let Multi Distinct group_concat function only get first argument as real argument
in hash join condition, some equals are trustable, some are not.
an equal is trustable if one side is almost unique, like primary key. for such equal condition we could estimate more accurate.
the problem is in rewriten q20, the are 2 equal condition, one is trustable, another is not. But we treat both of them as trustable.
Test result:
on tpch100, from 2.2 sec to 0.44 sec
no impact on tpch other queries
no performance impact on tpcds queries