Add materialized view availability regression test
when mv refresh_time is in the grace_period(unit is second), materialized view will be use to
query rewrite regardless of the base table is update or not
when mv refresh_time is out of the grace_period(unit is second), will check the base table is update or not
if update the materialized view will not be used to query rewrite
[Opt] (multi-catalog) Opt split assignment to resolve uneven distribution. Currently only for `FileQueryScanNode`.
Referring to the implementation of Trino,
- Local node soft affinity optimization. Prefer local replication node.
- Remote split will use the consistent hash algorithm is used when the file cache is turned on, and because of the possible unevenness of the consistent hash, the split is re-adjusted so that the maximum and minimum split numbers of hosts differ by at most `max_split_num_variance` split.
- Remote split will use the round-robin algorithm is used when the file cache is turned off.
materialized view definition is as following, and the query sql is the same
when outer group by use the col1 in the inner group, which can be rewritten by materialized view
select
t1.o_orderdate,
t1.o_orderkey,
t1.col1
from
(
select
o_orderkey,
o_custkey,
o_orderstatus,
o_orderdate,
sum(o_shippriority) as col1
from
orders
group by
o_orderkey,
o_custkey,
o_orderstatus,
o_orderdate
) as t1
left join lineitem on lineitem.l_orderkey = t1.o_orderkey
group by
t1.o_orderdate,
t1.o_orderkey,
t1.col1
This PR makes the following changes to the connection pool of JDBC Catalog
1. Set the maximum connection survival time, the default is 30 minutes
- Moreover, one-half of the maximum survival time is the recyclable time,
- One-tenth is the check interval for recycling connections
2. Keepalive only takes effect on the connection pool on BE, and will be activated based on one-fifth of the maximum survival time.
3. The maximum number of existing connections is changed from 100 to 10
4. Add the connection cache recycling thread on BE, and add a parameter to control the recycling time, the default is 28800 (8 hours)
5. Add CatalogID to the key of the connection pool cache to achieve better isolation, requires refresh catalog to take effect
6. Upgrade druid connection pool to version 1.2.20
7. Added JdbcResource's setting of default parameters when upgrading the FE version to avoid errors due to unset parameters.
If there are too many backup/restore jobs, it may cause OOM. This PR allows the user to skip all backup/restore jobs if max_backup_restore_job_num_per_db is set to 0.
* add left anti join ut
* forbidden the self join partition column get
* [Fix](nereids) Disable getting partition related table and column when self join
* fix code style
Issue Number: #30484
The objects stored in PriorityQueue must implement the Comparable interface or passed into the customized `Comparator`.
If we don't do this, run the program in the JDK17 environment will report an exception:
```java
Caused by: java.lang.AssertionError: Expect exception msg contains 'query wait timeout', but meet
'java.sql.SQLException: ClassCastException,
msg: class org.apache.doris.resource.workloadgroup.QueueToken cannot be cast to class java.lang.Comparable
(org.apache.doris.resource.workloadgroup.QueueToken is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')'
```
When a partition in OlapTable is removed, we should use partition id to delete the related stats record in column_statistics. Before, it was using id, which may cause delete useful stats of other partition.
1. Skip parquet file which has only 4 bytes length: PAR1
2. Refactor the schema init method of iceberg/hudi/hive table in hms catalog
1. Remove some redundant methods of `getIcebergTable`
2. Fix issue described in #23771
3. Support HoodieParquetInputFormatBase, treat it as normal hive table format
4. When listing file, skip all hidden dirs and files