computeColumnsFilter compute filter on all table base schema's column.
However, it table is very wide, such as 5000 columns. It will take a
long time. This PR compare conjuncts size and columns size. If conjuncts
size is small than columns size, then collect slots from conjuncts to
avoid traverse all columns.
1. Fix auto analyze external table recursively load schema cache bug.
2. Move some function in StatisticsAutoAnalyzer class to TableIf. So that external table and internal table could implement the logic separately.
3. Disable external catalog auto analyze by default, could open it by adding catalog property "enable.auto.analyze"="true"
Before, show column stats will ignore column with error.
In this pr, when min or max value failed to deserialize, show column stats will use N/A as value of min or max, and still show the rest stats. (count, null_count, ndv and so on).
Fix of PR #23582
Some Fe codes are deleted by [Improvement](pipeline) Cancel outdated query if original fe restarts #23582 , need to be added back;
Fix mac build failed caused by wrong thrift declaration order.
1. "set" may overwrite the original ID.
2.A bloom filter may not necessarily be an IN_OR_BLOOM_FILTER.
before may be
RuntimeFilterInfo id -1: [type = BF, input = 25, filtered = 0]
now
RuntimeFilterInfo id 0: [type = BF, input = 25, filtered = 0]
Current initialization dependency:
Daemon ───┬──► StorageEngine ──► ExecEnv ──► Disk/Mem/CpuInfo
│
│
BackendService ─┘
However, original code incorrectly initialize Daemon before StorageEngine.
This PR also stop and join threads of daemon services in their dtor, to ensure Daemon services release resources in reverse order of initialization via RAII.
1. When file meta cache is disabled (by setting `max_external_file_meta_cache_num=0` in be.conf),
the parquet's meta info is owned by parquet reader and will be released when calling `reader->close()`.
But the underlying file reader of this parquet reader will be released after `reader->close()`,
this may causing `heap-use-after-free` bug because some part of meta info may be referenced by file reader.
This PR fix it by making sure that meta info is released after file reader released.
2. Add modification time for file meta cache in BE, to avoid parquet read error like:
`Failed to deserialize parquet page header`
1. add scalar subquery's output to LogicalApply's output
2. for in and exists subquery's, add mark join slot into LogicalApply's output
3. forbid push down alias through join if the project list have any mark join slots.
4. move normalize aggregate rule to analysis phase
The reason is that sql cache just use partitionKey , latestVersion and latestTime to check if the cache should be returned, if we delete some partition(s) which is not the latest updated partition, all above values are not changed, so the cache will hit.
Use a field to save the partition num of these tables and sum the partition nums and send it to BE, there are two situations which contains delete-partition ops:
- just delete some partition(s), so the sum of partition num will be lower than before.
- delete some partition(s) coexists with add some partition(s), so the latest time or latest version will be higher than before.
if resolve a inline view column failed, we try to resolve it again by removing the table name. But it's wrong if the table name(may be the inlineview's alias) is same as some table name inside inlineview. So this pr check the table name, and only remove it when there is no table inside the inlineview has the same name with the column's table name
for example
```sql
WITH A AS (SELECT * FROM B)
SELECT * FROM C
UNION
SELECT * FROM D
```
the scope of CTE in Nereids is the first set oeprand.
the scope of CTE in legacy planner is the whole statement.
log error:
`W20230830 11:19:47.495721 3046231 status.h:363] meet error status: [INTERNAL_ERROR]user function's name should be function_id.checksum[.file_name].file_type, now the all split parts are by delimiter(.): 7119053928154065546.20c8228267b6c9ce620fddb39467d3eb.postgresql-42.5.0.jar`
When the jdbc driver had `.` in its name we failed to split it properly