cast(A as date), where A is a string column. the min/max of result column stats should be calc like this:
convert A.minExpr to a date dateA, and then get double value from dateA.
add "explain memo plan select ..." to print memo from mysql client
dump column stats for FileScanNode, used in datalake.
PushdownJoinOtherCondition will pushdown expression in condition into project, it will block JoinReorder, so we need to pushdown project to help JoinReorder
After mergeGroup, the children of the plan are different from GroupExpr. To avoid optimizing out-dated group, we should construct new plan with groupExpr's children rather than plan's children
Use file system type and Conf as key to cache remote file system.
This could avoid get a new file system for each external table partition's location.
The time cost for fetching 100000 partitions with 1 file for each partition is reduced to 22s from about 15 minutes.
Support hudi time travel in external table:
```
select * from hudi_table for time as of '20230712221248';
```
PR(https://github.com/apache/doris/pull/15418) supports to take timestamp or version as the snapshot ID in iceberg, but hudi only has timestamp as the snapshot ID. Therefore, when querying hudi table with `for version as of`, error will be thrown like:
```
ERROR 1105 (HY000): errCode = 2, detailMessage = Hudi table only supports timestamp as snapshot ID
```
The supported formats of timestamp in hudi are: 'yyyy-MM-dd HH:mm:ss[.SSS]' or 'yyyy-MM-dd' or 'yyyyMMddHHmmss[SSS]', which is consistent with the [time-travel-query.](https://hudi.apache.org/docs/quick-start-guide#time-travel-query)
## Partitioning Strategies
Before this PR, hudi's partitions need to be synchronized to hive through [hive-sync-tool](https://hudi.apache.org/docs/syncing_metastore/#hive-sync-tool), or by setting very complex synchronization parameters in [spark conf](https://hudi.apache.org/docs/syncing_metastore/#sync-template). These processes are exceptionally complex and unnecessary, unless you want to query hudi data through hive.
In addition, partitions are changed in time travel. We cannot guarantee the correctness of time travel through partition synchronization.
So this PR directly obtain partitions by reading hudi meta information. Caching and updating table partition information through hudi instant timestamp, and reusing Doris' partition pruning.
Problem:
When doing show export jobs, limit would execute before sort before changed. So the result would not be expected because limit always cut results first and we can not get what we want.
Example:
we having export job1 and job2 with JobId1 > JobId2. We want to get job with JobId1
show export from db order by JobId desc limit 1;
We do limit 1 first, so we would probably get Job2 because JobId assigned from small to large
Solve:
We can not cut results first if we have order by clause. And cut result set after sorting
Currently we find that MetastoreEventsProcessor can not catch up the event producing rate in our cluster, so we need to merge some hms events every round.
add session var: MAX_JOIN_NUMBER_BUSHY_TREE, default is 5
if table number is less than MAX_JOIN_NUMBER_BUSHY_TREE in a join cluster, nereids try bushy tree, o.w. zigzag tree
Add new rule named "OrToIn", used to convert multi equalTo which has same slot and compare to a literal of disjunction to a InPredicate so that it could be pushdown to storage engine.
for example:
```sql
col1 = 1 or col1 = 2 or col1 = 3 and (col2 = 4)
col1 = 1 and col1 = 3 and col2 = 3 or col2 = 4
(col1 = 1 or col1 = 2) and (col2 = 3 or col2 = 4)
```
would be converted to
```sql
col1 in (1, 2) or col1 = 3 and (col2 = 4)
col1 = 1 and col1 = 3 and col2 = 3 or col2 = 4
(col1 in (1, 2) and (col2 in (3, 4)))
```
Before this PR, if user connect to follower and analyze table, stats would not get cached in follower FE, since Analyze stmt would be forwarded to master, and in follower it's still lazy load to cache.After this PR, once analyze finished on master, master would sync stats to all followers and update follower's stats cache
Load partition stats to col stats
Enlarge jetty_server_max_http_header_size to avoid Request Header Fields
Too Large error when streamloading to FE.
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
1. expand the semantics of variable strict_mode to control the behavior for stream load: if strict_mode is true, the stream load can only update existing rows; if strict_mode is false, the stream load can insert new rows if the key is not present in the table
2. when inserting a new row in non-strict mode stream load, the unmentioned columns should have default value or be nullable
The cost estimation can be more accurate if the statistics of partition are available. But we are running big data like 1T, can not really import.
So now we want to extend this by injecting partition statistics.
Syntax:
ALTER TABLE table_name MODIFY COLUMN column_name SET STATS ('stat_name' = 'stat_value', ...)
[ PARTITION (partition_name) ];
Explanation:
- Table_name: The table to which the statistics are dropped. It can be a db_name.table_name form.
Column_name: Specified target column. table_name Must be a column that exists in. Statistics can only be modified one column at a time.
- Stat _ name and stat _ value: The corresponding stat name and the value of the stat info. Multiple stats are comma separated. Statistics that can be modified include row_count, ndv, num_nulls min_value max_value, and data_size.
- Partition_name: specifies the target partition. Must be a partition existing in table_name. Multiple partitions are separated by commas.
after we forbid some cases off agg candidate plans,
all local phase agg require DistributionSpecAny for child.
So, we could enable parallel scan for it
Add alias name for system variable to fix the col name is the values of system variable like:
```
mysql> select @@character_set_client;
+--------+
| 'utf8' |
+--------+
| utf8 |
+--------+
==================================
mysql> select @@character_set_client;
+------------------------+
| @@character_set_client |
+------------------------+
| utf8 |
+------------------------+
```