Previously, there are 2 members: TableScanParams and IncrementalRelation
in HMSExternalTable.
These 2 members are for Hudi's incremental query, so their lifecycle
should be with query task,
should not be saved in HMSExternalTable.
This PR mainly changes:
- Add LogicalHudiScan and PhysicalHudiScan, extends from LogicalFileScan
and PhysicalFileScan.
- Move TableScanParams and IncrementalRelation from HMSExternalTable to
XXXHudiScan.
- Add or modify related Nereids rules
Also move the analysis exception of "Not support insert with partition
spec in hive catalog."
from create sink phase to bind sink phase.
So that when `set enable_fallback_to_original_planner=false;`, the
return error will be correct.
skip null partition when get base tablets for each be (for further usage
in dedup updated row count in MV) This may cause publish fail
cherry pick master #35475
Previously, FE logs were written to files. The main FE logs include
fe.log, fe.warn.log, fe.audit.log, fe.out, and fe.gc.log.
In a K8s deployment environment, logs usually need to be output to
standard output, and then other components process the log stream.
This PR made the following changes:
1. Modified the log4j configuration template
- When started with `--daemon`, logs are still written to various files,
and the format remains unchanged.
- When started with `--console`, all logs are output to standard output
and marked with different prefixes:
- `StdoutLogger`: logs for standard output
- `StderrLogger`: logs for standard error output
- `RuntimeLogger`: logs for fe.log or fe.warn.log
- `AuditLogger:` logs for fe.audit.log
- No prefix: logs for fe.gc.log
Examples are as follows:
```
RuntimeLogger 2024-06-03 14:54:51,229 INFO (binlog-gcer|62)
[BinlogManager.gc():359] begin gc binlog
```
2. Added a new FE config: `enable_file_logger`
Defaults to true. Indicates that logs will be recorded to files
regardless of the startup method. For example, if it is started with
`--console`, the log will be output to both the file and the standard
output. If it is `false`, the log will not be recorded in the file
regardless of the startup method.
3. Optimized the log format of standard output
The byte streams of stdout and stderr are captured. The logs previously
outputted using `System.out` will be captured in fe.log for unified
management.
cherry-pick #34313 to branch-2.1
MergePercentileToArray is to perform a transformation in this case:
select ss_item_sk, percentile(ss_quantity,0.9), percentile(ss_quantity,0.6), percentile(ss_quantity,0.3)
from store_sales group by ss_item_sk;
==>
select ss_item_sk, percentile_array(ss_quantity,[0.3,0.6,0.9]) from store_sales group by ss_item_sk;
support data type ipv4/ipv6 with inverted index
and then we can query like "> or < or >= or <= or in/not in " this
conjuncts expr for ip with inverted index speeding up
1. compute signature should call super#computeSignature first
2. fold const return type not changed after signature changed in #26827
we already have p0 for this case, but our regression framework has bug
that it report success when compare decimal type if real result lose
scale
backport: #35690
`PropertyConverter.setS3FsAccess` has add customized s3 providers:
```
public static final List<String> AWS_CREDENTIALS_PROVIDERS = Arrays.asList(
DataLakeAWSCredentialsProvider.class.getName(),
TemporaryAWSCredentialsProvider.class.getName(),
SimpleAWSCredentialsProvider.class.getName(),
EnvironmentVariableCredentialsProvider.class.getName(),
IAMInstanceCredentialsProvider.class.getName());
```
And these providers are set as configuration value of
`fs.s3a.aws.credentials.provider`, which will be used as configuration
to build s3 reader in JNI readers. However,
`DataLakeAWSCredentialsProvider` is in `fe-core`, that is not dependent
by JNI readers, so we have to move s3 providers to `fe-common'.