skip null partition when get base tablets for each be (for further usage
in dedup updated row count in MV) This may cause publish fail
cherry pick master #35475
Previously, FE logs were written to files. The main FE logs include
fe.log, fe.warn.log, fe.audit.log, fe.out, and fe.gc.log.
In a K8s deployment environment, logs usually need to be output to
standard output, and then other components process the log stream.
This PR made the following changes:
1. Modified the log4j configuration template
- When started with `--daemon`, logs are still written to various files,
and the format remains unchanged.
- When started with `--console`, all logs are output to standard output
and marked with different prefixes:
- `StdoutLogger`: logs for standard output
- `StderrLogger`: logs for standard error output
- `RuntimeLogger`: logs for fe.log or fe.warn.log
- `AuditLogger:` logs for fe.audit.log
- No prefix: logs for fe.gc.log
Examples are as follows:
```
RuntimeLogger 2024-06-03 14:54:51,229 INFO (binlog-gcer|62)
[BinlogManager.gc():359] begin gc binlog
```
2. Added a new FE config: `enable_file_logger`
Defaults to true. Indicates that logs will be recorded to files
regardless of the startup method. For example, if it is started with
`--console`, the log will be output to both the file and the standard
output. If it is `false`, the log will not be recorded in the file
regardless of the startup method.
3. Optimized the log format of standard output
The byte streams of stdout and stderr are captured. The logs previously
outputted using `System.out` will be captured in fe.log for unified
management.
cherry-pick #34313 to branch-2.1
MergePercentileToArray is to perform a transformation in this case:
select ss_item_sk, percentile(ss_quantity,0.9), percentile(ss_quantity,0.6), percentile(ss_quantity,0.3)
from store_sales group by ss_item_sk;
==>
select ss_item_sk, percentile_array(ss_quantity,[0.3,0.6,0.9]) from store_sales group by ss_item_sk;
support data type ipv4/ipv6 with inverted index
and then we can query like "> or < or >= or <= or in/not in " this
conjuncts expr for ip with inverted index speeding up
1. compute signature should call super#computeSignature first
2. fold const return type not changed after signature changed in #26827
we already have p0 for this case, but our regression framework has bug
that it report success when compare decimal type if real result lose
scale
backport: #35690
`PropertyConverter.setS3FsAccess` has add customized s3 providers:
```
public static final List<String> AWS_CREDENTIALS_PROVIDERS = Arrays.asList(
DataLakeAWSCredentialsProvider.class.getName(),
TemporaryAWSCredentialsProvider.class.getName(),
SimpleAWSCredentialsProvider.class.getName(),
EnvironmentVariableCredentialsProvider.class.getName(),
IAMInstanceCredentialsProvider.class.getName());
```
And these providers are set as configuration value of
`fs.s3a.aws.credentials.provider`, which will be used as configuration
to build s3 reader in JNI readers. However,
`DataLakeAWSCredentialsProvider` is in `fe-core`, that is not dependent
by JNI readers, so we have to move s3 providers to `fe-common'.
## Proposed changes
when set a wrong session variable, eg:
mysql [(none)]>set enable_profileXXXXXXX=true;
ERROR 1228 (HY000): errCode = 2, detailMessage = Unknown system variable
'enable_profileXXXXXXX', the similar variables are {'enable_profile',
'enable_force_spill', 'enable_projection'}
<!--Describe your changes.-->
Cherry-pick #35636.
The ccr-syncer does not support syncing temporary partitions, so this PR
adds a field to record whether this upsert record comes from a temporary
partition.
## Proposed changes
This pull request updates the function signatures where VarcharLiteral
is currently used, replacing it with StringLikeLiteral. This change aims
to enhance flexibility and consistency across functions that handle
similar types of string data. By adopting StringLikeLiteral, we can
support a broader range of string-like types beyond the basic VARCHAR
type, facilitating more robust and versatile string handling
capabilities in our codebase. This update ensures better type
abstraction and promotes code reusability.
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
## Proposed changes
Issue Number: close#31442
(Fix) [hive-writer] Fixed the issue when partition values contain spaces
when writing to s3.
### Error msg
```
org.apache.doris.common.UserException: errCode = 2, detailMessage = java.net.URISyntaxException: Illegal character in path at index 114: oss://xxxxxxxxxxx/hive/tpcds1000_partition_oss/call_center/cc_call_center_sk=1/cc_mkt_class=A bit narrow forms matter animals. Consist/cc_market_manager=Daniel Weller/cc_rec_end_date=2001-12-31/f6b5ff4253414b06-9fd365ef68e5ddc5_133f02fb-a7e0-4109-9100-fb748a28259e-0.zlib.orc
at org.apache.doris.common.util.S3URI.validateUri(S3URI.java:134) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.common.util.S3URI.parseUri(S3URI.java:120) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.common.util.S3URI.<init>(S3URI.java:116) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.common.util.S3URI.create(S3URI.java:108) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.fs.obj.S3ObjStorage.deleteObject(S3ObjStorage.java:194) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.fs.remote.ObjFileSystem.delete(ObjFileSystem.java:150) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.fs.remote.SwitchingFileSystem.delete(SwitchingFileSystem.java:92) ~[doris-fe.jar:1.2-
```
### Root Cause
Hadoop partition names will encode some special characters, but not
space characters, which is different from URI encoding. Therefore, an
error will be reported when constructing URI.
### Solution
The solution is to use regular expressions to parse URI, and then pass
in each part of URI to construct URI. This URI constructor will encode
each part of URI.
## Proposed changes
Issue #31442
<!--Describe your changes.-->
1. The unit of the seventh parameter of `ZonedDateTime.of` is
nanosecond, so we should multiply the microsecond by 1000.
2. When writing to a non-partitioned iceberg table, the data path has an
extra slash
Improve the performance from two points, one is optimize decide model
method and another is to reuse the mv struc info:
1. Instead of use java.util.List#containsAll by
java.util.Set#containsAll in method
AbstractMaterializedViewRule#decideMatchMode
2. Reuse the mv struct info in different query, because mv struct info
is immutable.
Notes: tableBitSet in struct info is relevant to the statementContext
in cascadesContext, if reuse the mv struct info for different query,
we should re generate table bitset and construct new struct info with
method StructInfo#withTableBitSet
pick from master #35112
Functions supported by Doris need to be configured through Custom,
otherwise it will throw exception Can not found function 'xxx'
pick from master #34548
The modification involving CloudGlobalTransactionMgr was not picked up
to 2.1 because the 2.1 branch does not yet have the Thunderbolt
CloudGlobalTransactionMgr
pick from master #35586
This is a temporary solution. In order to avoid affecting the existing
backup function, the backup MTMV will be allowed after detailed design
## Proposed changes
Change `use_cnt` mechanism for incremental (auto partition) channels and
streams, it's now dynamically counted.
Use `close_wait()` of regular partitions as a synchronize point to make
sure all sinks are in close phase before closing any incremental (auto
partition) channels and streams.
Add dummy (fake) partition and tablet if there is no regular partition
in the auto partition table.
Backport #35287
Co-authored-by: zhaochangle <zhaochangle@selectdb.com>