This pull request modifies the index_id type in inverted index storage
format v2 to int64_t. The index_id is now stored in the inverted index
file using 4 bytes.
## Proposed changes
This PR enable `delete sub predicate v2` for compaction, and legacy
version of delete predicate will be processed in the original way.
add logs for partial update
the master PR is #35802
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
Previously, FE logs were written to files. The main FE logs include
fe.log, fe.warn.log, fe.audit.log, fe.out, and fe.gc.log.
In a K8s deployment environment, logs usually need to be output to
standard output, and then other components process the log stream.
This PR made the following changes:
1. Modified the log4j configuration template
- When started with `--daemon`, logs are still written to various files,
and the format remains unchanged.
- When started with `--console`, all logs are output to standard output
and marked with different prefixes:
- `StdoutLogger`: logs for standard output
- `StderrLogger`: logs for standard error output
- `RuntimeLogger`: logs for fe.log or fe.warn.log
- `AuditLogger:` logs for fe.audit.log
- No prefix: logs for fe.gc.log
Examples are as follows:
```
RuntimeLogger 2024-06-03 14:54:51,229 INFO (binlog-gcer|62)
[BinlogManager.gc():359] begin gc binlog
```
2. Added a new FE config: `enable_file_logger`
Defaults to true. Indicates that logs will be recorded to files
regardless of the startup method. For example, if it is started with
`--console`, the log will be output to both the file and the standard
output. If it is `false`, the log will not be recorded in the file
regardless of the startup method.
3. Optimized the log format of standard output
The byte streams of stdout and stderr are captured. The logs previously
outputted using `System.out` will be captured in fe.log for unified
management.
cherry-pick #34313 to branch-2.1
MergePercentileToArray is to perform a transformation in this case:
select ss_item_sk, percentile(ss_quantity,0.9), percentile(ss_quantity,0.6), percentile(ss_quantity,0.3)
from store_sales group by ss_item_sk;
==>
select ss_item_sk, percentile_array(ss_quantity,[0.3,0.6,0.9]) from store_sales group by ss_item_sk;
support data type ipv4/ipv6 with inverted index
and then we can query like "> or < or >= or <= or in/not in " this
conjuncts expr for ip with inverted index speeding up
## Proposed changes
Issue Number: close #xxx
<!--Describe your changes.-->
This reverts commit #35641 because of the compilation of such is not
successful on arm plateform.
1. compute signature should call super#computeSignature first
2. fold const return type not changed after signature changed in #26827
we already have p0 for this case, but our regression framework has bug
that it report success when compare decimal type if real result lose
scale
## Proposed changes
1. return error when bloom filter allocate memory failed
2. return error when deserialize a block, it may need a lot of memory.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
backport: #35690
`PropertyConverter.setS3FsAccess` has add customized s3 providers:
```
public static final List<String> AWS_CREDENTIALS_PROVIDERS = Arrays.asList(
DataLakeAWSCredentialsProvider.class.getName(),
TemporaryAWSCredentialsProvider.class.getName(),
SimpleAWSCredentialsProvider.class.getName(),
EnvironmentVariableCredentialsProvider.class.getName(),
IAMInstanceCredentialsProvider.class.getName());
```
And these providers are set as configuration value of
`fs.s3a.aws.credentials.provider`, which will be used as configuration
to build s3 reader in JNI readers. However,
`DataLakeAWSCredentialsProvider` is in `fe-core`, that is not dependent
by JNI readers, so we have to move s3 providers to `fe-common'.
## Proposed changes
when set a wrong session variable, eg:
mysql [(none)]>set enable_profileXXXXXXX=true;
ERROR 1228 (HY000): errCode = 2, detailMessage = Unknown system variable
'enable_profileXXXXXXX', the similar variables are {'enable_profile',
'enable_force_spill', 'enable_projection'}
<!--Describe your changes.-->
Cherry-pick #35636.
The ccr-syncer does not support syncing temporary partitions, so this PR
adds a field to record whether this upsert record comes from a temporary
partition.
## Proposed changes
Issue Number: close #xxx
cherry-pick #31268
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
## Proposed changes
This pull request updates the function signatures where VarcharLiteral
is currently used, replacing it with StringLikeLiteral. This change aims
to enhance flexibility and consistency across functions that handle
similar types of string data. By adopting StringLikeLiteral, we can
support a broader range of string-like types beyond the basic VARCHAR
type, facilitating more robust and versatile string handling
capabilities in our codebase. This update ensures better type
abstraction and promotes code reusability.
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
## Proposed changes
should display the load progress info, so the user could know it loading
step.
```
JobId: 49088
Label: rpt_10002184_syqzzywqkb10
State: FINISHED
Progress: 100.00% (10/10)
```
<!--Describe your changes.-->
## Proposed changes
Issue Number: close#31442
(Fix) [hive-writer] Fixed the issue when partition values contain spaces
when writing to s3.
### Error msg
```
org.apache.doris.common.UserException: errCode = 2, detailMessage = java.net.URISyntaxException: Illegal character in path at index 114: oss://xxxxxxxxxxx/hive/tpcds1000_partition_oss/call_center/cc_call_center_sk=1/cc_mkt_class=A bit narrow forms matter animals. Consist/cc_market_manager=Daniel Weller/cc_rec_end_date=2001-12-31/f6b5ff4253414b06-9fd365ef68e5ddc5_133f02fb-a7e0-4109-9100-fb748a28259e-0.zlib.orc
at org.apache.doris.common.util.S3URI.validateUri(S3URI.java:134) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.common.util.S3URI.parseUri(S3URI.java:120) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.common.util.S3URI.<init>(S3URI.java:116) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.common.util.S3URI.create(S3URI.java:108) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.fs.obj.S3ObjStorage.deleteObject(S3ObjStorage.java:194) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.fs.remote.ObjFileSystem.delete(ObjFileSystem.java:150) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.fs.remote.SwitchingFileSystem.delete(SwitchingFileSystem.java:92) ~[doris-fe.jar:1.2-
```
### Root Cause
Hadoop partition names will encode some special characters, but not
space characters, which is different from URI encoding. Therefore, an
error will be reported when constructing URI.
### Solution
The solution is to use regular expressions to parse URI, and then pass
in each part of URI to construct URI. This URI constructor will encode
each part of URI.
## Proposed changes
Issue #31442
<!--Describe your changes.-->
1. The unit of the seventh parameter of `ZonedDateTime.of` is
nanosecond, so we should multiply the microsecond by 1000.
2. When writing to a non-partitioned iceberg table, the data path has an
extra slash
Follow-up for #35466.
We should assure closed tasks will not block other tasks.
## Proposed changes
Issue Number: close #xxx
<!--Describe your changes.-->
Improve the performance from two points, one is optimize decide model
method and another is to reuse the mv struc info:
1. Instead of use java.util.List#containsAll by
java.util.Set#containsAll in method
AbstractMaterializedViewRule#decideMatchMode
2. Reuse the mv struct info in different query, because mv struct info
is immutable.
Notes: tableBitSet in struct info is relevant to the statementContext
in cascadesContext, if reuse the mv struct info for different query,
we should re generate table bitset and construct new struct info with
method StructInfo#withTableBitSet