This pull request modifies the index_id type in inverted index storage
format v2 to int64_t. The index_id is now stored in the inverted index
file using 4 bytes.
## Proposed changes
This PR enable `delete sub predicate v2` for compaction, and legacy
version of delete predicate will be processed in the original way.
add logs for partial update
the master PR is #35802
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
support data type ipv4/ipv6 with inverted index
and then we can query like "> or < or >= or <= or in/not in " this
conjuncts expr for ip with inverted index speeding up
## Proposed changes
1. return error when bloom filter allocate memory failed
2. return error when deserialize a block, it may need a lot of memory.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
## Proposed changes
Issue Number: close #xxx
cherry-pick #31268
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
## Proposed changes
should display the load progress info, so the user could know it loading
step.
```
JobId: 49088
Label: rpt_10002184_syqzzywqkb10
State: FINISHED
Progress: 100.00% (10/10)
```
<!--Describe your changes.-->
## Proposed changes
Issue #31442
<!--Describe your changes.-->
1. The unit of the seventh parameter of `ZonedDateTime.of` is
nanosecond, so we should multiply the microsecond by 1000.
2. When writing to a non-partitioned iceberg table, the data path has an
extra slash
Follow-up for #35466.
We should assure closed tasks will not block other tasks.
## Proposed changes
Issue Number: close #xxx
<!--Describe your changes.-->
## Proposed changes
Change `use_cnt` mechanism for incremental (auto partition) channels and
streams, it's now dynamically counted.
Use `close_wait()` of regular partitions as a synchronize point to make
sure all sinks are in close phase before closing any incremental (auto
partition) channels and streams.
Add dummy (fake) partition and tablet if there is no regular partition
in the auto partition table.
Backport #35287
Co-authored-by: zhaochangle <zhaochangle@selectdb.com>
If there are duplicated expressions in the select list, the result will
be incorrect.
## Proposed changes
Issue Number: close#28438
<!--Describe your changes.-->
`float` and `double` is not allowed to build inverted index.
We remove them in `inverted_index_writer` to keep consistent with FE.
And to avoid unnecessary exception.
Co-authored-by: Luennng <luennng@gmail.com>
ubsan hints:
```c++
/root/doris/be/src/olap/hll.h:93:29: runtime error: load of value 3078029312, which is not a valid value for type 'HllDataType'
/root/doris/be/src/olap/hll.h:94:23: runtime error: load of value 3078029312, which is not a valid value for type 'HllDataType'
/root/doris/be/src/runtime/descriptors.h:439:38: runtime error: load of value 118, which is not a valid value for type 'bool'
/root/doris/be/src/vec/exec/vjdbc_connector.cpp:61:50: runtime error: load of value 35, which is not a valid value for type 'bool'
```
Issue Number: close #xxx
<!--Describe your changes.-->
If a pipeline task is cancelled by another thread during executing
`extract_dependencies`, dependencies will be accessed by different
read/write threads which will lead to serious result.
## Proposed changes
Issue Number: close #xxx
<!--Describe your changes.-->
## Proposed changes
Issue Number: close #xxx
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
- Add 2 new BE config
- `s3_read_base_wait_time_ms` and `s3_read_max_wait_time_ms`
When meet s3 429 error, the "get" request will
sleep `s3_read_base_wait_time_ms (*1, *2, *3, *4)` ms get try again.
The max sleep time is s3_read_max_wait_time_ms
and the max retry time is max_s3_client_retry
- Add more metrics for s3 file reader
- `s3_file_reader_too_many_request`: counter of 429 error.
- `s3_file_reader_s3_get_request`: the QPS of s3 get request.
- `TotalGetRequest`: Get request counter in profile
- `TooManyRequestErr`: 429 error counter in profile
- `TooManyRequestSleepTime`: Sum of sleep time after 429 error in profile
- `TotalBytesRead`: Total bytes read from s3 in profile
do not process rf on HashJoinBuildSinkLocalState::close when query
```cpp
*** Query id: ee97f0c64a76436b-babc251c7d6702fb ***
*** is nereids: 1 ***
*** tablet id: 0 ***
*** Aborted at 1716780426 (unix time) try "date -d @1716780426" if you are using GNU date ***
*** Current BE git commitID: 813074b ***
*** SIGSEGV address not mapped to object (@0x0) received by PID 12924 (TID 15847 OR 0x7efbe5aa5700) from PID 0; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/doris/be/src/common/signal_handler.h:421
1# PosixSignals::chained_handler(int, siginfo_t*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
3# 0x00007F064FF1C090 in /lib/x86_64-linux-gnu/libc.so.6
4# doris::BloomFilterFuncBase::merge(doris::BloomFilterFuncBase*) at /root/doris/be/src/exprs/bloom_filter_func.h:169
5# doris::RuntimePredicateWrapper::merge(doris::RuntimePredicateWrapper const*) at /root/doris/be/src/exprs/runtime_filter.cpp:507
6# doris::IRuntimeFilter::merge_from(doris::RuntimePredicateWrapper const*) at /root/doris/be/src/exprs/runtime_filter.cpp:1497
7# doris::IRuntimeFilter::publish(bool)::$_2::operator()() const in /home/work/unlimit_teamcity/TeamCity/Agents/20240527104837agent_172.16.0.93_1/work/60183217f6ee2a9c/output/be/lib/doris_be
8# doris::IRuntimeFilter::publish(bool) at /root/doris/be/src/exprs/runtime_filter.cpp:1015
9# doris::VRuntimeFilterSlots::publish(bool) at /root/doris/be/src/exprs/runtime_filter_slots.h:137
10# doris::pipeline::HashJoinBuildSinkLocalState::close(doris::RuntimeState*, doris::Status) in /home/work/unlimit_teamcity/TeamCity/Agents/20240527104837agent_172.16.0.93_1/work/60183217f6ee2a9c/output/be/lib/doris_be
11# doris::pipeline::DataSinkOperatorXBase::close(doris::RuntimeState*, doris::Status) at /root/doris/be/src/pipeline/exec/operator.h:491
12# doris::pipeline::PipelineTask::close(doris::Status) at /root/doris/be/src/pipeline/pipeline_task.cpp:436
13# doris::pipeline::_close_task(doris::pipeline::PipelineTask*, doris::Status) at /root/doris/be/src/pipeline/task_scheduler.cpp:88
14# doris::pipeline::TaskScheduler::_do_work(unsigned long) in /home/work/unlimit_teamcity/TeamCity/Agents/20240527104837agent_172.16.0.93_1/work/60183217f6ee2a9c/output/be/lib/doris_be
15# doris::ThreadPool::dispatch_thread() at /root/doris/be/src/util/threadpool.cpp:551
16# doris::Thread::supervise_thread(void*) at /root/doris/be/src/util/thread.cpp:499
17# start_thread at /build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:478
18# __clone at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
```
## Proposed changes
Issue Number: close #xxx
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...