Commit Graph

21868 Commits

Author SHA1 Message Date
974263d83c [fix](join) Should not use the build block's size to resize mark_join_flags (#50993) (#51089)
Pick #50993

Introduced by #51050

The build block maybe be `clear_column_mem_not_keep` in build phase when
the operator is closed.

```cpp
Status HashJoinBuildSinkLocalState::close(RuntimeState* state, Status exec_status) {
    if (_closed) {
        return Status::OK();
    }
    auto& p = _parent->cast<HashJoinBuildSinkOperatorX>();
    Defer defer {[&]() {
        if (!_should_build_hash_table) {
            return;
        }
        // The build side hash key column maybe no need output, but we need to keep the column in block
        // because it is used to compare with probe side hash key column

        if (p._should_keep_hash_key_column && _build_col_ids.size() == 1) {
            p._should_keep_column_flags[_build_col_ids[0]] = true;
        }

        if (_shared_state->build_block) {
            // release the memory of unused column in probe stage
            _shared_state->build_block->clear_column_mem_not_keep(p._should_keep_column_flags,
                                                                  p._use_shared_hash_table);
        }

        if (p._use_shared_hash_table) {
            std::unique_lock lock(p._mutex);
            p._signaled = true;
            for (auto& dep : _shared_state->sink_deps) {
                dep->set_ready();
            }
            for (auto& dep : p._finish_dependencies) {
                dep->set_ready();
            }
        }
    }};
```

```
*** Aborted at 1747343165 (unix time) try "date -d @1747343165" if you are using GNU date ***
*** Current BE git commitID: e7a3e78b97 ***
*** SIGSEGV address not mapped to object (@0x1) received by PID 7474 (TID 9641 OR 0x7f3f8c0e5640) from PID 1; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/doris/be/src/common/signal_handler.h:421
 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F4368F76520 in /lib/x86_64-linux-gnu/libc.so.6
 4# doris::Status doris::pipeline::ProcessHashTableProbe<7>::finish_probing > > >(doris::vectorized::MethodKeysFixed > >&, doris::vectorized::MutableBlock&, doris::vectorized::Block*, bool*, bool) at /root/doris/be/src/pipeline/exec/join/process_hash_table_probe_impl.h:738
 5# std::__detail::__variant::__gen_vtable_impl (*)(doris::pipeline::HashJoinProbeOperatorX::pull(doris::RuntimeState*, doris::vectorized::Block*, bool*) const::$_1&&, std::variant > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber, doris::JoinHashTable, HashCRC32 > > >, doris::vectorized::MethodOneNumber, doris::JoinHashTable, HashCRC32 > > >, doris::vectorized::MethodKeysFixed > >, doris::vectorized::MethodKeysFixed, HashCRC32 > > >, doris::vectorized::MethodKeysFixed > >, doris::vectorized::MethodKeysFixed, HashCRC32 > > >, doris::vectorized::MethodStringNoCache > > >&, std::variant, doris::pipeline::ProcessHashTableProbe<2>, doris::pipeline::ProcessHashTableProbe<8>, doris::pipeline::ProcessHashTableProbe<1>, doris::pipeline::ProcessHashTableProbe<4>, doris::pipeline::ProcessHashTableProbe<3>, doris::pipeline::ProcessHashTableProbe<7>, doris::pipeline::ProcessHashTableProbe<9>, doris::pipeline::ProcessHashTableProbe<10>, doris::pipeline::ProcessHashTableProbe<11> >&)>, std::integer_sequence >::__visit_invoke(doris::pipeline::HashJoinProbeOperatorX::pull(doris::RuntimeState*, doris::vectorized::Block*, bool*) const::$_1&&, std::variant > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber, doris::JoinHashTable, HashCRC32 > > >, doris::vectorized::MethodOneNumber, doris::JoinHashTable, HashCRC32 > > >, doris::vectorized::MethodKeysFixed > >, doris::vectorized::MethodKeysFixed, HashCRC32 > > >, doris::vectorized::MethodKeysFixed > >, doris::vectorized::MethodKeysFixed, HashCRC32 > > >, doris::vectorized::MethodStringNoCache > > >&, std::variant, doris::pipeline::ProcessHashTableProbe<2>, doris::pipeline::ProcessHashTableProbe<8>, doris::pipeline::ProcessHashTableProbe<1>, doris::pipeline::ProcessHashTableProbe<4>, doris::pipeline::ProcessHashTableProbe<3>, doris::pipeline::ProcessHashTableProbe<7>, doris::pipeline::ProcessHashTableProbe<9>, doris::pipeline::ProcessHashTableProbe<10>, doris::pipeline::ProcessHashTableProbe<11> >&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/variant:1013
 6# doris::pipeline::HashJoinProbeOperatorX::pull(doris::RuntimeState*, doris::vectorized::Block*, bool*) const at /root/doris/be/src/pipeline/exec/hashjoin_probe_operator.cpp:281
 7# doris::pipeline::StatefulOperatorX::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /root/doris/be/src/pipeline/exec/operator.cpp:670
 8# doris::pipeline::OperatorXBase::get_block_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /root/doris/be/src/pipeline/exec/operator.cpp:381
 9# doris::pipeline::PipelineTask::execute(bool*) in /mnt/hdd01/ci/doris-deploy-master-local/be/lib/doris_be
10# doris::pipeline::TaskScheduler::_do_work(int) at /root/doris/be/src/pipeline/task_scheduler.cpp:144
11# doris::ThreadPool::dispatch_thread() at /root/doris/be/src/util/threadpool.cpp:622
12# doris::Thread::supervise_thread(void*) at /root/doris/be/src/util/thread.cpp:469
13# start_thread at ./nptl/pthread_create.c:442
14# 0x00007F436905A850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
```

Related PR: #xxx

Problem Summary:

None

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change. - [ ] No code files have been
changed. - [ ] Other reason <!-- Add your reason? -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
2025-05-21 12:11:17 +08:00
16b3e5ff18 [improve](information schema) introduce routine load job system table (#48963) (#49286)
pick #48963

Part IV of https://github.com/apache/doris/issues/48511

doc https://github.com/apache/doris-website/pull/2196

**Introduce routine load job statistic system table:**
```
mysql> show create table information_schema.routine_load_job\G
*************************** 1. row ***************************
       Table: routine_load_job
Create Table: CREATE TABLE `routine_load_job` (
  `JOB_ID` text NULL,
  `JOB_NAME` text NULL,
  `CREATE_TIME` text NULL,
  `PAUSE_TIME` text NULL,
  `END_TIME` text NULL,
  `DB_NAME` text NULL,
  `TABLE_NAME` text NULL,
  `STATE` text NULL,
  `CURRENT_TASK_NUM` text NULL,
  `JOB_PROPERTIES` text NULL,
  `DATA_SOURCE_PROPERTIES` text NULL,
  `CUSTOM_PROPERTIES` text NULL,
  `STATISTIC` text NULL,
  `PROGRESS` text NULL,
  `LAG` text NULL,
  `REASON_OF_STATE_CHANGED` text NULL,
  `ERROR_LOG_URLS` text NULL,
  `USER_NAME` text NULL,
  `CURRENT_ABORT_TASK_NUM` int NULL,
  `IS_ABNORMAL_PAUSE` boolean NULL
) ENGINE=SCHEMA;
1 row in set (0.00 sec)
```

**There are some benefits to empower job with SQL query capability for
statistical information:**

- It can be used in conjunction with metrics add through
https://github.com/apache/doris/pull/48209 to roughly locate abnormal
jobs when Grafana alarms, and the following SQL can be used:

```
SELECT JOB_NAME
FROM information_schema.routine_load_job_statistics
WHERE CURRENT_ABORT_TASK_NUM > 0
   OR IS_ABNORMAL_PAUSE = TRUE;
```

- User can use the `select * from information_schema.routine_load_job`
instead of the `show routine load`. The advantage is that the `show
routine load` can only be searched by name, but SQL can be very flexible
in locating jobs

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
2025-05-21 12:10:34 +08:00
5c344ea043 branch-2.1: [opt](docker) add a script flag to control load data or not #51065 (#51083)
Cherry-picked from #51065

Co-authored-by: zgxme <zhenggaoxiong@selectdb.com>
2025-05-21 12:09:07 +08:00
d63815fa57 branch-2.1: [fix](mv) refresh failed while open enable_single_replica_insert #50986 (#51021)
Cherry-picked from #50986

Co-authored-by: camby <cambyzhu@tencent.com>
2025-05-21 12:07:48 +08:00
71deeec294 [conf](fe) Print jvm ClassHistogram in fe gc log after full gc (#44010) (#51007)
* Add `-XX:+PrintClassHistogramAfterFullGC` for JAVA_OPTS
* Add `classhisto*=trace` for JAVA_OPTS_FOR_JDK_17

fe.gc.log will print like this:
```
2024-11-15T11:49:00.316+0800: 11.346: [Class Histogram (after full gc):
 num     #instances         #bytes  class name
----------------------------------------------
   1:          7464        7053464  [B
   2:         37465        3656360  [C
   3:          7076        2909880  [Ljava.lang.Object;
   4:          4915        2306872  [I
   5:          9167        1719552  [S
   6:         16229        1168488  io.grpc.netty.shaded.io.netty.buffer.PoolSubpage
......
```

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [x] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [x] Previous test can cover this change.
        - [x] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
2025-05-21 12:06:49 +08:00
6fe7c9087e branch-2.1: [hotfix](jdbc catalog) Fix jdbcclient repeated initialization (#51038)
pick #51036
2025-05-21 12:05:30 +08:00
b1a71fab2c branch-2.1: [fix](pipelinex) fix null aware left anti join instance num #51053 (#51067)
cherry pick from #51053
2025-05-21 11:51:20 +08:00
3b60e91a0c branch-2.1:[statistics](regression)Add utf-8 encoding regression test. (#50466) (#51016)
backport: https://github.com/apache/doris/pull/50466
2025-05-19 17:30:10 +08:00
edb8a51414 branch-2.1: [fix](nereids) fix create view use null literal #49881 (#51006)
Cherry-picked from #49881

Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
2025-05-19 15:11:32 +08:00
51b39d0992 [fix](join)Consider mark join when computing right_col_idx(#50720) (#50727) 2025-05-19 14:42:15 +08:00
52fa3b5784 branch-2.1: [fix](mtmv)before mtmv refresh,check column type if change (#50730)
pick: https://github.com/apache/doris/pull/49041
2025-05-19 10:30:37 +08:00
967adc3ce1 branch-2.1: [feature](restore) introduce AgentBoundedBatchTask to manage concurrent restore tasks #50740 (#50845)
Cherry-picked from #50740

---------

Co-authored-by: walter <maochuan@selectdb.com>
Co-authored-by: wubiao02 <wubiao02@meituan.com>
2025-05-17 17:23:04 +08:00
5611a3988b [Fix](JsonPath) return null when meet unknown escape sequence, example '$.name\\k' (#50930)
cherry-pick from #50859
2025-05-17 17:21:22 +08:00
127deb6d2a [Fix](Variant) fix array with predicate push down (#50969)
Cherry-pick from https://github.com/apache/doris/pull/50934
2025-05-17 17:19:47 +08:00
080bc8cbbe branch-2.1: [enhancement](compaction) generate multiple compaction tasks each round #49547 (#50991)
Cherry-picked from #49547

Co-authored-by: Luwei <luwei@selectdb.com>
2025-05-17 17:18:51 +08:00
fd378a303b branch-2.1: [Fix](external catalog) where tables in the information_schema could not be displayed #49607 (#50878)
Cherry-picked from #49607

Co-authored-by: shee <13843187+qzsee@users.noreply.github.com>
Co-authored-by: garenshi <garenshi@tencent.com>
2025-05-17 16:21:11 +08:00
35af34f1ce branch-2.1: [fix](binlog) Record rollup index info for alterJob binlog #50850, #50337 (#50874)
cherry pick from #50850, #50337

---------

Co-authored-by: Uniqueyou <wangyixuan@selectdb.com>
2025-05-17 16:19:57 +08:00
505c9af95a [fix](inverted index) fix query error (#50860) (#50909)
pick from master #50860
2025-05-17 16:19:15 +08:00
e3fbf39722 branch-2.1: [fix](jdbc catalog) fix a jdbc catalog npe #50901 (#50928)
Cherry-picked from #50901

Co-authored-by: zy-kkk <zhongyk10@gmail.com>
2025-05-17 16:16:00 +08:00
13fbc9efa6 branch-2.1: [fix](hive) fix write hive partition by Doris #50864 (#50921)
Cherry-picked from #50864

Co-authored-by: Socrates <suxiaogang223@icloud.com>
2025-05-17 16:14:23 +08:00
040cacd635 branch-2.1: [fix](cooldown) allow cooldown_ttl = 0 when altering storage policy #50830 (#50856)
Cherry-picked from #50830

Co-authored-by: Luwei <luwei@selectdb.com>
2025-05-17 09:17:33 +08:00
82d1375dc5 branch-2.1: [fix](Nereids) we should also push down expr in join's mark conjuncts #50886 (#50955)
Cherry-picked from #50886

Co-authored-by: morrySnow <zhangwenxin@selectdb.com>
2025-05-15 22:46:54 +08:00
01f70deb8b branch-2.1: [fix](nereids) fix parse date time exception #50810 (#50900)
cherry pick from #50810
2025-05-14 23:07:15 +08:00
c4d0e1e693 branch-2.1: [fix](job scheduler) specifies both startTime and immediate, it will trigger one fewer task execution #50624 (#50897)
Cherry-picked from #50624

Co-authored-by: zhangdong <zhangdong@selectdb.com>
2025-05-14 22:59:48 +08:00
33df5ba180 2.1.10-rc01 (#50917) 2025-05-14 21:26:28 +08:00
c4e2f05563 branch-2.1: [fix](ut) fix unstable FE ut case for schema change job #50694 (#50887)
Cherry-picked from #50694

Co-authored-by: airborne12 <jiangkai@selectdb.com>
2025-05-14 21:16:20 +08:00
7459764650 [regression-test](fuzzy) accelerate tests by enlarge batch_size in fuzzy (#50861) 2025-05-13 21:44:53 +08:00
2d17330ff6 [regression-test](fix) fix github_events_p2 bug (#50826) 2025-05-13 16:08:09 +08:00
a868fa1976 branch-2.1: [Fix](regression-test) fix test_export_max_file_size case #50795 (#50811)
Cherry-picked from #50795

Co-authored-by: Tiewei Fang <fangtiewei@selectdb.com>
2025-05-13 16:00:55 +08:00
068531129d branch-2.1: [fix](http) remove file before downloading #50754 (#50827)
Cherry-picked from #50754

Co-authored-by: walter <maochuan@selectdb.com>
2025-05-13 15:44:34 +08:00
c5c4b0ac10 branch-2.1: [opt](jdbc scan) Add more jdbc scan profile items #46460 (#50799)
Cherry-picked from #46460

Co-authored-by: zy-kkk <zhongyk10@gmail.com>
2025-05-13 15:43:34 +08:00
9db7a46536 branch-2.1: [bugfix](nerids) align locate function behavior with BE side #50797 (#50832)
Cherry-picked from #50797

Co-authored-by: XLPE <crykix@gmail.com>
2025-05-13 15:19:21 +08:00
550df8f4f1 [fix](case) adjust remote_fragment_exec_timeout_ms to avoid unstable … #50801 (#50807) 2025-05-12 17:20:26 +08:00
0f50cea3d8 branch-2.1: [fix](memory) Fix PODArray::add_num_element (#50785)
pick #50756
2025-05-11 19:14:25 +08:00
b662b346fd branch-2.1: [fix](test) fix regression test same table name in one database #50737 (#50779)
Cherry-picked from #50737

Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
2025-05-10 12:18:35 +08:00
0efd97055d branch-2.1: [fix](jdbc catalog) Improve conjunct expression handling in JdbcScanNode #50542 (#50648)
Cherry-picked from #50542

Co-authored-by: zy-kkk <zhongyk10@gmail.com>
2025-05-10 08:38:45 +08:00
1e336ed7ea [fix](restore) avoid change the hard link files of a snapshot (#50753) 2025-05-10 08:27:46 +08:00
87eeddc52c branch-2.1: [fix](regress) fix join_condition #50719 (#50738)
Cherry-picked from #50719

Co-authored-by: yujun <yujun@selectdb.com>
2025-05-09 22:02:45 +08:00
79056d4d7a branch-2.1: [feat](hive) add catalog level partition cache property #50724 (#50762)
Cherry-picked from #50724

Co-authored-by: Mingyu Chen (Rayner) <morningman@163.com>
2025-05-09 22:01:49 +08:00
f8de821d49 branch-2.1:[fix](regression)Fix test analyze mv case. (#50701) (#50751)
backport: https://github.com/apache/doris/pull/50701
2025-05-09 17:38:40 +08:00
0002500757 branch-2.1: [Fix](inverted index) fix rename column build index bug #50056 (#50732)
pick #47562 #50056 from master

---------

Co-authored-by: qiye <luen@selectdb.com>
2025-05-09 17:13:46 +08:00
fee5d40e07 [fix](planner) fix show variable display wrong enable_nereids_planner value (#50746)
### What problem does this PR solve?

fix show variable display wrong enable_nereids_planner value, introduced
by #49913

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [x] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [x] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
2025-05-09 17:09:07 +08:00
04cbb2ac66 [fix](information_schema) fix backend_active_tasks table only return one backend's data (#50721) (#50722)
cherry pick from #50721
2025-05-09 15:01:22 +08:00
5923d97804 branch-2.1: [improvement](regression)Add log to print jdbc url in prepare stmt test #50711 (#50729)
Cherry-picked from #50711

Co-authored-by: James <lijibing@selectdb.com>
2025-05-09 14:37:45 +08:00
fde8d05f5d branch-2.1 [opt](nereids) catch all exceptions in StatsCalculator (#49415) (#50364) 2025-05-09 11:24:18 +08:00
9422c973af branch-2.1: [fix](binlog) Acquire migration lock before ingesting binlog #50663 (#50709)
cherry pick from #50663
2025-05-09 11:16:18 +08:00
0347e8a3c6 branch-2.1: proper planning of shadow columns for load and schema change concurrent execution (#49332) (#50710) 2025-05-09 11:15:08 +08:00
f0f0f21e5f [regression-test](case) move github_events to nonConcurrent (#50733) 2025-05-09 11:14:11 +08:00
37f3c8f0c7 [fix](nereids) fix fold constant return wrong scale of datetime type (#50142) (#50716)
cherry pick from #50142
2025-05-09 11:12:07 +08:00
bf5885a8f8 branch-2.1: [opt](Nereids) avoid generate nested alias expr when plan insert values (#50388) 2025-05-09 11:06:24 +08:00