Commit Graph

18429 Commits

Author SHA1 Message Date
d9bbeca431 [improve](env) Improve catalog not ready tips (#27715) 2023-12-01 22:52:43 +08:00
b74388c3b1 [case](regression) Add backup restore test with specified partition (#27694) 2023-12-01 22:31:59 +08:00
1706699e7e [fix](multi-catalog)support the max compute partition prune (#27154)
1. max compute partition prune,
we just support filter mc partitions by '=',it can filter just one partition
to support multiple partition filter and range operator('>','<', '>='..), the partition prune should be supported.

2. add max compute row count cache and partitionValues cache

3. add max compute regression case
2023-12-01 22:28:26 +08:00
f4afcae452 [case](regression) Stream load 2pc exceptions (#27804)
Co-authored-by: qinhao <qinhao@newland.com.cn>
2023-12-01 22:27:40 +08:00
68525fc112 [feature](profile) add RuntimeFilterInfo in merge profile #27869 2023-12-01 21:42:25 +08:00
fcfd0aa8e0 [fix](doc) spell error (#27079)
fixed Spelling errors in metadata-operation and cold-hot-separation
2023-12-01 21:30:50 +08:00
d80bfc19c9 [fix](doc) spell error fixes for FE & BE Config documents (#27619) 2023-12-01 20:53:26 +08:00
327035f2b0 [fix](doc) chinese translation replaced and case fix (#27611) 2023-12-01 20:53:07 +08:00
8749e5208f [fix](jdbc catalog) fix insert into jdbc table column order (#27855) 2023-12-01 20:46:48 +08:00
7e3d6bc9f1 [Fix](Variant) Implement ColumnObject::update_hash_with_value (#27873) 2023-12-01 20:14:47 +08:00
1451a835b7 [fix](stats) Don't save colToPartitions anymore to save mem (#27879) 2023-12-01 19:54:30 +08:00
19281e3590 [tpcds] remove useless tpcds tools config (#27867)
Co-authored-by: zhongjian.xzj <zhongjian.xzj@192.168.2.31>
2023-12-01 18:47:12 +08:00
3f20cf1456 [fix](nereids)set operation's result type is wrong if decimal overflows (#27870) 2023-12-01 18:40:06 +08:00
c93e5d9e89 [doc](flink-connector) update flink doc and options (#27875)
---------

Co-authored-by: wudi <>
2023-12-01 17:40:08 +08:00
007506ce42 [fix](like_func) incorrect result of like with 'NO_BACKSLASH_ESCAPES' mode (#27842) 2023-12-01 17:32:46 +08:00
26e81b6573 [fix](stats)min and max return NaN when table is empty (#27862)
fix analyze empty table and min/max null value bug:
1. Skip empty analyze task for sample analyze task. (Full analyze task already skipped).
2. Check sample rows is not 0 before calculate the scale factor.
3. Remove ' in sql template after remove base64 encoding for min/max value.
2023-12-01 17:00:56 +08:00
18338a33b6 [bugfix](mergeprofile) ignore null profile to avoid bug (#27860)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-12-01 16:56:29 +08:00
34c85c962f [opt](Nereids) improve semi/anti join estimation when column stats are unavailable #27793
this change improves performance of tpch q20. on sf500, improved from 6.3sec to 1.1 sec
this change has no impaction on tpcds

when column stats is unknown,
the basic algorithm to estimate left semi join output row count is its left child output row count.
q1: "A left semi join B on A.x=B.x"
the output row is estimated as A.rowCount.

But the basic algorithm is not good to following pattern:
q2: "A left semi join filter(B) on A.x=B.x"
Because there is a filter on B, usually this left semi join also reduce the row count of A, and we estimate
the output of q2 as A.rowCount * Filter.rowCount/B.rowCount
2023-12-01 15:48:33 +08:00
137f94eac9 [Bug](func) coredump in equal for null in function (#27844) 2023-12-01 15:48:01 +08:00
94b75515e5 [minor](stats) Throw error when sync analyze failed (#27845) 2023-12-01 15:44:27 +08:00
Pxl
64fad89eb1 [Chore](case) add case of join with big hashtable (#27825)
add case of join with big hashtable
2023-12-01 15:32:23 +08:00
39692266d3 [minor](stats) Update olap table row count after analyze (#27814) 2023-12-01 13:51:42 +08:00
e868c990ff [feature](Nereids) support add constraint on table (#27627)
support add constraint on the table including
- primary key constraint
- unique constrain
- foreign key constraint
2023-12-01 13:28:48 +08:00
48d7df205f [chore](log) Add more detail msg for waitRPC exception #27771 2023-12-01 11:59:47 +08:00
765f2b4809 [community](collaborator)add new collaborator KassieZ (#27708) 2023-12-01 11:46:41 +08:00
776f0205f3 [Fix](test) Fix an auto partition conflict and add many testcases (#27730)
Fix an auto partition conflict and add many testcases
2023-12-01 09:58:44 +08:00
2afbece0b8 [Fix](type) fix wrong type transform for unix_timestamp (#27728)
fix wrong type transform for unix_timestamp
2023-12-01 09:58:20 +08:00
c1d73ecefb [chore](load) rm some load related redundant code (#27102) 2023-12-01 09:29:28 +08:00
60bc3be8a2 [Opt](Compression) Opt zstd block decompression by ZSTD_decompressDCtx(). (#27534)
Opt zstd block decompression by `ZSTD_decompressDCtx()` to replace streaming decompression.
It will improve performance but consume more memory. 

Test result: 
- env: 1 node(16 cores, 64G).
- parquet column: 100 million rows of char(255) column.
- result: 5.2 -> 4.6.
2023-12-01 09:10:32 +08:00
6a614c3e7b [regression](nereids) add regression case for transposeSemiJoinAgg/transposeSemiJoinAggProject rules (#27664)
add case for transposeSemiJoinAgg/transposeSemiJoinAggProject rules
2023-12-01 08:19:16 +08:00
2b2c2dd772 [fix](sequence column) insert into should require sequence column in all scenario (#27780) 2023-11-30 23:27:58 +08:00
0b7becd4b7 [fix](executor)Fix memtracker not set to task group #27699 2023-11-30 22:35:51 +08:00
6c4ec3cb82 [FIX](complextype)fix array/map/struct impl hashcode and equals (#27717) 2023-11-30 22:08:15 +08:00
c93b5727b3 [fix](profile) fix double add in aggcounter #27826 2023-11-30 21:45:15 +08:00
97105e9a16 [regression](compaction) Add case to test single replica compaction (#27199) 2023-11-30 21:27:13 +08:00
a2fa0b3745 [compability](segment) fix compability issue introduced by #27676 (#27799)
Prior to PR #27676, data was written with empty path information. Consequently, after implementing #27676, data that already exists in a segment is not included in `column_id_to_footer_ordinal`. This issue will lead to `invalid nonexistent column without default value` error.
2023-11-30 21:24:59 +08:00
c0aac043b6 [pipelineX](local shuffle) Use local shuffle to optimize BHJ (#27823) 2023-11-30 21:08:45 +08:00
16fb7a507c [fix](colocate) bucket index cannot be set correctly when do colocate balance (#27741)
for (Partition partition : olapTable.getPartitions()) {
    short replicationNum = replicaAlloc.getTotalReplicaNum();
    long visibleVersion = partition.getVisibleVersion();
    // Here we only get VISIBLE indexes. All other indexes are not queryable.
    // So it does not matter if tablets of other indexes are not matched.
    for (MaterializedIndex index : partition.getMaterializedIndices(IndexExtState.VISIBLE)) {
        Preconditions.checkState(backendBucketsSeq.size() == index.getTablets().size(),
                backendBucketsSeq.size() + " vs. " + index.getTablets().size());
        int idx = 0;
        for (Long tabletId : index.getTabletIdsInOrder()) {
            counter.totalTabletNum++;
            Set<Long> bucketsSeq = backendBucketsSeq.get(idx);
            Preconditions.checkState(bucketsSeq.size() == replicationNum,
                    bucketsSeq.size() + " vs. " + replicationNum);
            Tablet tablet = index.getTablet(tabletId);
            TabletStatus st = tablet.getColocateHealthStatus(
                    visibleVersion, replicaAlloc, bucketsSeq);
            if (st != TabletStatus.HEALTHY) {
                counter.unhealthyTabletNum++;
                unstableReason = String.format("get unhealthy tablet %d in colocate table."
                        + " status: %s", tablet.getId(), st);
                LOG.debug(unstableReason);

                if (!tablet.readyToBeRepaired(infoService, Priority.NORMAL)) {
                    counter.tabletNotReady++;
                    // 这里需要将  idx++ ,否则 bucketsSeq和 tablet replicas backends 对应不上
                    idx++;
                    continue;
                }

                TabletSchedCtx tabletCtx = new TabletSchedCtx(
                        TabletSchedCtx.Type.REPAIR,
                        db.getId(), tableId, partition.getId(), index.getId(), tablet.getId(),
                        replicaAlloc, System.currentTimeMillis());
                // the tablet status will be set again when being scheduled
                tabletCtx.setTabletStatus(st);
                tabletCtx.setPriority(Priority.NORMAL);
                tabletCtx.setTabletOrderIdx(idx);

                AddResult res = tabletScheduler.addTablet(tabletCtx, false /* not force */);
                if (res == AddResult.LIMIT_EXCEED || res == AddResult.DISABLED) {
                    // tablet in scheduler exceed limit, or scheduler is disabled,
                    // skip this group and check next one.
                    LOG.info("tablet scheduler return: {}. stop colocate table check", res.name());
                    break OUT;
                } else if (res == AddResult.ADDED) {
                    counter.addToSchedulerTabletNum++;
                }  else {
                    counter.tabletInScheduler++;
                }
            }
            idx++;
        }
    }
}
2023-11-30 20:28:18 +08:00
435192bddc [doc](stats) add auto_analyze_table_width_threshold description. (#27818) 2023-11-30 18:15:11 +08:00
9d2aac305f [doc](partial update) update suggestions of using unique key table (#27810) 2023-11-30 17:27:39 +08:00
Pxl
96f2ef3d99 [Improvement](schema-change) Reserve some memory for use by other parts except hold block of schem… (#27800)
Reserve some memory for use by other parts except hold block of schema change job
2023-11-30 17:01:51 +08:00
Pxl
f573918aa4 [Chore](materialized-view) output reference column info when create mv can't find ref column (#27182)
output reference column info when create mv can't find ref column
2023-11-30 16:48:06 +08:00
f1846c10a1 [fix](stop)fix missing notify_all() after the stop (#27796) 2023-11-30 16:04:13 +08:00
8ca8a0655e [fix](memtracking) require size in Allocator::free (#27795) 2023-11-30 15:57:15 +08:00
db8e56b9f2 [improve](move-memtable) increase open load stream timeout (#26909) 2023-11-30 15:27:29 +08:00
5a4948f0f9 [fix](load) fix DataSink prepared check in PlanFragmentExecutor (#27735) 2023-11-30 15:24:04 +08:00
838225b6be [fix](move-memtable) wait stream close before releasing streams (#27791) 2023-11-30 15:03:07 +08:00
3e910e2978 [refactor](simd_json_reader) refactor simd json reader to adapt to parse multi json (#27272) 2023-11-30 15:01:06 +08:00
f10b7bf7e7 [test](Planner): add regression-test for eager-aggregate (#27732) 2023-11-30 14:42:26 +08:00
e4149c6e4c [Fix](parquet-reader) Fix null map issue in parquet reader. (#27777)
Fix null map issue in parquet reader which cause result incorrect such as `min()`, `max()`.

In order to share null map between parquet converted src column and dst column to avoid copying. It is very tricky that will call mutable function `doris_nullable_column->get_null_map_column_ptr()` which will set `_need_update_has_null = true`. Because some operations such as agg will call `has_null()` to set `_need_update_has_null = false`.
2023-11-30 13:55:37 +08:00