Commit Graph

18263 Commits

Author SHA1 Message Date
18338a33b6 [bugfix](mergeprofile) ignore null profile to avoid bug (#27860)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-12-01 16:56:29 +08:00
34c85c962f [opt](Nereids) improve semi/anti join estimation when column stats are unavailable #27793
this change improves performance of tpch q20. on sf500, improved from 6.3sec to 1.1 sec
this change has no impaction on tpcds

when column stats is unknown,
the basic algorithm to estimate left semi join output row count is its left child output row count.
q1: "A left semi join B on A.x=B.x"
the output row is estimated as A.rowCount.

But the basic algorithm is not good to following pattern:
q2: "A left semi join filter(B) on A.x=B.x"
Because there is a filter on B, usually this left semi join also reduce the row count of A, and we estimate
the output of q2 as A.rowCount * Filter.rowCount/B.rowCount
2023-12-01 15:48:33 +08:00
137f94eac9 [Bug](func) coredump in equal for null in function (#27844) 2023-12-01 15:48:01 +08:00
94b75515e5 [minor](stats) Throw error when sync analyze failed (#27845) 2023-12-01 15:44:27 +08:00
Pxl
64fad89eb1 [Chore](case) add case of join with big hashtable (#27825)
add case of join with big hashtable
2023-12-01 15:32:23 +08:00
39692266d3 [minor](stats) Update olap table row count after analyze (#27814) 2023-12-01 13:51:42 +08:00
e868c990ff [feature](Nereids) support add constraint on table (#27627)
support add constraint on the table including
- primary key constraint
- unique constrain
- foreign key constraint
2023-12-01 13:28:48 +08:00
48d7df205f [chore](log) Add more detail msg for waitRPC exception #27771 2023-12-01 11:59:47 +08:00
765f2b4809 [community](collaborator)add new collaborator KassieZ (#27708) 2023-12-01 11:46:41 +08:00
776f0205f3 [Fix](test) Fix an auto partition conflict and add many testcases (#27730)
Fix an auto partition conflict and add many testcases
2023-12-01 09:58:44 +08:00
2afbece0b8 [Fix](type) fix wrong type transform for unix_timestamp (#27728)
fix wrong type transform for unix_timestamp
2023-12-01 09:58:20 +08:00
c1d73ecefb [chore](load) rm some load related redundant code (#27102) 2023-12-01 09:29:28 +08:00
60bc3be8a2 [Opt](Compression) Opt zstd block decompression by ZSTD_decompressDCtx(). (#27534)
Opt zstd block decompression by `ZSTD_decompressDCtx()` to replace streaming decompression.
It will improve performance but consume more memory. 

Test result: 
- env: 1 node(16 cores, 64G).
- parquet column: 100 million rows of char(255) column.
- result: 5.2 -> 4.6.
2023-12-01 09:10:32 +08:00
6a614c3e7b [regression](nereids) add regression case for transposeSemiJoinAgg/transposeSemiJoinAggProject rules (#27664)
add case for transposeSemiJoinAgg/transposeSemiJoinAggProject rules
2023-12-01 08:19:16 +08:00
2b2c2dd772 [fix](sequence column) insert into should require sequence column in all scenario (#27780) 2023-11-30 23:27:58 +08:00
0b7becd4b7 [fix](executor)Fix memtracker not set to task group #27699 2023-11-30 22:35:51 +08:00
6c4ec3cb82 [FIX](complextype)fix array/map/struct impl hashcode and equals (#27717) 2023-11-30 22:08:15 +08:00
c93b5727b3 [fix](profile) fix double add in aggcounter #27826 2023-11-30 21:45:15 +08:00
97105e9a16 [regression](compaction) Add case to test single replica compaction (#27199) 2023-11-30 21:27:13 +08:00
a2fa0b3745 [compability](segment) fix compability issue introduced by #27676 (#27799)
Prior to PR #27676, data was written with empty path information. Consequently, after implementing #27676, data that already exists in a segment is not included in `column_id_to_footer_ordinal`. This issue will lead to `invalid nonexistent column without default value` error.
2023-11-30 21:24:59 +08:00
c0aac043b6 [pipelineX](local shuffle) Use local shuffle to optimize BHJ (#27823) 2023-11-30 21:08:45 +08:00
16fb7a507c [fix](colocate) bucket index cannot be set correctly when do colocate balance (#27741)
for (Partition partition : olapTable.getPartitions()) {
    short replicationNum = replicaAlloc.getTotalReplicaNum();
    long visibleVersion = partition.getVisibleVersion();
    // Here we only get VISIBLE indexes. All other indexes are not queryable.
    // So it does not matter if tablets of other indexes are not matched.
    for (MaterializedIndex index : partition.getMaterializedIndices(IndexExtState.VISIBLE)) {
        Preconditions.checkState(backendBucketsSeq.size() == index.getTablets().size(),
                backendBucketsSeq.size() + " vs. " + index.getTablets().size());
        int idx = 0;
        for (Long tabletId : index.getTabletIdsInOrder()) {
            counter.totalTabletNum++;
            Set<Long> bucketsSeq = backendBucketsSeq.get(idx);
            Preconditions.checkState(bucketsSeq.size() == replicationNum,
                    bucketsSeq.size() + " vs. " + replicationNum);
            Tablet tablet = index.getTablet(tabletId);
            TabletStatus st = tablet.getColocateHealthStatus(
                    visibleVersion, replicaAlloc, bucketsSeq);
            if (st != TabletStatus.HEALTHY) {
                counter.unhealthyTabletNum++;
                unstableReason = String.format("get unhealthy tablet %d in colocate table."
                        + " status: %s", tablet.getId(), st);
                LOG.debug(unstableReason);

                if (!tablet.readyToBeRepaired(infoService, Priority.NORMAL)) {
                    counter.tabletNotReady++;
                    // 这里需要将  idx++ ,否则 bucketsSeq和 tablet replicas backends 对应不上
                    idx++;
                    continue;
                }

                TabletSchedCtx tabletCtx = new TabletSchedCtx(
                        TabletSchedCtx.Type.REPAIR,
                        db.getId(), tableId, partition.getId(), index.getId(), tablet.getId(),
                        replicaAlloc, System.currentTimeMillis());
                // the tablet status will be set again when being scheduled
                tabletCtx.setTabletStatus(st);
                tabletCtx.setPriority(Priority.NORMAL);
                tabletCtx.setTabletOrderIdx(idx);

                AddResult res = tabletScheduler.addTablet(tabletCtx, false /* not force */);
                if (res == AddResult.LIMIT_EXCEED || res == AddResult.DISABLED) {
                    // tablet in scheduler exceed limit, or scheduler is disabled,
                    // skip this group and check next one.
                    LOG.info("tablet scheduler return: {}. stop colocate table check", res.name());
                    break OUT;
                } else if (res == AddResult.ADDED) {
                    counter.addToSchedulerTabletNum++;
                }  else {
                    counter.tabletInScheduler++;
                }
            }
            idx++;
        }
    }
}
2023-11-30 20:28:18 +08:00
435192bddc [doc](stats) add auto_analyze_table_width_threshold description. (#27818) 2023-11-30 18:15:11 +08:00
9d2aac305f [doc](partial update) update suggestions of using unique key table (#27810) 2023-11-30 17:27:39 +08:00
Pxl
96f2ef3d99 [Improvement](schema-change) Reserve some memory for use by other parts except hold block of schem… (#27800)
Reserve some memory for use by other parts except hold block of schema change job
2023-11-30 17:01:51 +08:00
Pxl
f573918aa4 [Chore](materialized-view) output reference column info when create mv can't find ref column (#27182)
output reference column info when create mv can't find ref column
2023-11-30 16:48:06 +08:00
f1846c10a1 [fix](stop)fix missing notify_all() after the stop (#27796) 2023-11-30 16:04:13 +08:00
8ca8a0655e [fix](memtracking) require size in Allocator::free (#27795) 2023-11-30 15:57:15 +08:00
db8e56b9f2 [improve](move-memtable) increase open load stream timeout (#26909) 2023-11-30 15:27:29 +08:00
5a4948f0f9 [fix](load) fix DataSink prepared check in PlanFragmentExecutor (#27735) 2023-11-30 15:24:04 +08:00
838225b6be [fix](move-memtable) wait stream close before releasing streams (#27791) 2023-11-30 15:03:07 +08:00
3e910e2978 [refactor](simd_json_reader) refactor simd json reader to adapt to parse multi json (#27272) 2023-11-30 15:01:06 +08:00
f10b7bf7e7 [test](Planner): add regression-test for eager-aggregate (#27732) 2023-11-30 14:42:26 +08:00
e4149c6e4c [Fix](parquet-reader) Fix null map issue in parquet reader. (#27777)
Fix null map issue in parquet reader which cause result incorrect such as `min()`, `max()`.

In order to share null map between parquet converted src column and dst column to avoid copying. It is very tricky that will call mutable function `doris_nullable_column->get_null_map_column_ptr()` which will set `_need_update_has_null = true`. Because some operations such as agg will call `has_null()` to set `_need_update_has_null = false`.
2023-11-30 13:55:37 +08:00
da03a50824 [refine](pipelineX) refine dataqueue set source ready block (#27733) 2023-11-30 13:00:18 +08:00
7f13dcc726 [refactor](cluster)(step-3) remove cluster related to Auth (#27718)
Remove `default_cluster` prefix related to:
1. User
2. Role
3. UserManager
4. RoleManager
5. UserRoleManager
6. UserProperty
7. Create/Drop user Stmt
8. Create/Drop role Stmt
9. Grant/Revoke
2023-11-30 12:46:08 +08:00
112ae59aa4 [fix](move-memtable) add timeout for load stream close wait (#27439) 2023-11-30 12:00:06 +08:00
34e53acaea [pipelineX](fix) Fix local exchange on pipelineX engine (#27763) 2023-11-30 11:16:20 +08:00
5739167142 [feature](window_function) support to secondary argument to ignore null values in first_value/last_value (#27623) 2023-11-30 09:56:43 +08:00
e9debca97c [Improve](sort) avoid too may tmp vectors for get_columns (#27734) 2023-11-30 09:47:31 +08:00
1f9aa8ab16 [fix](group commit) Fix some group commit problems (#27769) 2023-11-29 23:43:21 +08:00
d96e2dfefb [feature-wip](arrow-flight)(step5) Support JDBC and PreparedStatement and Fix Bug (#27661) 2023-11-29 21:17:20 +08:00
19ecb3a8a2 [opt](stats) Use escape rather than base64 for min/max value (#27746) 2023-11-29 21:13:33 +08:00
acc14d7e4c [feature](Planner): Push down LimitDistinct through Union (#27745) 2023-11-29 21:12:42 +08:00
83ed8d3cba [Feat](Nereids) join hint support stage one (#27378)
support view as a independent unit of leading hint
add random test check of leading hint query
add more test with data of leading hint query
add random test check of distribute hint
2023-11-29 21:08:08 +08:00
6cdaf8ea32 [bugfix](profile) insert into select profile could not build successfully(#27756)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-11-29 20:44:25 +08:00
d00c04ccc1 [enhancement](stats) limit bq cap size for analyze task (#27685) 2023-11-29 20:35:56 +08:00
ba6219c3c3 [fix](stats) Fix show auto analyze missing jobs bug (#27755) 2023-11-29 20:33:29 +08:00
ce271ff382 [fix](parquet)fix can not read parquet lz4 compress. (#27383)
Fixed the problem of not being able to read parquet lz4 compressed format. By default, it is decompressed according to the Hadoop lz4 format. If it fails, it will fall back to the standard lz4 compression format.
2023-11-29 19:04:53 +08:00
c99eb5d80f [fix](multi-catalog)add properties converter fe ut (#27254) 2023-11-29 19:01:29 +08:00