this change improves performance of tpch q20. on sf500, improved from 6.3sec to 1.1 sec
this change has no impaction on tpcds
when column stats is unknown,
the basic algorithm to estimate left semi join output row count is its left child output row count.
q1: "A left semi join B on A.x=B.x"
the output row is estimated as A.rowCount.
But the basic algorithm is not good to following pattern:
q2: "A left semi join filter(B) on A.x=B.x"
Because there is a filter on B, usually this left semi join also reduce the row count of A, and we estimate
the output of q2 as A.rowCount * Filter.rowCount/B.rowCount
Opt zstd block decompression by `ZSTD_decompressDCtx()` to replace streaming decompression.
It will improve performance but consume more memory.
Test result:
- env: 1 node(16 cores, 64G).
- parquet column: 100 million rows of char(255) column.
- result: 5.2 -> 4.6.
Prior to PR #27676, data was written with empty path information. Consequently, after implementing #27676, data that already exists in a segment is not included in `column_id_to_footer_ordinal`. This issue will lead to `invalid nonexistent column without default value` error.
for (Partition partition : olapTable.getPartitions()) {
short replicationNum = replicaAlloc.getTotalReplicaNum();
long visibleVersion = partition.getVisibleVersion();
// Here we only get VISIBLE indexes. All other indexes are not queryable.
// So it does not matter if tablets of other indexes are not matched.
for (MaterializedIndex index : partition.getMaterializedIndices(IndexExtState.VISIBLE)) {
Preconditions.checkState(backendBucketsSeq.size() == index.getTablets().size(),
backendBucketsSeq.size() + " vs. " + index.getTablets().size());
int idx = 0;
for (Long tabletId : index.getTabletIdsInOrder()) {
counter.totalTabletNum++;
Set<Long> bucketsSeq = backendBucketsSeq.get(idx);
Preconditions.checkState(bucketsSeq.size() == replicationNum,
bucketsSeq.size() + " vs. " + replicationNum);
Tablet tablet = index.getTablet(tabletId);
TabletStatus st = tablet.getColocateHealthStatus(
visibleVersion, replicaAlloc, bucketsSeq);
if (st != TabletStatus.HEALTHY) {
counter.unhealthyTabletNum++;
unstableReason = String.format("get unhealthy tablet %d in colocate table."
+ " status: %s", tablet.getId(), st);
LOG.debug(unstableReason);
if (!tablet.readyToBeRepaired(infoService, Priority.NORMAL)) {
counter.tabletNotReady++;
// 这里需要将 idx++ ,否则 bucketsSeq和 tablet replicas backends 对应不上
idx++;
continue;
}
TabletSchedCtx tabletCtx = new TabletSchedCtx(
TabletSchedCtx.Type.REPAIR,
db.getId(), tableId, partition.getId(), index.getId(), tablet.getId(),
replicaAlloc, System.currentTimeMillis());
// the tablet status will be set again when being scheduled
tabletCtx.setTabletStatus(st);
tabletCtx.setPriority(Priority.NORMAL);
tabletCtx.setTabletOrderIdx(idx);
AddResult res = tabletScheduler.addTablet(tabletCtx, false /* not force */);
if (res == AddResult.LIMIT_EXCEED || res == AddResult.DISABLED) {
// tablet in scheduler exceed limit, or scheduler is disabled,
// skip this group and check next one.
LOG.info("tablet scheduler return: {}. stop colocate table check", res.name());
break OUT;
} else if (res == AddResult.ADDED) {
counter.addToSchedulerTabletNum++;
} else {
counter.tabletInScheduler++;
}
}
idx++;
}
}
}
Fix null map issue in parquet reader which cause result incorrect such as `min()`, `max()`.
In order to share null map between parquet converted src column and dst column to avoid copying. It is very tricky that will call mutable function `doris_nullable_column->get_null_map_column_ptr()` which will set `_need_update_has_null = true`. Because some operations such as agg will call `has_null()` to set `_need_update_has_null = false`.
Remove `default_cluster` prefix related to:
1. User
2. Role
3. UserManager
4. RoleManager
5. UserRoleManager
6. UserProperty
7. Create/Drop user Stmt
8. Create/Drop role Stmt
9. Grant/Revoke
support view as a independent unit of leading hint
add random test check of leading hint query
add more test with data of leading hint query
add random test check of distribute hint
Fixed the problem of not being able to read parquet lz4 compressed format. By default, it is decompressed according to the Hadoop lz4 format. If it fails, it will fall back to the standard lz4 compression format.