Currently, there are many profiles using add child profile to orgnanize profile into blocks. But it is wrong. Child profile will have a total time counter. Actually, what we should use is just a label.
- MemoryUsage:
- HashTable: 23.98 KB
- SerializeKeyArena: 446.75 KB
Add a new macro ADD_LABEL_COUNTER to add just a label in the profile.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
now, we check some olap table state normal outside write lock scope, the table state may be changed to unnormal when we do alter operation
---------
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
the formula used to compute ndv after filter implies that the new rowCount is smaller than the original rowCount. When we apply this formula to join, we should add branch if new row count is bigger than original row count.
when new row count is bigger, the ndv is not changed.
Support collect statistics for HMS external table with specific partitions. Add session variables to limit the partitions to collect for whole table line number and columns statistics.
PR(https://github.com/apache/doris/pull/19909) has implemented the framework of hudi reader for MOR table. This PR completes all functions of reading MOR table and enables end-to-end queries.
Key Implementations:
1. Use hudi meta information to generate the table schema, not from hive client.
2. Use hive client to list hudi partitions, so it strongly depends the sync-tools(https://hudi.apache.org/docs/syncing_metastore/) which syncs the partitions of hudi into hive metastore. However, we may get the hudi partitions directly from .hoodie directory.
3. Remove `HudiHMSExternalCatalog`, because other catalogs like glue is compatible with hive catalog.
4. Read the COW table originally from c++.
5. Hudi RecordReader will use ProcessBuilder to start a hotspot debugger process, which may be stuck when attaching the origin JNI process, soI use a tricky method to kill this useless process.
we have some prunning path logical in cascades framework. However it do not work as we expected. if we do prunning on one Group, then maybe we need to do thousands of times optimization on its parent without any success result. This PR remove these prunning provisionally. We will add prunning back when we re-design it.
If BE crashed the error would be logged, and the analysis task would be mark as finished, which is incorrect.
In this PR, update analysis task according to the query state
1. Support mutli table for routine load
2. Multi-table dynamic setting table information
3. Add multi-table syntax rules
4. Add new multi-table execution plan
The precision handling for division with DECIMALV3 is as follows (excluding cases where division increases precision):
(p1, s1) / (p2, s2) ----> (p1 + s2, s1)
However, due to precision loss in division, it is considered to increase the precision of the left operand:
(p1, s1) / (p2, s2) =====> (p1 + s2, s1 + s2) / (p2, s2) ----> (p1 + s2, s1)
However, the legacy optimizer repeats the analyze and substitute steps for an expression, which can result in the accumulation of precision:
(p1, s1) / (p2, s2) =====> (p1 + s2, s1 + s2) / (p2, s2) =====> (p1 + s2 + s2, s1 + s2 + s2) / (p2, s2)
To address this, the previous approach was to forcibly convert the left operand of DECIMALV3 calculations. This results in rewriting the expression as:
(p1, s1) / (p2, s2) =====> cast((p1, s1) as (p1 + s2, s1 + s2)) / (p2, s2)
Then, during the substitution step, a check is performed. If it is a cast expression, the expression modified by the cast is extracted:
cast((p1, s1) as (p1 + s2, s1 + s2)) =====> (p1, s1)
protected Expr substituteImpl(ExprSubstitutionMap smap, ExprSubstitutionMap disjunctsMap, Analyzer analyzer) {
if (isImplicitCast()) {
return getChild(0).substituteImpl(smap, disjunctsMap, analyzer);
}
This way, there won't be repeated analysis, preventing the continuous increase in precision. However, if the left expression is a constant (literal), theoretically, the precision would continue to increase. Unfortunately, the code that was removed in this PR (#19926) obscured this issue.
for (Expr child : children) {
if (child instanceof DecimalLiteral && child.getType().isDecimalV3()) {
((DecimalLiteral)child).tryToReduceType();
}
}
An attempt will be made to reduce the precision of literals in the expressions. However, this code snippet can cause such a bug.
mysql [test]>select cast(1 as DECIMALV3(16, 2)) / cast(3 as DECIMALV3(16, 2));
+-----------------------------------------------------------+
| CAST(1 AS DECIMALV3(16, 2)) / CAST(3 AS DECIMALV3(16, 2)) |
+-----------------------------------------------------------+
| 0.00 |
+-----------------------------------------------------------+
1.00 / 3.00, due to reduced precision, becomes 1 / 3.
<--Describe your changes.-->
Use consistent hash to collect BE only when the file cache is enabled. And move the consistent BE assign code to FederationBackendPolicy.
Fix explain split number and file size incorrect bug.
If a query hits a materialized view that has row storage enabled, but the row storage column is not present in the materialized view, it will result in a query crash. Therefore, it is necessary to include the row storage column when creating the materialized view, and serialize the row storage column during the execution of SchemaChange.
MTMV tasks keep finish status only to reduce the loss caused by logging.
After changes, unfinished tasks will be lost directly when FE master restarts.
we should not plan any Filter or Project above CteAnchor, because there are project or filter under anchor sometimes.
and the whole plan can not translate to a valid plan for BE.
For routine load (kafka load), user can produce all data for different
table into single topic and doris will dispatch them into corresponding
table.
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
1.move PushdownFilterThroughCTEAnchor and PushdownProjectThroughCTEAnchor into PUSH_DOWN_FILTERS rule set
2.move PushdownFilterThroughProject before MergeProjectPostProcessor
1. skip forbidUnknownColStats check for in-visible columns
2. use columsStatistics.isUnknown to tell if this stats is unknown
3. skip unknown stats check for internal schema
Add the rule for checking the join node in `analysis/CheckAnalysis.java` file. When we check the join node, we should check its' on clause. If there are some subquery expression, we should throw exception.
Before this PR
```
mysql> select a.k1 from baseall a join test b on b.k2 in (select 49);
ERROR 1105 (HY000): errCode = 2, detailMessage = Unexpected exception: nul
```
After this PR
```
mysql> select a.k1 from baseall a join test b on b.k2 in (select 49);
ERROR 1105 (HY000): errCode = 2, detailMessage = Unexpected exception: Not support OnClause contain Subquery, expr:k2 IN (INSUBQUERY) (LogicalOneRowRelation ( projects=[49 AS `49`#28], buildUnionNode=true ))
```