Commit Graph

18854 Commits

Author SHA1 Message Date
eefeb4d80c [fix](spill) fix wrong disk usage of spill (#35423)
## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

## Further comments

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
2024-05-28 18:53:55 +08:00
72a27a0938 [fix](paimon)fix paimon cache bug (#35309)
Issue Number: close #35024 
This bug is because the fe incorrectly sets the update time of paimon
catalog, causing the be to be unable to update paimon's schema in time.
```c++
    private void initTable() {
        PaimonTableCacheKey key = new PaimonTableCacheKey(ctlId, dbId, tblId, paimonOptionParams, dbName, tblName);
        TableExt tableExt = PaimonTableCache.getTable(key);
        if (tableExt.getCreateTime() < lastUpdateTime) {
            LOG.warn("invalidate cache table:{}, localTime:{}, remoteTime:{}", key, tableExt.getCreateTime(),
                    lastUpdateTime);
            PaimonTableCache.invalidateTableCache(key);
            tableExt = PaimonTableCache.getTable(key);
        }
        this.table = tableExt.getTable();
        paimonAllFieldNames = PaimonScannerUtils.fieldNames(this.table.rowType());
        if (LOG.isDebugEnabled()) {
            LOG.debug("paimonAllFieldNames:{}", paimonAllFieldNames);
        }
    }
```
2024-05-28 18:52:51 +08:00
2e1318b8a0 [fix] (compaction) fix CompactionPermitLimiter causing compaction to stall (#35078)
## BUG
1. config::total_permits_for_compaction_score = 20000
2. Thread-B requests permits 11000, used_permits = 11000
3. Thread-A requests permits 12000,wait for used_permits + 12000 <=
20000
4.  adjust config::total_permits_for_compaction_score = 10000
5. Thread-B releases permits,used_permits = 0,notify
Thread-A,used_permits + 12000 <= 10000

## FIx
we need to initialize total_permits instead of using the config
2024-05-28 18:52:34 +08:00
86c7092f21 [opt](external) ignore not find files (#35319)
The file list is got from external meta cache, and the file may already
be removed from storage.
We should ignore not found files and that query continue.
2024-05-28 18:51:56 +08:00
efdce7e9b3 [fix](binlog) Fix add partition record sql (#35461)
1. support adding a temporary partition
2. remove extra parentheses in the list partition value set
3. support unpartitioned partition item
2024-05-28 18:50:05 +08:00
d97788dec8 [Refactor](Status) Refactor the scanner scheduler code make return error msg means (#35286)
## Proposed changes

Before error msg:
```
Failed to submit scanner to scanner pool
```

After error msg:
```
Failed to submit scanner to scanner pool reason:Scan thread pool had shutdown|type 1

```
2024-05-28 18:49:55 +08:00
70106067ab Revert "[fix](group commit) should set wal id in runtime_state when building pipeline task (#35506)"
This reverts commit 9f6d82672f5d445822f0a2d5b13a6c9ffdcca13a.
2024-05-28 18:22:20 +08:00
50e81d9db7 [feat](nereids) add more rules to eliminate empty relation (#34997) -branch-2.1 (#35534)
eliminate empty relations for following patterns:
topn->empty
sort->empty
distribute->empty
project->empty

(cherry picked from commit 8340f23946c0c8e40510ce937acd3342cb2e28b7)

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

## Further comments

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
2024-05-28 18:12:42 +08:00
84e9a14063 [Fix](hive-writer) Fix partition column orders issue when the partition fields inserted into the target table are inconsistent with the field order of the query source table and the schema field order of the query source table. (#35543)
## Proposed changes

backport #35347

## Further comments

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
2024-05-28 18:11:55 +08:00
27cf5a667f [enhancement](export) filter empty partition before export table to remote storage (#35389) (#35542)
## Proposed changes

Linked PR : #35389 

<!--Describe your changes.-->

## Further comments

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
2024-05-28 18:11:12 +08:00
b78dae040a Revert "[fix](nereids) push filter through window, using slot equal-set (#35361)" (#35541)
This reverts commit d2df392994e8dc00dfb5f8e49cca83fca97cb565.

This PR should not pick to branch-2.1, because the infra it relayed on
not in branch-2.1
2024-05-28 17:54:13 +08:00
9f6d82672f [fix](group commit) should set wal id in runtime_state when building pipeline task (#35506)
pick from master #35445

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

## Further comments

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
2024-05-28 17:48:10 +08:00
69da39b43d [improvement](statistics)Use defaultSessionVariable instead of clone a new one. (#34672) (#35531)
backport https://github.com/apache/doris/pull/34672

<!--Describe your changes.-->

## Further comments

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
2024-05-28 17:19:38 +08:00
aa4fd3fd79 [fix](statistics)Improve analyze timeout. (#33836) (#35530)
backport https://github.com/apache/doris/pull/33836

<!--Describe your changes.-->

## Further comments

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
2024-05-28 17:12:53 +08:00
63e63e114d [fix](Nereids) could not push down filter through cte producer sometimes (#35507)
pick from master #35463
commit id 0632309209cc3f9b6523ef7054eb1abdb9d0e7d8

when consumer side eliminate some consumers from plan, the size of
consumers is wrong. so we cannot push down some filter in producer side.
this PR fix this problem by update consumer set after rewrite outer side
2024-05-28 16:53:51 +08:00
9d04d18c94 [improvement](statistics)Write audit log while doing drop stats. (#34433) (#35526)
backport https://github.com/apache/doris/pull/34433

<!--Describe your changes.-->

## Further comments

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
2024-05-28 16:46:27 +08:00
43890ffd3a [fix](compaction) fix repeatedly picking tablets with disable auto compaction (#35472) (#35505)
pick master #35472
2024-05-28 15:57:54 +08:00
96a4159f73 [opt](scan) Use lazy-init for segment iterators and avoid caching all segments in the rowset reader (#35432)
## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

## Further comments

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
2024-05-28 13:19:18 +08:00
4e7e8d700f [enhancement](atomicstatus) use lock to make the status object more stable (#35476)
1. In the past, if error code is not ok and then get status, the status
maybe ok. some dcheck maybe failed.

In this PR use std mutex to make this behavior stable.

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-05-28 13:18:42 +08:00
Pxl
87c90094a7 [Bug](materialized-view) fix unmatch mv coz table name (#35444)
fix unmatch mv coz table name
2024-05-28 13:17:33 +08:00
97a5f55a37 [fix](function) bitmap to base64 error length check (#35117) 2024-05-28 13:17:16 +08:00
8599e8ee64 [improvement](mtmv) Add id to statistics map in statement context for cost estimation later (#35436)
Add id to statistics map in statement context for cost estimation later
this helps to improve the probability to use materialized view when
query a single table with aggregate and many filter
2024-05-28 13:17:05 +08:00
d2df392994 [fix](nereids) push filter through window, using slot equal-set (#35361)
example:

filter (y=1)
+-- window( ... partition by x)
    +-- project( A as x, A as y)

filter(y=1) is equivalent to filter(x=1),
because x and y are in the same equal-set in window#logicalProperties.
And hence we could push filter(y=1) through window operator
2024-05-28 13:16:53 +08:00
2310915c26 [fix](pipeline) Fix query hang if limited rows is reached (#35466)
## Proposed changes

Some operators has limit condition, the source operator should notify
the sink operator that limit reached.
Although FE has limit logic but it not always send .

## Further comments

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
2024-05-28 13:15:31 +08:00
f6540d52cb [regression-test](fix) fix schema_change_p2/test_schema_change.groovy case (#35470) 2024-05-28 13:14:27 +08:00
2f7280be7d [regression-test](fix) fix sql_block_rule_p0/test_sql_block_rule.groovy case bug (#35471) 2024-05-28 13:14:27 +08:00
9c15a857d3 [fix](tools) update tools cases #35467
Remove useless filter of tpcds sf1000 query78
2024-05-28 13:13:47 +08:00
dfcabf8d47 [fix](nereids) set mark join reference for bitmap-in-apply (#35435)
bitmap filter is implemented before mark-join. When support mark-join, we forgot to update the bitmap-filter branch.
when convert a bitmap-apply-in to join, we should set markjoinReference to the join if there are markJoinRefereneces
2024-05-28 13:13:41 +08:00
ac49576229 [Fix](nereids) fix merge aggregate setting top projection bug (#35348)
introduced by #31811

sql like this:

    select col1, col2 from  (select a as col1, a as col2 from mal_test1 group by a) t group by col1, col2 ;

Transformation Description:
In the process of optimizing the query, an agg-project-agg pattern is transformed into a project-agg pattern:
Before Transformation:

LogicalAggregate
+-- LogicalPrject
    +-- LogicalAggregate

After Transformation:

LogicalProject
+-- LogicalAggregate

Before the transformation, the projection in the LogicalProject was a AS col1, a AS col2, and the outer aggregate group by keys were col1, col2. After the transformation, the aggregate group by keys became a, a, and the projection remained a AS col1, a AS col2.

Problem:
When building the project projections, the group by key a, a needed to be transformed to a AS col1, a AS col2. The old code had a bug where it used the slot as the map key and the alias in the projections as the map value. This approach did not account for the situation where aliases might have the same slot.

Solution:
The new code fixes this issue by using the original outer aggregate group by expression's exprId. It searches within the original project projections to find the NamedExpression that has the same exprId. These expressions are then placed into the new projections. This method ensures that the correct aliases are maintained, resolving the bug.
2024-05-28 13:13:31 +08:00
7c808fcecf [bugfix] Fix the case is unstable because Table[tbl_scalar_types_dup]'s state(ROLLUP) is not NORMAL (#35460) 2024-05-28 13:12:27 +08:00
3aab6b1d61 [chore](regression) add debug log for flaky case of test_stream_load_cast (#35441) 2024-05-28 13:12:15 +08:00
c38c939b52 [bug](Fe) fix potential deadlock in show proc statement (#34988) 2024-05-28 13:12:03 +08:00
f8fcd17f33 [fix](memory) Fix nested scoped tracker and nested reserve memory (#35257)
SCOPED_ATTACH_TASK cannot be nested, but SCOPED_SWITCH_THREAD_MEM_TRACKER_LIMITER can continue to be called, so attach_limiter_tracker may be nested.
2024-05-28 13:12:03 +08:00
9d6b2d66ca [feature](metrics)support be jvm metrics. (#35023)
support be jvm metrics.
if you `curl http://be_host:webserver_port/metrics` , you will get :
```
doris_be_jvm_heap_size_bytes{type="max"} 8589934592
doris_be_jvm_heap_size_bytes{type="committed"} 8589934592
doris_be_jvm_heap_size_bytes{type="used"} 364159504

doris_be_jvm_non_heap_size_bytes{type="committed"} 117899264
doris_be_jvm_non_heap_size_bytes{type="used"} 115330424

doris_be_jvm_young_size_bytes{type="used"} 255852544
doris_be_jvm_young_size_bytes{type="peak_used"} 255852544
doris_be_jvm_young_size_bytes{type="max"} 8589934592

doris_be_jvm_old_size_bytes{type="used"} 94393344
doris_be_jvm_old_size_bytes{type="peak_used"} 94393344
doris_be_jvm_old_size_bytes{type="max"} 8589934592

doris_be_jvm_gc{name="G1 Young Generation Count", type="count"} 3
doris_be_jvm_gc{name="G1 Young Generation Time", type="time"} 33
doris_be_jvm_gc{name="G1 Old Generation Count", type="count"} 0
doris_be_jvm_gc{name="G1 Old Generation Time", type="time"} 0

doris_be_jvm_thread{type="count"} 147
doris_be_jvm_thread{type="peak_count"} 147
doris_be_jvm_thread{type="new_count"} 0
doris_be_jvm_thread{type="runnable_count"} 25
doris_be_jvm_thread{type="blocked_count"} 0
doris_be_jvm_thread{type="waiting_count"} 48
doris_be_jvm_thread{type="timed_waiting_count"} 74
doris_be_jvm_thread{type="terminated_count"} 0
```
2024-05-28 13:12:03 +08:00
79cd726132 [Fix](inverted index) fix race condition in index build (#35427)
Fix race condition problem introduced by #35366 , which will cause heap-use-after-free
2024-05-28 13:12:03 +08:00
d8eefd0be8 [fix] fix wrong result of spill agg with limit (#35403) 2024-05-28 13:12:03 +08:00
7058b31edd [fix](move-memtable) clear load streams before shutdown SegmentFileWriterThreadPool (#35217) 2024-05-28 13:12:03 +08:00
f0e883c968 [Fix](executor)Fix backend_active_tasks only scan one be (#35490)
## Proposed changes
Fix ```select * from backend_active_tasks``` but only return one random
be info.
2024-05-28 11:48:42 +08:00
238e218312 [fix](httpapi) restore compaction/run_status api can show be's overall compaction status and refactor code (#35409) 2024-05-28 09:43:43 +08:00
8ff95a00f3 [Fix](test) fix test case output for inverted_index_p0.test_tokenize (#35464) 2024-05-27 19:19:24 +08:00
8c4f5af708 [opt](Nereids) auto fallback when insert unsupport catalog (#33353) (#35453)
pick from master #33353
2024-05-27 16:58:35 +08:00
1a52e4f7db [chore](mtmv)Optimize mtmv logs and exception information (#34957) (#35446)
pick from master #34957

1. Change some logs to debug.
2. Error prompt changed from MTMV to async materialized view
2024-05-27 16:35:13 +08:00
a32db25070 [enhance](mtmv) allow add index for MTMV (#34225) (#35443)
Previously, the limitation on whether operations can be performed on materialized views was to determine `opType`.

Now, a `allowOpMTMV()` method is implemented through various `clauses`.

Because some operations have the same `opType`, but some operations allow and some do not.

For example, the `opType` for both `add column` and `create index` is `SCHEMA-CHANGE`, but `add column` is not allowed and `create index` is allowed.
2024-05-27 16:22:16 +08:00
d71e9d34fe [Bugfix] Fix mv column type is not changed when do schema change (#34598) 2024-05-27 15:28:12 +08:00
596fb6f327 [improve](ub) fix some runtime error of ubsan when downcast (#35343)
those code could work well, but it will be report some runtime error under UBSAN,
so refactor it to let's ubsan could running happy.
2024-05-27 15:27:43 +08:00
c44affb43f Add downgrade scan thread num by column num (#35351) 2024-05-27 15:27:12 +08:00
6d362c1061 [fix](hint) fix hint tests with different be instances (#35188)
Problem:
When using multiple be to test hint with distribute hint, the result would be unstable
Solved:
Add ordered hint to every distribute hint and move some leading hint cases to check containing of hint infomation
2024-05-27 15:27:05 +08:00
68eda58a8c [Fix](multi-catalog) Fix string dict filtering when use null related function in parquet and orc reader. (#35335)
The following sql and when the dictionary column contains functions related to null, the results will be incorrect.
```
select * from ( select IF(o_orderpriority IS NULL, 'null', o_orderpriority) AS o_orderpriority from test_string_dict_filter_orc ) as A where o_orderpriority = 'null';
```
```
select * from ( select IFNULL(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null'
```
```
select * from ( select COALESCE(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null';
```
2024-05-27 15:25:29 +08:00
Pxl
82ff29faea [Chore](materialized-view) forbid create mv on row store table (#35360)
forbid create mv on row store table
2024-05-27 15:25:16 +08:00
7284b6959f [Configurations](multi-catalog)Fix enable_orc_filter_by_min_max functionality, the mistake for #35012. (#35320)
fix bug introduced from  #35012
2024-05-27 15:25:07 +08:00