Commit Graph

18802 Commits

Author SHA1 Message Date
b1795d44ec [bugfix](hive)fix testcase for test_hive_write_different_path (#35209)
Hive's test environment uses docker, so when using 127.0.0.1,
BE will write the file to the docker of its own machine.
But if FE and are not on the same machine,
FE cannot read this file because it can only read docker on its own machine. 
Therefore, the address 127.0.0.1 cannot be used in the test environment.
2024-05-27 15:24:30 +08:00
5ab5ec3d0d [Fix](inverted index) fix build index wrong size for inverted index (#35366) 2024-05-27 15:24:17 +08:00
2422439e45 [Update](regression) add case for inverted index (#35305)
Co-authored-by: Kang <kxiao.tiger@gmail.com>
2024-05-27 15:24:09 +08:00
a82c6e869e [fix](Nereids) LogicalEmptyRelation type is wrong (#35382) 2024-05-27 15:23:46 +08:00
f99b2f0f82 [branch-2.1][hotfix](jdbc table) Restoring a table type that should not be deleted (#35434)
* [hotfix](jdbc table) Restoring a table type that should not be deleted

* add comment
2024-05-27 14:39:36 +08:00
2e20e38523 [improvement](jdbc catalog) remove useless jdbc catalog code (#34986) (#35418) 2024-05-27 14:25:26 +08:00
e3b4d4e630 Reset workload_group_max_num for regression test (#35430) 2024-05-27 14:10:25 +08:00
b6eaf95720 [fix](memory) Fix BE memory info compatible with Cgroup (#35412) (#35425)
1. `memory.usage_in_bytes ~= free.used + free.(buff/cache) - (buff)`, free cache can be reused,
   so, modify cgroup_memory_usage = memory.usage_in_bytes - memory.meminfo["Cached"].
2. If system not configured with cgroup, find cgroup file path will failed, refactor refresh cgroup memory info, compatible with find failed.
2024-05-27 12:31:44 +08:00
af986c370b [feat](Nereids): Put the Child with Least Row Count in the First Position of Intersect (#34290) (#35339)
In this pull request, we optimize the ordering of children in the Intersect operator to improve query performance. The proposed change is to place the child with the least row count in the first position of the Intersect operator.

The rationale behind this optimization is that the Intersect operator works by first evaluating the leftmost child and then iterating through the results of the other children to find matching rows. By placing the child with the least row count first, we can minimize the number of iterations required to find the matching rows, thereby reducing the overall execution time of the query.
2024-05-27 11:52:35 +08:00
a9bd98d65b [fix](nereids)AdjustNullable rule should handle union node with no children (#35323) 2024-05-27 10:06:20 +08:00
83cbb4e255 fix cloud mode 2024-05-27 09:56:26 +08:00
8f5deb10be [be](oom) add stacktrace in debugmode to find oom reason 2024-05-26 23:39:46 +08:00
ade1841a01 [fix](shuffle) Do not return error if local recvr is null (#35399) 2024-05-26 20:20:50 +08:00
6e17dc1e87 (cherry-pick)[branch-2.1] add calc tablet file crc and fix single compaction test #33076 #34915 (#35215)
* [fix](compaction test) show single replica compaction status and fix test (#33076)
* [improve](http action) add http interface to calculate the crc of all files in tablet (#34915)
2024-05-26 17:15:09 +08:00
a79b436b12 remove iscloud mode 2024-05-25 19:29:47 +08:00
65b9e5ab69 [fix](chore) fix DCHECK failure of BufferWritable if failed to alloc memory (#35345) 2024-05-25 17:48:04 +08:00
fff6ab933c [fix](clean trash) Add clean trash regression case (#35330) 2024-05-25 17:47:51 +08:00
952875b437 [chore](restore) Add logs about the restore table state (#35363) 2024-05-25 17:47:38 +08:00
806b7d68e4 [regression-test](fix) runtime_filter.groovy case bug (#35368) 2024-05-25 17:47:29 +08:00
Pxl
b143f0dfe2 [Improvement](date) shortcut for str to date parse (#35288)
shortcut for str to date parse
2024-05-25 17:47:20 +08:00
80ba873d84 [regression-test](fix) test_date_diff case bug (#35356) 2024-05-25 17:46:57 +08:00
34e5030702 [bugifx](core) fix logical error of status check in nestedloop join (#35365) 2024-05-25 17:46:44 +08:00
c6c90ff63e [chore](routine-load) make routine_load_consumer_pool_size can update using HTTP API (#35315) 2024-05-25 17:46:29 +08:00
41c3a27bce [minor](nereids): remove useless code (#35325) 2024-05-25 17:44:39 +08:00
5bcdc75283 fix compile 2024-05-25 09:00:48 +08:00
9c6a6893d9 [fix](mtmv) Fix npe when the id of base table in mv is lager than Integer.MAX_VALUE (#35294) (#35384)
This brought by #34768
2024-05-24 23:27:08 +08:00
9af493f3f9 [fix](mtmv) Fix table id overturn and optimize get table qualifier method (#34768) (#35381)
commitid: 806e241
pr: #34768

Table id may be the same but actually they are different tables. so we optimize the
org.apache.doris.nereids.rules.exploration.mv.mapping.RelationMapping#getTableQualifier with following code:

Objects.hash(table.getDatabase().getCatalog().getId(), table.getDatabase().getId(), table.getId())

table id is long, we identify the table used in mv rewrite is bitSet. the bitSet can only use int, so we mapping the long id to init id in every query when mv rewrite
2024-05-24 21:19:15 +08:00
62998719df [opt](mtmv) Add threshold for relation mapping num when query rewrite (#34694) (#35378)
if query and mv def is as following:

    def mv1_1 = """
        select  t1.L_LINENUMBER,t2.l_extendedprice, t2.L_ORDERKEY
        from lineitem t1
        inner join lineitem t2 on t1.L_ORDERKEY = t2.L_ORDERKEY;
    """
    def query1_1 = """
        select  t1.L_LINENUMBER, t2.L_ORDERKEY
        from lineitem t1
        inner join lineitem t2 on t1.L_ORDERKEY = t2.L_ORDERKEY;
    """

this will generate relation mapping  by Cartesian, if the num of self join is too much, this will cause the performance problem
so we add `materialized_view_relation_mapping_max_count` session varaible, default 8. if actual num is greater than the value, the excess relation mapping is discarded.
2024-05-24 20:36:29 +08:00
0f550aeda7 [fix](compression) handle exception to reuse compression context (#35338) (#35380)
* [fix](compression) handle exception to reuse compression context

Otherwise, there is memleak and new context is allocated, then flush tlb
consumes a lot sys cpu.
2024-05-24 19:56:27 +08:00
3eeb83ff11 [test](fix) Fix test check fail when test nested mv hit (#34293) (#35375)
pick from master commit id: d20b18f pr: #34293

if mv3 is def as following:
select c1, c2, c3 from t1;

mv4 is def as following:
select c1, c2 from mv3;

when query is
select c1, c2 from t1;

the mv3 and mv4 both can be rewritten successfully
2024-05-24 19:47:16 +08:00
cf84998711 Revert "[fix](broker load) Make Config.enable_pipeline_load works as expected for BrokerLoad (#35105)"
This reverts commit e8fb47bec1a1cfc7b07a6ed4eb36283407a4a9fe.
2024-05-24 19:28:34 +08:00
c4b2ddd688 [Fix](Variant) clear block after a flush complete (#35226) (#35372)
Otherwise result in crash

```
*** SIGSEGV address not mapped to object (@0x0) received by PID 4149909 (TID 4152328 OR 0x7efefc60d700) from PID 0; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:421
 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F031AD0E090 in /lib/x86_64-linux-gnu/libc.so.6
 4# doris::Status doris::vectorized::MutableBlock::merge_impl<doris::vectorized::Block const&>(doris::vectorized::Block const&) at /home/zcp/repo_center/doris_master/doris/be/src/vec/core/block.h:586
 5# doris::Status doris::vectorized::MutableBlock::merge<doris::vectorized::Block const&>(doris::vectorized::Block const&) at /home/zcp/repo_center/doris_master/doris/be/src/vec/core/block.h:521
```
2024-05-24 19:10:07 +08:00
41f29cf4cd [fix](decompress)(review) context leaked in failure path (#33622) (#35364)
* [fix](decompress)(review) context leaked in failure path

* [fix](decompress)(review) context leaked in failure path review fix

Co-authored-by: Vallish Pai <vallishpai@gmail.com>
2024-05-24 17:40:13 +08:00
88e2753e40 [fix](Nereids) fix ShowProcedureStatusCommand sendResultSet (#35355) 2024-05-24 17:22:07 +08:00
639c7ee7fb [fix](decimalv2) fix scale of decimalv2 to string (#35222) (#35359)
* [fix](decimalv2) fix scale of decimalv2 to string
2024-05-24 17:20:43 +08:00
ca86ee7b15 [fix](load) fix wrong assert and cancel load error (#35362) 2024-05-24 17:11:01 +08:00
1e07971a98 [Feat](nereids)when dealing insert into stmt with empty table source, fe returns directly (#35333)
* [Feat](nereids) when dealing insert into stmt with empty table source, fe returns directly (#34418)

When a LogicalOlapScan has no partitions, transform it to a LogicalEmptyRelation.
When dealing insert into stmt with empty table source, fe returns directly.

* [Fix](nereids) fix when insert into select empty table

---------

Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
2024-05-24 16:25:00 +08:00
bfe293c725 [fix](nereids) AdjustNullable rule should handle union node with no children (#35074)
The output slot's nullable info is not correctly calculated in union node.
Because old code only get correct result if union node has children.
But the union node may have no children but only have constantExprList.
So in that case, we should calculate output's nullable info byboth children and constantExprList.
2024-05-24 16:23:58 +08:00
f6beeb1ddd [Enhencement](tvf) select tvf supports using resource (#35139)
Create an S3/HDFS resource that TVF can use it directly to access the data source.
2024-05-24 16:23:58 +08:00
d6e8fb7d77 [feature](mtmv) Support agg state roll up and optimize the roll up code (#35026)
agg_state is agg  intermediate state, detail see 
state combinator: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/combinators/state

this support agg function roll up as following
 
+---------------------+---------------------------------------------+---------------------+
| query               | materialized view                           | roll up             |
| ------------------- | ------------------------------------------- | ------------------- |
| agg_funtion()       | agg_funtion_unoin()  or agg_funtion_state() | agg_funtion_merge() |
| agg_funtion_unoin() | agg_funtion_unoin() or agg_funtion_state()  | agg_funtion_union() |
| agg_funtion_merge() | agg_funtion_unoin() or agg_funtion_state()  | agg_funtion_merge() |
+---------------------+---------------------------------------------+---------------------+

for example which can be rewritten by mv sucessfully as following

MV defination is

```
            select
            o_orderstatus,
            l_partkey,
            l_suppkey,
            sum_union(sum_state(o_shippriority)),
            group_concat_union(group_concat_state(l_shipinstruct)),
            avg_union(avg_state(l_linenumber)),
            max_by_union(max_by_state(l_shipmode, l_suppkey)),
            count_union(count_state(l_orderkey)),
            multi_distinct_count_union(multi_distinct_count_state(l_shipmode))
            from lineitem
            left join orders
            on lineitem.l_orderkey = o_orderkey and l_shipdate = o_orderdate
            group by
            o_orderstatus,
            l_partkey,
            l_suppkey;
```

Query is

```
            select
            o_orderstatus,
            l_suppkey,
            sum(o_shippriority),
            group_concat(l_shipinstruct),
            avg(l_linenumber),
            max_by(l_shipmode,l_suppkey),
            count(l_orderkey),
            multi_distinct_count(l_shipmode)
            from lineitem
            left join orders 
            on l_orderkey = o_orderkey and l_shipdate = o_orderdate
            group by
            o_orderstatus,
            l_suppkey;
```
2024-05-24 16:23:58 +08:00
4b91ad003f [opt](memory) avoid allocate memory in agg operator constructor (#35301)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-05-24 16:23:58 +08:00
c4776a48f2 [fix](regression-test) fix test_tvf_view_count_p2 regression test (#35216)
coused by: #34642

it must set verbose true
2024-05-24 16:23:58 +08:00
e6027ca9d7 [fix](p2-test) fix test_export_with_parallelism case (#35283) 2024-05-24 16:23:58 +08:00
bbf502dfcf [fix](create-table)The CREATE TABLE IF NOT EXISTS AS SELECT statement should refrain from performing any INSERT operations if the table already exists (#35210) 2024-05-24 16:23:58 +08:00
708b5b548c [fix](ui): fix data preview error (#34521) 2024-05-24 16:23:58 +08:00
bd4dd94c24 [Fix](nereids) add checkBlockRules() check for create view and alter view (#34104) 2024-05-24 16:23:58 +08:00
d85ea83b73 [test](case) Remove sensitive information in k8s deploy test (#35185)
Remove sensitive information from the k8s deployment test, otherwise the code base security check fails.
2024-05-24 16:23:58 +08:00
0e2b7480b7 [fix](regression-test) line_delimiter parse error in regression_test test_tvf_based_broker_load (#35001) 2024-05-24 16:23:58 +08:00
309503855e [Fix](bloom filter) Fix bloom filter memory leak (#34871)
* Issue: Doris occasionally encounters an issue where memory usage becomes exceptionally high and does not decrease. The leaked memory is occupied by Bloom filters stored in memory.

Reason: The segment cache stores segment objects read from files into memory. It functions as an LRU cache with an eviction strategy: when the number of segments exceeds the maximum number, or the total memory size of segment objects in the cache exceeds the maximum usage, it evicts the older segments. However, there is a piece of logic in the code that first reads the segment object into memory, assuming it occupies memory size A, then places the read segment object into the cache (at this point, the cache considers the segment object size to be A). It then reads the segment's Bloom filter from the file and assigns it to the segment's Bloom filter member variable, assuming the Bloom filter occupies memory size B. Thus, the total size of the segment object at this point is A+B. However, the cache does not update this size, leading to the actual size of the segment object stored in the cache (A+B) being larger than the size considered by the cache (A). When the number of segment objects in the cache increases to a certain extent, the used memory will surge dramatically. However, the cache does not perceive the size as reaching the eviction limit, so it does not evict the segment objects. In such cases, a memory leak issue arises.

Solution: Since each segment object only reads the Bloom filter once, the issue can be resolved by changing the logic from reading the segment, placing it into the cache, and then reading the Bloom filter to reading the segment, reading the Bloom filter, and then placing it into the cache.
2024-05-24 16:23:58 +08:00
e02dcecb0a [optimize](regression)Add retry for curl request (#35260)
Co-authored-by: Luennng <luennng@gmail.com>
2024-05-24 16:23:58 +08:00