doris

Author	SHA1	Message	Date
wuwenchi	b1795d44ec	[bugfix](hive)fix testcase for test_hive_write_different_path (#35209 ) Hive's test environment uses docker, so when using 127.0.0.1, BE will write the file to the docker of its own machine. But if FE and are not on the same machine, FE cannot read this file because it can only read docker on its own machine. Therefore, the address 127.0.0.1 cannot be used in the test environment.	2024-05-27 15:24:30 +08:00
airborne12	5ab5ec3d0d	[Fix](inverted index) fix build index wrong size for inverted index (#35366 )	2024-05-27 15:24:17 +08:00
airborne12	2422439e45	[Update](regression) add case for inverted index (#35305 ) Co-authored-by: Kang <kxiao.tiger@gmail.com>	2024-05-27 15:24:09 +08:00
morrySnow	a82c6e869e	[fix](Nereids) LogicalEmptyRelation type is wrong (#35382 )	2024-05-27 15:23:46 +08:00
zy-kkk	f99b2f0f82	[branch-2.1][hotfix](jdbc table) Restoring a table type that should not be deleted (#35434 ) * [hotfix](jdbc table) Restoring a table type that should not be deleted * add comment	2024-05-27 14:39:36 +08:00
zy-kkk	2e20e38523	[improvement](jdbc catalog) remove useless jdbc catalog code (#34986 ) (#35418 )	2024-05-27 14:25:26 +08:00
wangbo	e3b4d4e630	Reset workload_group_max_num for regression test (#35430 )	2024-05-27 14:10:25 +08:00
Xinyi Zou	b6eaf95720	[fix](memory) Fix BE memory info compatible with Cgroup (#35412 ) (#35425 ) 1. `memory.usage_in_bytes ~= free.used + free.(buff/cache) - (buff)`, free cache can be reused, so, modify cgroup_memory_usage = memory.usage_in_bytes - memory.meminfo["Cached"]. 2. If system not configured with cgroup, find cgroup file path will failed, refactor refresh cgroup memory info, compatible with find failed.	2024-05-27 12:31:44 +08:00
谢健	af986c370b	[feat](Nereids): Put the Child with Least Row Count in the First Position of Intersect (#34290 ) (#35339 ) In this pull request, we optimize the ordering of children in the Intersect operator to improve query performance. The proposed change is to place the child with the least row count in the first position of the Intersect operator. The rationale behind this optimization is that the Intersect operator works by first evaluating the leftmost child and then iterating through the results of the other children to find matching rows. By placing the child with the least row count first, we can minimize the number of iterations required to find the matching rows, thereby reducing the overall execution time of the query.	2024-05-27 11:52:35 +08:00
starocean999	a9bd98d65b	[fix](nereids)AdjustNullable rule should handle union node with no children (#35323 )	2024-05-27 10:06:20 +08:00
yiguolei	83cbb4e255	fix cloud mode	2024-05-27 09:56:26 +08:00
yiguolei	8f5deb10be	[be](oom) add stacktrace in debugmode to find oom reason	2024-05-26 23:39:46 +08:00
Gabriel	ade1841a01	[fix](shuffle) Do not return error if local recvr is null (#35399 )	2024-05-26 20:20:50 +08:00
Sun Chenyang	6e17dc1e87	(cherry-pick)[branch-2.1] add calc tablet file crc and fix single compaction test #33076 #34915 (#35215 ) * [fix](compaction test) show single replica compaction status and fix test (#33076) * [improve](http action) add http interface to calculate the crc of all files in tablet (#34915)	2024-05-26 17:15:09 +08:00
yiguolei	a79b436b12	remove iscloud mode	2024-05-25 19:29:47 +08:00
TengJianPing	65b9e5ab69	[fix](chore) fix DCHECK failure of BufferWritable if failed to alloc memory (#35345 )	2024-05-25 17:48:04 +08:00
deardeng	fff6ab933c	[fix](clean trash) Add clean trash regression case (#35330 )	2024-05-25 17:47:51 +08:00
walter	952875b437	[chore](restore) Add logs about the restore table state (#35363 )	2024-05-25 17:47:38 +08:00
shuke	806b7d68e4	[regression-test](fix) runtime_filter.groovy case bug (#35368 )	2024-05-25 17:47:29 +08:00
Pxl	b143f0dfe2	[Improvement](date) shortcut for str to date parse (#35288 ) shortcut for str to date parse	2024-05-25 17:47:20 +08:00
shuke	80ba873d84	[regression-test](fix) test_date_diff case bug (#35356 )	2024-05-25 17:46:57 +08:00
zhiqiang	34e5030702	[bugifx](core) fix logical error of status check in nestedloop join (#35365 )	2024-05-25 17:46:44 +08:00
HHoflittlefish777	c6c90ff63e	[chore](routine-load) make routine_load_consumer_pool_size can update using HTTP API (#35315 )	2024-05-25 17:46:29 +08:00
jakevin	41c3a27bce	[minor](nereids): remove useless code (#35325 )	2024-05-25 17:44:39 +08:00
yiguolei	5bcdc75283	fix compile	2024-05-25 09:00:48 +08:00
seawinde	9c6a6893d9	[fix](mtmv) Fix npe when the id of base table in mv is lager than Integer.MAX_VALUE (#35294 ) (#35384 ) This brought by #34768	2024-05-24 23:27:08 +08:00
seawinde	9af493f3f9	[fix](mtmv) Fix table id overturn and optimize get table qualifier method (#34768 ) (#35381 ) commitid: 806e241 pr: #34768 Table id may be the same but actually they are different tables. so we optimize the org.apache.doris.nereids.rules.exploration.mv.mapping.RelationMapping#getTableQualifier with following code: Objects.hash(table.getDatabase().getCatalog().getId(), table.getDatabase().getId(), table.getId()) table id is long, we identify the table used in mv rewrite is bitSet. the bitSet can only use int, so we mapping the long id to init id in every query when mv rewrite	2024-05-24 21:19:15 +08:00
seawinde	62998719df	[opt](mtmv) Add threshold for relation mapping num when query rewrite (#34694 ) (#35378 ) if query and mv def is as following: def mv1_1 = """ select t1.L_LINENUMBER,t2.l_extendedprice, t2.L_ORDERKEY from lineitem t1 inner join lineitem t2 on t1.L_ORDERKEY = t2.L_ORDERKEY; """ def query1_1 = """ select t1.L_LINENUMBER, t2.L_ORDERKEY from lineitem t1 inner join lineitem t2 on t1.L_ORDERKEY = t2.L_ORDERKEY; """ this will generate relation mapping by Cartesian, if the num of self join is too much, this will cause the performance problem so we add `materialized_view_relation_mapping_max_count` session varaible, default 8. if actual num is greater than the value, the excess relation mapping is discarded.	2024-05-24 20:36:29 +08:00
Yongqiang YANG	0f550aeda7	[fix](compression) handle exception to reuse compression context (#35338 ) (#35380 ) * [fix](compression) handle exception to reuse compression context Otherwise, there is memleak and new context is allocated, then flush tlb consumes a lot sys cpu.	2024-05-24 19:56:27 +08:00
seawinde	3eeb83ff11	[test](fix) Fix test check fail when test nested mv hit (#34293 ) (#35375 ) pick from master commit id: d20b18f pr: #34293 if mv3 is def as following: select c1, c2, c3 from t1; mv4 is def as following: select c1, c2 from mv3; when query is select c1, c2 from t1; the mv3 and mv4 both can be rewritten successfully	2024-05-24 19:47:16 +08:00
yiguolei	cf84998711	Revert "[fix](broker load) Make Config.enable_pipeline_load works as expected for BrokerLoad (#35105 )" This reverts commit e8fb47bec1a1cfc7b07a6ed4eb36283407a4a9fe.	2024-05-24 19:28:34 +08:00
lihangyu	c4b2ddd688	[Fix](Variant) clear block after a flush complete (#35226 ) (#35372 ) Otherwise result in crash ``` * SIGSEGV address not mapped to object (@0x0) received by PID 4149909 (TID 4152328 OR 0x7efefc60d700) from PID 0; stack trace: * 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t, void) at /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:421 1# PosixSignals::chained_handler(int, siginfo, void) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 3# 0x00007F031AD0E090 in /lib/x86_64-linux-gnu/libc.so.6 4# doris::Status doris::vectorized::MutableBlock::merge_impl<doris::vectorized::Block const&>(doris::vectorized::Block const&) at /home/zcp/repo_center/doris_master/doris/be/src/vec/core/block.h:586 5# doris::Status doris::vectorized::MutableBlock::merge<doris::vectorized::Block const&>(doris::vectorized::Block const&) at /home/zcp/repo_center/doris_master/doris/be/src/vec/core/block.h:521 ```	2024-05-24 19:10:07 +08:00
Yongqiang YANG	41f29cf4cd	[fix](decompress)(review) context leaked in failure path (#33622 ) (#35364 ) * [fix](decompress)(review) context leaked in failure path * [fix](decompress)(review) context leaked in failure path review fix Co-authored-by: Vallish Pai <vallishpai@gmail.com>	2024-05-24 17:40:13 +08:00
zy-kkk	88e2753e40	[fix](Nereids) fix ShowProcedureStatusCommand sendResultSet (#35355 )	2024-05-24 17:22:07 +08:00
TengJianPing	639c7ee7fb	[fix](decimalv2) fix scale of decimalv2 to string (#35222 ) (#35359 ) * [fix](decimalv2) fix scale of decimalv2 to string	2024-05-24 17:20:43 +08:00
Mingyu Chen	ca86ee7b15	[fix](load) fix wrong assert and cancel load error (#35362 )	2024-05-24 17:11:01 +08:00
feiniaofeiafei	1e07971a98	[Feat](nereids)when dealing insert into stmt with empty table source, fe returns directly (#35333 ) * [Feat](nereids) when dealing insert into stmt with empty table source, fe returns directly (#34418) When a LogicalOlapScan has no partitions, transform it to a LogicalEmptyRelation. When dealing insert into stmt with empty table source, fe returns directly. * [Fix](nereids) fix when insert into select empty table --------- Co-authored-by: feiniaofeiafei <moailing@selectdb.com>	2024-05-24 16:25:00 +08:00
starocean999	bfe293c725	[fix](nereids) AdjustNullable rule should handle union node with no children (#35074 ) The output slot's nullable info is not correctly calculated in union node. Because old code only get correct result if union node has children. But the union node may have no children but only have constantExprList. So in that case, we should calculate output's nullable info byboth children and constantExprList.	2024-05-24 16:23:58 +08:00
Tiewei Fang	f6beeb1ddd	[Enhencement](tvf) select tvf supports using resource (#35139 ) Create an S3/HDFS resource that TVF can use it directly to access the data source.	2024-05-24 16:23:58 +08:00
seawinde	d6e8fb7d77	[feature](mtmv) Support agg state roll up and optimize the roll up code (#35026 ) agg_state is agg intermediate state, detail see state combinator: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/combinators/state this support agg function roll up as following +---------------------+---------------------------------------------+---------------------+ \| query \| materialized view \| roll up \| \| ------------------- \| ------------------------------------------- \| ------------------- \| \| agg_funtion() \| agg_funtion_unoin() or agg_funtion_state() \| agg_funtion_merge() \| \| agg_funtion_unoin() \| agg_funtion_unoin() or agg_funtion_state() \| agg_funtion_union() \| \| agg_funtion_merge() \| agg_funtion_unoin() or agg_funtion_state() \| agg_funtion_merge() \| +---------------------+---------------------------------------------+---------------------+ for example which can be rewritten by mv sucessfully as following MV defination is ``` select o_orderstatus, l_partkey, l_suppkey, sum_union(sum_state(o_shippriority)), group_concat_union(group_concat_state(l_shipinstruct)), avg_union(avg_state(l_linenumber)), max_by_union(max_by_state(l_shipmode, l_suppkey)), count_union(count_state(l_orderkey)), multi_distinct_count_union(multi_distinct_count_state(l_shipmode)) from lineitem left join orders on lineitem.l_orderkey = o_orderkey and l_shipdate = o_orderdate group by o_orderstatus, l_partkey, l_suppkey; ``` Query is ``` select o_orderstatus, l_suppkey, sum(o_shippriority), group_concat(l_shipinstruct), avg(l_linenumber), max_by(l_shipmode,l_suppkey), count(l_orderkey), multi_distinct_count(l_shipmode) from lineitem left join orders on l_orderkey = o_orderkey and l_shipdate = o_orderdate group by o_orderstatus, l_suppkey; ```	2024-05-24 16:23:58 +08:00
yiguolei	4b91ad003f	[opt](memory) avoid allocate memory in agg operator constructor (#35301 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2024-05-24 16:23:58 +08:00
Tiewei Fang	c4776a48f2	[fix](regression-test) fix test_tvf_view_count_p2 regression test (#35216 ) coused by: #34642 it must set verbose true	2024-05-24 16:23:58 +08:00
Tiewei Fang	e6027ca9d7	[fix](p2-test) fix test_export_with_parallelism case (#35283 )	2024-05-24 16:23:58 +08:00
Calvin Kirs	bbf502dfcf	[fix](create-table)The CREATE TABLE IF NOT EXISTS AS SELECT statement should refrain from performing any INSERT operations if the table already exists (#35210 )	2024-05-24 16:23:58 +08:00
Jeffrey	708b5b548c	[fix](ui): fix data preview error (#34521 )	2024-05-24 16:23:58 +08:00
feiniaofeiafei	bd4dd94c24	[Fix](nereids) add checkBlockRules() check for create view and alter view (#34104 )	2024-05-24 16:23:58 +08:00
yongjinhou	d85ea83b73	[test](case) Remove sensitive information in k8s deploy test (#35185 ) Remove sensitive information from the k8s deployment test, otherwise the code base security check fails.	2024-05-24 16:23:58 +08:00
133tosakarin	0e2b7480b7	[fix](regression-test) line_delimiter parse error in regression_test test_tvf_based_broker_load (#35001 )	2024-05-24 16:23:58 +08:00
abmdocrt	309503855e	[Fix](bloom filter) Fix bloom filter memory leak (#34871 ) * Issue: Doris occasionally encounters an issue where memory usage becomes exceptionally high and does not decrease. The leaked memory is occupied by Bloom filters stored in memory. Reason: The segment cache stores segment objects read from files into memory. It functions as an LRU cache with an eviction strategy: when the number of segments exceeds the maximum number, or the total memory size of segment objects in the cache exceeds the maximum usage, it evicts the older segments. However, there is a piece of logic in the code that first reads the segment object into memory, assuming it occupies memory size A, then places the read segment object into the cache (at this point, the cache considers the segment object size to be A). It then reads the segment's Bloom filter from the file and assigns it to the segment's Bloom filter member variable, assuming the Bloom filter occupies memory size B. Thus, the total size of the segment object at this point is A+B. However, the cache does not update this size, leading to the actual size of the segment object stored in the cache (A+B) being larger than the size considered by the cache (A). When the number of segment objects in the cache increases to a certain extent, the used memory will surge dramatically. However, the cache does not perceive the size as reaching the eviction limit, so it does not evict the segment objects. In such cases, a memory leak issue arises. Solution: Since each segment object only reads the Bloom filter once, the issue can be resolved by changing the logic from reading the segment, placing it into the cache, and then reading the Bloom filter to reading the segment, reading the Bloom filter, and then placing it into the cache.	2024-05-24 16:23:58 +08:00
qiye	e02dcecb0a	[optimize](regression)Add retry for curl request (#35260 ) Co-authored-by: Luennng <luennng@gmail.com>	2024-05-24 16:23:58 +08:00

1 2 3 4 5 ...

18802 Commits