doris

Author	SHA1	Message	Date
starocean999	86d7233b06	[fix](nereids) ExtractAndNormalizeWindowExpression rule should push down correct exprs to child (#21827 ) consider the window function: ```sql substr( ref_1.cp_type, sum(CASE WHEN ref_1.cp_type = 0 THEN 3 ELSE 2 END) OVER (), 1) ``` Before the pr, only "CASE WHEN ref_1.cp_type = 0 THEN 3 ELSE 2 END" is pushed down. But both "ref_1.cp_type" and "CASE WHEN ref_1.cp_type = 0 THEN 3 ELSE 2 END" should be pushed down. This pr fix it	2023-07-20 11:47:55 +08:00
Kaijie Chen	0f116ce148	Revert "[Enhancement](Nereids)enable nereids DML by default. (#21539 )" (#22013 ) This reverts commit f668b3965effbd5df4902f20b496cb6b6642414c.	2023-07-20 11:32:54 +08:00
morrySnow	a859a93b22	[fix](Nereids) should not push down project to the nullable side of outer join (#21999 )	2023-07-20 11:10:11 +08:00
Siyang Tang	3d0832d973	[fix](stmt-forward) fix forward null packet (#21979 )	2023-07-20 10:45:16 +08:00
Jack Drogon	0f5b973cb9	[Enhancement](http) Add HttpError to HttpClient::execute_with_retry (#21989 )	2023-07-20 10:43:05 +08:00
Mingyu Chen	5c7c4d90b4	[improvement](catalog) return the root cause of error when forwarding init request to master FE (#22001 )	2023-07-20 10:42:29 +08:00
Yongqiang YANG	8d36de3377	[fix](max_version) protect max_version by meta lock (#21948 ) Otherwise, the be would core dump due to non thread safe access.	2023-07-20 10:35:23 +08:00
Xinyi Zou	1afe090486	[improvement](memory) modify jemalloc conf in be.conf (#21943 ) modify jemalloc conf in be.conf disable je_purge_all_arena_dirty_pages	2023-07-20 10:34:31 +08:00
lihangyu	20242d9a0e	[Improve](simdjson) put unescaped string value after parsed (#21866 ) In some cases, it is necessary to unescape the original value, such as when converting a string to JSONB. If not unescape, then later jsonb parse will be failed	2023-07-20 10:33:17 +08:00
Xinyi Zou	7e1299bcbc	[fix](memory) fix mem tracker grace exit 2 (#22003 ) fix: #21136 mem tracker group uses class static variables instead of global variables https://stackoverflow.com/questions/2204608/does-c-call-destructors-for-global-and-class-static-variables TODO: A mem tracker manager is required, don't use global variables, it will sad ==3623982==ERROR: AddressSanitizer: heap-use-after-free on address 0x60f0000056b8 at pc 0x56478bbe3ae0 bp 0x7f20953d2270 sp 0x7f20953d2268 READ of size 8 at 0x60f0000056b8 thread T41 (memory_tracker_) * Query id: 0-0 * * Aborted at 1689749969 (unix time) try "date -d @1689749969" if you are using GNU date * * Current BE git commitID: b3e9cad48e * * SIGSEGV address not mapped to object (@0x0) received by PID 3623982 (TID 3624277 OR 0x7f19e06dd640) from PID 0; stack trace: * #0 0x56478bbe3adf in std::__shared_ptr::operator bool() const /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1295:16 #1 0x56478bbe306e in doris::MemTracker::refresh_profile_counter() /doris/be/src/runtime/memory/mem_tracker.h:149:13 #2 0x56478bbec669 in doris::MemTrackerLimiter::refresh_all_tracker_profile() /doris/be/src/runtime/memory/mem_tracker_limiter.cpp:119:22 #3 0x564788f53fa0 in doris::Daemon::memory_tracker_profile_refresh_thread() /doris/be/src/common/daemon.cpp:295:9 #4 0x564788f5d04b in doris::Daemon::start()::$_4::operator()() const /doris/be/src/common/daemon.cpp:473:30 #5 0x564788f5cff6 in void std::__invoke_impl(std::__invoke_other, doris::Daemon::start()::$_4&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14 #6 0x564788f5cf78 in std::enable_if, void>::type std::__invoke_r(doris::Daemon::start()::$_4&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:111:2 #7 0x564788f5cdae in std::_Function_handler::_M_invoke(std::_Any_data const&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291:9 #8 0x56478903f576 in std::function::operator()() const /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:560:9 #9 0x56478c4a35af in doris::Thread::supervise_thread(void) /doris/be/src/util/thread.cpp:465:5 #10 0x7f217c8a244f in start_thread nptl/pthread_create.c:473:8 #11 0x7f217cb27d52 in __clone misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95 0x60f0000056b8 is located 56 bytes inside of 168-byte region [0x60f000005680,0x60f000005728) freed by thread T0 here: #0 0x564788e7280d in operator delete(void) (/mnt/hdd01/dorisTestEnv/NEREIDS_ASAN/be/lib/doris_be+0x1758280d) (BuildId: 219493cc924323ee) #1 0x56478acec1d5 in std::default_delete::operator()(doris::MemTrackerLimiter) const /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:85:2 #2 0x56478ace9faf in std::unique_ptr >::~unique_ptr() /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:361:4 #3 0x56478ace1471 in doris::ShardedLRUCache::~ShardedLRUCache() /doris/be/src/olap/lru_cache.cpp:581:1 #4 0x56478ace14c8 in doris::ShardedLRUCache::~ShardedLRUCache() /doris/be/src/olap/lru_cache.cpp:572:37 #5 0x56478acd0984 in std::default_delete::operator()(doris::Cache) const /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:85:2 #6 0x56478acceddf in std::unique_ptr >::~unique_ptr() /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:361:4 #7 0x56478ad96dc6 in doris::StoragePageCache::~StoragePageCache() /doris/be/src/olap/page_cache.h:78:7 #8 0x7f217ca54146 in __run_exit_handlers stdlib/exit.c:108:8 previously allocated by thread T0 here: #0 0x564788e71fad in operator new(unsigned long) (/mnt/hdd01/dorisTestEnv/NEREIDS_ASAN/be/lib/doris_be+0x17581fad) (BuildId: 219493cc924323ee) #1 0x56478ace9c90 in std::_MakeUniq::__single_object std::make_unique, std::allocator > const&>(doris::MemTrackerLimiter::Type&&, std::__cxx11::basic_string, std::allocator > const&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:962:30 #2 0x56478acde930 in doris::ShardedLRUCache::ShardedLRUCache(std::__cxx11::basic_string, std::allocator > const&, unsigned long, doris::LRUCacheType, unsigned int, unsigned int) /doris/be/src/olap/lru_cache.cpp:526:20 #3 0x56478ace22e1 in doris::new_lru_cache(std::__cxx11::basic_string, std::allocator > const&, unsigned long, doris::LRUCacheType, unsigned int) /doris/be/src/olap/lru_cache.cpp:670:16 #4 0x56478ad91da2 in doris::StoragePageCache::StoragePageCache(unsigned long, int, long, unsigned int) /doris/be/src/olap/page_cache.cpp:47:17 #5 0x56478ad9156e in doris::StoragePageCache::create_global_cache(unsigned long, int, long, unsigned int) /doris/be/src/olap/page_cache.cpp:31:29 #6 0x56478b98b3d3 in doris::ExecEnv::_init_mem_env() /doris/be/src/runtime/exec_env_init.cpp:251:5 #7 0x56478b98946c in doris::ExecEnv::_init(std::vector > const&) /doris/be/src/runtime/exec_env_init.cpp:182:5 #8 0x56478b987139 in doris::ExecEnv::init(doris::ExecEnv, std::vector > const&) /doris/be/src/runtime/exec_env_init.cpp:98:17 #9 0x564788e79b50 in main /doris/be/src/service/doris_main.cpp:429:5 #10 0x7f217ca38564 in __libc_start_main csu/../csu/libc-start.c:332:16 Thread T41 (memory_tracker_) created by T0 here: #0 0x564788e1fcaa in pthread_create (/mnt/hdd01/dorisTestEnv/NEREIDS_ASAN/be/lib/doris_be+0x1752fcaa) (BuildId: 219493cc924323ee) #1 0x56478c4a2366 in doris::Thread::start_thread(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, std::function const&, unsigned long, scoped_refptr) /doris/be/src/util/thread.cpp:419:15 #2 0x564788f59b91 in doris::Status doris::Thread::create(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, doris::Daemon::start()::$_4 const&, scoped_refptr*) /doris/be/src/util/thread.h:50:16 #3 0x564788f58165 in doris::Daemon::start() /doris/be/src/common/daemon.cpp:471:10 #4 0x564788e79a96 in main /doris/be/src/service/doris_main.cpp:420:12 #5 0x7f217ca38564 in __libc_start_main csu/../csu/libc-start.c:332:16	2023-07-20 10:33:03 +08:00
lihangyu	e7f143c266	[Fix](topn opt) forbit outfile when using 2phase read (#21991 ) "Enabling two-phase query for similar select * from tbl into outfile "file:/xxx/" format as orc; queries can lead to performance issues due to the fetch operation."	2023-07-20 10:32:30 +08:00
zhangstar333	c364196577	[fuzzy](test) set topnOptLimitThreshold to 0 in fuzzy test temporary (#21952 ) Now P0 pipeline test have some failed cese about topn, but can't reproduce at local So set this threshold to 0 temporary.	2023-07-20 10:22:22 +08:00
Jack Drogon	28a6a2e44d	[Enhancement](binlog) Add partitionRange && indexIds in UpsertRecord && PartitionCommitInfo (#22005 )	2023-07-20 09:52:21 +08:00
zy-kkk	2daad2151d	[enhancement](jdbc catalog) Add mysql jdbc catalog function to filter push-down identification (#21745 )	2023-07-19 23:48:23 +08:00
LiBinfeng	58f2593ba1	[Fix](Nereids) Add cast comparison with slot reference when inferring predicate (#21171 ) Problem: When inferring predicate, we assume that slot reference need to be inferred. But in this case: carete table tb1(l1 smallint) ...; create table tb2(l2 int) ...; select * from tb1 inner join tb2 where tb1.l1 = tb2.l2 and tb2.l2 = 1; We can not get tb1.l1 = 1 filter because we will add a cast to l1 (Cast smallint to int l1) = l2. Solved: Add cast consideration when inferring predicate, also add change judgement when judging equals to slotreference and cast expression. But when we want to infer predicate from bigger type cast to smaller type, it is logical error. For example: select * from tb1 inner join tb2 where tb1.l1 = cast(tb2.l2 as smallint) and tb2.l2 = (number between smallint max and intmax); tb2.l2 value can not infer to left side because tb1.l1 would be false value, and when we add one more condition like tb1.l1 = tb3.l3(smallint). It would cause this predicate be false.	2023-07-19 23:14:26 +08:00
morrySnow	c581cf311a	[fix](compile) fe cannot compile since parallel merge #21902 and #21951 (#21973 )	2023-07-19 18:11:50 +08:00
zhangstar333	a51aab6d29	[FE](compile) fix master fe compile failed (#21971 ) fix master fe compile failed	2023-07-19 18:02:00 +08:00
Xinyi Zou	d180ed418d	[fix](stacktrace) Speed up stack trace (#21755 ) Introduce libunwind get stack trace, cost is negligible and has line numbers. use StackTraceCache, PHDRCache speed up, is customizable and has some optimizations. Other stack trace tools remain: glog, boost, glibc, in case for need. TODO: currently support linux __x86_64__, __arm__, __powerpc__, not supported __FreeBSD__, APPLE Note: __arm__, __powerpc__ not been verified Support signal handle libunwid support unw_backtrace for jemalloc Use of undefined compile option USE_MUSL for later	2023-07-19 15:43:14 +08:00
zzzxl	171f374f56	[improvement](invert index) Change the loading method of keyword type (#21893 ) 1. fix can not index Chinese 2. optimized invert index load	2023-07-19 15:26:49 +08:00
jakevin	0fa3efae1d	[fix](Nereids): removePhysicalExpression() should clear empty Group. (#21951 )	2023-07-19 14:41:06 +08:00
minghong	bd40767754	[stats](nereids) dump col stats for all physical plan node and cost details in memo #21902 1. print cost detail 2. dump col stats in memo	2023-07-19 14:10:26 +08:00
zhangy5	56c67a442a	[regression-test] add p0/p1 case about partition table (#21777 )	2023-07-19 14:05:56 +08:00
mch_ucchi	f668b3965e	[Enhancement](Nereids)enable nereids DML by default. (#21539 ) TODO: fix cast agg_state type when do insert	2023-07-19 13:52:15 +08:00
Xiaocc	d8272b16e9	[fix](fe) fd leak of ssl #19645	2023-07-19 12:45:54 +08:00
amory	ce397a8d32	[FIX](map)fix arrow serde with map null key #21955	2023-07-19 12:09:34 +08:00
morrySnow	d987f782d2	[refactor](Nereids) refactor cte analyze, rewrite and reuse code (#21727 ) REFACTOR: 1. Generate CTEAnchor, CTEProducer, CTEConsumer when analyze. For example, statement `WITH cte1 AS (SELECT * FROM t) SELECT * FROM cte1`. Before this PR, we got analyzed plan like this: ``` logicalCTE(LogicalSubQueryAlias(cte1)) +-- logicalProject() +-- logicalCteConsumer() ``` we only have LogicalCteConsumer on the plan, but not LogicalCteProducer. This is not a valid plan, and should not as the final result of analyze. After this PR, we got analyzed plan like this: ``` logicalCteAnchor() \|-- logicalCteProducer() +-- logicalProject() +-- logicalCteConsumer() ``` This is a valid plan with LogicalCteProducer and LogicalCteConsumer 2. Replace re-analyze unbound plan with deepCopy plan when do CTEInline Because we generate LogicalCteAnchor and LogicalCteProducer when analyze. So, we could not do re-analyze to gnerate CTE inline plan anymore. The another reason is, we reuse relation id between unbound and bound relation. So, if we do re-analyze on unresloved CTE plan, we will get two relation with same RelationId. This is wrong, because we use RelationId to distinguish two different relations. This PR implement two helper class to deep copy a new plan from CTEProducer. `LogicalPlanDeepCopier` and `ExpressionDeepCopier` 3. New rewrite framework to ensure do CTEInline in right way. Before this PR, we do CTEInline before apply any rewrite rule. But sometimes, some CteConsumer could be eliminated after rewrite. After this PR, we do CTEInline after the plans relaying on CTEProducer have been rewritten. So we could do CTEInline if some the count of CTEConsumer decrease under the threshold of CTEInline. 4. add relation id to all relation plan node 5. let all relation generated from table implement trait CatalogRelation 6. reuse relation id between unbound relation and relation after bind ENHANCEMENT: 1. Pull up CTEAnchor before RBO to avoid break other rules' pattern Before this PR, we will generate CTEAnchor and LogicalCTE in the middle of plan. So all rules should process LogicalCTEAnchor, otherwise will generate unexpected plan. For example, push down filter and push down project should add pattern like: ``` logicalProject(logicalCTE) ... logicalFilter(logicalCteAnchor) ... ``` project and filter must be push through these virtual plan node to ensure all project and filter could be merged togather and get right order of them. for Example: ``` logicalProject +-- logicalFilter +-- logicalCteAnchor +-- logicalProject +-- logicalFilter +-- logicalOlapScan ``` upper plan will lead to translation error. because we could not do twice filter and project on bottom logicalOlapScan. BUGFIX: 1. Recursive analyze LogicalCTE to avoid bind outer relation on inner CTE For example ```sql SELECT * FROM (WITH cte1 AS (SELECT * FROM t1) SELECT * FROM cte1)v1, cte1 v2; ``` Before this PR, we will use nested cte name to bind outer plan. So the outer cte1 with alias v2 will bound on the inner cte1. After this PR, the sql will throw Table not exists exception when binding. 2. Use right way do withChildren in CTEProducer and remove projects in it Before this PR, we add an attr named projects in CTEProducer to represent the output of it. This is because we cannot get right output of it by call `getOutput` method on it. The root reason of that is the wrong implementation of computeOutput of LogicalCteProducer. This PR fix this problem and remove projects attr of CTEProducer. 3. Adjust nullable rule update CTEConsumer's output by CTEProducer's output This PR process nullable on LogicalCteConsumer to ensure CteConsumer's output with right nullable info, if the CteProducer's output nullable has been adjusted. 4. Bind set operation expression should not change children's output's nullable This PR use fix a problem introduced by prvious PR #21168. The nullable info of SetOperation's children should not changed after binding SetOperation.	2023-07-19 11:41:41 +08:00
lihangyu	c28b90a301	[Bug](topn opt) disable topn 2 phase read when storage policy is not emtpy (#21909 )	2023-07-19 10:28:41 +08:00
catpineapple	1110ff49f3	[feature-wip](dbt) exchange table temp to target table atomically (#21931 ) exchange table temp to target table atomically	2023-07-19 10:20:50 +08:00
catpineapple	21633908bd	[feature-wip](dbt) overwrite the materialization for table and view (#21935 ) overwrite the materialization for table and view	2023-07-19 10:20:29 +08:00
zclllyybb	1818526fba	[fix](profile) Fix wrong instance number in query profile (#21808 )	2023-07-19 10:00:48 +08:00
xzj7019	c993663827	[fix](nereids) fix cte as bc right side hang bug (#21897 ) During original computeMultiCastFragmentParams process, we don't handle the scenario the cte as the broadcast right side, which will lead the missing setting of the buildHashTableForBroadcastJoin flag true and finally the sql hang.	2023-07-19 09:43:31 +08:00
starocean999	5b043a980e	[fix](planner)only forbid literal value in AnalyticExpr's order by list (#21819 ) * [fix](planner)only forbid literal value in AnalyticExpr's order by list	2023-07-19 09:40:55 +08:00
AKIRA	d349c955f0	[fix](nereids) Disable auto analyze temporarily #21919	2023-07-19 09:27:24 +08:00
Xinyi Zou	e0705f1149	[chore](third-party) Introduce libunwind (#21938 )	2023-07-19 01:55:26 +08:00
Siyang Tang	24c00698f2	[fix](stmt-forward) fix should-be-required fields in forward params (#21945 ) * fix-optional-fields-in-forward-param * fix reviewed	2023-07-19 01:52:50 +08:00
HappenLee	b35cfc5d5e	[opt](join) Opt the performance of join probe (#21845 )	2023-07-19 01:21:22 +08:00
Pxl	0de94e857f	[Bug](materialized view) fix wrong match mv when mv have where clause (#21797 )	2023-07-19 01:11:39 +08:00
zclllyybb	845cf94a7a	[feature](function) support time_to_sec (#21722 ) mysql >select sec_to_time(time_to_sec(cast('16:32:18' as time))); +----------------------------------------------------+ \| sec_to_time(time_to_sec(CAST('16:32:18' AS TIME))) \| +----------------------------------------------------+ \| 16:32:18 \| +----------------------------------------------------+ 1 row in set (0.53 sec) mysql [test]>select sec_to_time(59538); +--------------------+ \| sec_to_time(59538) \| +--------------------+ \| 16:32:18 \| +--------------------+ 1 row in set (0.03 sec)	2023-07-19 01:09:48 +08:00
Jiwen liu	1c149439d7	[docs](map)Add map and struct type support parameters (#21802 )	2023-07-19 01:06:23 +08:00
herry2038	802d73f16d	[optimization](heartbeart) Rm startuptime from front heart beart class (#21904 ) --------- Co-authored-by: yuxianbing <iloveqaz123>	2023-07-19 00:56:36 +08:00
Pxl	4171309b9b	[Bug](scanner) fix core dump due to release ScannerContext too early #21946	2023-07-19 00:53:23 +08:00
Stalary	f6bfe058be	[Fix](information_schema) Schema table varchar len error #21308	2023-07-19 00:50:01 +08:00
Pxl	f87fad97e1	[Bug](storage) add lock on base tablet when create_tablet #21915	2023-07-19 00:47:19 +08:00
starocean999	fff1983f40	[fix](planner)use tupleId of agg node to get its unsigned conjuncts (#21949 )	2023-07-19 00:46:49 +08:00
yujun	beec0e9169	[Improvement](tablet clone) impr tablet sched speed and fix tablet sched failed too many times (#21856 )	2023-07-18 23:25:22 +08:00
Ashin Gau	dcb165cc9f	[opt](hudi) get hudi split concurrently by using parallelStream (#21871 ) This PR contains two optimizations: 1. Using parallel stream to get hoodie splits concurrently. It reduce the split time from 1min20s to 12s when splitting 10,000 partitions. 2. Reading hoodie meta table to get table partitions. It reduce the getting partition time from 12min to 3s when reading 10,000 partitions.	2023-07-18 23:19:34 +08:00
AKIRA	28dfcd8785	[fix](pipeline) Fix pipeline that cause plenty timeout of p0 cases #21917	2023-07-18 23:15:49 +08:00
AlexYue	d2b199955a	[bugfix](deserialize ) pack struct to avoid parse wrong content for file header (#21907 ) Recently we encountered one strange bug where the log is file length is not match. file=/mnt/hdd01/master/NO_AVX2/doris.HDD/snapshot/20230713122303.26.72000/45832/536215111/45832.hdr, file_length=, real_file_length=0 when running restore P2 case, after checking the file on the remote storage we doubt it's the local file deserialize who caused this situation. Then we analyzed the layout for the struct and the content of the hdr file then we found out that it must be the wrong layout which cause reading wrong content.	2023-07-18 22:32:41 +08:00
TengJianPing	a9ea138caf	[fix](two level hash table) fix dead loop when converting to two level hash table for zero value (#21899 ) When enable two level hash table , if there is zero value in the existing one level hash table, it will cause dead loop when converting to two level hash table, because the PartitionedHashTable::_is_partitioned flag is not set correctly when doing the converting.	2023-07-18 19:50:30 +08:00
HHoflittlefish777	c6063ed92f	[Revert](lazy open) revert lazy open and add case (#21821 )	2023-07-18 19:41:33 +08:00

1 2 3 4 5 ...

11938 Commits