doris

Author	SHA1	Message	Date
starocean999	d39bca5ec7	[fix](nereids) don't build cte producer if the consumer is empty relation (#21317 ) explain WITH cte_0 AS ( SELECT 1 AS a ) SELECT * from cte_0 t1 join cte_0 t2 on true WHERE false; before: ``` +----------------------------+ \| Explain String \| +----------------------------+ \| PLAN FRAGMENT 0 \| \| OUTPUT EXPRS: \| \| a[#1] \| \| a[#2] \| \| PARTITION: UNPARTITIONED \| \| \| \| VRESULT SINK \| \| \| \| 1:VEMPTYSET \| \| \| \| PLAN FRAGMENT 1 \| \| OUTPUT EXPRS: \| \| a[#0] \| \| PARTITION: UNPARTITIONED \| \| \| \| MultiCastDataSinks \| \| \| \| 0:VUNION \| \| constant exprs: \| \| 1 \| +----------------------------+ ``` after: ``` +----------------------------+ \| Explain String \| +----------------------------+ \| PLAN FRAGMENT 0 \| \| OUTPUT EXPRS: \| \| a[#0] \| \| a[#1] \| \| PARTITION: UNPARTITIONED \| \| \| \| VRESULT SINK \| \| \| \| 0:VEMPTYSET \| +----------------------------+ ```	2023-07-07 18:12:28 +08:00
AKIRA	cad9e8849c	[minor](stats) ADD LOG in analyze task (#21362 )	2023-07-07 18:04:15 +08:00
morrySnow	2d445bbb6d	[opt](Nereids) forbid some bad case on agg plans (#21565 ) 1. forbid all candidates that need to gather process except must do it 2. forbid do local agg after reshuffle of two phase agg of distinct 3. forbid one phase agg after reshuffle 4. forbid three or four phase agg for distinct if any stage need reshuffle 5. forbid multi distinct for one distinct agg if do not need reshuffle	2023-07-07 17:45:55 +08:00
Mingyu Chen	b471cf2045	Revert "[Enhancement](multi-catalog) Add hdfs read statistics profile. (#21442 )" (#21618 ) This reverts commit 57729bad6841ea9728e6b2cf0bd484133e7b9ead. To fix compile error	2023-07-07 17:45:31 +08:00
mch_ucchi	6b1a74af61	[Enhancement](planner&Nereids) support sql_select_limit for master (#21138 ) support sql_select_limit for original planner and Nereids. if enable the variable In original planner, add limit to the top planNode In Nereids, add limit node to the top in preprocess phase.	2023-07-07 17:18:38 +08:00
LiBinfeng	dc44345ee4	[Fix](Planner) change non boolean return type to boolean (#20599 ) Problem: When using no boolean type as return type in where or having clause, the analyzer will check the return type and throw an error. But in some other databases, this usage is enable. Solved: Cast return type to boolean in where clause and having clause. select * from * where case when *** then 1 else 0 end;	2023-07-07 17:12:41 +08:00
Mingyu Chen	0b7b5dc991	[fix](catalog) wrong required slot info causing BE crash (#21598 ) For file scan node, this is a special field `requiredSlot`, this field is set depends on the `isMaterialized` info of slot. But `isMaterialized` info can be changed during the plan process, so we must update the `requiredSlot` in `finalize` phase of scan node, otherwise, it may causing BE crash due to mismatching slot info.	2023-07-07 17:10:50 +08:00
minghong	02149ff329	[fix](nereids) Agg on unknown-stats column (#21428 )	2023-07-07 17:03:04 +08:00
zhannngchen	67afea73b1	[enhancement](merge-on-write) add more version and txn information for mow publish (#21257 )	2023-07-07 16:18:47 +08:00
caoliang-web	29dd0158cf	Delete alter system modify broker related documents (#21578 )	2023-07-07 15:34:59 +08:00
Mingyu Chen	871002c882	[fix](kerberos) should renew the kerberos ticket each half of ticket lifetime (#21546 ) Follow #21265, the renew interval of kerberos ticket should be half of config::kerberos_expiration_time_seconds	2023-07-07 14:52:36 +08:00
Qi Chen	57729bad68	[Enhancement](multi-catalog) Add hdfs read statistics profile. (#21442 ) Add hdfs read statistics profile. ``` - HdfsIO: 0ns - TotalBytesRead: 133.47 MB - TotalLocalBytesRead: 133.47 MB - TotalShortCircuitBytesRead: 133.47 MB - TotalZeroCopyBytesRead: 0.00 ```	2023-07-07 14:52:14 +08:00
morrySnow	f908ea5573	[fix](Nereids) union distinct should not prune any column (#21610 )	2023-07-07 14:38:28 +08:00
Calvin Kirs	b5f247f73f	[Improve](mysql)ensure constant time for computing hash value (#21569 )	2023-07-07 14:04:11 +08:00
Xinyi Zou	70f2ac308a	[fix](sink) fix OlapTableSink early close causes load failure #21545	2023-07-07 14:03:54 +08:00
bobhan1	2a721be4f7	[fix](partial update) correct col_nums when init agg state in memtable (#21592 )	2023-07-07 14:03:33 +08:00
airborne12	612265c717	[Enhancement](inverted index) reset global instance for InvertedIndexSearcherCache when destroy (#21601 ) This PR aims to address the need for resetting the InvertedIndexSearcherCache during the destroy of doris_be. Given that InvertedIndexSearcherCache is a global instance, it is necessary to explicitly reset its members. Implementing this change will effectively eliminate the memory leak information that currently appears when doris_be is stopped gracefully. This contributes to a cleaner and more efficient shutdown process.	2023-07-07 13:00:43 +08:00
谢健	b70fb4ca8e	[fix](test) build internal table for TPCHTest to fix testRank (#21566 )	2023-07-07 12:46:07 +08:00
谢健	d76293d9bf	[improve](doc) add doc about explain plan (#21561 )	2023-07-07 12:34:52 +08:00
Gavin Chou	53c10a2389	(chore) Disable ssl connection to FE by default for compatibility reason (#20230 ) Older MySQL client (< 5.7.28) will try to connect to server with tls1.1, which is insecure and is not supported by Doris FE. The connection will fail. We disable ssl connection support on Doris FE to keep the users' application unaffected. To enable ssl support explicitly, just put the following to fe.conf ``` enable_ssl = true ```	2023-07-07 12:24:55 +08:00
zhangstar333	bb985cd9a1	[refactor](udf) refactor java-udf execute method by using for loop (#21388 )	2023-07-07 11:43:11 +08:00
catpineapple	8272232e21	[fix](dbt) fix _MISSING_TYPE object is not callable bug (#21577 )	2023-07-07 10:45:42 +08:00
Jibing-Li	64d0e28ed0	[improvement](multi catalog)Use getPartitionsByNames to retrieve hive partitions (#21562 ) Before, we get hive partition using HMS getPartition api. In this case, each partition need to call the api once. The performance is very poor when partition number is large. This pr use getPartitionsByNames to get multiple partitions in one api call. To get 90000 partitions, the time costing is reduced to 14s from 108s.	2023-07-07 10:37:33 +08:00
Qi Chen	9ee7fa45d1	[Refactor](multi-catalog) Refactor to process splitted conjuncts for dict filter. (#21459 ) Conjuncts are currently split, so refactor source code to handle split conjuncts for dict filters.	2023-07-07 09:19:08 +08:00
Jibing-Li	9bcf79178e	[Improvement](statistics, multi catalog)Support iceberg table stats collection (#21481 ) Fetch iceberg table stats automatically while querying a table. Collect accurate statistics for Iceberg table by running analyze sql in Doris (remove collect by meta option).	2023-07-07 09:18:37 +08:00
jakevin	79221a54ca	[refactor](Nereids): remove withLogicalProperties & check children size (#21563 )	2023-07-06 20:37:17 +08:00
starocean999	fba3ae96b9	Revert "[Fix](planner) Set inline view output as non constant after analyze (#21212 )" (#21581 ) This reverts commit 0c3acfdb7c744decb7b60e372007707a55d14e00.	2023-07-06 20:30:27 +08:00
Mryange	181dad4181	[fix](executor) make elt / repeat smooth upgrade. (#21493 ) BE : 2.0，FE : 1.2 before mysql [(none)]>select elt(1, 'aaa', 'bbb'); ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]Function elt get failed, expr is VectorizedFnCall[elt](arguments=,return=String) and return type is String. mysql [test]> INSERT INTO tbb VALUES (1, repeat("test1111", 8192))(2, repeat("test1111", 131072)); mysql [test]>select k1, md5(v1), length(v1) from tbb; +------+----------------------------------+--------------+ \| k1 \| md5(`v1`) \| length(`v1`) \| +------+----------------------------------+--------------+ \| 1 \| d41d8cd98f00b204e9800998ecf8427e \| 0 \| \| 2 \| d41d8cd98f00b204e9800998ecf8427e \| 0 \| +------+----------------------------------+--------------+ now mysql [test]>select elt(1, 'aaa', 'bbb'); +----------------------+ \| elt(1, 'aaa', 'bbb') \| +----------------------+ \| aaa \| +----------------------+ mysql [test]>select k1, md5(v1), length(v1) from tbb; +------+----------------------------------+--------------+ \| k1 \| md5(`v1`) \| length(`v1`) \| +------+----------------------------------+--------------+ \| 1 \| 1f44fb91f47cab16f711973af06294a0 \| 65536 \| \| 2 \| 3c514d3b89e26e2f983b7bd4cbb82055 \| 1048576 \| +------+----------------------------------+--------------+	2023-07-06 19:15:06 +08:00
zy-kkk	2d94477748	[fix](type system) fix datetimev2 write column to arrow (#21529 ) * Query id: c1d804d455a24dee-a8967d16a258fc15 * * Aborted at 1688530361 (unix time) try "date -d @1688530361" if you are using GNU date * * Current BE git commitID: f2025b9 * * SIGSEGV unknown detail explain (@0x0) received by PID 3709755 (TID 3710413 OR 0x7f5661d57700) from PID 0; stack trace: * 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t, void) at /root/doris/be/src/common/signal_handler.h:413 1# os::Linux::chained_handler(int, siginfo, void) in /usr/lib/jvm/TencentKona-8.0.12-352/jre/lib/amd64/server/[libjvm.so](http://libjvm.so/) 2# JVM_handle_linux_signal in /usr/lib/jvm/TencentKona-8.0.12-352/jre/lib/amd64/server/[libjvm.so](http://libjvm.so/) 3# signalHandler(int, siginfo, void) in /usr/lib/jvm/TencentKona-8.0.12-352/jre/lib/amd64/server/[libjvm.so](http://libjvm.so/) 4# 0x00007F5795EE5B50 in /lib64/libc.so.6 5# doris::vectorized::DateV2Value<doris::vectorized::DateTimeV2ValueType>::to_buffer(char, int) const at /root/doris/be/src/vec/runtime/vdatetime_value.cpp:2409 6# doris::vectorized::DataTypeDateTimeV2SerDe::write_column_to_arrow(doris::vectorized::IColumn const&, unsigned char const, arrow::ArrayBuilder, int, int) const in /mnt/disk2/zhaobingquan/doris/be/lib/doris_be 7# doris::vectorized::DataTypeNullableSerDe::write_column_to_arrow(doris::vectorized::IColumn const&, unsigned char const, arrow::ArrayBuilder, int, int) const at /root/doris/be/src/vec/data_types/serde/data_type_nullable_serde.cpp:120 8# doris::FromBlockConverter::convert(std::shared_ptr<arrow::RecordBatch>) at /root/doris/be/src/util/arrow/block_convertor.cpp:392 9# doris::convert_to_arrow_batch(doris::vectorized::Block const&, std::shared_ptr<arrow::Schema> const&, arrow::MemoryPool, std::shared_ptr<arrow::RecordBatch>) in /mnt/disk2/zhaobingquan/doris/be/lib/doris_be 10# doris::vectorized::MemoryScratchSink::send(doris::RuntimeState, doris::vectorized::Block, bool) at /root/doris/be/src/vec/sink/vmemory_scratch_sink.cpp:83 11# doris::PlanFragmentExecutor::open_vectorized_internal() in /mnt/disk2/zhaobingquan/doris/be/lib/doris_be 12# doris::PlanFragmentExecutor::open() at /root/doris/be/src/runtime/plan_fragment_executor.cpp:273 13# doris::FragmentExecState::execute() at /root/doris/be/src/runtime/fragment_mgr.cpp:263 14# doris::FragmentMgr::_exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::RuntimeState, doris::Status)> const&) at /root/doris/be/src/runtime/fragment_mgr.cpp:527 15# std::_Function_handler<void (), doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&, std::function<void (doris::RuntimeState, doris::Status)> const&)::$_0>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291 16# doris::ThreadPool::dispatch_thread() in /mnt/disk2/zhaobingquan/doris/be/lib/doris_be 17# doris::Thread::supervise_thread(void*) at /root/doris/be/src/util/thread.cpp:466 18# start_thread in /lib64/libpthread.so.0 19# __clone in /lib64/libc.so.6	2023-07-06 17:33:49 +08:00
Xinyi Zou	8e6b9b4026	[fix](sink) Fix NodeChannel add_block_closure null pointer (#21534 ) NodeChannel add_block_closure null pointer when canceled before open_wait new closure.	2023-07-06 17:09:43 +08:00
Kaijie Chen	dac2b638c6	[refactor](load) move memtable flush logic to flush token and rowset writer (#21547 )	2023-07-06 17:04:30 +08:00
Kaijie Chen	457de3fc55	[refactor](load) move find_tablet out of VOlapTableSink (#21462 )	2023-07-06 16:51:32 +08:00
starocean999	2e651bbc9a	[fix](nereids) fix some planner bugs (#21533 ) 1. allow cast boolean as date like type in nereids, the result is null 2. PruneOlapScanTablet rule can prune tablet even if a mv index is selected. 3. constant conjunct should not be pushed through agg node in old planner	2023-07-06 16:13:37 +08:00
LiBinfeng	0c3acfdb7c	[Fix](planner) Set inline view output as non constant after analyze (#21212 ) Problem: Select list should be non const when from list have tables or multiple tuples. Or upper query will regard wrong of isConstant And make wrong constant folding For example： when using nullif funtion with subquery which result in two alternative constant, planner would treat it as constant expr. So analyzer would report an error of order by clause can not be constant Solusion: Change inline view output to non constant, because (select 1 a from table) as view , a in output is no constant when we see view.a outside	2023-07-06 15:37:43 +08:00
LiBinfeng	068fe44493	[feature](profile) Add important time of legacy planner to profile (#20602 ) Add important time in planning process. Add time points of: // Join reorder end time queryJoinReorderFinishTime means time after analyze and before join reorder // Create single node plan end time queryCreateSingleNodeFinishTime means time after join reorder and before finish create single node plan // Create distribute plan end time queryDistributedFinishTime means time after create single node plan and before finish create distributed node plan	2023-07-06 15:36:25 +08:00
Xiangyu Wang	bb3b6770b5	[Enhancement](multi-catalog) Make meta cache batch loading concurrently. (#21471 ) I will enhance performance about querying meta cache of hms tables by 2 steps: Step1 : use concurrent batch loading for meta cache Step2 : execute some other tasks concurrently as soon as possible This pr mainly for step1 and it mainly do the following things: - Create a `CacheBulkLoader` for batch loading - Remove the executor of the previous async cache loader and change the loader's type to `CacheBulkLoader` (We do not set any refresh strategies for LoadingCache, so the previous executor is not useful) - Use a `FixedCacheThreadPool` to replace the `CacheThreadPool` (The previous `CacheThreadPool` just log warn infos and will not throw any exceptions when the pool is full). - Remove parallel streams and use the `CacheBulkLoader` to do batch loadings - Change the value of `max_external_cache_loader_thread_pool_size` to 64, and set the pool size of hms client pool to `max_external_cache_loader_thread_pool_size` - Fix the spelling mistake for `max_hive_table_catch_num`	2023-07-06 15:18:30 +08:00
Qi Chen	fde73b6cc6	[Fix](multi-catalog) Fix hadoop short circuit reading can not enabled in some environments. (#21516 ) Fix hadoop short circuit reading can not enabled in some environments. - Revert #21430 because it will cause performance degradation issue. - Add `$HADOOP_CONF_DIR` to `$CLASSPATH`. - Remove empty `hdfs-site.xml`. Because in some environments it will cause hadoop short circuit reading can not enabled. - Copy the hadoop common native libs(which is copied from https://github.com/apache/doris-thirdparty/pull/98 ) and add it to `LD_LIBRARY_PATH`. Because in some environments `LD_LIBRARY_PATH` doesn't contain hadoop common native libs, which will cause hadoop short circuit reading can not enabled.	2023-07-06 15:00:26 +08:00
shuke	06451c4ff1	fix: infinit loop when handle exceed limit memory (#21556 ) In some situation, _handle_mem_exceed_limit will alloc a large memory block, more than 5G. After add some log, we found that: alloc memory was made in vector::insert_realloc writers_to_reduce_mem's size is more than 8 million. which indicated that an infinite loop was met in while (!tablets_mem_heap.empty()). By reviewing codes, """ if (std::get<0>(tablet_mem_item)++ != std::get<1>(tablet_mem_item)) """ is wrong, which must be """ if (++std::get<0>(tablet_mem_item) != std::get<1>(tablet_mem_item)) """. In the original code, we will made ++ on end iterator, and then compare to end iterator, the behavior is undefined.	2023-07-06 14:34:29 +08:00
Gabriel	4d17400244	[profile](join) add collisions into profile (#21510 )	2023-07-06 14:30:10 +08:00
jakevin	8839518bfb	[Performance](Nereids): add withGroupExprLogicalPropChildren to reduce new Plan (#21477 )	2023-07-06 14:10:31 +08:00
airborne12	009b300abd	[Fix](ScannerScheduler) fix dead lock when shutdown group_local_scan_thread_pool (#21553 )	2023-07-06 13:09:37 +08:00
lihangyu	013bfc6a06	[Bug](row store) Fix column aggregate info lost when table is unique model (#21506 )	2023-07-06 12:06:22 +08:00
airborne12	9d2f879bd2	[Enhancement](inverted index) make InvertedIndexReader shared_from_this (#21381 ) This PR proposes several changes to improve code safety and readability by replacing raw pointers with smart pointers in several places. use enable_factory_creator in InvertedIndexIterator and InvertedIndexReader, remove explicit new constructor. make InvertedIndexReader shared_from_this, it may desctruct when InvertedIndexIterator use it.	2023-07-06 11:52:59 +08:00
Yongqiang YANG	fb14950887	[refactor](load) split flush_segment_writer into two parts (#21372 )	2023-07-06 11:13:34 +08:00
AlexYue	80be2bb220	[bugfix](RowsetIterator) use valid stats when creating segment iterator (#21512 )	2023-07-06 10:35:16 +08:00
Siyang Tang	b1be59c799	[enhancement](query) enable strong consistency by syncing max journal id from master (#21205 ) Add a session var & config enable_strong_consistency_read to solve the problem that loading result may be shortly invisible to follwers, to meet users requirements in strong consistency read scenario. Will sync max journal id from master and wait for replaying.	2023-07-06 10:25:38 +08:00
HHoflittlefish777	6a0a21d8b0	[regression-test](load) add streamload default value test (#21536 )	2023-07-06 10:14:13 +08:00
Kaijie Chen	688a1bc059	[refactor](load) expand OlapTableValidator to VOlapTableBlockConvertor (#21476 )	2023-07-06 10:11:53 +08:00
YueW	a2e679f767	[fix](status) Return the correct error code when clucene error occured (#21511 )	2023-07-06 09:08:11 +08:00
Mingyu Chen	c1e82ce817	[fix](backup) fix show snapshot cauing mysql connection lost (#21520 ) If this is no `info file` in repository, the mysql connection may lost when user executing `show snapshot on repo`, ``` 2023-07-05 09:22:48,689 WARN (mysql-nio-pool-0\|199) [ReadListener.lambda$handleEvent$0():60] Exception happened in one session(org.apache.doris.qe.ConnectContext@730797c1). java.io.IOException: Error happened when receiving packet. at org.apache.doris.qe.ConnectProcessor.processOnce(ConnectProcessor.java:691) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.mysql.ReadListener.lambda$handleEvent$0(ReadListener.java:52) ~[doris-fe.jar:1.2-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_322] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_322] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_322] ``` This is because there are some field missing in returned result set.	2023-07-05 22:44:57 +08:00

1 2 3 4 5 ...

11729 Commits