doris

Author	SHA1	Message	Date
airborne12	009b300abd	[Fix](ScannerScheduler) fix dead lock when shutdown group_local_scan_thread_pool (#21553 )	2023-07-06 13:09:37 +08:00
Ashin Gau	9adbca685a	[opt](hudi) use spark bundle to read hudi data (#21260 ) Use spark-bundle to read hudi data instead of using hive-bundle to read hudi data. Advantage for using spark-bundle to read hudi data: 1. The performance of spark-bundle is more than twice that of hive-bundle 2. spark-bundle using `UnsafeRow` can reduce data copying and GC time of the jvm 3. spark-bundle support `Time Travel`, `Incremental Read`, and `Schema Change`, these functions can be quickly ported to Doris Disadvantage for using spark-bundle to read hudi data: 1. More dependencies make hudi-dependency.jar very cumbersome(from 138M -> 300M) 2. spark-bundle only provides `RDD` interface and cannot be used directly	2023-07-04 17:04:49 +08:00
morrySnow	90dd8716ed	[refactor](multicast) change the way multicast do filter, project and shuffle (#21412 ) Co-authored-by: Jerry Hu <mrhhsg@gmail.com> 1. Filtering is done at the sending end rather than the receiving end 2. Projection is done at the sending end rather than the receiving end 3. Each sender can use different shuffle policies to send data	2023-07-04 16:51:07 +08:00
Xinyi Zou	b86dd11a7d	[fix](pipeline) refactor olap table sink close (#20771 ) For pipeline, olap table sink close is divided into three stages, try_close() --> pending_finish() --> close() only after all node channels are done or canceled, pending_finish() will return false, close() will start. this will avoid block pipeline on close(). In close, check the index channel intolerable failure status after each node channel failure, if intolerable failure is true, the close will be terminated in advance, and all node channels will be canceled to avoid meaningless blocking.	2023-07-04 11:27:51 +08:00
yongjinhou	df23ab3f29	[Enhancement](tvf) Add authentication for workload group tvf (#21323 )	2023-06-30 12:56:23 +08:00
Jerry Hu	7f0e37069f	[improvement](olap) filter the whole segment by dictionary (#21239 )	2023-06-29 10:34:29 +08:00
DongLiang-0	a6b51ec19a	[Feature](avro) Support Apache Avro file format (#19990 ) support read avro file by hdfs() or s3() . ```sql select * from s3( "uri" = "http://127.0.0.1:9312/test2/person.avro", "ACCESS_KEY" = "ak", "SECRET_KEY" = "sk", "FORMAT" = "avro"); +--------+--------------+-------------+-----------------+ \| name \| boolean_type \| double_type \| long_type \| +--------+--------------+-------------+-----------------+ \| Alyssa \| 1 \| 10.0012 \| 100000000221133 \| \| Ben \| 0 \| 5555.999 \| 4009990000 \| \| lisi \| 0 \| 5992225.999 \| 9099933330 \| +--------+--------------+-------------+-----------------+ select * from hdfs( "uri" = "hdfs://127.0.0.1:9000/input/person2.avro", "fs.defaultFS" = "hdfs://127.0.0.1:9000", "hadoop.username" = "doris", "format" = "avro"); +--------+--------------+-------------+-----------+ \| name \| boolean_type \| double_type \| long_type \| +--------+--------------+-------------+-----------+ \| Alyssa \| 1 \| 8888.99999 \| 89898989 \| +--------+--------------+-------------+-----------+ ``` current avro reader only support common data type, the complex data types will be supported later.	2023-06-28 21:15:35 +08:00
Gabriel	e348b9464e	[scan](freeblocks) use ConcurrentQueue to replace vector for free blocks (#21241 )	2023-06-28 15:10:07 +08:00
Lijia Liu	76bdcf1d26	[improvement](pipeline) task group scan entity (#19924 )	2023-06-25 14:43:35 +08:00
TsukiokaKogane	3dfeee3946	[fix](typesystem) fix wrong return type argument cause type check fail (#21082 )	2023-06-22 00:04:46 +08:00
Gabriel	81abdeffbc	[Improvement](pipeline) Improve shared scan performance (#20785 )	2023-06-21 14:36:05 +08:00
Kang	2c11ce0a02	[bugfix](topn) fix key topn merge block conflict with index predicate result columns (#20820 )	2023-06-20 21:23:00 +08:00
Ashin Gau	923f7edad0	[opt](hudi) using native reader to read the base file with no log file (#20988 ) Two optimizations: 1. Insert string bytes directly to remove decoding&encoding process. 2. Use native reader to read the hudi base file if it has no log file. Use `explain` to show how many splits are read natively.	2023-06-20 11:20:21 +08:00
yongjinhou	26cca5e00a	[Enhancement](tvf) Add frontends table-valued-function (#20857 )	2023-06-19 13:57:40 +08:00
YueW	d6b7640cf0	[fix](inverted index) fix check failed for block erase temp column (#20924 )	2023-06-18 19:27:48 +08:00
xzj7019	ab32299ba4	[feature](nereids) Support multi target rf #20714 Support multi target runtime filter, mainly for set operation, such as union/intersect/except.	2023-06-16 20:26:00 +08:00
Qi Chen	b7a50a09fe	[Opt](orc-reader) Optimize orc reader by dict filtering. (#20806 ) Optimize orc reader by dict filtering. It is similar with #17594. Test result ssb-flat-100: (3 nodes) \| Query \| before opt \| after opt \| \| ------------- \|:-------------:\| ---------:\| Q1.1 \| 1.239 \| 1.145 Q1.2 \| 1.254 \| 1.128 Q1.3 \| 1.931 \| 1.644 Q2.1 \| 1.359 \| 1.006 Q2.2 \| 1.229 \| 0.674 Q2.3 \| 0.934 \| 0.427 Q3.1 \| 2.226 \| 1.712 Q3.2 \| 2.042 \| 1.562 Q3.3 \| 1.631 \| 1.021 Q3.4 \| 1.618 \| 0.732 Q4.1 \| 2.294 \| 1.858 Q4.2 \| 2.511 \| 1.961 Q4.3 \| 1.736 \| 1.446 total \| 22.004 \| 16.316	2023-06-16 13:11:37 +08:00
Pxl	17a395f5e3	[Bug](runtime-filter) fix runtime filter not register on vdata_gen_scan_node (#20787 ) fix runtime filter not register on vdata_gen_scan_node	2023-06-15 14:06:14 +08:00
yiguolei	31a4f96f01	[refactor](exprcontext) move close to expr context's dector method (#20747 ) The close method does nothing. But I am not sure we could remove it. So that I add it to dector method and remove many many calls.	2023-06-14 18:01:07 +08:00
lihangyu	0f470fec0e	[Bug](topn opt) Fix Two-Phase read when some rowset swept (#20732 ) * [Bug](topn opt) Fix Two-Phase read when some rowset swept If this is a Two-Phase read query, and we need to delay the release of Rowset by row->update_delayed_expired_timestamp() to expand the lifespan of rowsets. This is necessary to avoid data loss during the second phase reading, where some stale rowsets may be swept and result in missing data.	2023-06-14 15:46:29 +08:00
Pxl	9244cb6553	[Chore](runtime-filter) do not make query fail when rf publish failed (#20742 ) do not make query fail when rf publish failed	2023-06-13 18:23:46 +08:00
lihangyu	2dddab03a1	[compatibility](schema cache) ensure schema version when using schema cache (#20729 ) When FE is old version, be is new version, issue a schema change(add column) and then query, old version of FE query without schema version could result in reading stale schema from schema cache	2023-06-13 15:19:26 +08:00
Mingyu Chen	4b15185e25	[improvement](hdfs) add parquet footer cache and hdfs file handle cache (#20544 ) 1. Add hdfs file handle cache for hdfs file reader Copied from Impala, `https://github.com/apache/impala/blob/master/be/src/util/lru-multi-cache.h`. (Thanks for the Impala team) This is a lru cache that can store multi entries with same key. The key is build with {file name + modification time} The value is the hdfsFile pointer that point to a certain hdfs file. This cache is to avoid reopen same hdfs file mutli time, which can save query time. Add a BE config `max_hdfs_file_handle_cache_num` to limit the max number of file handle cache, default is 20000. 2. Add file meta cache The file meta cache is a lru cache. the key is {file name + modification time}, the value is the parsed file meta info of the certain file, which can save the time of re-parsing file meta everytime. Currently, it is only used for caching parquet file footer. The test show that is cache is hit, the `FileOpenTime` and `ParseFooterTime` is reduce to almost 0 in query profile, which can save time when there are lots of files to read.	2023-06-13 15:13:57 +08:00
lexluo09	57656b2459	[Enhancement](java-udf) java-udf module split to sub modules (#20185 ) The java-udf module has become increasingly large and difficult to manage, making it inconvenient to package and use as needed. It needs to be split into multiple sub-modules, such as : java-commom、java-udf、jdbc-scanner、hudi-scanner、 paimon-scanner. Co-authored-by: lexluo <lexluo@tencent.com>	2023-06-13 09:41:22 +08:00
Qi Chen	73ad885e19	[Feature][Fix](multi-catalog) Implements transactional hive full acid tables. (#20679 ) After supporting insert-only transactional hive full acid tables #19518, #19419, this PR support transactional hive full acid tables. Support hive3 transactional hive full acid tables. Hive2 transactional hive full acid tables need to run major compactions.	2023-06-13 08:55:16 +08:00
Xujian Duan	0b228b3414	[fix](load)Support load json data with default value (#20624 ) * support json default value --------- Co-authored-by: duanxujian <duanxujian@jd.com>	2023-06-12 14:51:31 +08:00
yiguolei	a6f625676b	[profile](remove child) child is for node, should not be used to organize counters (#20676 ) Currently, there are many profiles using add child profile to orgnanize profile into blocks. But it is wrong. Child profile will have a total time counter. Actually, what we should use is just a label. - MemoryUsage: - HashTable: 23.98 KB - SerializeKeyArena: 446.75 KB Add a new macro ADD_LABEL_COUNTER to add just a label in the profile. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-06-12 10:00:35 +08:00
Ashin Gau	9a83d78dfe	[Enhancement](hudi) support hudi mor table, step2 follow #19909 (#20570 ) PR(https://github.com/apache/doris/pull/19909) has implemented the framework of hudi reader for MOR table. This PR completes all functions of reading MOR table and enables end-to-end queries. Key Implementations: 1. Use hudi meta information to generate the table schema, not from hive client. 2. Use hive client to list hudi partitions, so it strongly depends the sync-tools(https://hudi.apache.org/docs/syncing_metastore/) which syncs the partitions of hudi into hive metastore. However, we may get the hudi partitions directly from .hoodie directory. 3. Remove `HudiHMSExternalCatalog`, because other catalogs like glue is compatible with hive catalog. 4. Read the COW table originally from c++. 5. Hudi RecordReader will use ProcessBuilder to start a hotspot debugger process, which may be stuck when attaching the origin JNI process, soI use a tricky method to kill this useless process.	2023-06-10 12:25:53 +08:00
YueW	656b9ad3da	[enhancement](index) Nereids support no need to read raw data for index column that only in filter conditions (#20605 )	2023-06-09 21:54:48 +08:00
Xinyi Zou	93b53cf2f4	[improvement](exception-safe) create and prepare node/sink support exception safe (#20551 )	2023-06-09 21:06:59 +08:00
Jibing-Li	195beec3a8	[Fix](external scan node)Use consistent hash to collect BE only when the file cache is enabled. #20560 Use consistent hash to collect BE only when the file cache is enabled. And move the consistent BE assign code to FederationBackendPolicy. Fix explain split number and file size incorrect bug.	2023-06-09 08:43:12 +08:00
TengJianPing	841094960f	[fix](olapscanner) fix coredump caused by concurrent acccess of olap scan node _conjuncts (#20534 ) =3073084==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60601897db80 at pc 0x55b2c993666e bp 0x7d1fbbfb66b0 sp 0x7d1fbbfb66a8 READ of size 8 at 0x60601897db80 thread T610 (_scanner_scan) #0 0x55b2c993666d in std::__shared_ptr<doris::vectorized::VExprContext, (__gnu_cxx::_Lock_policy)2>::get() const /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1291:16 #1 0x55b2dae86ec5 in doris::vectorized::VExprContext::clone(doris::RuntimeState, std::shared_ptr<doris::vectorized::VExprContext>&) /mnt/disk2/tengjianping/doris-master/be/src/vec/exprs/vexpr_context.cpp:98:5 #2 0x55b2e757b6d8 in doris::vectorized::VScanner::prepare(doris::RuntimeState, std::vector<std::shared_ptr<doris::vectorized::VExprContext>, std::allocator<std::shared_ptr<doris::vectorized::VExprContext>>> const&) /mnt/disk2/tengjianping/doris-master/be/src/vec/exec/scan/vscanner.cpp:47:13 #3 0x55b2e78e8155 in doris::vectorized::NewOlapScanner::init() /mnt/disk2/tengjianping/doris-master/be/src/vec/exec/scan/new_olap_scanner.cpp:109:5 #4 0x55b2e7551c81 in doris::vectorized::ScannerScheduler::_scanner_scan(doris::vectorized::ScannerScheduler, doris::vectorized::ScannerContext, std::shared_ptr<doris::vectorized::VScanner>) /mnt/disk2/tengjianping/doris-master/be/src/vec/exec/scan/scanner_scheduler.cpp:279:27 #5 0x55b2e7554d5e in doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext)::$_0::operator()() const::'lambda0'()::operator()() const /mnt/disk2/tengjianping/doris-master/be/src/vec/exec/scan/scanner_scheduler.cpp:202:31 #6 0x55b2e7554c14 in void std::__invoke_impl<void, doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext)::$_0::operator()() const::'lambda0'()&>(std::__invoke_other, doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext)::$_0::operator()() const::'lambda0'()&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14 #7 0x55b2e7554bb4 in std::enable_if<is_invocable_r_v<void, doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext)::$_0::operator()() const::'lambda0'()&>, void>::type std::__invoke_r<void, doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext)::$_0::operator()() const::'lambda0'()&>(doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext)::$_0::operator()() const::'lambda0'()&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:111:2 #8 0x55b2e7554a1c in std::_Function_handler<void (), doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext)::$_0::operator()() const::'lambda0'()>::_M_invoke(std::_Any_data const&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291:9 #9 0x55b2c80f2cd2 in std::function<void ()>::operator()() const /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:560:9 #10 0x55b2e755f3e4 in doris::PriorityWorkStealingThreadPool::work_thread(int) /mnt/disk2/tengjianping/doris-master/be/src/util/priority_work_stealing_thread_pool.hpp:135:17 #11 0x55b2e7563c72 in void std::__invoke_impl<void, void (doris::PriorityWorkStealingThreadPool:: const&)(int), doris::PriorityWorkStealingThreadPool&, int&>(std::__invoke_memfun_deref, void (doris::PriorityWorkStealingThreadPool:: const&)(int), doris::PriorityWorkStealingThreadPool&, int&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:74:14 #12 0x55b2e7563b44 in std::__invoke_result<void (doris::PriorityWorkStealingThreadPool:: const&)(int), doris::PriorityWorkStealingThreadPool&, int&>::type std::__invoke<void (doris::PriorityWorkStealingThreadPool:: const&)(int), doris::PriorityWorkStealingThreadPool&, int&>(void (doris::PriorityWorkStealingThreadPool:: const&)(int), doris::PriorityWorkStealingThreadPool&, int&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96:14 #13 0x55b2e7563b14 in decltype(std::__invoke((this)._M_pmf, std::forward<doris::PriorityWorkStealingThreadPool&>(fp), std::forward<int&>(fp))) std::_Mem_fn_base<void (doris::PriorityWorkStealingThreadPool::)(int), true>::operator()<doris::PriorityWorkStealingThreadPool&, int&>(doris::PriorityWorkStealingThreadPool&, int&) const /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/functional:131:11 #14 0x55b2e7563ae4 in void std::__invoke_impl<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::)(int)>&, doris::PriorityWorkStealingThreadPool&, int&>(std::__invoke_other, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::)(int)>&, doris::PriorityWorkStealingThreadPool&, int&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14 #15 0x55b2e7563a54 in std::enable_if<is_invocable_r_v<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::)(int)>&, doris::PriorityWorkStealingThreadPool&, int&>, void>::type std::__invoke_r<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::)(int)>&, doris::PriorityWorkStealingThreadPool&, int&>(std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::)(int)>&, doris::PriorityWorkStealingThreadPool&, int&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:111:2 #16 0x55b2e75639c3 in void std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::)(int)> (doris::PriorityWorkStealingThreadPool, int)>::__call<void, 0ul, 1ul>(std::tuple<>&&, std::_Index_tuple<0ul, 1ul>) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/functional:570:11 #17 0x55b2e756382d in void std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::)(int)> (doris::PriorityWorkStealingThreadPool, int)>::operator()<>() /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/functional:629:17 #18 0x55b2e7563744 in void std::__invoke_impl<void, std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::)(int)> (doris::PriorityWorkStealingThreadPool, int)>>(std::__invoke_other, std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::)(int)> (doris::PriorityWorkStealingThreadPool, int)>&&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14 #19 0x55b2e7563704 in std::__invoke_result<std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::)(int)> (doris::PriorityWorkStealingThreadPool, int)>>::type std::__invoke<std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::)(int)> (doris::PriorityWorkStealingThreadPool, int)>>(std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::)(int)> (doris::PriorityWorkStealingThreadPool, int)>&&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96:14 #20 0x55b2e75636dc in void std:🧵:_Invoker<std::tuple<std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::)(int)> (doris::PriorityWorkStealingThreadPool, int)>>>::_M_invoke<0ul>(std::_Index_tuple<0ul>) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:253:13 #21 0x55b2e75636b4 in std:🧵:_Invoker<std::tuple<std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::)(int)> (doris::PriorityWorkStealingThreadPool, int)>>>::operator()() /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:260:11 #22 0x55b2e7563638 in std:🧵:_State_impl<std:🧵:_Invoker<std::tuple<std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::)(int)> (doris::PriorityWorkStealingThreadPool, int)>>>>::_M_run() /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:211:13 #23 0x55b2eb41d0ef in execute_native_thread_routine /data/gcc-11.1.0/build/x86_64-pc-linux-gnu/libstdc++-v3/src/c++11/../../../../../libstdc++-v3/src/c++11/thread.cc:82:18 #24 0x7f1dfd4e1179 in start_thread pthread_create.c #25 0x7f1dfdd7bdf2 in clone (/lib64/libc.so.6+0xfcdf2) (BuildId: 20ee73ce1b6ac38a52440bab82ec7e28f0f5c5b9)	2023-06-07 17:00:29 +08:00
yuxuan-luo	fe63a0a3bb	[Feature](multi-catalog)support paimon catalog (#19681 ) CREATE CATALOG paimon_n2 PROPERTIES ( "dfs.ha.namenodes.HDFS1006531" = "nn2,nn1", "dfs.namenode.rpc-address.HDFS1006531.nn2" = "172.16.65.xx:4007", "dfs.namenode.rpc-address.HDFS1006531.nn1" = "172.16.65.xx:4007", "hive.metastore.uris" = "thrift://172.16.65.xx:7004", "type" = "paimon", "dfs.nameservices" = "HDFS1006531", "hadoop.username" = "hadoop", "paimon.catalog.type" = "hms", "warehouse" = "hdfs://HDFS1006531/data/paimon1", "dfs.client.failover.proxy.provider.HDFS1006531" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" );	2023-06-06 15:08:30 +08:00
Mryange	c7888f4bfa	[feature](profile)Add the filtering info of the in filter in profile #20321 image Currently, it is difficult to obtain the id of in filters,so, the some in filters's id is -1.	2023-06-06 10:24:59 +08:00
wangbo	1fc48e83f2	[fix](executor)Fix duplicate timer and add open timer #20448 1 Currently, Node's total timer couter has timed twice(in Open and alloc_resource), this may cause timer in profile is not correct. 2 Add more timer to find more code which may cost much time.	2023-06-06 08:55:52 +08:00
slothever	b7fc17da68	[feature-wip](multi-catalog)(step2)support read max compute data by JNI (#19819 ) Issue Number: #19679	2023-06-05 22:10:08 +08:00
lihangyu	f0513a861d	[Improve](Scan) add a session variable to make scan run serial (#20220 ) Parallel scanning can result in some read amplification, for example, select * from xx where limit 1 actually requires only one row of data. However, due to parallel scanning of multiple tablets, read amplification occurs, leading to performance bottlenecks in high-concurrency scenarios. This PR Adding a SessionVariable to enforce serial scanning can help mitigate this issue.	2023-06-01 15:06:35 +08:00
Lijia Liu	f9dfcb923d	[Enhancement] Change Create Resource Group Grammar (#20249 )	2023-05-31 15:23:24 +08:00
Mingyu Chen	0c98355fff	[fix](catalog) fix create catalog with resource replay issue and kerberos auth issue (#20137 ) 1. Fix create catalog with resource replay bug. If user create catalog using `create catalog hive with resource xxx`, when replaying edit log, there is a bug that resource may be dropped, causing NPE and FE will fail to start. In this PR, I add a new FE config `disallow_create_catalog_with_resource`, default is true. So that `with resource` will not be allowed, and it will be deprecated later. And also fix the replay bug to avoid NPE. 2. Fix issue when creating 2 hive catalogs to connect with and without kerberos authentication. When user create 2 hive catalogs, one use simple auth, the other use kerberos auth. The query may fail with error like: `Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections.` So I add a default property for hive catalog: `"ipc.client.fallback-to-simple-auth-allowed" = "true"`. Which means this property will be added automatically when user creating hive catalog, to avoid such problem. 3. Fix calling `hdfsExists()` issue When calling `hdfsExists()` with non-zero return code, should check if it encounters error or is file not found. 3. Some code refactor Avoid import `org.apache.parquet.Strings`	2023-05-30 16:57:39 +08:00
YueW	de08c4a57b	[enhance](match) Support match query without inverted index (#19936 )	2023-05-30 15:02:57 +08:00
lihangyu	ab8125d56f	[Improve](performance) introduce SchemaCache to cache TabletSchame & Schema (#20037 ) * [Improve](performance) introduce SchemaCache to cache TabletSchame & Schema 1. When the system is under high-concurrency load with wide table point queries, the frequent memory allocation and deallocation of Schema become evident system bottlenecks. Additionally, the initialization of TabletSchema and Schema also becomes a CPU hotspot.Therefore, the introduction of a SchemaCache is implemented to cache these resources for reuse. 2. Make some variables wrapped with std::unique<unique_ptr> Performance: \| 状态 \| QPS \| 平均响应时间 (avg) \| P99 响应时间 \| \|------------------\|-----\|------------------\|-------------\| \| 开启 SchemaCache \| 501 \| 20ms \| 34ms \| \| 关闭 SchemaCache \| 321 \| 31ms \| 61ms \| * handle schema change with schema version * remove useless header * rebase	2023-05-29 17:34:53 +08:00
Gabriel	55ccddb62c	[Conf](decimalv3) enable decimalv3 by default	2023-05-29 15:38:31 +08:00
Pxl	8376e5eefb	[Chore](build) add non-virtual-dtor, remove no-embedded-directive/no-zero-length-array (#20118 ) add non-virtual-dtor, remove no-embedded-directive/no-zero-length-array	2023-05-29 14:42:47 +08:00
Jerry Hu	9f8de89659	[refactor](exec) replace the single pointer with an array of 'conjuncts' in ExecNode (#19758 ) Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity. By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed. This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.	2023-05-29 11:47:31 +08:00
Pxl	15a7420661	[Chore](ub) fix some undefined behaviors (#19986 ) /home/zcp/repo_center/doris_master/doris/be/src/olap/rowset/segment_v2/column_reader.cpp:895:21: runtime error: load of value 423208544, which is not a valid value for type 'doris::ReaderType' /home/zcp/repo_center/doris_master/doris/be/src/vec/columns/column_decimal.cpp:260:33: runtime error: load of misaligned address 0x7fa3348b301c for type 'int64_t' (aka 'long'), which requires 8 byte alignment /home/zcp/repo_center/doris_master/doris/be/src/olap/block_column_predicate.cpp:82:24: runtime error: variable length array bound evaluates to non-positive value 0 /home/zcp/repo_center/doris_master/doris/be/src/vec/columns/column_string.h:225:26: runtime error: null pointer passed as argument 2, which is declared to never be null	2023-05-26 14:08:40 +08:00
Mryange	92a6122f74	[feature](profile)Add the filtering information of the Bloom filter in profile. (#19789 )	2023-05-26 10:56:58 +08:00
Chuanle Chen	6efe6ef6e8	[Enhancement](scanner) allocate blocks in scanner_context on demand and free them on close (#19389 ) Firstly, to reduce memory usage, we do not pre-allocate blocks, instead we lazily allocate block when upper call get_free_block. And when upper call return_free_block to return free block, we add the block to a queue for memory reuse, and we will free the blocks in the queue when the scanner_context was closed instead of destructed. Secondly, to limit the memory usage of the scanner, we introduce a variable _free_blocks_capacity to indicate the current number of free blocks available to the scanners. The number of scanners that can be scheduled will be calculated based on this value. ssb flat test previous lineorder 1.2G: load time: 3s, query time: 0.355s lineorder 5.8G: load time: 330s, query time: 0.970s load time: 349s, query time: 0.949s load time: 349s, query time: 0.955s load time: 360s, query time: 0.889s (pipeline enabled) after lineorder 1.2G: load time: 3s, query time: 0.349s lineorder 5.8G: load time: 342s, query time: 0.929s load time: 337s, query time: 0.913s load time: 345s, query time: 0.946s load time: 346s, query time: 0.865s (pipeline enabled)	2023-05-23 18:17:21 +08:00
Qi Chen	53ba46e404	[Fix][Refactor] Fix 'not member call on null pointer of type 'doris::TextConverter' error in ubsan env and refactor text converter. (#19849 ) Fix 'not member call on null pointer of type doris::TextConverter' error in ubsan env and refactor text converter.	2023-05-22 21:00:19 +08:00
luozenglin	272a7565b8	[improvement](tracing) Remove useless span levels from be side tracing (#19665 ) 1. Remove an exec node method corresponding to a span and replace it with an exec node corresponding to a span; 2. Fix some problems with tracing in pipeline.	2023-05-17 19:04:52 +08:00
Pxl	7f73749b88	[Bug](pipeline) fix distributionColumnIds not updated correct when outputColumnUnique… (#19704 ) fix distributionColumnIds not updated correct when outputColumnUnique	2023-05-17 00:13:10 +08:00

1 2 3 4 5 ...

277 Commits