Commit Graph

1774 Commits

Author SHA1 Message Date
1433544c56 [fix](case expr) fix coredump of case for null value 3 #20711 2023-06-12 20:58:01 +08:00
9d47c6a871 [fix](columnstring) fix bug of columnstring prefetch (#20698) 2023-06-12 17:03:44 +08:00
99c0592157 [Feature](array-function) Support array_pushback function #17417 (#19988)
Implement array_pushback.

mysql> select array_pushback([1, 2], 3);
+--------------------------------+
| array_pushback(ARRAY(1, 2), 3) |
+--------------------------------+
| [1, 2, 3]                      |
+--------------------------------+
1 row in set (0.01 sec)
2023-06-12 16:51:12 +08:00
ea264ce9de [Opt](join) short circuit probe for join node (#20585)
Support the _short_circuit_for_probe for join node
2023-06-12 16:01:09 +08:00
0b228b3414 [fix](load)Support load json data with default value (#20624)
* support json default value

---------

Co-authored-by: duanxujian <duanxujian@jd.com>
2023-06-12 14:51:31 +08:00
Pxl
7f8c5c81e7 [Feature](agg_state) support agg_state combinator on nereids (#20164)
support agg_state combinator on nereids
2023-06-12 12:49:26 +08:00
14f59bef1d [improvement](profile)add sum/avg rpc time (#20511) 2023-06-12 11:34:49 +08:00
4c340f2851 [Feature] (Multi-Catalog) support query hll column in doris jdbc table - part 1 (#19413)
Issue Number: close #17895
2023-06-12 11:16:19 +08:00
a6f625676b [profile](remove child) child is for node, should not be used to organize counters (#20676)
Currently, there are many profiles using add child profile to orgnanize profile into blocks. But it is wrong. Child profile will have a total time counter. Actually, what we should use is just a label.

                          -  MemoryUsage:  
                              -  HashTable:  23.98  KB
                              -  SerializeKeyArena:  446.75  KB
Add a new macro ADD_LABEL_COUNTER to add just a label in the profile.

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-06-12 10:00:35 +08:00
Pxl
ab7ac31d89 [Chore](case) fix failed on test_big_pad when enable pipeline engine #20644 2023-06-12 09:15:55 +08:00
a347063390 [fix](case expr) fix coredump of case for null value 2 (#20635)
fix coredump of case for null value 2
2023-06-11 23:08:53 +08:00
9a83d78dfe [Enhancement](hudi) support hudi mor table, step2 follow #19909 (#20570)
PR(https://github.com/apache/doris/pull/19909) has implemented the framework of hudi reader for MOR table. This PR completes all functions of reading MOR table and enables end-to-end queries.
Key Implementations:
1. Use hudi meta information to generate the table schema, not from hive client.
2. Use hive client to list hudi partitions, so it strongly depends the sync-tools(https://hudi.apache.org/docs/syncing_metastore/) which syncs the partitions of hudi into hive metastore. However, we may get the hudi partitions directly from .hoodie directory.
3. Remove `HudiHMSExternalCatalog`, because other catalogs like glue is compatible with hive catalog.
4. Read the COW table originally from c++.
5. Hudi RecordReader will use ProcessBuilder to start a hotspot debugger process, which may be stuck when attaching the origin JNI process, soI use a tricky method to kill this useless process.
2023-06-10 12:25:53 +08:00
656b9ad3da [enhancement](index) Nereids support no need to read raw data for index column that only in filter conditions (#20605) 2023-06-09 21:54:48 +08:00
93b53cf2f4 [improvement](exception-safe) create and prepare node/sink support exception safe (#20551) 2023-06-09 21:06:59 +08:00
abb2048d5d [performance](executor) remove repeated call within the loop in validate_column 2023-06-09 19:59:25 +08:00
3b17cc8eb3 [Improvement](column) reduce cache miss for data copy (#20583) 2023-06-09 13:10:57 +08:00
b60860c5e5 [refactor](profile) refactor the join profile when its shared hash table (#20391)
in join node, if it's broadcast_join
and shared hash table, some counter/timer about build hash table is useless,
so we could add those counter/timer in faker profile, and those will not display in web profile.
2023-06-09 08:59:49 +08:00
4c6b99d1f9 [Fix](orc-reader) Fix the inner reader of MergeRangeFileReader is not correct when creating MergeRangeFileReader in orc reader. (#20393)
Fix the inner reader of MergeRangeFileReader is not correct when creating MergeRangeFileReader in orc reader.
2023-06-09 08:53:27 +08:00
845d459f05 [Fix](orc-reader) Fix some bugs of orc lazy materialization. (#20410)
Fix some bugs of orc lazy materialization(#18615)
- Fix issue causing column size to continuously increase after `execute_conjuncts()` by calling `Block::erase_useless_column()`.
- Fix partition issues of orc lazy materialization. 
- Fix lazy materialization will not be used when the predicate column is inconsistent with the orc file.
2023-06-09 08:53:01 +08:00
c441cbf402 [Fix](dyncmic-partition) Check bucket size before find tablet. (#20488)
Co-authored-by: 王翔宇 <wangxiangyu@360shuke.com>
2023-06-09 08:44:41 +08:00
d03bd73795 [bug](udaf) fix java-udaf can't exectue add function (#20554)
In some case of agg function, maybe running as streaming agg firstly,
this will call the add function when serialize, so need implement add function also.
2023-06-09 08:44:12 +08:00
195beec3a8 [Fix](external scan node)Use consistent hash to collect BE only when the file cache is enabled. #20560
Use consistent hash to collect BE only when the file cache is enabled. And move the consistent BE assign code to FederationBackendPolicy.
Fix explain split number and file size incorrect bug.
2023-06-09 08:43:12 +08:00
dd71e101d3 [fix](case expr) fix coredump of case for null value (#20564)
be coredump when when expr is null:
2023-06-08 20:05:23 +08:00
Pxl
a15a0b9193 [Chore](build) use file(GLOB_RECURSE xxx CONFIGURE_DEPENDS) to replace set cpp (#20461)
use file(GLOB_RECURSE xxx CONFIGURE_DEPENDS) to replace set cpp
2023-06-08 19:36:21 +08:00
4faee4d8fd [Fix](multi-catalog) Fix be crashed when query hive table after schema changed(new column added). (#20537)
Fix be crashed when query hive table after schema changed(new column added).

Regression Test: test_hive_schema_evolution.groovy
2023-06-08 18:10:36 +08:00
Pxl
a56449f86e [Bug](Agg-state) try to make test_agg_state stable (#20574)
try to make test_agg_state stable
2023-06-08 17:17:51 +08:00
09344eaab5 [feature](load) introduce single-stream-multi-table load (#20006)
For routine load (kafka load), user can produce all data for different
table into single topic and doris will dispatch them into corresponding
table.

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-06-07 17:55:25 +08:00
Pxl
fbbf4c420e [Bug](Agg-State) fix agg state function get wrong input argument list (#20546)
fix agg state function get wrong input argument list
2023-06-07 17:32:48 +08:00
d00b7ad04b [Opt](performance) opt the outer join for nested loop join (#20524) 2023-06-07 17:31:36 +08:00
841094960f [fix](olapscanner) fix coredump caused by concurrent acccess of olap scan node _conjuncts (#20534)
=3073084==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60601897db80 at pc 0x55b2c993666e bp 0x7d1fbbfb66b0 sp 0x7d1fbbfb66a8
READ of size 8 at 0x60601897db80 thread T610 (_scanner_scan)
    #0 0x55b2c993666d in std::__shared_ptr<doris::vectorized::VExprContext, (__gnu_cxx::_Lock_policy)2>::get() const /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1291:16
    #1 0x55b2dae86ec5 in doris::vectorized::VExprContext::clone(doris::RuntimeState*, std::shared_ptr<doris::vectorized::VExprContext>&) /mnt/disk2/tengjianping/doris-master/be/src/vec/exprs/vexpr_context.cpp:98:5
    #2 0x55b2e757b6d8 in doris::vectorized::VScanner::prepare(doris::RuntimeState*, std::vector<std::shared_ptr<doris::vectorized::VExprContext>, std::allocator<std::shared_ptr<doris::vectorized::VExprContext>>> const&) /mnt/disk2/tengjianping/doris-master/be/src/vec/exec/scan/vscanner.cpp:47:13
    #3 0x55b2e78e8155 in doris::vectorized::NewOlapScanner::init() /mnt/disk2/tengjianping/doris-master/be/src/vec/exec/scan/new_olap_scanner.cpp:109:5
    #4 0x55b2e7551c81 in doris::vectorized::ScannerScheduler::_scanner_scan(doris::vectorized::ScannerScheduler*, doris::vectorized::ScannerContext*, std::shared_ptr<doris::vectorized::VScanner>) /mnt/disk2/tengjianping/doris-master/be/src/vec/exec/scan/scanner_scheduler.cpp:279:27
    #5 0x55b2e7554d5e in doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext*)::$_0::operator()() const::'lambda0'()::operator()() const /mnt/disk2/tengjianping/doris-master/be/src/vec/exec/scan/scanner_scheduler.cpp:202:31
    #6 0x55b2e7554c14 in void std::__invoke_impl<void, doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext*)::$_0::operator()() const::'lambda0'()&>(std::__invoke_other, doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext*)::$_0::operator()() const::'lambda0'()&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14
    #7 0x55b2e7554bb4 in std::enable_if<is_invocable_r_v<void, doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext*)::$_0::operator()() const::'lambda0'()&>, void>::type std::__invoke_r<void, doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext*)::$_0::operator()() const::'lambda0'()&>(doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext*)::$_0::operator()() const::'lambda0'()&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:111:2
    #8 0x55b2e7554a1c in std::_Function_handler<void (), doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext*)::$_0::operator()() const::'lambda0'()>::_M_invoke(std::_Any_data const&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291:9
    #9 0x55b2c80f2cd2 in std::function<void ()>::operator()() const /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:560:9
    #10 0x55b2e755f3e4 in doris::PriorityWorkStealingThreadPool::work_thread(int) /mnt/disk2/tengjianping/doris-master/be/src/util/priority_work_stealing_thread_pool.hpp:135:17
    #11 0x55b2e7563c72 in void std::__invoke_impl<void, void (doris::PriorityWorkStealingThreadPool::* const&)(int), doris::PriorityWorkStealingThreadPool*&, int&>(std::__invoke_memfun_deref, void (doris::PriorityWorkStealingThreadPool::* const&)(int), doris::PriorityWorkStealingThreadPool*&, int&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:74:14
    #12 0x55b2e7563b44 in std::__invoke_result<void (doris::PriorityWorkStealingThreadPool::* const&)(int), doris::PriorityWorkStealingThreadPool*&, int&>::type std::__invoke<void (doris::PriorityWorkStealingThreadPool::* const&)(int), doris::PriorityWorkStealingThreadPool*&, int&>(void (doris::PriorityWorkStealingThreadPool::* const&)(int), doris::PriorityWorkStealingThreadPool*&, int&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96:14
    #13 0x55b2e7563b14 in decltype(std::__invoke((*this)._M_pmf, std::forward<doris::PriorityWorkStealingThreadPool*&>(fp), std::forward<int&>(fp))) std::_Mem_fn_base<void (doris::PriorityWorkStealingThreadPool::*)(int), true>::operator()<doris::PriorityWorkStealingThreadPool*&, int&>(doris::PriorityWorkStealingThreadPool*&, int&) const /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/functional:131:11
    #14 0x55b2e7563ae4 in void std::__invoke_impl<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)>&, doris::PriorityWorkStealingThreadPool*&, int&>(std::__invoke_other, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)>&, doris::PriorityWorkStealingThreadPool*&, int&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14
    #15 0x55b2e7563a54 in std::enable_if<is_invocable_r_v<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)>&, doris::PriorityWorkStealingThreadPool*&, int&>, void>::type std::__invoke_r<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)>&, doris::PriorityWorkStealingThreadPool*&, int&>(std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)>&, doris::PriorityWorkStealingThreadPool*&, int&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:111:2
    #16 0x55b2e75639c3 in void std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)> (doris::PriorityWorkStealingThreadPool*, int)>::__call<void, 0ul, 1ul>(std::tuple<>&&, std::_Index_tuple<0ul, 1ul>) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/functional:570:11
    #17 0x55b2e756382d in void std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)> (doris::PriorityWorkStealingThreadPool*, int)>::operator()<>() /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/functional:629:17
    #18 0x55b2e7563744 in void std::__invoke_impl<void, std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)> (doris::PriorityWorkStealingThreadPool*, int)>>(std::__invoke_other, std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)> (doris::PriorityWorkStealingThreadPool*, int)>&&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14
    #19 0x55b2e7563704 in std::__invoke_result<std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)> (doris::PriorityWorkStealingThreadPool*, int)>>::type std::__invoke<std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)> (doris::PriorityWorkStealingThreadPool*, int)>>(std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)> (doris::PriorityWorkStealingThreadPool*, int)>&&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96:14
    #20 0x55b2e75636dc in void std:🧵:_Invoker<std::tuple<std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)> (doris::PriorityWorkStealingThreadPool*, int)>>>::_M_invoke<0ul>(std::_Index_tuple<0ul>) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:253:13
    #21 0x55b2e75636b4 in std:🧵:_Invoker<std::tuple<std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)> (doris::PriorityWorkStealingThreadPool*, int)>>>::operator()() /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:260:11
    #22 0x55b2e7563638 in std:🧵:_State_impl<std:🧵:_Invoker<std::tuple<std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)> (doris::PriorityWorkStealingThreadPool*, int)>>>>::_M_run() /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:211:13
    #23 0x55b2eb41d0ef in execute_native_thread_routine /data/gcc-11.1.0/build/x86_64-pc-linux-gnu/libstdc++-v3/src/c++11/../../../../../libstdc++-v3/src/c++11/thread.cc:82:18
    #24 0x7f1dfd4e1179 in start_thread pthread_create.c
    #25 0x7f1dfdd7bdf2 in clone (/lib64/libc.so.6+0xfcdf2) (BuildId: 20ee73ce1b6ac38a52440bab82ec7e28f0f5c5b9)
2023-06-07 17:00:29 +08:00
Pxl
36216f0925 [Bug](Agg-State) fix coredump when state combinator input const column (#20510)
fix coredump when state combinator input const column
2023-06-07 11:28:55 +08:00
03a679e33a [improvement](sink) reuse rows buffer in msyql_result_writer (#20482)
Creating a rows buffer for each block can impact non-negligible performance.
So it is necessary to reuse the rows buffer.

Test with a total of 1.7M rows, the AppendBatchTime reduced from 500ms to 280ms.
2023-06-07 10:09:32 +08:00
35a2be1074 [refactor](agg_state) refactor agg_state type to support fixed length object type (#20370)
before the agg_state type only support with datatype string,
But with some agg functions, eg: avg,sum,mix...
those functions need serialize type is fixed length object type
2023-06-07 10:05:00 +08:00
49f8f20fb1 [fix](regex) String with Chinese characters matching failed (#20493) 2023-06-07 07:27:47 +08:00
3691372054 [bug](table_function) fix table function node forget to call open function of expr (#20495) 2023-06-07 07:26:50 +08:00
fe63a0a3bb [Feature](multi-catalog)support paimon catalog (#19681)
CREATE CATALOG paimon_n2 PROPERTIES (
"dfs.ha.namenodes.HDFS1006531" = "nn2,nn1",
"dfs.namenode.rpc-address.HDFS1006531.nn2" = "172.16.65.xx:4007",
"dfs.namenode.rpc-address.HDFS1006531.nn1" = "172.16.65.xx:4007",
"hive.metastore.uris" = "thrift://172.16.65.xx:7004",
"type" = "paimon",
"dfs.nameservices" = "HDFS1006531",
"hadoop.username" = "hadoop",
"paimon.catalog.type" = "hms",
"warehouse" = "hdfs://HDFS1006531/data/paimon1",
"dfs.client.failover.proxy.provider.HDFS1006531" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
);
2023-06-06 15:08:30 +08:00
1b94b6368f [fix](load) in strict mode, return error for insert if datatype convert fails (#20378)
* [fix](load) in strict mode, return error for load and insert if datatype convert fails

Revert "[fix](MySQL) the way Doris handles boolean type is consistent with MySQL (#19416)"

This reverts commit 68eb420cabe5b26b09d6d4a2724ae12699bdee87.

Since it changed other behaviours, e.g. in strict mode insert into t_int values ("a"),
it will result 0 is inserted into table, but it should return error instead.

* fix be ut

* fix regression tests
2023-06-06 12:04:03 +08:00
d02737a293 [feature](struct-type) support struct_element function (#19045)
This commit support a function allows return a field column in named struct column.
Since the function can return any type, this commit also supports ANY_STRUCT_TYPE
and ANY_ELEMENT_TYPE.
2023-06-06 10:44:08 +08:00
c7888f4bfa [feature](profile)Add the filtering info of the in filter in profile #20321
image Currently, it is difficult to obtain the id of in filters,so, the some in filters's id is -1.
2023-06-06 10:24:59 +08:00
1fc48e83f2 [fix](executor)Fix duplicate timer and add open timer #20448
1 Currently, Node's total timer couter has timed twice(in Open and alloc_resource), this may cause timer in profile is not correct.
2 Add more timer to find more code which may cost much time.
2023-06-06 08:55:52 +08:00
4f77578d8a [enhancement](profile) add build get child next time (#20460)
Currently, build time not include child(1)->get next time, it is very confusing during shared hash table scenario. So that I add a profile.

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-06-06 08:55:19 +08:00
b7fc17da68 [feature-wip](multi-catalog)(step2)support read max compute data by JNI (#19819)
Issue Number: #19679
2023-06-05 22:10:08 +08:00
5b2efd196b [fix](execution) result_filter_data should be filled by 0 when can_filter_all is true (#20438) 2023-06-05 17:05:35 +08:00
d03bb4ba7b [Optimize](function) Optimize locate function by compare across strings (#20290)
Optimize locate function by compare across strings. about 90% speed up test by sum()
2023-06-05 12:43:14 +08:00
Pxl
7dc7ed97eb [Chore](build) remove some unused code and remove some wno (#20326)
remove some unused code about spinlock
remove some wno and fix warning
remove varadic macro usage
2023-06-05 10:48:07 +08:00
59a0f80233 [Improve](array-function)Improve array function intersect (#20085)
now we just support array function with 2 arrays , but intersect operator can support more than 2 arrays
2023-06-05 10:38:48 +08:00
92af861481 [Fix] Change open tablet log level #20306
Now, during the open tablet operation, logs will be printed, and a large number of logs will be generated under high-frequency import. It is necessary to reduce the log level.
2023-06-05 08:45:32 +08:00
Pxl
8e39f0cf6b [Enchancement](Agg State) storage function name and result is nullable in agg state type (#20298)
storage function name and result is nullable in agg state type
2023-06-04 22:44:48 +08:00
b0bbff0fd1 [performance](load) improve memtable sort performance (#20392) 2023-06-04 20:33:15 +08:00
34a1b7599f [Fix](lazy_open) fix lazy open commit info lose (#20404) 2023-06-04 19:08:36 +08:00