Commit Graph

4723 Commits

Author SHA1 Message Date
73ad885e19 [Feature][Fix](multi-catalog) Implements transactional hive full acid tables. (#20679)
After supporting insert-only transactional hive full acid tables #19518, #19419, this PR support transactional hive full acid tables.

Support hive3 transactional hive full acid tables.
Hive2 transactional hive full acid tables need to run major compactions.
2023-06-13 08:55:16 +08:00
Pxl
5e3a96d605 [Bug](pipeline) fix memory leak because pipeline shared ptr not release #20710 2023-06-13 08:50:34 +08:00
283c55720d [bug](cooldown) Fix the issue of unused remote files not being deleted (#19785) 2023-06-12 21:05:09 +08:00
1433544c56 [fix](case expr) fix coredump of case for null value 3 #20711 2023-06-12 20:58:01 +08:00
Pxl
5fd9f58bd3 [Chore](pipeline-engine) adjus queryt canceled log on pipeline engine (#20702)
adjus queryt canceled log on pipeline engine
2023-06-12 18:23:19 +08:00
9d47c6a871 [fix](columnstring) fix bug of columnstring prefetch (#20698) 2023-06-12 17:03:44 +08:00
99c0592157 [Feature](array-function) Support array_pushback function #17417 (#19988)
Implement array_pushback.

mysql> select array_pushback([1, 2], 3);
+--------------------------------+
| array_pushback(ARRAY(1, 2), 3) |
+--------------------------------+
| [1, 2, 3]                      |
+--------------------------------+
1 row in set (0.01 sec)
2023-06-12 16:51:12 +08:00
ea264ce9de [Opt](join) short circuit probe for join node (#20585)
Support the _short_circuit_for_probe for join node
2023-06-12 16:01:09 +08:00
0b228b3414 [fix](load)Support load json data with default value (#20624)
* support json default value

---------

Co-authored-by: duanxujian <duanxujian@jd.com>
2023-06-12 14:51:31 +08:00
Pxl
7f8c5c81e7 [Feature](agg_state) support agg_state combinator on nereids (#20164)
support agg_state combinator on nereids
2023-06-12 12:49:26 +08:00
14f59bef1d [improvement](profile)add sum/avg rpc time (#20511) 2023-06-12 11:34:49 +08:00
4c340f2851 [Feature] (Multi-Catalog) support query hll column in doris jdbc table - part 1 (#19413)
Issue Number: close #17895
2023-06-12 11:16:19 +08:00
a6f625676b [profile](remove child) child is for node, should not be used to organize counters (#20676)
Currently, there are many profiles using add child profile to orgnanize profile into blocks. But it is wrong. Child profile will have a total time counter. Actually, what we should use is just a label.

                          -  MemoryUsage:  
                              -  HashTable:  23.98  KB
                              -  SerializeKeyArena:  446.75  KB
Add a new macro ADD_LABEL_COUNTER to add just a label in the profile.

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-06-12 10:00:35 +08:00
Pxl
ab7ac31d89 [Chore](case) fix failed on test_big_pad when enable pipeline engine #20644 2023-06-12 09:15:55 +08:00
a347063390 [fix](case expr) fix coredump of case for null value 2 (#20635)
fix coredump of case for null value 2
2023-06-11 23:08:53 +08:00
8ea61a1ce6 [fix](streamload) fix crash when be exit (#20662) 2023-06-11 15:58:44 +08:00
bd9a9a32f5 [bugfix](s3 fs) fix s3 uri parsing for http/https uri (#20656) 2023-06-11 14:00:04 +08:00
9a83d78dfe [Enhancement](hudi) support hudi mor table, step2 follow #19909 (#20570)
PR(https://github.com/apache/doris/pull/19909) has implemented the framework of hudi reader for MOR table. This PR completes all functions of reading MOR table and enables end-to-end queries.
Key Implementations:
1. Use hudi meta information to generate the table schema, not from hive client.
2. Use hive client to list hudi partitions, so it strongly depends the sync-tools(https://hudi.apache.org/docs/syncing_metastore/) which syncs the partitions of hudi into hive metastore. However, we may get the hudi partitions directly from .hoodie directory.
3. Remove `HudiHMSExternalCatalog`, because other catalogs like glue is compatible with hive catalog.
4. Read the COW table originally from c++.
5. Hudi RecordReader will use ProcessBuilder to start a hotspot debugger process, which may be stuck when attaching the origin JNI process, soI use a tricky method to kill this useless process.
2023-06-10 12:25:53 +08:00
Pxl
ab6c1f152c [Chore](build) adjust build script about pch setting (#20637)
try to make be-ut workflow stable
2023-06-09 22:27:13 +08:00
656b9ad3da [enhancement](index) Nereids support no need to read raw data for index column that only in filter conditions (#20605) 2023-06-09 21:54:48 +08:00
0f21166110 [fix](memory) Fix runtime state default mem tracker (#20615)
start time: Wed 07 Jun 2023 06:50:14 PM CST
*** Query id: e9000000e9-eb00000073 ***
*** Aborted at 1686136356 (unix time) try "date -d @1686136356" if you are using GNU date ***
*** Current BE git commitID: 5c33dd7a2c ***
*** SIGSEGV address not mapped to object (@0x23000000235) received by PID 2131238 (TID 2132258 OR 0x7f708eff7700) from PID 565; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /mnt/hdd01/repo_center/doris_branch-2.0-beta/doris/be/src/common/signal_handler.h:413
 1# 0x00007F727BBE3090 in /lib/x86_64-linux-gnu/libc.so.6
 2# doris::AttachTask::AttachTask(doris::RuntimeState*) at /mnt/hdd01/repo_center/doris_branch-2.0-beta/doris/be/src/runtime/thread_context.cpp:43
 3# std::_Function_handler<void (doris::PTabletWriterAddBlockResult const&, bool), doris::stream_load::VNodeChannel::open_wait()::$_1>::_M_invoke(std::_Any_data const&, doris::PTabletWriterAddBlockResult const&, bool&&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
 4# doris::stream_load::ReusableClosure<doris::PTabletWriterAddBlockResult>::Run() at /mnt/hdd01/repo_center/doris_branch-2.0-beta/doris/be/src/vec/sink/vtablet_sink.h:176
 5# brpc::Controller::EndRPC(brpc::Controller::CompletionInfo const&) in /root/20230607171843-doris-branch-2.0-beta-5c33dd7a/be/lib/doris_be
 6# brpc::Controller::OnVersionedRPCReturned(brpc::Controller::CompletionInfo const&, bool, int) in /root/20230607171843-doris-branch-2.0-beta-5c33dd7a/be/lib/doris_be
 7# brpc::policy::ProcessRpcResponse(brpc::InputMessageBase*) in /root/20230607171843-doris-branch-2.0-beta-5c33dd7a/be/lib/doris_be
 8# brpc::InputMessenger::InputMessageClosure::~InputMessageClosure() in /root/20230607171843-doris-branch-2.0-beta-5c33dd7a/be/lib/doris_be
 9# brpc::InputMessenger::OnNewMessages(brpc::Socket*) in /root/20230607171843-doris-branch-2.0-beta-5c33dd7a/be/lib/doris_be
10# brpc::Socket::ProcessEvent(void*) in /root/20230607171843-doris-branch-2.0-beta-5c33dd7a/be/lib/doris_be
11# bthread::TaskGroup::task_runner(long) in /root/20230607171843-doris-branch-2.0-beta-5c33dd7a/be/lib/doris_be
12# bthread_make_fcontext in /root/20230607171843-doris-branch-2.0-beta-5c33dd7a/be/lib/doris_be
2023-06-09 21:09:07 +08:00
93b53cf2f4 [improvement](exception-safe) create and prepare node/sink support exception safe (#20551) 2023-06-09 21:06:59 +08:00
abb2048d5d [performance](executor) remove repeated call within the loop in validate_column 2023-06-09 19:59:25 +08:00
05438eab0d remove DCHECK for rpc time (#20621) 2023-06-09 13:38:12 +08:00
3b17cc8eb3 [Improvement](column) reduce cache miss for data copy (#20583) 2023-06-09 13:10:57 +08:00
b60860c5e5 [refactor](profile) refactor the join profile when its shared hash table (#20391)
in join node, if it's broadcast_join
and shared hash table, some counter/timer about build hash table is useless,
so we could add those counter/timer in faker profile, and those will not display in web profile.
2023-06-09 08:59:49 +08:00
4c6b99d1f9 [Fix](orc-reader) Fix the inner reader of MergeRangeFileReader is not correct when creating MergeRangeFileReader in orc reader. (#20393)
Fix the inner reader of MergeRangeFileReader is not correct when creating MergeRangeFileReader in orc reader.
2023-06-09 08:53:27 +08:00
845d459f05 [Fix](orc-reader) Fix some bugs of orc lazy materialization. (#20410)
Fix some bugs of orc lazy materialization(#18615)
- Fix issue causing column size to continuously increase after `execute_conjuncts()` by calling `Block::erase_useless_column()`.
- Fix partition issues of orc lazy materialization. 
- Fix lazy materialization will not be used when the predicate column is inconsistent with the orc file.
2023-06-09 08:53:01 +08:00
bd5a26f240 [improvement](recover) Default disable check tablet path (#20565)
change check tablet path interval's default value to -1
2023-06-09 08:47:39 +08:00
c441cbf402 [Fix](dyncmic-partition) Check bucket size before find tablet. (#20488)
Co-authored-by: 王翔宇 <wangxiangyu@360shuke.com>
2023-06-09 08:44:41 +08:00
d03bd73795 [bug](udaf) fix java-udaf can't exectue add function (#20554)
In some case of agg function, maybe running as streaming agg firstly,
this will call the add function when serialize, so need implement add function also.
2023-06-09 08:44:12 +08:00
195beec3a8 [Fix](external scan node)Use consistent hash to collect BE only when the file cache is enabled. #20560
Use consistent hash to collect BE only when the file cache is enabled. And move the consistent BE assign code to FederationBackendPolicy.
Fix explain split number and file size incorrect bug.
2023-06-09 08:43:12 +08:00
14fe95578e [enhancement](heartbeat) print a warning log for long running heartbeat (#20559) 2023-06-09 08:39:35 +08:00
dd71e101d3 [fix](case expr) fix coredump of case for null value (#20564)
be coredump when when expr is null:
2023-06-08 20:05:23 +08:00
e801e3b737 [fix](memory) Fix crash at bthread_setspecific in brpc::Socket::CheckHealth() (#20450)
Only switch to bthread local when modifying the mem tracker in the thread context. No longer switches to bthread local by default when bthread starts
mem tracker increases brpc IOBufBlockMemory memory
remove thread mem tracker metrics
2023-06-08 19:48:19 +08:00
Pxl
a15a0b9193 [Chore](build) use file(GLOB_RECURSE xxx CONFIGURE_DEPENDS) to replace set cpp (#20461)
use file(GLOB_RECURSE xxx CONFIGURE_DEPENDS) to replace set cpp
2023-06-08 19:36:21 +08:00
4faee4d8fd [Fix](multi-catalog) Fix be crashed when query hive table after schema changed(new column added). (#20537)
Fix be crashed when query hive table after schema changed(new column added).

Regression Test: test_hive_schema_evolution.groovy
2023-06-08 18:10:36 +08:00
3054574bc1 [fix](load) fix ctx nullptr core in flush_single_memtable (#20573) 2023-06-08 17:40:02 +08:00
Pxl
a56449f86e [Bug](Agg-state) try to make test_agg_state stable (#20574)
try to make test_agg_state stable
2023-06-08 17:17:51 +08:00
a68fc551f0 [bug](cooldown) Fix async_write_cooldown_meta and snapshot cooldowned version not continuous bug (#20437) 2023-06-08 15:35:35 +08:00
Pxl
22985af4d7 [Bug](pipeline) set SourceState to MORE_DATA when UnionSourceOperator have const_expr/data_queue->remaining_has_data (#20557)
set SourceState to MORE_DATA when UnionSourceOperator have const_expr/data_queue->remaining_has_data
2023-06-08 14:47:35 +08:00
24fb05ec83 [Bug](row-store) Fix row store with materialize index (#20356)
If a query hits a materialized view that has row storage enabled, but the row storage column is not present in the materialized view, it will result in a query crash. Therefore, it is necessary to include the row storage column when creating the materialized view, and serialize the row storage column during the execution of SchemaChange.
2023-06-08 10:55:22 +08:00
f154319a49 [fix](load tablet) Fix restart slowly due to load deleted tablet error (#20552) 2023-06-08 09:16:33 +08:00
92577f45d3 [fix] (recover) fix can not recover a BE's tablet after deleting its data directory manual (#20273) (#20274) 2023-06-07 22:27:50 +08:00
03cb69c0ee [feature](backup-restore) Add local backup/restore not upload/download by broker (#20492) 2023-06-07 21:35:15 +08:00
296ca1d9dd [Fix](inverted index) if range query exceeds CLucene limits, downgrade it from inverted index (#20528)
CLucene may throw CL_ERR_TooManyClauses when a range query hits too many terms.
In this situation, we have to downgrade from inverted index.
2023-06-07 20:07:48 +08:00
2db900b775 [fix](lazy_open) fix lazy open null point (#20540) 2023-06-07 17:56:31 +08:00
09344eaab5 [feature](load) introduce single-stream-multi-table load (#20006)
For routine load (kafka load), user can produce all data for different
table into single topic and doris will dispatch them into corresponding
table.

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-06-07 17:55:25 +08:00
Pxl
fbbf4c420e [Bug](Agg-State) fix agg state function get wrong input argument list (#20546)
fix agg state function get wrong input argument list
2023-06-07 17:32:48 +08:00
d00b7ad04b [Opt](performance) opt the outer join for nested loop join (#20524) 2023-06-07 17:31:36 +08:00