Commit Graph

11162 Commits

Author SHA1 Message Date
c25c19bddc [test](regression) Add cases to test join condition push and not like (#20453)
Add testing cases to issue #19613
2023-06-12 18:26:23 +08:00
Pxl
5fd9f58bd3 [Chore](pipeline-engine) adjus queryt canceled log on pipeline engine (#20702)
adjus queryt canceled log on pipeline engine
2023-06-12 18:23:19 +08:00
565095eb52 [bug](function) fix is_null/is_not_null check is_const has error (#20562)
fix is_null/is_not_null check is_const has error
2023-06-12 18:21:12 +08:00
daf18a4b0e [fix](MTMV) Support refreshing data manually (#20108) 2023-06-12 17:57:06 +08:00
153f91f77e [typo](doc) Update doc for newly released 1.2.0 version of spark connector (#20639) 2023-06-12 17:42:10 +08:00
9d47c6a871 [fix](columnstring) fix bug of columnstring prefetch (#20698) 2023-06-12 17:03:44 +08:00
99c0592157 [Feature](array-function) Support array_pushback function #17417 (#19988)
Implement array_pushback.

mysql> select array_pushback([1, 2], 3);
+--------------------------------+
| array_pushback(ARRAY(1, 2), 3) |
+--------------------------------+
| [1, 2, 3]                      |
+--------------------------------+
1 row in set (0.01 sec)
2023-06-12 16:51:12 +08:00
141813b476 [tpcds](nereids) estimate distribution cost by byte size instead of row count (#20642)
this pr impacts tpch q16 Agg strategy, but no performance issue
this pr improves tpcds sf100

before:
cold 141 sec
hot 133 sec

after:
code 137 sec
hot 128 sec
2023-06-12 16:23:49 +08:00
ea264ce9de [Opt](join) short circuit probe for join node (#20585)
Support the _short_circuit_for_probe for join node
2023-06-12 16:01:09 +08:00
0b228b3414 [fix](load)Support load json data with default value (#20624)
* support json default value

---------

Co-authored-by: duanxujian <duanxujian@jd.com>
2023-06-12 14:51:31 +08:00
10134ea8c6 [fix](planner) fix RewriteInPredicateRule may be useless (#20668)
Issue Number: close #20669

RewriteInPredicateRule may cast InPredicate expr's two child to the same type, for example: where cast(age as char) in ('11'), the type of age is int, RewriteInPredicateRule will cast expr's two child type to int. As in the example above, child 0 will be such struct: 
```
child 0: type: int
    |---  child: type : char
            |-- child: type : int
```

Due to the RewriteInPredicateRule cast the type of the expr to int, it will reanalyze stmt, but it will reset stmt first before reanalyze the stmt, and reset opt will change child 0 to such struct:
```
child: type : char
    |-- child: type : int
```
It cause two child's type will be cast to varchar in func castAllToCompatibleType, the logic of RewriteInPredicateRule will be useless.

In 1.1-lts and 1.2-lts, such case  " where cast(age as char) in ('11')"  can't work well,  because func castAllToCompatibleType will cast int to char but int can't cast to char(master can work well because func castAllToCompatibleType will cast int to varchar in such case).
```
MySQL [test]> select user_id from test_cast where cast(age as char) in ('45');
ERROR 1105 (HY000): errCode = 2, detailMessage = type not match, originType=INT, targeType=CHAR(*)
```
2023-06-12 14:39:01 +08:00
f90d5dbacf [fix](test) fix unstable dynamic partition regression test (#20674)
Add to define variable with def keyword
2023-06-12 14:28:30 +08:00
28fbdf3273 [BUG](es_catalog)Solve the problem of querying es catalog Unexpected exception: Index:… (#18743) 2023-06-12 13:48:12 +08:00
Pxl
7f8c5c81e7 [Feature](agg_state) support agg_state combinator on nereids (#20164)
support agg_state combinator on nereids
2023-06-12 12:49:26 +08:00
a02a2f4163 [doc](create-function) Update CREATE-FUNCTION.md to remove the usage of c++ (#20654) 2023-06-12 11:48:14 +08:00
14f59bef1d [improvement](profile)add sum/avg rpc time (#20511) 2023-06-12 11:34:49 +08:00
bcc37c9405 [fix](planner)the common type of floating and decimal should be floating type (#20634)
* [fix](planner)the common type of floating and decimal should be floating type

* fix test cases
2023-06-12 11:32:23 +08:00
4c340f2851 [Feature] (Multi-Catalog) support query hll column in doris jdbc table - part 1 (#19413)
Issue Number: close #17895
2023-06-12 11:16:19 +08:00
a6f625676b [profile](remove child) child is for node, should not be used to organize counters (#20676)
Currently, there are many profiles using add child profile to orgnanize profile into blocks. But it is wrong. Child profile will have a total time counter. Actually, what we should use is just a label.

                          -  MemoryUsage:  
                              -  HashTable:  23.98  KB
                              -  SerializeKeyArena:  446.75  KB
Add a new macro ADD_LABEL_COUNTER to add just a label in the profile.

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-06-12 10:00:35 +08:00
Pxl
ab7ac31d89 [Chore](case) fix failed on test_big_pad when enable pipeline engine #20644 2023-06-12 09:15:55 +08:00
a347063390 [fix](case expr) fix coredump of case for null value 2 (#20635)
fix coredump of case for null value 2
2023-06-11 23:08:53 +08:00
c9b08d5c20 [feature](planner) multi partition create by integer column (#19597)
Create partitions use :
```
PARTITION BY RANGE(integer_col)(
        FROM (10) TO (1000) INTERVAL 50
)
```
2023-06-11 22:42:21 +08:00
8162d0062b [fix](alter) fix potential concurrent issue for alter when check olap table state normal outside write lock scope is not atomic (#20480)
now, we check some olap table state normal outside write lock scope, the table state may be changed to unnormal when we do alter operation
---------

Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2023-06-11 18:17:41 +08:00
8ea61a1ce6 [fix](streamload) fix crash when be exit (#20662) 2023-06-11 15:58:44 +08:00
bd9a9a32f5 [bugfix](s3 fs) fix s3 uri parsing for http/https uri (#20656) 2023-06-11 14:00:04 +08:00
ca1e2ddf43 [fix](regression) tests in unique_with_mow_p0/partial_update are flaky (#20633) 2023-06-11 13:51:49 +08:00
8a2e0504e4 [chore](coldHeatCases) drop table first to enhance robustness (#20629) 2023-06-11 13:51:05 +08:00
3d9e520fb2 [fix](ssl) fix ssl connection bug for JDBC 8.0.19 (#20659) 2023-06-11 13:50:03 +08:00
987b29ded5 [fix](nereids)avoid to derive rowCount NaN (#20523)
the formula used to compute ndv after filter implies that the new rowCount is smaller than the original rowCount. When we apply this formula to join, we should add branch if new row count is bigger than original row count.
when new row count is bigger, the ndv is not changed.
2023-06-10 15:40:14 +08:00
87bc405c41 [Improvement](statistics)Support external table partition statistics (#20415)
Support collect statistics for HMS external table with specific partitions. Add session variables to limit the partitions to collect for whole table line number and columns statistics.
2023-06-10 12:28:53 +08:00
9a83d78dfe [Enhancement](hudi) support hudi mor table, step2 follow #19909 (#20570)
PR(https://github.com/apache/doris/pull/19909) has implemented the framework of hudi reader for MOR table. This PR completes all functions of reading MOR table and enables end-to-end queries.
Key Implementations:
1. Use hudi meta information to generate the table schema, not from hive client.
2. Use hive client to list hudi partitions, so it strongly depends the sync-tools(https://hudi.apache.org/docs/syncing_metastore/) which syncs the partitions of hudi into hive metastore. However, we may get the hudi partitions directly from .hoodie directory.
3. Remove `HudiHMSExternalCatalog`, because other catalogs like glue is compatible with hive catalog.
4. Read the COW table originally from c++.
5. Hudi RecordReader will use ProcessBuilder to start a hotspot debugger process, which may be stuck when attaching the origin JNI process, soI use a tricky method to kill this useless process.
2023-06-10 12:25:53 +08:00
206b5a4235 [doc](flink-connector) add flink cdc sync mysql database (#20486) 2023-06-10 10:52:15 +08:00
c79642781b [minor](Nereids) remove some invasive code of minidump in cascades framework (#20606) 2023-06-09 23:41:00 +08:00
def6a8ec94 [regression](nereids) check tpch sf1T and sf500 plan shape on 3 BE environment #20610 2023-06-09 22:46:40 +08:00
Pxl
ab6c1f152c [Chore](build) adjust build script about pch setting (#20637)
try to make be-ut workflow stable
2023-06-09 22:27:13 +08:00
656b9ad3da [enhancement](index) Nereids support no need to read raw data for index column that only in filter conditions (#20605) 2023-06-09 21:54:48 +08:00
0f21166110 [fix](memory) Fix runtime state default mem tracker (#20615)
start time: Wed 07 Jun 2023 06:50:14 PM CST
*** Query id: e9000000e9-eb00000073 ***
*** Aborted at 1686136356 (unix time) try "date -d @1686136356" if you are using GNU date ***
*** Current BE git commitID: 5c33dd7a2c ***
*** SIGSEGV address not mapped to object (@0x23000000235) received by PID 2131238 (TID 2132258 OR 0x7f708eff7700) from PID 565; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /mnt/hdd01/repo_center/doris_branch-2.0-beta/doris/be/src/common/signal_handler.h:413
 1# 0x00007F727BBE3090 in /lib/x86_64-linux-gnu/libc.so.6
 2# doris::AttachTask::AttachTask(doris::RuntimeState*) at /mnt/hdd01/repo_center/doris_branch-2.0-beta/doris/be/src/runtime/thread_context.cpp:43
 3# std::_Function_handler<void (doris::PTabletWriterAddBlockResult const&, bool), doris::stream_load::VNodeChannel::open_wait()::$_1>::_M_invoke(std::_Any_data const&, doris::PTabletWriterAddBlockResult const&, bool&&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
 4# doris::stream_load::ReusableClosure<doris::PTabletWriterAddBlockResult>::Run() at /mnt/hdd01/repo_center/doris_branch-2.0-beta/doris/be/src/vec/sink/vtablet_sink.h:176
 5# brpc::Controller::EndRPC(brpc::Controller::CompletionInfo const&) in /root/20230607171843-doris-branch-2.0-beta-5c33dd7a/be/lib/doris_be
 6# brpc::Controller::OnVersionedRPCReturned(brpc::Controller::CompletionInfo const&, bool, int) in /root/20230607171843-doris-branch-2.0-beta-5c33dd7a/be/lib/doris_be
 7# brpc::policy::ProcessRpcResponse(brpc::InputMessageBase*) in /root/20230607171843-doris-branch-2.0-beta-5c33dd7a/be/lib/doris_be
 8# brpc::InputMessenger::InputMessageClosure::~InputMessageClosure() in /root/20230607171843-doris-branch-2.0-beta-5c33dd7a/be/lib/doris_be
 9# brpc::InputMessenger::OnNewMessages(brpc::Socket*) in /root/20230607171843-doris-branch-2.0-beta-5c33dd7a/be/lib/doris_be
10# brpc::Socket::ProcessEvent(void*) in /root/20230607171843-doris-branch-2.0-beta-5c33dd7a/be/lib/doris_be
11# bthread::TaskGroup::task_runner(long) in /root/20230607171843-doris-branch-2.0-beta-5c33dd7a/be/lib/doris_be
12# bthread_make_fcontext in /root/20230607171843-doris-branch-2.0-beta-5c33dd7a/be/lib/doris_be
2023-06-09 21:09:07 +08:00
93b53cf2f4 [improvement](exception-safe) create and prepare node/sink support exception safe (#20551) 2023-06-09 21:06:59 +08:00
abb2048d5d [performance](executor) remove repeated call within the loop in validate_column 2023-06-09 19:59:25 +08:00
54504fb61d [opt](Nereids) remove running in OptimizeGroup to avoid recompute on it parent (#20608)
we have some prunning path logical in cascades framework. However it do not work as we expected. if we do prunning on one Group, then maybe we need to do thousands of times optimization on its parent without any success result. This PR remove these prunning provisionally. We will add prunning back when we re-design it.
2023-06-09 19:16:39 +08:00
df1e526ec0 [opt](planner)(Nereids) add switch to determine if some unfixed functions will be folded on fe. (#20270)
add switch to determine if below functions could be folded on fe.
- now()
- current_date()
- current_time()
- unix_timestamp()
- utc_timestamp()
- uuid()
- rand()
2023-06-09 18:18:56 +08:00
70819fae22 [feature](alter) Add AlterDatabasePropertyStmt binlog impl (#20550) 2023-06-09 17:29:21 +08:00
a6aee1fc2c [enhancement](stats) Forbid unknown stats check for internal_column (#20535)
Ignore internal columns when enable new optimizer and forbid unknown stats
2023-06-09 16:16:11 +08:00
b6386889d5 [fix](stats) set analysis job status to finished when be crashed by mistake (#20485)
If BE crashed the error would be logged, and the analysis task would be mark as finished, which is incorrect.
In this PR, update analysis task according to the query state
2023-06-09 15:43:11 +08:00
c8bda9508e [doc](catalog) remove external table doc (#20632) 2023-06-09 14:16:44 +08:00
fe8233863a [enhancement](stats) ignore view by default when analyze whole DB #20630 2023-06-09 14:13:54 +08:00
05438eab0d remove DCHECK for rpc time (#20621) 2023-06-09 13:38:12 +08:00
3b17cc8eb3 [Improvement](column) reduce cache miss for data copy (#20583) 2023-06-09 13:10:57 +08:00
101e75d633 [pipeline](doc) Update pipeline doc (#20623) 2023-06-09 12:38:36 +08:00
44e20d9087 [feature](Nereids): push down alias into union outputs. (#20543) 2023-06-09 11:53:44 +08:00