Commit Graph

4942 Commits

Author SHA1 Message Date
fe18cfa2fb [improvement](pg jdbc)Support for automatically obtaining the precision of the postgresql timestamp type (#20909) 2023-06-16 23:41:09 +08:00
367f64e7bd [improvement](jdbc) support insert autoinc and default value column to mysql (#20765)
In JdbcMysqlClient, I've added methods to retrieve auto-increment and default value columns from MySQL. These columns are then mapped into Doris metadata to make them visible to users.

When handling the InsertStmt into an execution plan, Doris used to automatically fill in NULL or default values for columns not specified in the InsertStmt. However, in the JDBC catalog, we don't need Doris to handle these unspecified columns, so I've made changes to skip them directly.

For the insert prepared statement required for writing, our previous behavior was to obtain all columns for placeholders. So, the change I made is to pass in the columns processed by the execution plan during the sink task generation stage for dynamic generation.
2023-06-16 23:38:11 +08:00
e834637a5b [improvement](ck jdbc) Support for automatically getting the precision of clickhouse's datetime64 type (#20887) 2023-06-16 23:37:30 +08:00
bf197ee8d2 [opt](nereids) adjust cost model for BroadCastJoin and PartitionJoin (#20713)
we add penalty for broadcast join (bc for brief in the following).
the intuition of penalty is as follow:
1. if the build side is very small (< 1M), we prefer bc, and set `penalty=1`, which means no penalty
2. if build side is more than 1M, we consider the ratio of the probe row count to the build row count. the less the ratio is, the higher penalty is.

this pr has positive impact on tpch queries. Only q3 is changed. in out test (tpch 1T, 3BE) q3 improved from 5.1sec to 2.5 sec.
this pr has positive impact on tpcds queries. test on tpcds sf100 (3BE), cold run improve from 163 sec to 156 sec, hot run improves from 155 sec to 149 sec
2023-06-16 22:49:04 +08:00
bb4f10b457 [fix](Nereids) lost having when analyze sort-having-agg (#20914) 2023-06-16 22:13:28 +08:00
7ee744ff5a [opt](Nereids) add more unexpected expression check (#20901)
1. check sub-query after rewrite, should not meet any sub-query there
2. check below expression on specific plan
  - Aggreagate
    - TableGeneratingFunction
  - Filter
    - AggregateFunction
    - GroupingScalarFunction
    - TableGeneratingFunction
    - WindowExpression
  - Generate
    - AggregateFunction
    - GroupingScalarFunction
    - WindowExpression
  - Having
    - TableGeneratingFunction
    - WindowExpression
  - Join
    - AggregateFunction
    - GroupingScalarFunction
    - TableGeneratingFunction
    - WindowExpression
  - Project
    - TableGeneratingFunction
  - Sort
    - AggregateFunction
    - GroupingScalarFunction
    - TableGeneratingFunction
    - WindowExpression
  - Window
    - GroupingScalarFunction
    - TableGeneratingFunction
2023-06-16 22:12:39 +08:00
ab32299ba4 [feature](nereids) Support multi target rf #20714
Support multi target runtime filter, mainly for set operation, such as union/intersect/except.
2023-06-16 20:26:00 +08:00
1cc611a913 [fix](match) fix regression case test_index_match_select and test_index_match_phrase (#20860)
1. add more checks for match expression in nereids:
  - match expression only support in filter
  - match expression left child and right child must all be string type
  - left child for match expression must be sloftRef, right child for match expression must be Literal

2. to fix regression case test_index_match_select and test_index_match_phrase
2023-06-16 20:18:29 +08:00
5dc0f90c7f [opt](Nereids) revert convert IN with 2 options to OR expression rule (#20894)
revert this rule because it has negative effect on predicate push-down-to-storage-layer
2023-06-16 19:11:37 +08:00
6cde7bc8ad [feature](Nereids) just reserve logical expression in memo after do dphyp (#20843)
After DPHyp, clear all physicalExpression and other,
just keep logicalExpression as original plan as input of cascades optimize.
2023-06-16 18:12:39 +08:00
536bf56a35 [fix](planner) strip trailing zeros when cast string literal to decimal literal (#20743) 2023-06-16 17:08:36 +08:00
e63739e729 [fix](nereids) add regression test for stats analyze and fix some bugs (#20865)
1. Add regression test case for analyze to make sure show/drop/analyze stats would work as expected
2. Remove useless code, which would block the clean for expired stats
3. Fix bug of DropStats,  before this PR drop the whole table stats would casuse a NPE exception when parsing stmt
2023-06-16 16:43:49 +08:00
0bd9aecfcc [Fix](planner&Nereids) fold constant invalid yyyy-mm-31 to the last day of month incorrectly (#20685)
currently, expression: cast('20230631' as date) will be evaluate to 2023-06-30 incorrectly, and '20230632' will be null, we fix the bug and evaluate all the invalid date to null.
2023-06-16 16:23:02 +08:00
c3b9e99350 [fix](regress-test)update config for disable_nested_complex_type (#20735) 2023-06-16 15:51:41 +08:00
ccfd6f1d23 [fix](nereids) only setHasColocatePlanNode when currentFragment's data partition is not RANDOM (#20885) 2023-06-16 14:54:22 +08:00
1a62c79970 [fix](replica) do not delete the only one replica (#20872) 2023-06-16 13:57:18 +08:00
9d41edd9eb [Feature](binlog) Add binlog gc && Auth master_token (#20854)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-06-16 11:25:11 +08:00
21bb7fea07 [minor](Nereids): convert if condition into a check (#20855) 2023-06-15 20:10:56 +08:00
d9b3c2aba2 [improvement](jdbc) support support get mysql information_schema's table and clickhouse system's table (#20768) 2023-06-15 14:53:51 +08:00
Pxl
01e53f4e67 [Bug](materialized-view) fix problems about create mv on ssb_flat q4.1 failed (#20658)
fix problems about create mv on ssb_flat q4.1 failed
2023-06-15 14:38:21 +08:00
31c17f1088 [improvement](tvf)Support hdfs and s3 tvf for nereids (#20829)
Support hdfs and s3 tvf for nereids.
2023-06-15 10:30:09 +08:00
71e8cb061c [Log](query) Add log of fragment instance num for query (#20597)
Co-authored-by: weizuo <weizuo@xiaomi.com>
2023-06-15 09:52:13 +08:00
09d187ec77 [improvement](ck jdbc) Optimized reading of datetime and ip types of the ClickHouse JDBC Catalog (#20804) 2023-06-14 23:28:08 +08:00
f82e43b96a [Improvement](jdbc external table)Support jdbc external table for nereids. (#20799)
Nereids planner only support JDBC external catalog, this pr is to support JDBC external table for nereids.
2023-06-14 23:25:43 +08:00
7ed03f6b86 [fix](Nereids) EmptySetRelation should be Gather not Any (#20801) 2023-06-14 19:24:33 +08:00
1c9f107185 [feature](nereids) support match syntax (#20781)
Support match syntax in nereids.
match syntax use like:
```sql
select * from test where msg match "hello";
select * from test where msg match_any "hello";
select * from test where msg match_all "hello hi";
select * from test where msg match_phrase "hello world";
```
`match` is same as `match_any`.

the pr of match syntax in original planner: https://github.com/apache/doris/pull/14211
2023-06-14 17:30:27 +08:00
062641e8f8 [fix](hudi) set default class loader for hudi serializer (#20680)
hudi serializer `org.apache.hudi.common.util.SerializationUtils$KryoInstantiator.newKryo` throws error like `java.lang.IllegalArgumentException: classLoader cannot be null`. Set the default class loader for scan thread.
```
public Kryo newKryo() {
    Kryo kryo = new Kryo();
    ...
    // Thread.currentThread().getContextClassLoader() returns null
    kryo.setClassLoader(Thread.currentThread().getContextClassLoader());
    ...
    return kryo;
}
```
2023-06-14 16:02:56 +08:00
54d42244fe [feature](Nereids) add cbo rewrite framework (#20746)
The changes in this PR:
1. rename BatchRewriteJob to AbstractBatchJobExecutor
2. add a new rewrite job type, CostBasedRewriteJob. It receive a RewriteJob as input, compare the cost of two candidate plans using or not using the input RewriteJob and return the lower cost plan as the rewrite result.
3. do some small refactor on NereidsPlanner for better abstraction 
4. do some refactor on dir structure of Nereids

The usage of cbo rewrite framework:
if you want let a rule or a rule list to be run in cbo rewrite frame work, you just need to wrap the rule / rule list with costBased function of class Rewriter, for example
```java
...
costBased(
       custom(RuleType.AGG_SCALAR_SUBQUERY_TO_WINDOW_FUNCTION,
                AggScalarSubQueryToWindowFunction::new)
),
...
```
2023-06-14 15:57:42 +08:00
bcf103e993 [enhancement](log4j) support high performance mode for log4j to escape potential bottleneck for doris read and write (#20759)
As we know, log4j2 some times may be bottleneck in doris fe when there are many logs to be output in sync mode while asynchronous logging has a better performance, and we find that capturing caller location has a similar impact across all logging libraries, and slows down asynchronous logging by about 30-100x. so, here we provide three log mode for log4j2 to meet the needs of different users.
refer to https://logging.apache.org/log4j/2.x/performance.html
2023-06-14 15:16:04 +08:00
f707dc9395 [fix](stats) Fix NPE when analyze database sync (#20775) 2023-06-14 15:01:02 +08:00
20ac940711 [Bug](pipeline) fix bug for file scan node on pipeline engine (#20763) 2023-06-14 12:52:56 +08:00
1c394f4964 Fix](Nereids) insert into table not need unpartitioned as root fragment's data partition (#20737) 2023-06-14 11:57:41 +08:00
8726047f86 [fix](nereids) select text as minimum column unexpected (#20745)
column of string and text types has width -1, and shouldn't be considered as minimum size column
2023-06-14 11:49:22 +08:00
35c19daec7 [opt](routine load) log BE id when get partitions failed. (#20749)
Add BackendId when get partitions failed to make debug error easier.
2023-06-13 19:15:05 +08:00
37db0145b4 [fix](load) fix mysql load parse response npe (#20699) 2023-06-13 18:14:03 +08:00
7636dd1fdc [fix](nereids) always use colocate scan when agg's fragment has olap scan (#20695) 2023-06-13 17:59:17 +08:00
7942bd0bf9 [fix](planner) cast string literal to date like type should not be an implict cast (#20709)
1. cast string literal to date like type should not be an implict cast
2. the string representation of float like type should not be scientific notation
3. the data type of like function's regex expr should be string type even if it's a null literal
4. add -Xss4m in fe.conf to prevent stack overflow in some case
2023-06-13 17:57:14 +08:00
0e82c0d7a2 [Fix](Nereids) constant folding for function timestamp() (#20607) 2023-06-13 17:41:58 +08:00
54a7dbeb4d [Refactor](External) Move Common ODBC Methods to JDBC Class and Add Default config to Disable ODBC Creation (#20566)
This PR addresses the refactoring of common methods that were originally located within the ODBC classes, but were used by the JDBC classes. These methods have now been moved to the JDBC classes to improve code readability and maintainability.

In addition, we have disabled the creation of ODBC external tables by default. However, this will not affect the existing usage of ODBC. You can still enable the ODBC external tables through the enable_odbc_table setting. Please be aware that we plan to completely remove the ODBC external tables in future versions, so we recommend using the JDBC Catalog as a priority.
2023-06-13 14:29:04 +08:00
eaa13e66f9 [fix](planner) inplement constant folding for function to_monday() (#20708) 2023-06-13 11:40:44 +08:00
ee0e2b40da [Improvement](meta) support return brief info of restore job (#20653) 2023-06-13 10:47:31 +08:00
e28187feb7 [fix](hive) fix NPE of hive meta store client (#20664)
The failed to connect to hive meta store, the exception will be thrown.
But there is a bug that the exception object may not be set, causing NPE.
2023-06-13 09:41:49 +08:00
57656b2459 [Enhancement](java-udf) java-udf module split to sub modules (#20185)
The java-udf module has become increasingly large and difficult to manage, making it inconvenient to package and use as needed. It needs to be split into multiple sub-modules, such as : java-commom、java-udf、jdbc-scanner、hudi-scanner、 paimon-scanner.

Co-authored-by: lexluo <lexluo@tencent.com>
2023-06-13 09:41:22 +08:00
73ad885e19 [Feature][Fix](multi-catalog) Implements transactional hive full acid tables. (#20679)
After supporting insert-only transactional hive full acid tables #19518, #19419, this PR support transactional hive full acid tables.

Support hive3 transactional hive full acid tables.
Hive2 transactional hive full acid tables need to run major compactions.
2023-06-13 08:55:16 +08:00
939575f5f3 [fix](mtmv)create mtmv failed when not specifying refresh strategy #20696
* fix no refresh error

* add ut
2023-06-13 08:53:24 +08:00
412ca9059e [fix](routine-load) fix stackoverflow bug in routine load (#20704)
When executing routine load job, there may encounter StackOverflowException.
This is because the expr in column setting list will be analyze for each routine load sub task,
and there is a self-reference bug that may cause endless loop when analyzing expr.

The following columns expr list may trigger this bug:

```
columns(col1, col2,
col2=null_or_empty(col2),
col1=null_or_empty(col2))
```

This fix is verified by user, but I can't add regression test for this case, because I can't submit a routine load job
in our regression test, and this bug can only be triggered in routine load.
2023-06-13 00:07:56 +08:00
565095eb52 [bug](function) fix is_null/is_not_null check is_const has error (#20562)
fix is_null/is_not_null check is_const has error
2023-06-12 18:21:12 +08:00
daf18a4b0e [fix](MTMV) Support refreshing data manually (#20108) 2023-06-12 17:57:06 +08:00
99c0592157 [Feature](array-function) Support array_pushback function #17417 (#19988)
Implement array_pushback.

mysql> select array_pushback([1, 2], 3);
+--------------------------------+
| array_pushback(ARRAY(1, 2), 3) |
+--------------------------------+
| [1, 2, 3]                      |
+--------------------------------+
1 row in set (0.01 sec)
2023-06-12 16:51:12 +08:00
141813b476 [tpcds](nereids) estimate distribution cost by byte size instead of row count (#20642)
this pr impacts tpch q16 Agg strategy, but no performance issue
this pr improves tpcds sf100

before:
cold 141 sec
hot 133 sec

after:
code 137 sec
hot 128 sec
2023-06-12 16:23:49 +08:00