Commit Graph

1306 Commits

Author SHA1 Message Date
fe18cfa2fb [improvement](pg jdbc)Support for automatically obtaining the precision of the postgresql timestamp type (#20909) 2023-06-16 23:41:09 +08:00
367f64e7bd [improvement](jdbc) support insert autoinc and default value column to mysql (#20765)
In JdbcMysqlClient, I've added methods to retrieve auto-increment and default value columns from MySQL. These columns are then mapped into Doris metadata to make them visible to users.

When handling the InsertStmt into an execution plan, Doris used to automatically fill in NULL or default values for columns not specified in the InsertStmt. However, in the JDBC catalog, we don't need Doris to handle these unspecified columns, so I've made changes to skip them directly.

For the insert prepared statement required for writing, our previous behavior was to obtain all columns for placeholders. So, the change I made is to pass in the columns processed by the execution plan during the sink task generation stage for dynamic generation.
2023-06-16 23:38:11 +08:00
e834637a5b [improvement](ck jdbc) Support for automatically getting the precision of clickhouse's datetime64 type (#20887) 2023-06-16 23:37:30 +08:00
bf197ee8d2 [opt](nereids) adjust cost model for BroadCastJoin and PartitionJoin (#20713)
we add penalty for broadcast join (bc for brief in the following).
the intuition of penalty is as follow:
1. if the build side is very small (< 1M), we prefer bc, and set `penalty=1`, which means no penalty
2. if build side is more than 1M, we consider the ratio of the probe row count to the build row count. the less the ratio is, the higher penalty is.

this pr has positive impact on tpch queries. Only q3 is changed. in out test (tpch 1T, 3BE) q3 improved from 5.1sec to 2.5 sec.
this pr has positive impact on tpcds queries. test on tpcds sf100 (3BE), cold run improve from 163 sec to 156 sec, hot run improves from 155 sec to 149 sec
2023-06-16 22:49:04 +08:00
5dc0f90c7f [opt](Nereids) revert convert IN with 2 options to OR expression rule (#20894)
revert this rule because it has negative effect on predicate push-down-to-storage-layer
2023-06-16 19:11:37 +08:00
5573858cb4 [Enhance](regression-test) use another db than test_query_db for nereids_p0 (#19467)
replace test_query_db to nereids_test_query_db in nereids_p0 directory
split test_join into five files to run faster.
2023-06-16 16:11:14 +08:00
97135a1cbb [Feature] (json)add json_contains function (#20824) 2023-06-16 15:10:12 +08:00
179b933ccc [test](regression) update some case in p2 (#20683) 2023-06-16 10:09:22 +08:00
d9b3c2aba2 [improvement](jdbc) support support get mysql information_schema's table and clickhouse system's table (#20768) 2023-06-15 14:53:51 +08:00
Pxl
01e53f4e67 [Bug](materialized-view) fix problems about create mv on ssb_flat q4.1 failed (#20658)
fix problems about create mv on ssb_flat q4.1 failed
2023-06-15 14:38:21 +08:00
09d187ec77 [improvement](ck jdbc) Optimized reading of datetime and ip types of the ClickHouse JDBC Catalog (#20804) 2023-06-14 23:28:08 +08:00
Pxl
a0d4f11667 [Bug](function) catch error state in function cast to avoid core dump (#20751)
catch error state in function cast to avoid core dump
2023-06-14 17:34:34 +08:00
1c9f107185 [feature](nereids) support match syntax (#20781)
Support match syntax in nereids.
match syntax use like:
```sql
select * from test where msg match "hello";
select * from test where msg match_any "hello";
select * from test where msg match_all "hello hi";
select * from test where msg match_phrase "hello world";
```
`match` is same as `match_any`.

the pr of match syntax in original planner: https://github.com/apache/doris/pull/14211
2023-06-14 17:30:27 +08:00
affe36d32e [test](find_in_set) add find_in_set function test case (#20718) 2023-06-14 09:43:48 +08:00
7636dd1fdc [fix](nereids) always use colocate scan when agg's fragment has olap scan (#20695) 2023-06-13 17:59:17 +08:00
7942bd0bf9 [fix](planner) cast string literal to date like type should not be an implict cast (#20709)
1. cast string literal to date like type should not be an implict cast
2. the string representation of float like type should not be scientific notation
3. the data type of like function's regex expr should be string type even if it's a null literal
4. add -Xss4m in fe.conf to prevent stack overflow in some case
2023-06-13 17:57:14 +08:00
feb21fc9e9 [fix](group_concat) use default seperator ',' instead of ', ' for group_concat, to be consistant with mysql (#20741) 2023-06-13 17:20:29 +08:00
2adf5169e6 [improvement](test) improve p2 case of githubevents (#20727)
Check rows of github_events table after restore finish.
2023-06-13 14:31:24 +08:00
73ad885e19 [Feature][Fix](multi-catalog) Implements transactional hive full acid tables. (#20679)
After supporting insert-only transactional hive full acid tables #19518, #19419, this PR support transactional hive full acid tables.

Support hive3 transactional hive full acid tables.
Hive2 transactional hive full acid tables need to run major compactions.
2023-06-13 08:55:16 +08:00
c25c19bddc [test](regression) Add cases to test join condition push and not like (#20453)
Add testing cases to issue #19613
2023-06-12 18:26:23 +08:00
99c0592157 [Feature](array-function) Support array_pushback function #17417 (#19988)
Implement array_pushback.

mysql> select array_pushback([1, 2], 3);
+--------------------------------+
| array_pushback(ARRAY(1, 2), 3) |
+--------------------------------+
| [1, 2, 3]                      |
+--------------------------------+
1 row in set (0.01 sec)
2023-06-12 16:51:12 +08:00
141813b476 [tpcds](nereids) estimate distribution cost by byte size instead of row count (#20642)
this pr impacts tpch q16 Agg strategy, but no performance issue
this pr improves tpcds sf100

before:
cold 141 sec
hot 133 sec

after:
code 137 sec
hot 128 sec
2023-06-12 16:23:49 +08:00
0b228b3414 [fix](load)Support load json data with default value (#20624)
* support json default value

---------

Co-authored-by: duanxujian <duanxujian@jd.com>
2023-06-12 14:51:31 +08:00
10134ea8c6 [fix](planner) fix RewriteInPredicateRule may be useless (#20668)
Issue Number: close #20669

RewriteInPredicateRule may cast InPredicate expr's two child to the same type, for example: where cast(age as char) in ('11'), the type of age is int, RewriteInPredicateRule will cast expr's two child type to int. As in the example above, child 0 will be such struct: 
```
child 0: type: int
    |---  child: type : char
            |-- child: type : int
```

Due to the RewriteInPredicateRule cast the type of the expr to int, it will reanalyze stmt, but it will reset stmt first before reanalyze the stmt, and reset opt will change child 0 to such struct:
```
child: type : char
    |-- child: type : int
```
It cause two child's type will be cast to varchar in func castAllToCompatibleType, the logic of RewriteInPredicateRule will be useless.

In 1.1-lts and 1.2-lts, such case  " where cast(age as char) in ('11')"  can't work well,  because func castAllToCompatibleType will cast int to char but int can't cast to char(master can work well because func castAllToCompatibleType will cast int to varchar in such case).
```
MySQL [test]> select user_id from test_cast where cast(age as char) in ('45');
ERROR 1105 (HY000): errCode = 2, detailMessage = type not match, originType=INT, targeType=CHAR(*)
```
2023-06-12 14:39:01 +08:00
Pxl
7f8c5c81e7 [Feature](agg_state) support agg_state combinator on nereids (#20164)
support agg_state combinator on nereids
2023-06-12 12:49:26 +08:00
bcc37c9405 [fix](planner)the common type of floating and decimal should be floating type (#20634)
* [fix](planner)the common type of floating and decimal should be floating type

* fix test cases
2023-06-12 11:32:23 +08:00
4c340f2851 [Feature] (Multi-Catalog) support query hll column in doris jdbc table - part 1 (#19413)
Issue Number: close #17895
2023-06-12 11:16:19 +08:00
a347063390 [fix](case expr) fix coredump of case for null value 2 (#20635)
fix coredump of case for null value 2
2023-06-11 23:08:53 +08:00
987b29ded5 [fix](nereids)avoid to derive rowCount NaN (#20523)
the formula used to compute ndv after filter implies that the new rowCount is smaller than the original rowCount. When we apply this formula to join, we should add branch if new row count is bigger than original row count.
when new row count is bigger, the ndv is not changed.
2023-06-10 15:40:14 +08:00
def6a8ec94 [regression](nereids) check tpch sf1T and sf500 plan shape on 3 BE environment #20610 2023-06-09 22:46:40 +08:00
656b9ad3da [enhancement](index) Nereids support no need to read raw data for index column that only in filter conditions (#20605) 2023-06-09 21:54:48 +08:00
54504fb61d [opt](Nereids) remove running in OptimizeGroup to avoid recompute on it parent (#20608)
we have some prunning path logical in cascades framework. However it do not work as we expected. if we do prunning on one Group, then maybe we need to do thousands of times optimization on its parent without any success result. This PR remove these prunning provisionally. We will add prunning back when we re-design it.
2023-06-09 19:16:39 +08:00
44e20d9087 [feature](Nereids): push down alias into union outputs. (#20543) 2023-06-09 11:53:44 +08:00
4c6df9062e [fix](DECIMALV3)fix cumulative precision when literal and DECIMALV3 operations in Legacy (#20354)
The precision handling for division with DECIMALV3 is as follows (excluding cases where division increases precision):

(p1, s1) / (p2, s2) ----> (p1 + s2, s1)

However, due to precision loss in division, it is considered to increase the precision of the left operand:

(p1, s1) / (p2, s2) =====> (p1 + s2, s1 + s2) / (p2, s2) ----> (p1 + s2, s1)

However, the legacy optimizer repeats the analyze and substitute steps for an expression, which can result in the accumulation of precision:

(p1, s1) / (p2, s2) =====> (p1 + s2, s1 + s2) / (p2, s2) =====> (p1 + s2 + s2, s1 + s2 + s2) / (p2, s2)

To address this, the previous approach was to forcibly convert the left operand of DECIMALV3 calculations. This results in rewriting the expression as:

(p1, s1) / (p2, s2) =====> cast((p1, s1) as (p1 + s2, s1 + s2)) / (p2, s2)

Then, during the substitution step, a check is performed. If it is a cast expression, the expression modified by the cast is extracted:

cast((p1, s1) as (p1 + s2, s1 + s2)) =====> (p1, s1)

protected Expr substituteImpl(ExprSubstitutionMap smap, ExprSubstitutionMap disjunctsMap, Analyzer analyzer) {
        if (isImplicitCast()) {
            return getChild(0).substituteImpl(smap, disjunctsMap, analyzer);
        }
This way, there won't be repeated analysis, preventing the continuous increase in precision. However, if the left expression is a constant (literal), theoretically, the precision would continue to increase. Unfortunately, the code that was removed in this PR (#19926) obscured this issue.

for (Expr child : children) {
    if (child instanceof DecimalLiteral && child.getType().isDecimalV3()) {
      ((DecimalLiteral)child).tryToReduceType();
    }
}
An attempt will be made to reduce the precision of literals in the expressions. However, this code snippet can cause such a bug.

mysql [test]>select cast(1 as DECIMALV3(16, 2)) /  cast(3 as DECIMALV3(16, 2));
+-----------------------------------------------------------+
| CAST(1 AS DECIMALV3(16, 2)) / CAST(3 AS DECIMALV3(16, 2)) |
+-----------------------------------------------------------+
|                                                      0.00 |
+-----------------------------------------------------------+
1.00 / 3.00, due to reduced precision, becomes 1 / 3.
<--Describe your changes.-->
2023-06-09 08:58:55 +08:00
845d459f05 [Fix](orc-reader) Fix some bugs of orc lazy materialization. (#20410)
Fix some bugs of orc lazy materialization(#18615)
- Fix issue causing column size to continuously increase after `execute_conjuncts()` by calling `Block::erase_useless_column()`.
- Fix partition issues of orc lazy materialization. 
- Fix lazy materialization will not be used when the predicate column is inconsistent with the orc file.
2023-06-09 08:53:01 +08:00
dd71e101d3 [fix](case expr) fix coredump of case for null value (#20564)
be coredump when when expr is null:
2023-06-08 20:05:23 +08:00
4faee4d8fd [Fix](multi-catalog) Fix be crashed when query hive table after schema changed(new column added). (#20537)
Fix be crashed when query hive table after schema changed(new column added).

Regression Test: test_hive_schema_evolution.groovy
2023-06-08 18:10:36 +08:00
Pxl
a56449f86e [Bug](Agg-state) try to make test_agg_state stable (#20574)
try to make test_agg_state stable
2023-06-08 17:17:51 +08:00
Pxl
5fe7106b83 [Bug](planner) fix pre condition check fail on max(null) (#20509)
fix pre condition check fail on max(null)
2023-06-08 14:49:52 +08:00
24fb05ec83 [Bug](row-store) Fix row store with materialize index (#20356)
If a query hits a materialized view that has row storage enabled, but the row storage column is not present in the materialized view, it will result in a query crash. Therefore, it is necessary to include the row storage column when creating the materialized view, and serialize the row storage column during the execution of SchemaChange.
2023-06-08 10:55:22 +08:00
d2d6ce5d0b [fix](nereids) add push down filter and project through cte anchor rules (#20547)
we should not plan any Filter or Project above CteAnchor, because there are project or filter under anchor sometimes.
and the whole plan can not translate to a valid plan for BE.
2023-06-08 10:34:42 +08:00
187bf14d81 [feature-wip](auto-inc)(step-1) add syntax support for duplicate table (#20284)
Co-authored-by: yifeng <cnissnzg@126.com>
2023-06-07 22:01:28 +08:00
Pxl
36216f0925 [Bug](Agg-State) fix coredump when state combinator input const column (#20510)
fix coredump when state combinator input const column
2023-06-07 11:28:55 +08:00
cd70c37402 [fix](nereids) filter and project node should be pushed down through cte (#20508)
1.move PushdownFilterThroughCTEAnchor and PushdownProjectThroughCTEAnchor into PUSH_DOWN_FILTERS rule set
2.move PushdownFilterThroughProject before MergeProjectPostProcessor
2023-06-07 10:36:32 +08:00
49f8f20fb1 [fix](regex) String with Chinese characters matching failed (#20493) 2023-06-07 07:27:47 +08:00
05bdbce8fc [Feature](Nereids) support update unique table statement (#20313) 2023-06-06 20:32:43 +08:00
ae428c29e2 [feature](planner)(nereids) support user defined variable (#20334)
Support user-defined variables.
After this PR, we can use `set @a = xx` to define a user variable and use it in the query like `select @a`.

the changes of this PR:
1. Support the grammar for `set user variable` in the parser.
2. Add the `userVars` in `VariableMgr` to store the user-defined variables.
3. For the `set @a = xx`, we will store the variable name and its value in the `userVars` in `VariableMgr`.
4. For the `select @a`, we will get the value for the variable name in `userVars`.
2023-06-06 14:35:16 +08:00
1f032a551d [Improve](array-functions) support array first function (#20397)
add array_first(lambda, [1,2,3,null]) function for doris
2023-06-06 12:08:46 +08:00
1b94b6368f [fix](load) in strict mode, return error for insert if datatype convert fails (#20378)
* [fix](load) in strict mode, return error for load and insert if datatype convert fails

Revert "[fix](MySQL) the way Doris handles boolean type is consistent with MySQL (#19416)"

This reverts commit 68eb420cabe5b26b09d6d4a2724ae12699bdee87.

Since it changed other behaviours, e.g. in strict mode insert into t_int values ("a"),
it will result 0 is inserted into table, but it should return error instead.

* fix be ut

* fix regression tests
2023-06-06 12:04:03 +08:00
e553615a27 [opt](Nereids) perfer use datev2 / datetimev2 in date related functions (#20224)
1. update all date related functions' signatures order. 
1.1. if return value need to be compute with time info, args with datetimev2 at the top of the list, followed by datev2, datetime and date
1.2. if return value need to be compute with only date info, args with datev2 at the top of list, followed by datetimev2, date and datetime
2. Priority for use datev2, if we must cast date to datev2 or datetime/datetimev2
2023-06-06 11:42:29 +08:00