Commit Graph

297 Commits

Author SHA1 Message Date
f1539761e8 [Bugfix](string_functions) rearrange code to avoid global buffer overflow in FindInSetOp::execute (#12677) 2022-09-21 09:19:38 +08:00
c5b6056b7a [fix](lateral_view) fix lateral view explode_split with temp table (#12643)
Problem describe:

follow SQL return wrong result:
WITH example1 AS ( select 6 AS k1 ,'a,b,c' AS k2) select k1, e1 from example1 lateral view explode_split(k2, ',') tmp as e1;

Wrong result:

+------+------+
| k1   | e1   |
+------+------+
|    0 | a    |
|    0 | b    |
|    0 | c    |
+------+------+
Correct result should be:
+------+------+
| k1   | e1   |
+------+------+
|    6 | a    |
|    6 | b    |
|    6 | c    |
+------+------+
Why?
TableFunctionNode::outputSlotIds do not include column k1.

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-09-21 09:19:18 +08:00
7dfbb7c639 [chore](regression-test) add order by column in tpch_sf1_p1/tpch_sf1/nereids/q11.groovy (#12770) 2022-09-20 22:26:24 +08:00
cc072d35b7 [Bug](date) Fix wrong type in TimestampArithmeticExpr (#12727) 2022-09-20 21:08:48 +08:00
954c44db39 [enhancement](Nereids) compare LogicalProperties with output set instead of output list (#12743)
We used output list to compare two LogicalProperties before. Since join reorder will change the children order of a join plan and caused output list changed. the two join plan will not equals anymore in memo although they should be. So we must add a project on the new join to keep the LogicalProperties the same.
This PR changes the equals and hashCode funtions of LogicalProperties. use a set of output to compare two LogicalProperties. Then we do not need add the top peoject anymore. This help us keep memo simple and efficient.
2022-09-20 10:55:29 +08:00
ca3e52a0bb [fix](agg)the output of window function's nullability should be consistent with output slot (#12607)
FE may force window function to output a nullable value in some case, be should follow this and change the nullability accordingly.
2022-09-20 09:29:44 +08:00
4f27692898 [fix](inlineview)the inlineview's slots' nullability property is not set correctly (#12681)
The output slots of inline view may come from an outer join nullable side table. So it's should be nullable.
2022-09-20 09:29:15 +08:00
d68b8cce1a [fix](intersect) fix intersect query failed in row storage code (#12712) 2022-09-19 11:47:50 +08:00
fb9e48a34a [fix](vstream load) Fix bug when load json with jsonpath (#12660) 2022-09-19 10:13:18 +08:00
1fa65708d7 [test](time_add or sub)add time_add and time_sub funcation case #12641 2022-09-19 09:22:53 +08:00
4669fa54cc [enhancement](test) add tpch_sf100_unique p2 test (#12697) 2022-09-19 09:19:17 +08:00
6d3ae1e69c [regression](left join)Add left join, the left table is empty, the query result is not empty case (#12344)
Add left join, the left table is empty, the query result is not empty case
2022-09-19 08:53:50 +08:00
fa8ed2bccc [fix](array-type) fix the invalid format load for stream load (#12424)
this pr is used to fix the invalid format load for stream load.
before the change , we will get the error when we load the invalid array format.
the origin file to load :
1 [1, 2, 3]
2 [4, 5, 6]
3 \N
4 [7, \N, 8]
5 10, 11, 12
[hugo@xafj-palo]$ sh curl_cmd.sh
{
"TxnId": 11035,
"Label": "11c9f111-188e-4616-9a50-aec8b7814513",
"TwoPhaseCommit": "false",
"Status": "Fail",
"Message": "Array does not start with '[' character, found '1'",
"NumberTotalRows": 0,
"NumberLoadedRows": 0,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 55,
"LoadTimeMs": 7,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 2,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 3,
"CommitAndPublishTimeMs": 0
}
3. after this change, we will get success and the error url which report the error line.
[hugo@xafj-palo]$ sh curl_cmd.sh
{
"TxnId": 11046,
"Label": "249808ee-55f4-4c08-b671-b3d82689d614",
"TwoPhaseCommit": "false",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 5,
"NumberLoadedRows": 4,
"NumberFilteredRows": 1,
"NumberUnselectedRows": 0,
"LoadBytes": 55,
"LoadTimeMs": 39,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 2,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 19,
"CommitAndPublishTimeMs": 16,
"ErrorURL": "http://10.81.85.89:8502/api/_load_error_log?file=__shard_3/error_log_insert_stmt_8d4130f0c18aeb0a-ad7ffd4233c41893_8d4130f0c18aeb0a_ad7ffd4233c41893"
}

the sql select result:
MySQL [example_db]> select * from array_test06;
+------+--------------+
| k1 | k2 |
+------+--------------+
| 1 | [1, 2, 3] |
| 2 | [4, 5, 6] |
| 3 | NULL |
| 4 | [7, NULL, 8] |
+------+--------------+
4 rows in set (0.019 sec)

the url page show us:
"Reason: Invalid format for array column(k2). src line [10, 11, 12]; "

Issue Number: #7570
2022-09-19 08:52:59 +08:00
625ac83f72 [enhancement](test) add opensky cases to p2 (#12693) 2022-09-19 08:38:17 +08:00
fc8f4c787d [enhancement](test) add yandex_metrica cases to p2 (#12692) 2022-09-19 08:37:48 +08:00
e9f105aa1e [enhancement](regression-test) add some p0 cases (#12243) 2022-09-18 17:36:08 +08:00
c30453e9ab [enhancement](regression-test) add ssb_sf100 to p2 cases (#12286) 2022-09-18 17:35:16 +08:00
2e41976b07 update tpch regression test (#12687)
turn on all TPC-H sf1 test cases except Q2. Q2 caused dead loop in Join reorder. Will turn on Q2 after fix it.
2022-09-17 17:06:39 +08:00
3030a3606a [fix](load) fix stream load fail when setting strict mode (#12684) 2022-09-17 17:02:11 +08:00
e01986b8b9 [feature](light-schema-change) fix light-schema-change and add more cases (#12160)
Fix _delete_sign_idx and _seq_col_idx when append_column or build_schema when load.
Tablet schema cache support recycle when schema sptr use count equals 1.
Add a http interface for flink-connector to sync ddl.
Improve tablet->tablet_schema() by max_version_schema.
2022-09-17 11:29:36 +08:00
a4a5dae7dc [enhancement](test) add tpcds_sf100 to p2 cases (#12296) 2022-09-16 17:38:23 +08:00
9d6c199553 [Bug](vec) Fix avg overflow in clickbench (#12621) 2022-09-16 14:43:40 +08:00
8364165e30 [regression_test](testcase) add regression test case from session variable skip_storage_engine_merge, skip_delete_predicate and show_hidden_columns (#12617)
also add this function to new olap scan node.
2022-09-16 10:33:12 +08:00
380e3695f8 [test](window-function) add cte test in regression of window function #12635 2022-09-16 10:27:50 +08:00
2a063355ad [fix](vstream load) Fix the default value insertion problem when importing json (#12601)
* [fix](vstream load) Fix the default value insertion problem when importing json

* update
2022-09-16 09:54:45 +08:00
a97f63141e [fix](cast) Add validity check for date conversion for non-vectorization (#12608)
actual result
select cast("0.0000031417" as date);
+------------------------------+
| CAST('0.0000031417' AS DATE) |
+------------------------------+
| 2000-00-00 |
+------------------------------+

expect result
select cast("0.0000031417" as date);
+------------------------------+
| CAST('0.0000031417' AS DATE) |
+------------------------------+
| NULL |
+------------------------------+
2022-09-16 09:08:53 +08:00
5b6d48ed5b [feature](nereids) support distinct count (#12159)
support distinct count with group by clause.
for example:
SELECT count(distinct c_custkey + 1) FROM customer group by c_nation;

TODO: support distinct count without group by clause.
2022-09-15 13:01:47 +08:00
beeb0ef3eb [Bug](lead) fix wrong child expression of lead function (#12587) 2022-09-15 08:44:18 +08:00
d4cb0bbdd5 [test](nereids) Add TPC-H regression test cases for nereids (#12600)
forbidden some test cases that could not run success. Will be open if we fix corresponding bugs
2022-09-14 22:37:56 +08:00
8448867bed [regression-test](window-function) add big table in regression of window function #12562 2022-09-14 08:43:24 +08:00
56b2fc43d4 [enhancement](array-type) shrink column suffix zero for type ARRAY<CHAR> (#12443)
In compute level, CHAR type will shrink suffix zeros.
To keep the logic the same as CHAR type, we also shrink for ARRAY or ARRAY<ARRAY> types.

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-09-13 23:24:48 +08:00
58508aea13 [enhance](information_schema) show hll type and bitmap type instead of unknown (#12519)
Before this pr, when querying data type of hll/bitmap column, 'unknown' would be returned instead of the correct data type of queried column.
2022-09-13 19:43:42 +08:00
6b52e47805 [fix](agg)the intermediate slots should be materialized as output slots (#12441)
in some case, the output slots of agg info may be materialized by call SlotDescriptor's materializeSrcExpr method, but not the intermediate slots. This pr set intermediate slots materialized info to keep consistent with output slots.
2022-09-13 11:28:27 +08:00
353f9e3782 [regression](json) add a nullable case for stream load with json format (#12505) 2022-09-13 10:45:01 +08:00
97cb095010 [test](join)add test join case4 #12508 2022-09-13 09:09:49 +08:00
8be5527be4 [test](join)add some join cases (#12501) 2022-09-13 08:59:32 +08:00
4c73755b40 [test](window-function) add regression test of window function (#12529) 2022-09-13 08:58:19 +08:00
a6a378c9ca [fix](regression-test) remove 2 regression cases for nereids temporarily which blocked the pipeline (#12517)
removed below cases in regression suite: nereids_syntax_p0/sub_query_correlated
1. qt_not_exists_unCorrelated
2. qt_not_exist_uncorr
2022-09-09 22:20:35 +08:00
2b62ac2fef [Feature](Nereids) Main framework for selecting rollup index. (#12464)
# Proposed changes
First step of #12303 

## Problem summary

This is the first step for supporting rollup index selection for aggregate/unique key OLAP table.

This PR aims to select rollup index when the aggregate node is present and the aggregate function matches the value type. So pre-aggregation is turned on by default.  Cases that pre-aggregation should be turned off will be addressed in the next PR.

Main steps for rollup index selection: 

1. filter rollup indexes with all the required columns.
2. filter rollup indexes that match the key prefix most.
3. order the rollup indexes by row count, column count, rollup index id.

TODO remaining:
1. address cases that pre-aggregation should be turned off. (next PR)
2. add more test cases. 

Refactor
- Add `Project.getSlotToProducer` to extract a map from the project output slot to its producing expression.
- Add `Filter.getConjuncts` to split the filter condition to conjunctive predicates.
- Move the usage of `ExpressionReplacer` to `ExpressionUtils.replace(expr, replaceMap)` to simplify the code.
2022-09-09 18:14:31 +08:00
dc7e5ca039 [fix](nereids) uncorrelated subquery can't get the correct result (#12421)
When the current non-correlated subquery is executed, an error will be reported that the corresponding column cannot be found.
The reason is that the tupleID of the child obtained in visitPhysicalNestedLoopJoin is not consistent with the child.

The non-correlated subquery will trigger this bug because it uses crossJoin.
At the same time, sub-query regression tests for non-associative and complex scenarios have been added

Co-authored-by: morrySnow <morrysnow@126.com>
2022-09-09 18:08:34 +08:00
b1db8aef58 [regression](array-type) add some case for array insert (#12474)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-09-09 11:18:06 +08:00
a536030979 [FOLLOWUP](load) fix nullable and add regression (#12375)
* [FOLLOWUP](load) fix nullable and add regression
2022-09-08 00:05:04 +08:00
09b45f2b71 [Function](ELT)Add elt function (#12321) 2022-09-07 15:21:08 +08:00
46776af2a3 [fix](Nereids)plan translator lost other conjuncts on hash join node (#12391)
In the earlier PR #11812 , we split join condition into two parts: hash join conjuncts and other condition. But we forgot to translate other condition into other conjuncts in HashJoinNode of legacy planner. So we get wrong result if query has other condition on join node. Such as:

SELECT * FROM lineorder INNER JOIN part ON lo_partkey = p_partkey WHERE lo_orderkey > p_size;
2022-09-07 11:32:05 +08:00
449d0c219f [Improvement](sort) Accumulate blocks to do partial sort (#12336) 2022-09-07 10:34:28 +08:00
a465549f5e [feature](Nereids)support parse and analyze having clause (#12129)
Implement the having clause for Nereids Planner.

NOTE:

This PR aims at making Nereids Planner generate the correct logical plan and physical plan only. The runtime correctness is not the goal in this PR due to GROUP BY is not ready in Nereids Planner.
2022-09-07 09:47:03 +08:00
772e5907f2 [enhancement](test) add some p0 cases (#12240) 2022-09-07 09:10:42 +08:00
b8cc576cba [fix](array-type) add data valid check for ARRAY type while insert or load (#12283)
Add data valid check for ARRAY type while insert or load
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-09-06 20:48:58 +08:00
4e95b3afaf [test](nereids) add subquery regression Testing (#12372)
Added regression test of sub-queries. Currently only associated sub-queries are added. Non-associated sub-queries will be added after project revision.
2022-09-06 16:37:17 +08:00
2019cf9406 [regression](test) add tpcds sf1 unique test (#12268) 2022-09-06 10:12:00 +08:00