Commit Graph

347 Commits

Author SHA1 Message Date
33f5a86e69 [fix](array-type) forbid to create materialized view for array column (#12543)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-09-15 11:08:23 +08:00
beeb0ef3eb [Bug](lead) fix wrong child expression of lead function (#12587) 2022-09-15 08:44:18 +08:00
d4cb0bbdd5 [test](nereids) Add TPC-H regression test cases for nereids (#12600)
forbidden some test cases that could not run success. Will be open if we fix corresponding bugs
2022-09-14 22:37:56 +08:00
3130a19fe9 [feature](regression) Enhancement regression frame, support http post… (#12565) 2022-09-14 15:31:59 +08:00
3543f85ae5 [feature](nereids) merge push down and remove redundant operator rules into one batch (#12569)
1. For some related rules, we need to execute them together to get the expected plan.
2. Add session variables to avoid fallback to stale planner when running regression tests of nereids for piggyback.
2022-09-14 14:37:36 +08:00
8448867bed [regression-test](window-function) add big table in regression of window function #12562 2022-09-14 08:43:24 +08:00
56b2fc43d4 [enhancement](array-type) shrink column suffix zero for type ARRAY<CHAR> (#12443)
In compute level, CHAR type will shrink suffix zeros.
To keep the logic the same as CHAR type, we also shrink for ARRAY or ARRAY<ARRAY> types.

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-09-13 23:24:48 +08:00
58508aea13 [enhance](information_schema) show hll type and bitmap type instead of unknown (#12519)
Before this pr, when querying data type of hll/bitmap column, 'unknown' would be returned instead of the correct data type of queried column.
2022-09-13 19:43:42 +08:00
6b52e47805 [fix](agg)the intermediate slots should be materialized as output slots (#12441)
in some case, the output slots of agg info may be materialized by call SlotDescriptor's materializeSrcExpr method, but not the intermediate slots. This pr set intermediate slots materialized info to keep consistent with output slots.
2022-09-13 11:28:27 +08:00
353f9e3782 [regression](json) add a nullable case for stream load with json format (#12505) 2022-09-13 10:45:01 +08:00
97cb095010 [test](join)add test join case4 #12508 2022-09-13 09:09:49 +08:00
8be5527be4 [test](join)add some join cases (#12501) 2022-09-13 08:59:32 +08:00
4c73755b40 [test](window-function) add regression test of window function (#12529) 2022-09-13 08:58:19 +08:00
fc605779ed [fix](array-type) support to export the array type to hdfs (#12504)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-09-12 10:23:33 +08:00
efd2bdb203 [improvement](new-scan) avoid too many scanner context scheduling (#12491)
When select large number of data from a table, the profile will show that:

- ScannerCtxSchedCount: 2.82664M(2826640)
But there is only 8 times of ScannerSchedCount, most of them are busy running.
After improvement, the ScannerCtxSchedCount will be reduced to only 10.
2022-09-12 10:22:54 +08:00
a6a378c9ca [fix](regression-test) remove 2 regression cases for nereids temporarily which blocked the pipeline (#12517)
removed below cases in regression suite: nereids_syntax_p0/sub_query_correlated
1. qt_not_exists_unCorrelated
2. qt_not_exist_uncorr
2022-09-09 22:20:35 +08:00
2b62ac2fef [Feature](Nereids) Main framework for selecting rollup index. (#12464)
# Proposed changes
First step of #12303 

## Problem summary

This is the first step for supporting rollup index selection for aggregate/unique key OLAP table.

This PR aims to select rollup index when the aggregate node is present and the aggregate function matches the value type. So pre-aggregation is turned on by default.  Cases that pre-aggregation should be turned off will be addressed in the next PR.

Main steps for rollup index selection: 

1. filter rollup indexes with all the required columns.
2. filter rollup indexes that match the key prefix most.
3. order the rollup indexes by row count, column count, rollup index id.

TODO remaining:
1. address cases that pre-aggregation should be turned off. (next PR)
2. add more test cases. 

Refactor
- Add `Project.getSlotToProducer` to extract a map from the project output slot to its producing expression.
- Add `Filter.getConjuncts` to split the filter condition to conjunctive predicates.
- Move the usage of `ExpressionReplacer` to `ExpressionUtils.replace(expr, replaceMap)` to simplify the code.
2022-09-09 18:14:31 +08:00
dc7e5ca039 [fix](nereids) uncorrelated subquery can't get the correct result (#12421)
When the current non-correlated subquery is executed, an error will be reported that the corresponding column cannot be found.
The reason is that the tupleID of the child obtained in visitPhysicalNestedLoopJoin is not consistent with the child.

The non-correlated subquery will trigger this bug because it uses crossJoin.
At the same time, sub-query regression tests for non-associative and complex scenarios have been added

Co-authored-by: morrySnow <morrysnow@126.com>
2022-09-09 18:08:34 +08:00
b1db8aef58 [regression](array-type) add some case for array insert (#12474)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-09-09 11:18:06 +08:00
9225dd16ca [fix](grouping sets) grouping sets cause be core or return wrong results (#12313) 2022-09-08 14:55:50 +08:00
74ffdbeebc [feature](Nereids) Support OneRowRelation and EmptyRelation (#12416)
Support OneRowRelation and EmptyRelation.

OneRowRelation: `select 100, 'abc', substring('abc', 1, 2)`
EmptyRelation: `select * from tbl limit 0`

Note:
PhysicalOneRowRelation will translate to UnionNode(constExpr) for BE execution
2022-09-08 12:21:13 +08:00
a536030979 [FOLLOWUP](load) fix nullable and add regression (#12375)
* [FOLLOWUP](load) fix nullable and add regression
2022-09-08 00:05:04 +08:00
184be8d13c [fix](array-type) ARRAY is not supported in bloomfilter index (#12353)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-09-07 18:00:01 +08:00
09b45f2b71 [Function](ELT)Add elt function (#12321) 2022-09-07 15:21:08 +08:00
46776af2a3 [fix](Nereids)plan translator lost other conjuncts on hash join node (#12391)
In the earlier PR #11812 , we split join condition into two parts: hash join conjuncts and other condition. But we forgot to translate other condition into other conjuncts in HashJoinNode of legacy planner. So we get wrong result if query has other condition on join node. Such as:

SELECT * FROM lineorder INNER JOIN part ON lo_partkey = p_partkey WHERE lo_orderkey > p_size;
2022-09-07 11:32:05 +08:00
449d0c219f [Improvement](sort) Accumulate blocks to do partial sort (#12336) 2022-09-07 10:34:28 +08:00
d410797200 [fix](regression p0) fix regression p0 test qt_window_hang2 always failed because of timeout #12388
Co-authored-by: wuhangze <wuhangze@jd.com>
2022-09-07 10:08:12 +08:00
9ccc39c164 [Enhancement](regression) add regression tests for executeSQL http rest api #12265 2022-09-07 10:02:37 +08:00
a465549f5e [feature](Nereids)support parse and analyze having clause (#12129)
Implement the having clause for Nereids Planner.

NOTE:

This PR aims at making Nereids Planner generate the correct logical plan and physical plan only. The runtime correctness is not the goal in this PR due to GROUP BY is not ready in Nereids Planner.
2022-09-07 09:47:03 +08:00
772e5907f2 [enhancement](test) add some p0 cases (#12240) 2022-09-07 09:10:42 +08:00
3a0aae1b82 [enhancement](explain)add projections and output id in explain string (#12358)
In earlier PR #11842, we add the ability of projection on each ExecNode.
But, we cannot get the projection expr list in explain. This is inconvenience to debug.
This PR add them into explain string if they exist.
2022-09-06 21:03:02 +08:00
b8cc576cba [fix](array-type) add data valid check for ARRAY type while insert or load (#12283)
Add data valid check for ARRAY type while insert or load
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-09-06 20:48:58 +08:00
4e95b3afaf [test](nereids) add subquery regression Testing (#12372)
Added regression test of sub-queries. Currently only associated sub-queries are added. Non-associated sub-queries will be added after project revision.
2022-09-06 16:37:17 +08:00
2019cf9406 [regression](test) add tpcds sf1 unique test (#12268) 2022-09-06 10:12:00 +08:00
86fa0e38e2 [fix](join) hash join should use children's output tuple ids not output tableref ids (#12261) 2022-09-06 09:53:45 +08:00
a47eb55d7c [regression](load)split dataset to cover more situation (#12311) 2022-09-05 19:25:01 +08:00
dadfd85c40 prune for agg with constant expr (#12274)
Currently, nereids doesn't support aggregate function with no slot reference in query, since all the column would be pruned, e.g.

SELECT COUNT(1) FROM t;

This PR reserve the column with the smallest amount of data when doing column prune under this situation.

To be noticed, this PR ONLY handle aggregate functions. So projection with no slot reference need to be handled in future.
2022-09-05 19:09:00 +08:00
8bfb89c100 [feature-wip](array-type) Add some regression tests for nested array (#12322)
#11392 made _input_block in each BetaRowsetReaders sharable. However, for some types (e.g. nested array with more than 1 depth), the _column_vector_batches in RowBlockV2 can be nested which means that there is a ColumnVectorBatch inside another ColumnVectorBatch. In this case, the data of inner ColumnVectorBatch
may be corrupted because the data of _input_block is copied shallowly to the _output_block.
2022-09-05 14:05:24 +08:00
34dd67f804 [feature](nereids) add weekOfYear to support ssb-flat benchmark (#12207)
support function WeekOfYear
In current implementation, WeekOfYear can be used in where clause, but not in select clause.
2022-09-03 12:04:51 +08:00
e7303c12c7 [Enhancement](array-type) Support Floating/Decimal type for array aggregation functions (#12271) 2022-09-03 09:55:56 +08:00
0d33c713d1 [Bug](CTAS) Fix CTAS error for use agg column as first. (#12299)
* FIX: ctas default use duplicate key.
2022-09-02 20:44:01 +08:00
7f7a3a7524 [feature](nereids) Convert subqueries into algebraic expressions and … (#11454)
1.Convert subqueries to Apply nodes.
2.Convert ApplyNode to ordinary join.

### Detailed design:

There are three types of current subexpressions, scalarSubquery, inSubquery, and Exists. The scalarSubquery refers to the returned data as 1 row and 1 column.

**Subquery replacement**

```
before:
scalarSubquery:  filter(t1.a = scalarSubquery(output b));
inSubquery:  filter(inSubquery);   inSubquery = (t1.a in select ***);
exists:  filter(exists);   exists = (select ***);

end:
scalarSubquery:  filter(t1.a = b);
inSubquery:  filter(True);
exists:  filter(True);
```

**Subquery Transformation Rules**

```
PushApplyUnderFilter
 * before:
 *             Apply
 *          /              \
 * Input(output:b)    Filter(Correlated predicate/UnCorrelated predicate)
 *
 * after:
 *          Filter(Correlated predicate)
 *                      |
 *                  Apply
 *                /            \
 *      Input(output:b)    Filter(UnCorrelated predicate)
```

```
PushApplyUnderProject
 * before:
 *            Apply
 *         /              \
 * Input(output:b)    Project(output:a)
 *
 * after:
 *          Project(b,(if the Subquery is Scalar add 'a' as the output column))
 *          /               \
 * Input(output:b)      Apply
```

```
ApplyPullFilterOnAgg
 * before:
 *             Apply
 *          /              \
 * Input(output:b)    agg(output:fn,c; group by:null)
 *                              |
 *              Filter(Correlated predicate(Input.e = this.f)/UnCorrelated predicate)
 *
 * end:
 *          Apply(Correlated predicate(Input.e = this.f))
 *         /              \
 * Input(output:b)    agg(output:fn,this.f; group by:this.f)
 *                              |
 *                    Filter(UnCorrelated predicate)
```

```
ApplyPullFilterOnProjectUnderAgg
 * before:
 *              apply
 *         /              \
 * Input(output:b)        agg
 *                         |
 *                  Project(output:a)
 *                         |
 *              Filter(correlated predicate(Input.e = this.f)/Unapply predicate)
 *                          |
 *                         child
 *              apply
 *         /              \
 * Input(output:b)        agg
 *                         |
 *              Filter(correlated predicate(Input.e = this.f)/Unapply predicate)
 *                         |
 *                  Project(output:a,this.f, Unapply predicate(slots))
 *                          |
 *                         child

```

```
ScalarToJoin
 * UnCorrelated -> CROSS_JOIN
 * Correlated -> LEFT_OUTER_JOIN
```

```
InToJoin
 * Not In -> LEFT_ANTI_JOIN
 * In -> LEFT_SEMI_JOIN
```

```
existsToJoin
 * Exists
 *    Correlated -> LEFT_SEMI_JOIN
 *      correlated                  LEFT_SEMI_JOIN(Correlated Predicate)
 *      /       \         -->       /           \
 *    input    queryPlan          input        queryPlan
 *
 *    UnCorrelated -> CROSS_JOIN(limit(1))
 *      uncorrelated                    CROSS_JOIN
 *      /           \          -->      /       \
 *    input        queryPlan          input    limit(1)
 *                                               |
 *                                             queryPlan
 *
 * Not Exists
 *    Correlated -> LEFT_ANTI_JOIN
 *      correlated                  LEFT_ANTI_JOIN(Correlated Predicate)
 *       /       \         -->       /           \
 *     input    queryPlan          input        queryPlan
 *
 *   UnCorrelated -> CROSS_JOIN(Count(*))
 *                                    Filter(count(*) = 0)
 *                                          |
 *         apply                       Cross_Join
 *      /       \         -->       /           \
 *    input    queryPlan          input       agg(output:count(*))
 *                                               |
 *                                             limit(1)
 *                                               |
 *                                             queryPlan
```
2022-09-02 17:34:19 +08:00
f8eb480bec [fix](emptynode)fix empty node bug in vec engine (#12258)
* [fix](emptynode)fix empty node bug in vec engine

* update fe ut
2022-09-01 18:52:10 +08:00
ad8e2f4749 [fix](rpc) fix that coordinator rpc timeout too large may make show load blocked for long time (#12152)
Co-authored-by: wuhangze <wuhangze@jd.com>
2022-09-01 18:05:37 +08:00
3bcab8bbef [feature](function) support now/current_timestamp functions with precision (#12219)
* [feature](function) support now/current_timestamp functions with precision
2022-09-01 14:35:12 +08:00
f294d33332 [bugfix](index) index page should not be bitshuffle decoded (#12231)
* [bugfix](index) index page should not be bitshuffle decoded

* minor change
2022-09-01 11:56:44 +08:00
fc05d54f0d [fix](array-type) array_sort function with empty input #12175
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-09-01 10:54:09 +08:00
65051d67cf [fix](yearweek) fixed the yearweek result error when mode is set to 1 (#12234) 2022-09-01 09:46:38 +08:00
d7e02a9514 [fix](join)join reorder by mistake (#12113) 2022-09-01 09:46:01 +08:00
f3cb0c24ee [enhancement](test) add restore action and s3 helper methond (#12084)
Co-authored-by: morrySnow <morrysnow@126.com>
Co-authored-by: SWJTU-ZhangLei <1091517373@qq.com>
2022-08-31 23:08:23 +08:00