Commit Graph

5755 Commits

Author SHA1 Message Date
9225dd16ca [fix](grouping sets) grouping sets cause be core or return wrong results (#12313) 2022-09-08 14:55:50 +08:00
74ffdbeebc [feature](Nereids) Support OneRowRelation and EmptyRelation (#12416)
Support OneRowRelation and EmptyRelation.

OneRowRelation: `select 100, 'abc', substring('abc', 1, 2)`
EmptyRelation: `select * from tbl limit 0`

Note:
PhysicalOneRowRelation will translate to UnionNode(constExpr) for BE execution
2022-09-08 12:21:13 +08:00
a6880ca573 [fix](Nereids) throw IndexOutOfBoundsException in DistributionSpecHash#equalsSatisfy (#12446)
In earlier PR #11976 , we changed DistributionSpecHash#equalsSatisfy, and forgot to check whether the length of both side are same. When required's shuffle slot size longer than current one, exception will be thrown.
2022-09-08 11:41:48 +08:00
dd2f834c79 [feature-wip](parquet-reader) bug fix, create compress codec before parsing dictionary (#12422)
## Fix five bugs:
1. Parquet dictionary data may be compressed, but `ColumnChunkReader` try to parse dictionary data before creating compression codec, causing unexpected data errors.
2. `FE` doesn't resolve array type
3. `ParquetFileHdfsScanner`  doesn't fill partition values when the table is partitioned
4. `ParquetFileHdfsScanner` set `_scanner_eof = true` when a scan range is empty, causing the end of the scanner, and resulting in data loss
5. typographical error in `PageReader`
2022-09-08 09:54:25 +08:00
a536030979 [FOLLOWUP](load) fix nullable and add regression (#12375)
* [FOLLOWUP](load) fix nullable and add regression
2022-09-08 00:05:04 +08:00
bdbce77227 [fix](nereids) cast left child of TimestampArithmetic to wrong type in BindFunction (#12423) 2022-09-07 20:32:47 +08:00
184be8d13c [fix](array-type) ARRAY is not supported in bloomfilter index (#12353)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-09-07 18:00:01 +08:00
941bda5a20 [enhancement](spark-load)support dynamic set env (#12276)
* [enhancement](spark-load)support dynamic set env and display spark appid

* [enhancement](spark-load)support dynamic set env
2022-09-07 16:24:29 +08:00
40f481049a [fix](Nereids)lowest cost plan map do not be merged when do group merge (#12396)
* [fix](Nereids)lowest cost plan map do not be merged when do group merge
2022-09-07 16:13:11 +08:00
f2923f9180 [Refactor](Nereids) Simplify get input and output slots for plan/expression. (#12356)
Simplify the code of getting input/output slots from `Expression` or `Plan`.

**new interfaces add**

`Expression`:
`getInputSlots`: Get all the input slots of the expression.

`Plan`:
- `getOutputSet`: Get the output slot set of the plan.
- `getInputSlots`: Get the input slot set of the plan.

**changed interface**

`TreeNode`:
- `collect`: return `set` as result instead of `list`.
2022-09-07 14:05:37 +08:00
0bb06a1fa7 [feature](Nereids) let nullable of Year, WeekOfYear and Divide be the same as implementation in BE (#12374)
These function/expression should always be nullable, so just return true in the overwrite method.
- Year
- WeekOfYear
- Divide
2022-09-07 13:09:08 +08:00
46776af2a3 [fix](Nereids)plan translator lost other conjuncts on hash join node (#12391)
In the earlier PR #11812 , we split join condition into two parts: hash join conjuncts and other condition. But we forgot to translate other condition into other conjuncts in HashJoinNode of legacy planner. So we get wrong result if query has other condition on join node. Such as:

SELECT * FROM lineorder INNER JOIN part ON lo_partkey = p_partkey WHERE lo_orderkey > p_size;
2022-09-07 11:32:05 +08:00
42bdde8750 [Feature](Vectorized) support jdbc scan node (#12010) 2022-09-07 10:29:41 +08:00
232d17efea [Enhancement](sparkload) cast the src slot types of bitmap columns to bitmap when FE push tasks in spark load (#12394)
In the current spark load implementation, the types of source data, that BE reads from the Broker, are all set to varchar.
However, the two types of varchar and bitmap are not compatible anymore after version 1.1.0, which will cause spark load failure.

An example of spark load error message:

detailMessage = type not match, originType=VARCHAR(*), targeType=BITMAP
Describe your changes.

Set the src type of the bitmap columns from varchar to bitmapwhen fe pushtasks.
2022-09-07 10:07:38 +08:00
a465549f5e [feature](Nereids)support parse and analyze having clause (#12129)
Implement the having clause for Nereids Planner.

NOTE:

This PR aims at making Nereids Planner generate the correct logical plan and physical plan only. The runtime correctness is not the goal in this PR due to GROUP BY is not ready in Nereids Planner.
2022-09-07 09:47:03 +08:00
55fb90d6ae [feature](Nereids)add colocate, shuffle and bucket shuffle join algorithm to Nereids (#11976)
This PR
1. add support below join algorithm already supported by legacy to Nereids
- colocate join
- bucket shuffle join
- shuffle join
- broadcast join

2. update all cost enforce derive utils
- ChildOutputPropertyDeriver
- EnforceMissingPropertiesHelper
- RequestPropertyDeriver

3. add a local quick sort plan used in enforce
4. set PhysicalProperties to PhysicalPlan when choose best plan from memo
5. rename Job#pushTask to Job#pushJob
2022-09-07 00:31:21 +08:00
4c36e3dfa6 [fix](Nereids)LogicalAggregate's equals and hashCode missing two attributes (#12393)
After applying NormalizeAggregate rule, owner groups of all aggregate children are removed.
The root cause is the new aggregate node is regarded as the old aggregate node, because LogicalAggregate.equals() does not take some attributes ("normalized", "disassembled") into account.
2022-09-07 00:07:26 +08:00
3a0aae1b82 [enhancement](explain)add projections and output id in explain string (#12358)
In earlier PR #11842, we add the ability of projection on each ExecNode.
But, we cannot get the projection expr list in explain. This is inconvenience to debug.
This PR add them into explain string if they exist.
2022-09-06 21:03:02 +08:00
f1507f93ee [enhancement](chore)add single empty line rule to fe check style for Nereids (#12365) 2022-09-06 14:19:59 +08:00
d7dedfadad [fix](nereids) fix dead loop in unnesting subquery rule (#12345)
[fix](nereids) fix dead loop in unnesting subquery rule
2022-09-06 11:50:30 +08:00
53b79d5a8c [Enhancement](restore) new add the property of reserve_replica to restore statement (#11942)
Add a new property called 'reserve_replica', which means you can
get a table with same partitions with the same replication num
as before the backup.

Co-authored-by: Stalary <stalary@163.com>
Co-authored-by: camby <104178625@qq.com>
2022-09-06 10:32:21 +08:00
86fa0e38e2 [fix](join) hash join should use children's output tuple ids not output tableref ids (#12261) 2022-09-06 09:53:45 +08:00
f2aa87d797 Add ctas support config key type ut and doc. (#12327) 2022-09-06 09:16:02 +08:00
190717dbcc [enhancement](chore)add single space separator rule to fe check style (#12354)
Some times, our code use more than one space as separator by mistake. This PR add a CheckStyle rule SingleSpaceSeparator to check that for Nereids.
2022-09-05 21:59:58 +08:00
698bae09b2 [fix](Nereids)get NPE and group not be optimized when add REWRITE rule to Cascades Optimzer (#12346)
Fix some bugs when add REWRITE rule to Cascades Optimizer
- all rule should set as not rewrite rule when use them in Cascades Optimizer
- IMPLEMENT rule promise should large than others since we should do exploration first.
2022-09-05 19:11:48 +08:00
f466a072d8 fix bug: tpch-q12 invalid type (#12347)
In old planner, Predicate set its type in analyzeImpl(). However, function analyzeImpl() is in old planner path, but not in nereids path. And hence the type is invalid.

Because all predicate has type bool, we set its type in constructor.
2022-09-05 19:09:27 +08:00
dadfd85c40 prune for agg with constant expr (#12274)
Currently, nereids doesn't support aggregate function with no slot reference in query, since all the column would be pruned, e.g.

SELECT COUNT(1) FROM t;

This PR reserve the column with the smallest amount of data when doing column prune under this situation.

To be noticed, this PR ONLY handle aggregate functions. So projection with no slot reference need to be handled in future.
2022-09-05 19:09:00 +08:00
8bfb89c100 [feature-wip](array-type) Add some regression tests for nested array (#12322)
#11392 made _input_block in each BetaRowsetReaders sharable. However, for some types (e.g. nested array with more than 1 depth), the _column_vector_batches in RowBlockV2 can be nested which means that there is a ColumnVectorBatch inside another ColumnVectorBatch. In this case, the data of inner ColumnVectorBatch
may be corrupted because the data of _input_block is copied shallowly to the _output_block.
2022-09-05 14:05:24 +08:00
3b104e334a [Bug](load) fix missing nullable info in stream load (#12302) 2022-09-05 13:41:28 +08:00
2398cd3bb6 [enhancement](Nereids)print slot name in explain string (#12272)
Currently, explain string print all expression as slot id, e.g. `<slot 1>`.
This PR, print its name with slot id instead, e.g. `column_a[#1]`. For details:
- print qualified table name for OlapScanNode
- print NamedExpression name with SlotId instead of just SlotId
- OlapScanNode's node name use "OlapScanNode" instead of table name
2022-09-05 11:31:35 +08:00
90a0baf5f8 [fix](array-type) Forbid ARRAY<NOT_NULL(T)> temporarily (#12262)
Currently, there are still lots of bugs related to ARRAY<NOT_NULL(T)>.

We decide that we don't support ARRAY<NOT_NULL(T)> types at the first version and all elements in ARRAY are nullable.

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-09-03 14:26:08 +08:00
34dd67f804 [feature](nereids) add weekOfYear to support ssb-flat benchmark (#12207)
support function WeekOfYear
In current implementation, WeekOfYear can be used in where clause, but not in select clause.
2022-09-03 12:04:51 +08:00
62561834a8 [Feature](array-type) Support is-null-predicate for array type (#12237) 2022-09-03 11:37:57 +08:00
c944496fb4 [chore](log) add cluster and tag message to exception (#12287) 2022-09-02 20:46:39 +08:00
0d33c713d1 [Bug](CTAS) Fix CTAS error for use agg column as first. (#12299)
* FIX: ctas default use duplicate key.
2022-09-02 20:44:01 +08:00
7f7a3a7524 [feature](nereids) Convert subqueries into algebraic expressions and … (#11454)
1.Convert subqueries to Apply nodes.
2.Convert ApplyNode to ordinary join.

### Detailed design:

There are three types of current subexpressions, scalarSubquery, inSubquery, and Exists. The scalarSubquery refers to the returned data as 1 row and 1 column.

**Subquery replacement**

```
before:
scalarSubquery:  filter(t1.a = scalarSubquery(output b));
inSubquery:  filter(inSubquery);   inSubquery = (t1.a in select ***);
exists:  filter(exists);   exists = (select ***);

end:
scalarSubquery:  filter(t1.a = b);
inSubquery:  filter(True);
exists:  filter(True);
```

**Subquery Transformation Rules**

```
PushApplyUnderFilter
 * before:
 *             Apply
 *          /              \
 * Input(output:b)    Filter(Correlated predicate/UnCorrelated predicate)
 *
 * after:
 *          Filter(Correlated predicate)
 *                      |
 *                  Apply
 *                /            \
 *      Input(output:b)    Filter(UnCorrelated predicate)
```

```
PushApplyUnderProject
 * before:
 *            Apply
 *         /              \
 * Input(output:b)    Project(output:a)
 *
 * after:
 *          Project(b,(if the Subquery is Scalar add 'a' as the output column))
 *          /               \
 * Input(output:b)      Apply
```

```
ApplyPullFilterOnAgg
 * before:
 *             Apply
 *          /              \
 * Input(output:b)    agg(output:fn,c; group by:null)
 *                              |
 *              Filter(Correlated predicate(Input.e = this.f)/UnCorrelated predicate)
 *
 * end:
 *          Apply(Correlated predicate(Input.e = this.f))
 *         /              \
 * Input(output:b)    agg(output:fn,this.f; group by:this.f)
 *                              |
 *                    Filter(UnCorrelated predicate)
```

```
ApplyPullFilterOnProjectUnderAgg
 * before:
 *              apply
 *         /              \
 * Input(output:b)        agg
 *                         |
 *                  Project(output:a)
 *                         |
 *              Filter(correlated predicate(Input.e = this.f)/Unapply predicate)
 *                          |
 *                         child
 *              apply
 *         /              \
 * Input(output:b)        agg
 *                         |
 *              Filter(correlated predicate(Input.e = this.f)/Unapply predicate)
 *                         |
 *                  Project(output:a,this.f, Unapply predicate(slots))
 *                          |
 *                         child

```

```
ScalarToJoin
 * UnCorrelated -> CROSS_JOIN
 * Correlated -> LEFT_OUTER_JOIN
```

```
InToJoin
 * Not In -> LEFT_ANTI_JOIN
 * In -> LEFT_SEMI_JOIN
```

```
existsToJoin
 * Exists
 *    Correlated -> LEFT_SEMI_JOIN
 *      correlated                  LEFT_SEMI_JOIN(Correlated Predicate)
 *      /       \         -->       /           \
 *    input    queryPlan          input        queryPlan
 *
 *    UnCorrelated -> CROSS_JOIN(limit(1))
 *      uncorrelated                    CROSS_JOIN
 *      /           \          -->      /       \
 *    input        queryPlan          input    limit(1)
 *                                               |
 *                                             queryPlan
 *
 * Not Exists
 *    Correlated -> LEFT_ANTI_JOIN
 *      correlated                  LEFT_ANTI_JOIN(Correlated Predicate)
 *       /       \         -->       /           \
 *     input    queryPlan          input        queryPlan
 *
 *   UnCorrelated -> CROSS_JOIN(Count(*))
 *                                    Filter(count(*) = 0)
 *                                          |
 *         apply                       Cross_Join
 *      /       \         -->       /           \
 *    input    queryPlan          input       agg(output:count(*))
 *                                               |
 *                                             limit(1)
 *                                               |
 *                                             queryPlan
```
2022-09-02 17:34:19 +08:00
81c5732dc7 [feature-wip](MTMV) Support creating materialized view for multiple tables (#11646)
Support creating materialized view for multiple tables.

Examples:

mysql> CREATE TABLE t1 (pk INT, v1 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1');
mysql> CREATE TABLE t2 (pk INT, v2 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1');
mysql> CREATE MATERIALIZED VIEW mv BUILD IMMEDIATE REFRESH COMPLETE KEY (mv_pk) DISTRIBUTED BY HASH (mv_pk) PROPERTIES ('replication_num' = '1') AS SELECT t1.pk as mv_pk FROM t1, t2 WHERE t1.pk = t2.pk;
2022-09-02 14:51:56 +08:00
87086ffe31 [enhancment](Nereids)enable normalize aggregate rule (#12194)
enable normalize aggregate rule introduced by #12013
2022-09-01 19:20:37 +08:00
3ce305134a [fix](scan) fix potential wrong cancel when sql has limit (#12224) 2022-09-01 19:11:40 +08:00
f8eb480bec [fix](emptynode)fix empty node bug in vec engine (#12258)
* [fix](emptynode)fix empty node bug in vec engine

* update fe ut
2022-09-01 18:52:10 +08:00
ad8e2f4749 [fix](rpc) fix that coordinator rpc timeout too large may make show load blocked for long time (#12152)
Co-authored-by: wuhangze <wuhangze@jd.com>
2022-09-01 18:05:37 +08:00
068e60145e [enhancement](Nereids)ban groupPlan() pattern to avoid misuse (#12250)
`groupPlan()` pattern means to find a `GroupPlan` in memo. Since we have no `GroupPlan` in memo, it is always return nothing.
When we want write a pattern to match any GROUP, we should use `group()`. But pattern `groupPlan` is very confusing, and easy misuse.
So, this PR ban `groupPlan()` pattern ti avoid misuse.
2022-09-01 14:37:48 +08:00
3bcab8bbef [feature](function) support now/current_timestamp functions with precision (#12219)
* [feature](function) support now/current_timestamp functions with precision
2022-09-01 14:35:12 +08:00
d7e02a9514 [fix](join)join reorder by mistake (#12113) 2022-09-01 09:46:01 +08:00
a49bde8a71 [fix](Nereids)statistics calculator for Project and Aggregate lost some columns (#12196)
There are some bugs in Nereids' StatsCalculator.

1. Project: return child column stats directly, so its parents cannot find column stats from project's slot.
2. Aggregate: do not return column that is Alias, its parents cannot find some column stats from Aggregate's slot.
3. All: use SlotReference as key of column to stats map. So we need change SlotReference's equals and hashCode method to just using ExprId as we discussed.
2022-08-31 20:47:22 +08:00
57051d3591 [fix](Nereids)cast StringType to DateType failed when bind TimestampArithmetic function (#12198)
When bind TimestampArithmetic, we always want to cast left child to DateTimeType. But sometimes, we need to cast it to DateType, this PR fix this problem.
2022-08-31 19:52:03 +08:00
90c5180370 [Bug](array-type) Fix bug in creating view from table with array types (#12200) 2022-08-31 14:36:31 +08:00
da4ffd3c56 [Enhancement](metric-type) more readable error message for only metric type #12162
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-08-31 14:35:48 +08:00
3cdd19821d [fix](sort)the slot in sort node should be nullable if it's outer joined (#12193)
The sort node's output expr should be nullable if it is outer joined.
2022-08-31 14:34:14 +08:00
8999ba34ae [improve](Nereids)unify all plan toString() function (#12132)
Add a Util function to generate uniform format plan toString for easy reading and debugging
2022-08-31 14:28:44 +08:00