Commit Graph

13721 Commits

Author SHA1 Message Date
b154a1b45e [doc] fix some docs issue (#11101)
* fix some docs issue

* add -y for apt-get

Co-authored-by: chaow <941210239@qq.com>
2022-09-02 21:06:12 +08:00
c944496fb4 [chore](log) add cluster and tag message to exception (#12287) 2022-09-02 20:46:39 +08:00
0d33c713d1 [Bug](CTAS) Fix CTAS error for use agg column as first. (#12299)
* FIX: ctas default use duplicate key.
2022-09-02 20:44:01 +08:00
1fd3490c56 remove duplicate "comments" (#12264) 2022-09-02 18:57:10 +08:00
7f7a3a7524 [feature](nereids) Convert subqueries into algebraic expressions and … (#11454)
1.Convert subqueries to Apply nodes.
2.Convert ApplyNode to ordinary join.

### Detailed design:

There are three types of current subexpressions, scalarSubquery, inSubquery, and Exists. The scalarSubquery refers to the returned data as 1 row and 1 column.

**Subquery replacement**

```
before:
scalarSubquery:  filter(t1.a = scalarSubquery(output b));
inSubquery:  filter(inSubquery);   inSubquery = (t1.a in select ***);
exists:  filter(exists);   exists = (select ***);

end:
scalarSubquery:  filter(t1.a = b);
inSubquery:  filter(True);
exists:  filter(True);
```

**Subquery Transformation Rules**

```
PushApplyUnderFilter
 * before:
 *             Apply
 *          /              \
 * Input(output:b)    Filter(Correlated predicate/UnCorrelated predicate)
 *
 * after:
 *          Filter(Correlated predicate)
 *                      |
 *                  Apply
 *                /            \
 *      Input(output:b)    Filter(UnCorrelated predicate)
```

```
PushApplyUnderProject
 * before:
 *            Apply
 *         /              \
 * Input(output:b)    Project(output:a)
 *
 * after:
 *          Project(b,(if the Subquery is Scalar add 'a' as the output column))
 *          /               \
 * Input(output:b)      Apply
```

```
ApplyPullFilterOnAgg
 * before:
 *             Apply
 *          /              \
 * Input(output:b)    agg(output:fn,c; group by:null)
 *                              |
 *              Filter(Correlated predicate(Input.e = this.f)/UnCorrelated predicate)
 *
 * end:
 *          Apply(Correlated predicate(Input.e = this.f))
 *         /              \
 * Input(output:b)    agg(output:fn,this.f; group by:this.f)
 *                              |
 *                    Filter(UnCorrelated predicate)
```

```
ApplyPullFilterOnProjectUnderAgg
 * before:
 *              apply
 *         /              \
 * Input(output:b)        agg
 *                         |
 *                  Project(output:a)
 *                         |
 *              Filter(correlated predicate(Input.e = this.f)/Unapply predicate)
 *                          |
 *                         child
 *              apply
 *         /              \
 * Input(output:b)        agg
 *                         |
 *              Filter(correlated predicate(Input.e = this.f)/Unapply predicate)
 *                         |
 *                  Project(output:a,this.f, Unapply predicate(slots))
 *                          |
 *                         child

```

```
ScalarToJoin
 * UnCorrelated -> CROSS_JOIN
 * Correlated -> LEFT_OUTER_JOIN
```

```
InToJoin
 * Not In -> LEFT_ANTI_JOIN
 * In -> LEFT_SEMI_JOIN
```

```
existsToJoin
 * Exists
 *    Correlated -> LEFT_SEMI_JOIN
 *      correlated                  LEFT_SEMI_JOIN(Correlated Predicate)
 *      /       \         -->       /           \
 *    input    queryPlan          input        queryPlan
 *
 *    UnCorrelated -> CROSS_JOIN(limit(1))
 *      uncorrelated                    CROSS_JOIN
 *      /           \          -->      /       \
 *    input        queryPlan          input    limit(1)
 *                                               |
 *                                             queryPlan
 *
 * Not Exists
 *    Correlated -> LEFT_ANTI_JOIN
 *      correlated                  LEFT_ANTI_JOIN(Correlated Predicate)
 *       /       \         -->       /           \
 *     input    queryPlan          input        queryPlan
 *
 *   UnCorrelated -> CROSS_JOIN(Count(*))
 *                                    Filter(count(*) = 0)
 *                                          |
 *         apply                       Cross_Join
 *      /       \         -->       /           \
 *    input    queryPlan          input       agg(output:count(*))
 *                                               |
 *                                             limit(1)
 *                                               |
 *                                             queryPlan
```
2022-09-02 17:34:19 +08:00
08c5e0b1e3 [chore](deps) strip debug info of thirdparty dependencies (#12284)
Strip debug info of most of thridparty dependencies' static lib.
If can significantly reduce the size of thirdparty libs: 3.4G -> 1.6G
And the doris_be binary size will be reduced: 1.5G -> 868M (clang build)
And after compress, the BE binary is only 195M with debug info!
2022-09-02 15:43:29 +08:00
64302ff4c9 [typo](docs)Sidebar fix (#12297)
* sidebar fix
2022-09-02 15:09:26 +08:00
81c5732dc7 [feature-wip](MTMV) Support creating materialized view for multiple tables (#11646)
Support creating materialized view for multiple tables.

Examples:

mysql> CREATE TABLE t1 (pk INT, v1 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1');
mysql> CREATE TABLE t2 (pk INT, v2 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1');
mysql> CREATE MATERIALIZED VIEW mv BUILD IMMEDIATE REFRESH COMPLETE KEY (mv_pk) DISTRIBUTED BY HASH (mv_pk) PROPERTIES ('replication_num' = '1') AS SELECT t1.pk as mv_pk FROM t1, t2 WHERE t1.pk = t2.pk;
2022-09-02 14:51:56 +08:00
Pxl
a8c8ebf5cf [Enhancement](compaction) empty string optimize for binary dict code (#12259)
improve write empty string perfomance.
2022-09-02 14:25:19 +08:00
7a4173b497 [typo](docs)Fix admin copy table format (#12288)
Fix admin copy table format
2022-09-02 14:08:56 +08:00
202ad5c659 [feature-wip](parquet-reader) bug fix, the number of rows are different among columns in a block (#12228)
1. `ExprContext` is delete in `ParquetReader::close()`, but it has not been closed,
so the `DCHECH` in `~ExprContext()` is failed. the lifetime of `ExprContext` is managed by scan node,
so we should not delete its pointer in `ParquetReader::close()`.
2. `RowGroupReader::next_batch` will update `_read_rows` in every column loop,
and does not ensure the number of rows in every column are equal.
3.  The skipped row ranges are variables in stack, which are released when calling `ArrayColumnReader::read_column_data`, so we should copy them out.
2022-09-02 09:50:25 +08:00
3ce6bb548d doc_stream_load_format (#12144)
doc_stream_load_format
2022-09-02 09:22:10 +08:00
10c3e683dd [docs]update users numbers (#12248)
update users numbers
2022-09-02 09:21:36 +08:00
1c91257c01 📝 fix create table doc typo (#12269)
fix create table doc typo
2022-09-02 09:20:46 +08:00
061b49b7bf [doc](website) update SHOW-PROC doc (#12229) 2022-09-01 19:50:25 +08:00
58c1d6ce9d [typo](docs)Modify the maximum handle parameter reference #12244 2022-09-01 19:50:07 +08:00
87086ffe31 [enhancment](Nereids)enable normalize aggregate rule (#12194)
enable normalize aggregate rule introduced by #12013
2022-09-01 19:20:37 +08:00
3ce305134a [fix](scan) fix potential wrong cancel when sql has limit (#12224) 2022-09-01 19:11:40 +08:00
f8eb480bec [fix](emptynode)fix empty node bug in vec engine (#12258)
* [fix](emptynode)fix empty node bug in vec engine

* update fe ut
2022-09-01 18:52:10 +08:00
ad8e2f4749 [fix](rpc) fix that coordinator rpc timeout too large may make show load blocked for long time (#12152)
Co-authored-by: wuhangze <wuhangze@jd.com>
2022-09-01 18:05:37 +08:00
068e60145e [enhancement](Nereids)ban groupPlan() pattern to avoid misuse (#12250)
`groupPlan()` pattern means to find a `GroupPlan` in memo. Since we have no `GroupPlan` in memo, it is always return nothing.
When we want write a pattern to match any GROUP, we should use `group()`. But pattern `groupPlan` is very confusing, and easy misuse.
So, this PR ban `groupPlan()` pattern ti avoid misuse.
2022-09-01 14:37:48 +08:00
3bcab8bbef [feature](function) support now/current_timestamp functions with precision (#12219)
* [feature](function) support now/current_timestamp functions with precision
2022-09-01 14:35:12 +08:00
c5481dfdf7 [fix](remote)Fix bug for Segment::open() in case: config::file_cache_type (#12249)
* fix bug for Segment::open() in case: config::file_cache_type

* fix bug for Segment::open() in case: config::file_cache_type
2022-09-01 14:16:41 +08:00
df51c78593 [fix](dbt)fix dbt run abnormal #12242 2022-09-01 12:10:48 +08:00
f294d33332 [bugfix](index) index page should not be bitshuffle decoded (#12231)
* [bugfix](index) index page should not be bitshuffle decoded

* minor change
2022-09-01 11:56:44 +08:00
fc05d54f0d [fix](array-type) array_sort function with empty input #12175
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-09-01 10:54:09 +08:00
8c8078ad28 [fix](projections) get error row_descriptor when have projections on ExecNode (#12232)
When ExecNode's projections is not empty, it use output row descriptor to initialize the block before doing projection. But we should use original row descriptor. This PR fix it.
2022-09-01 10:48:10 +08:00
60a2fa7dea [Improvement](compaction) copy row in batch in VCollectIterator&VGenericIterator (#12214)
In VCollectIterator&VGenericIterator, use insert_range_from to copy rows
in a block which is continuous to save cpu cost.

If rows in rowset and segment are non overlapping, this whill improve 30%
throughput of compaction.If rows are completely overlapping such as load two
same files, the throughput goes nearly same as before.

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-09-01 10:20:17 +08:00
90fb3b7783 [Improvement](load) accelerate tablet sink (#12174) 2022-09-01 10:08:09 +08:00
ec4863b63a [feature-wip](new-scan)Add new file scan node (#12048)
Related pr: #11582
This is the new file scan node and scanner for external hms catalog.
2022-09-01 10:01:20 +08:00
65051d67cf [fix](yearweek) fixed the yearweek result error when mode is set to 1 (#12234) 2022-09-01 09:46:38 +08:00
d7e02a9514 [fix](join)join reorder by mistake (#12113) 2022-09-01 09:46:01 +08:00
f3cb0c24ee [enhancement](test) add restore action and s3 helper methond (#12084)
Co-authored-by: morrySnow <morrysnow@126.com>
Co-authored-by: SWJTU-ZhangLei <1091517373@qq.com>
2022-08-31 23:08:23 +08:00
a49bde8a71 [fix](Nereids)statistics calculator for Project and Aggregate lost some columns (#12196)
There are some bugs in Nereids' StatsCalculator.

1. Project: return child column stats directly, so its parents cannot find column stats from project's slot.
2. Aggregate: do not return column that is Alias, its parents cannot find some column stats from Aggregate's slot.
3. All: use SlotReference as key of column to stats map. So we need change SlotReference's equals and hashCode method to just using ExprId as we discussed.
2022-08-31 20:47:22 +08:00
57051d3591 [fix](Nereids)cast StringType to DateType failed when bind TimestampArithmetic function (#12198)
When bind TimestampArithmetic, we always want to cast left child to DateTimeType. But sometimes, we need to cast it to DateType, this PR fix this problem.
2022-08-31 19:52:03 +08:00
1a198b3777 [typo](docs) Fix Bitmap index and Type QUANTILE_STATE: fix description (#12217)
Fix Bitmap index and Type QUANTILE_STATE: fix description
2022-08-31 18:10:17 +08:00
1cc9eeeb1a [feature-wip](parquet-reader) read and generate array column (#12166)
Read and generate parquet array column.

When D=1, R=0, representing an empty array. Empty array is not a null value, so the NullMap for this row is false,
the offset for this row is [offset_start, offset_end) whose `offset_start == offset_end`,
and offset_end is the start offset of the next row, so there is no value in the nested primitive column.

When D=0, R=0, representing a null array, and the NullMap for this row is true.
2022-08-31 17:08:12 +08:00
573e5476dd [Opt](load) Speed up the vectorized load (#12146)
* [Opt](load) Speed up the vectorized load
2022-08-31 16:23:36 +08:00
90c5180370 [Bug](array-type) Fix bug in creating view from table with array types (#12200) 2022-08-31 14:36:31 +08:00
d7e032bc38 Modify the startup script and print the log without using the --daemon parameter. (#12218) 2022-08-31 14:36:14 +08:00
da4ffd3c56 [Enhancement](metric-type) more readable error message for only metric type #12162
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-08-31 14:35:48 +08:00
c2f7e95c7f [typo](docs)Add somm json format load doc (#12213)
* Specify JSON root document when adding JSON import, and add JSON file size limit parameter prompt
2022-08-31 14:34:55 +08:00
3cdd19821d [fix](sort)the slot in sort node should be nullable if it's outer joined (#12193)
The sort node's output expr should be nullable if it is outer joined.
2022-08-31 14:34:14 +08:00
8999ba34ae [improve](Nereids)unify all plan toString() function (#12132)
Add a Util function to generate uniform format plan toString for easy reading and debugging
2022-08-31 14:28:44 +08:00
8b98e2021e [enhancement](array-type) Array type do not support compare with '=','>', '<', make the error message more readable (#12181)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-08-31 12:49:10 +08:00
254cb321b9 [optimize](remote) Optimize cache reader use a pre-created buffer when downloading the cache (#12165)
* optimize cache reader

* add description for config

* optimize cache reader

* optimize cache reader
2022-08-31 10:15:40 +08:00
f0cde35ea6 [performance improvement] Spark Load, SparkDpp processRDDAggregate performance improvement (#12186)
Co-authored-by: hourong <hourong@zhihu.com>
2022-08-31 09:14:13 +08:00
8251e7cbfc [refactor](column) remove confused field (#12187) 2022-08-31 09:13:31 +08:00
f72d2559cf [fix](compile) Fix compile error '<unknown>' may be used uninitialized in PODArray::insert_prepare #12202 2022-08-31 09:12:28 +08:00
f949262ddf [fix](planner) a slot id is bounded on a wrong tuple id, if cross join has a hash join as child (#12156) 2022-08-31 09:07:55 +08:00