Commit Graph

6125 Commits

Author SHA1 Message Date
34dd67f804 [feature](nereids) add weekOfYear to support ssb-flat benchmark (#12207)
support function WeekOfYear
In current implementation, WeekOfYear can be used in where clause, but not in select clause.
2022-09-03 12:04:51 +08:00
62561834a8 [Feature](array-type) Support is-null-predicate for array type (#12237) 2022-09-03 11:37:57 +08:00
e7303c12c7 [Enhancement](array-type) Support Floating/Decimal type for array aggregation functions (#12271) 2022-09-03 09:55:56 +08:00
5d0b1868c2 [chore](docs)Add compile check for document format (#12300)
Add compile check for document format

Avoid document formatting issues that fail in the daily build release of the official website
so that we can find problems and fix them in time to avoid repeated modifications
Since the compiler for the website is now in the doris-website repo, we pull the code from this repo, delete the documentation inside, and copy the documentation from doris master to perform the compiler check
2022-09-03 09:44:20 +08:00
b154a1b45e [doc] fix some docs issue (#11101)
* fix some docs issue

* add -y for apt-get

Co-authored-by: chaow <941210239@qq.com>
2022-09-02 21:06:12 +08:00
c944496fb4 [chore](log) add cluster and tag message to exception (#12287) 2022-09-02 20:46:39 +08:00
0d33c713d1 [Bug](CTAS) Fix CTAS error for use agg column as first. (#12299)
* FIX: ctas default use duplicate key.
2022-09-02 20:44:01 +08:00
1fd3490c56 remove duplicate "comments" (#12264) 2022-09-02 18:57:10 +08:00
7f7a3a7524 [feature](nereids) Convert subqueries into algebraic expressions and … (#11454)
1.Convert subqueries to Apply nodes.
2.Convert ApplyNode to ordinary join.

### Detailed design:

There are three types of current subexpressions, scalarSubquery, inSubquery, and Exists. The scalarSubquery refers to the returned data as 1 row and 1 column.

**Subquery replacement**

```
before:
scalarSubquery:  filter(t1.a = scalarSubquery(output b));
inSubquery:  filter(inSubquery);   inSubquery = (t1.a in select ***);
exists:  filter(exists);   exists = (select ***);

end:
scalarSubquery:  filter(t1.a = b);
inSubquery:  filter(True);
exists:  filter(True);
```

**Subquery Transformation Rules**

```
PushApplyUnderFilter
 * before:
 *             Apply
 *          /              \
 * Input(output:b)    Filter(Correlated predicate/UnCorrelated predicate)
 *
 * after:
 *          Filter(Correlated predicate)
 *                      |
 *                  Apply
 *                /            \
 *      Input(output:b)    Filter(UnCorrelated predicate)
```

```
PushApplyUnderProject
 * before:
 *            Apply
 *         /              \
 * Input(output:b)    Project(output:a)
 *
 * after:
 *          Project(b,(if the Subquery is Scalar add 'a' as the output column))
 *          /               \
 * Input(output:b)      Apply
```

```
ApplyPullFilterOnAgg
 * before:
 *             Apply
 *          /              \
 * Input(output:b)    agg(output:fn,c; group by:null)
 *                              |
 *              Filter(Correlated predicate(Input.e = this.f)/UnCorrelated predicate)
 *
 * end:
 *          Apply(Correlated predicate(Input.e = this.f))
 *         /              \
 * Input(output:b)    agg(output:fn,this.f; group by:this.f)
 *                              |
 *                    Filter(UnCorrelated predicate)
```

```
ApplyPullFilterOnProjectUnderAgg
 * before:
 *              apply
 *         /              \
 * Input(output:b)        agg
 *                         |
 *                  Project(output:a)
 *                         |
 *              Filter(correlated predicate(Input.e = this.f)/Unapply predicate)
 *                          |
 *                         child
 *              apply
 *         /              \
 * Input(output:b)        agg
 *                         |
 *              Filter(correlated predicate(Input.e = this.f)/Unapply predicate)
 *                         |
 *                  Project(output:a,this.f, Unapply predicate(slots))
 *                          |
 *                         child

```

```
ScalarToJoin
 * UnCorrelated -> CROSS_JOIN
 * Correlated -> LEFT_OUTER_JOIN
```

```
InToJoin
 * Not In -> LEFT_ANTI_JOIN
 * In -> LEFT_SEMI_JOIN
```

```
existsToJoin
 * Exists
 *    Correlated -> LEFT_SEMI_JOIN
 *      correlated                  LEFT_SEMI_JOIN(Correlated Predicate)
 *      /       \         -->       /           \
 *    input    queryPlan          input        queryPlan
 *
 *    UnCorrelated -> CROSS_JOIN(limit(1))
 *      uncorrelated                    CROSS_JOIN
 *      /           \          -->      /       \
 *    input        queryPlan          input    limit(1)
 *                                               |
 *                                             queryPlan
 *
 * Not Exists
 *    Correlated -> LEFT_ANTI_JOIN
 *      correlated                  LEFT_ANTI_JOIN(Correlated Predicate)
 *       /       \         -->       /           \
 *     input    queryPlan          input        queryPlan
 *
 *   UnCorrelated -> CROSS_JOIN(Count(*))
 *                                    Filter(count(*) = 0)
 *                                          |
 *         apply                       Cross_Join
 *      /       \         -->       /           \
 *    input    queryPlan          input       agg(output:count(*))
 *                                               |
 *                                             limit(1)
 *                                               |
 *                                             queryPlan
```
2022-09-02 17:34:19 +08:00
08c5e0b1e3 [chore](deps) strip debug info of thirdparty dependencies (#12284)
Strip debug info of most of thridparty dependencies' static lib.
If can significantly reduce the size of thirdparty libs: 3.4G -> 1.6G
And the doris_be binary size will be reduced: 1.5G -> 868M (clang build)
And after compress, the BE binary is only 195M with debug info!
2022-09-02 15:43:29 +08:00
64302ff4c9 [typo](docs)Sidebar fix (#12297)
* sidebar fix
2022-09-02 15:09:26 +08:00
81c5732dc7 [feature-wip](MTMV) Support creating materialized view for multiple tables (#11646)
Support creating materialized view for multiple tables.

Examples:

mysql> CREATE TABLE t1 (pk INT, v1 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1');
mysql> CREATE TABLE t2 (pk INT, v2 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1');
mysql> CREATE MATERIALIZED VIEW mv BUILD IMMEDIATE REFRESH COMPLETE KEY (mv_pk) DISTRIBUTED BY HASH (mv_pk) PROPERTIES ('replication_num' = '1') AS SELECT t1.pk as mv_pk FROM t1, t2 WHERE t1.pk = t2.pk;
2022-09-02 14:51:56 +08:00
Pxl
a8c8ebf5cf [Enhancement](compaction) empty string optimize for binary dict code (#12259)
improve write empty string perfomance.
2022-09-02 14:25:19 +08:00
7a4173b497 [typo](docs)Fix admin copy table format (#12288)
Fix admin copy table format
2022-09-02 14:08:56 +08:00
202ad5c659 [feature-wip](parquet-reader) bug fix, the number of rows are different among columns in a block (#12228)
1. `ExprContext` is delete in `ParquetReader::close()`, but it has not been closed,
so the `DCHECH` in `~ExprContext()` is failed. the lifetime of `ExprContext` is managed by scan node,
so we should not delete its pointer in `ParquetReader::close()`.
2. `RowGroupReader::next_batch` will update `_read_rows` in every column loop,
and does not ensure the number of rows in every column are equal.
3.  The skipped row ranges are variables in stack, which are released when calling `ArrayColumnReader::read_column_data`, so we should copy them out.
2022-09-02 09:50:25 +08:00
3ce6bb548d doc_stream_load_format (#12144)
doc_stream_load_format
2022-09-02 09:22:10 +08:00
10c3e683dd [docs]update users numbers (#12248)
update users numbers
2022-09-02 09:21:36 +08:00
1c91257c01 📝 fix create table doc typo (#12269)
fix create table doc typo
2022-09-02 09:20:46 +08:00
061b49b7bf [doc](website) update SHOW-PROC doc (#12229) 2022-09-01 19:50:25 +08:00
58c1d6ce9d [typo](docs)Modify the maximum handle parameter reference #12244 2022-09-01 19:50:07 +08:00
87086ffe31 [enhancment](Nereids)enable normalize aggregate rule (#12194)
enable normalize aggregate rule introduced by #12013
2022-09-01 19:20:37 +08:00
3ce305134a [fix](scan) fix potential wrong cancel when sql has limit (#12224) 2022-09-01 19:11:40 +08:00
f8eb480bec [fix](emptynode)fix empty node bug in vec engine (#12258)
* [fix](emptynode)fix empty node bug in vec engine

* update fe ut
2022-09-01 18:52:10 +08:00
ad8e2f4749 [fix](rpc) fix that coordinator rpc timeout too large may make show load blocked for long time (#12152)
Co-authored-by: wuhangze <wuhangze@jd.com>
2022-09-01 18:05:37 +08:00
068e60145e [enhancement](Nereids)ban groupPlan() pattern to avoid misuse (#12250)
`groupPlan()` pattern means to find a `GroupPlan` in memo. Since we have no `GroupPlan` in memo, it is always return nothing.
When we want write a pattern to match any GROUP, we should use `group()`. But pattern `groupPlan` is very confusing, and easy misuse.
So, this PR ban `groupPlan()` pattern ti avoid misuse.
2022-09-01 14:37:48 +08:00
3bcab8bbef [feature](function) support now/current_timestamp functions with precision (#12219)
* [feature](function) support now/current_timestamp functions with precision
2022-09-01 14:35:12 +08:00
c5481dfdf7 [fix](remote)Fix bug for Segment::open() in case: config::file_cache_type (#12249)
* fix bug for Segment::open() in case: config::file_cache_type

* fix bug for Segment::open() in case: config::file_cache_type
2022-09-01 14:16:41 +08:00
df51c78593 [fix](dbt)fix dbt run abnormal #12242 2022-09-01 12:10:48 +08:00
f294d33332 [bugfix](index) index page should not be bitshuffle decoded (#12231)
* [bugfix](index) index page should not be bitshuffle decoded

* minor change
2022-09-01 11:56:44 +08:00
fc05d54f0d [fix](array-type) array_sort function with empty input #12175
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-09-01 10:54:09 +08:00
8c8078ad28 [fix](projections) get error row_descriptor when have projections on ExecNode (#12232)
When ExecNode's projections is not empty, it use output row descriptor to initialize the block before doing projection. But we should use original row descriptor. This PR fix it.
2022-09-01 10:48:10 +08:00
60a2fa7dea [Improvement](compaction) copy row in batch in VCollectIterator&VGenericIterator (#12214)
In VCollectIterator&VGenericIterator, use insert_range_from to copy rows
in a block which is continuous to save cpu cost.

If rows in rowset and segment are non overlapping, this whill improve 30%
throughput of compaction.If rows are completely overlapping such as load two
same files, the throughput goes nearly same as before.

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-09-01 10:20:17 +08:00
90fb3b7783 [Improvement](load) accelerate tablet sink (#12174) 2022-09-01 10:08:09 +08:00
ec4863b63a [feature-wip](new-scan)Add new file scan node (#12048)
Related pr: #11582
This is the new file scan node and scanner for external hms catalog.
2022-09-01 10:01:20 +08:00
65051d67cf [fix](yearweek) fixed the yearweek result error when mode is set to 1 (#12234) 2022-09-01 09:46:38 +08:00
d7e02a9514 [fix](join)join reorder by mistake (#12113) 2022-09-01 09:46:01 +08:00
f3cb0c24ee [enhancement](test) add restore action and s3 helper methond (#12084)
Co-authored-by: morrySnow <morrysnow@126.com>
Co-authored-by: SWJTU-ZhangLei <1091517373@qq.com>
2022-08-31 23:08:23 +08:00
a49bde8a71 [fix](Nereids)statistics calculator for Project and Aggregate lost some columns (#12196)
There are some bugs in Nereids' StatsCalculator.

1. Project: return child column stats directly, so its parents cannot find column stats from project's slot.
2. Aggregate: do not return column that is Alias, its parents cannot find some column stats from Aggregate's slot.
3. All: use SlotReference as key of column to stats map. So we need change SlotReference's equals and hashCode method to just using ExprId as we discussed.
2022-08-31 20:47:22 +08:00
57051d3591 [fix](Nereids)cast StringType to DateType failed when bind TimestampArithmetic function (#12198)
When bind TimestampArithmetic, we always want to cast left child to DateTimeType. But sometimes, we need to cast it to DateType, this PR fix this problem.
2022-08-31 19:52:03 +08:00
1a198b3777 [typo](docs) Fix Bitmap index and Type QUANTILE_STATE: fix description (#12217)
Fix Bitmap index and Type QUANTILE_STATE: fix description
2022-08-31 18:10:17 +08:00
1cc9eeeb1a [feature-wip](parquet-reader) read and generate array column (#12166)
Read and generate parquet array column.

When D=1, R=0, representing an empty array. Empty array is not a null value, so the NullMap for this row is false,
the offset for this row is [offset_start, offset_end) whose `offset_start == offset_end`,
and offset_end is the start offset of the next row, so there is no value in the nested primitive column.

When D=0, R=0, representing a null array, and the NullMap for this row is true.
2022-08-31 17:08:12 +08:00
573e5476dd [Opt](load) Speed up the vectorized load (#12146)
* [Opt](load) Speed up the vectorized load
2022-08-31 16:23:36 +08:00
90c5180370 [Bug](array-type) Fix bug in creating view from table with array types (#12200) 2022-08-31 14:36:31 +08:00
d7e032bc38 Modify the startup script and print the log without using the --daemon parameter. (#12218) 2022-08-31 14:36:14 +08:00
da4ffd3c56 [Enhancement](metric-type) more readable error message for only metric type #12162
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-08-31 14:35:48 +08:00
c2f7e95c7f [typo](docs)Add somm json format load doc (#12213)
* Specify JSON root document when adding JSON import, and add JSON file size limit parameter prompt
2022-08-31 14:34:55 +08:00
3cdd19821d [fix](sort)the slot in sort node should be nullable if it's outer joined (#12193)
The sort node's output expr should be nullable if it is outer joined.
2022-08-31 14:34:14 +08:00
8999ba34ae [improve](Nereids)unify all plan toString() function (#12132)
Add a Util function to generate uniform format plan toString for easy reading and debugging
2022-08-31 14:28:44 +08:00
8b98e2021e [enhancement](array-type) Array type do not support compare with '=','>', '<', make the error message more readable (#12181)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-08-31 12:49:10 +08:00
254cb321b9 [optimize](remote) Optimize cache reader use a pre-created buffer when downloading the cache (#12165)
* optimize cache reader

* add description for config

* optimize cache reader

* optimize cache reader
2022-08-31 10:15:40 +08:00