b154a1b45e
[doc] fix some docs issue ( #11101 )
...
* fix some docs issue
* add -y for apt-get
Co-authored-by: chaow <941210239@qq.com >
2022-09-02 21:06:12 +08:00
c944496fb4
[chore](log) add cluster and tag message to exception ( #12287 )
2022-09-02 20:46:39 +08:00
0d33c713d1
[Bug](CTAS) Fix CTAS error for use agg column as first. ( #12299 )
...
* FIX: ctas default use duplicate key.
2022-09-02 20:44:01 +08:00
1fd3490c56
remove duplicate "comments" ( #12264 )
2022-09-02 18:57:10 +08:00
7f7a3a7524
[feature](nereids) Convert subqueries into algebraic expressions and … ( #11454 )
...
1.Convert subqueries to Apply nodes.
2.Convert ApplyNode to ordinary join.
### Detailed design:
There are three types of current subexpressions, scalarSubquery, inSubquery, and Exists. The scalarSubquery refers to the returned data as 1 row and 1 column.
**Subquery replacement**
```
before:
scalarSubquery: filter(t1.a = scalarSubquery(output b));
inSubquery: filter(inSubquery); inSubquery = (t1.a in select ***);
exists: filter(exists); exists = (select ***);
end:
scalarSubquery: filter(t1.a = b);
inSubquery: filter(True);
exists: filter(True);
```
**Subquery Transformation Rules**
```
PushApplyUnderFilter
* before:
* Apply
* / \
* Input(output:b) Filter(Correlated predicate/UnCorrelated predicate)
*
* after:
* Filter(Correlated predicate)
* |
* Apply
* / \
* Input(output:b) Filter(UnCorrelated predicate)
```
```
PushApplyUnderProject
* before:
* Apply
* / \
* Input(output:b) Project(output:a)
*
* after:
* Project(b,(if the Subquery is Scalar add 'a' as the output column))
* / \
* Input(output:b) Apply
```
```
ApplyPullFilterOnAgg
* before:
* Apply
* / \
* Input(output:b) agg(output:fn,c; group by:null)
* |
* Filter(Correlated predicate(Input.e = this.f)/UnCorrelated predicate)
*
* end:
* Apply(Correlated predicate(Input.e = this.f))
* / \
* Input(output:b) agg(output:fn,this.f; group by:this.f)
* |
* Filter(UnCorrelated predicate)
```
```
ApplyPullFilterOnProjectUnderAgg
* before:
* apply
* / \
* Input(output:b) agg
* |
* Project(output:a)
* |
* Filter(correlated predicate(Input.e = this.f)/Unapply predicate)
* |
* child
* apply
* / \
* Input(output:b) agg
* |
* Filter(correlated predicate(Input.e = this.f)/Unapply predicate)
* |
* Project(output:a,this.f, Unapply predicate(slots))
* |
* child
```
```
ScalarToJoin
* UnCorrelated -> CROSS_JOIN
* Correlated -> LEFT_OUTER_JOIN
```
```
InToJoin
* Not In -> LEFT_ANTI_JOIN
* In -> LEFT_SEMI_JOIN
```
```
existsToJoin
* Exists
* Correlated -> LEFT_SEMI_JOIN
* correlated LEFT_SEMI_JOIN(Correlated Predicate)
* / \ --> / \
* input queryPlan input queryPlan
*
* UnCorrelated -> CROSS_JOIN(limit(1))
* uncorrelated CROSS_JOIN
* / \ --> / \
* input queryPlan input limit(1)
* |
* queryPlan
*
* Not Exists
* Correlated -> LEFT_ANTI_JOIN
* correlated LEFT_ANTI_JOIN(Correlated Predicate)
* / \ --> / \
* input queryPlan input queryPlan
*
* UnCorrelated -> CROSS_JOIN(Count(*))
* Filter(count(*) = 0)
* |
* apply Cross_Join
* / \ --> / \
* input queryPlan input agg(output:count(*))
* |
* limit(1)
* |
* queryPlan
```
2022-09-02 17:34:19 +08:00
08c5e0b1e3
[chore](deps) strip debug info of thirdparty dependencies ( #12284 )
...
Strip debug info of most of thridparty dependencies' static lib.
If can significantly reduce the size of thirdparty libs: 3.4G -> 1.6G
And the doris_be binary size will be reduced: 1.5G -> 868M (clang build)
And after compress, the BE binary is only 195M with debug info!
2022-09-02 15:43:29 +08:00
64302ff4c9
[typo](docs)Sidebar fix ( #12297 )
...
* sidebar fix
2022-09-02 15:09:26 +08:00
81c5732dc7
[feature-wip](MTMV) Support creating materialized view for multiple tables ( #11646 )
...
Support creating materialized view for multiple tables.
Examples:
mysql> CREATE TABLE t1 (pk INT, v1 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1');
mysql> CREATE TABLE t2 (pk INT, v2 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1');
mysql> CREATE MATERIALIZED VIEW mv BUILD IMMEDIATE REFRESH COMPLETE KEY (mv_pk) DISTRIBUTED BY HASH (mv_pk) PROPERTIES ('replication_num' = '1') AS SELECT t1.pk as mv_pk FROM t1, t2 WHERE t1.pk = t2.pk;
2022-09-02 14:51:56 +08:00
a8c8ebf5cf
[Enhancement](compaction) empty string optimize for binary dict code ( #12259 )
...
improve write empty string perfomance.
2022-09-02 14:25:19 +08:00
7a4173b497
[typo](docs)Fix admin copy table format ( #12288 )
...
Fix admin copy table format
2022-09-02 14:08:56 +08:00
202ad5c659
[feature-wip](parquet-reader) bug fix, the number of rows are different among columns in a block ( #12228 )
...
1. `ExprContext` is delete in `ParquetReader::close()`, but it has not been closed,
so the `DCHECH` in `~ExprContext()` is failed. the lifetime of `ExprContext` is managed by scan node,
so we should not delete its pointer in `ParquetReader::close()`.
2. `RowGroupReader::next_batch` will update `_read_rows` in every column loop,
and does not ensure the number of rows in every column are equal.
3. The skipped row ranges are variables in stack, which are released when calling `ArrayColumnReader::read_column_data`, so we should copy them out.
2022-09-02 09:50:25 +08:00
3ce6bb548d
doc_stream_load_format ( #12144 )
...
doc_stream_load_format
2022-09-02 09:22:10 +08:00
10c3e683dd
[docs]update users numbers ( #12248 )
...
update users numbers
2022-09-02 09:21:36 +08:00
1c91257c01
📝 fix create table doc typo ( #12269 )
...
fix create table doc typo
2022-09-02 09:20:46 +08:00
061b49b7bf
[doc](website) update SHOW-PROC doc ( #12229 )
2022-09-01 19:50:25 +08:00
58c1d6ce9d
[typo](docs)Modify the maximum handle parameter reference #12244
2022-09-01 19:50:07 +08:00
87086ffe31
[enhancment](Nereids)enable normalize aggregate rule ( #12194 )
...
enable normalize aggregate rule introduced by #12013
2022-09-01 19:20:37 +08:00
3ce305134a
[fix](scan) fix potential wrong cancel when sql has limit ( #12224 )
2022-09-01 19:11:40 +08:00
f8eb480bec
[fix](emptynode)fix empty node bug in vec engine ( #12258 )
...
* [fix](emptynode)fix empty node bug in vec engine
* update fe ut
2022-09-01 18:52:10 +08:00
ad8e2f4749
[fix](rpc) fix that coordinator rpc timeout too large may make show load blocked for long time ( #12152 )
...
Co-authored-by: wuhangze <wuhangze@jd.com >
2022-09-01 18:05:37 +08:00
068e60145e
[enhancement](Nereids)ban groupPlan() pattern to avoid misuse ( #12250 )
...
`groupPlan()` pattern means to find a `GroupPlan` in memo. Since we have no `GroupPlan` in memo, it is always return nothing.
When we want write a pattern to match any GROUP, we should use `group()`. But pattern `groupPlan` is very confusing, and easy misuse.
So, this PR ban `groupPlan()` pattern ti avoid misuse.
2022-09-01 14:37:48 +08:00
3bcab8bbef
[feature](function) support now/current_timestamp functions with precision ( #12219 )
...
* [feature](function) support now/current_timestamp functions with precision
2022-09-01 14:35:12 +08:00
c5481dfdf7
[fix](remote)Fix bug for Segment::open() in case: config::file_cache_type ( #12249 )
...
* fix bug for Segment::open() in case: config::file_cache_type
* fix bug for Segment::open() in case: config::file_cache_type
2022-09-01 14:16:41 +08:00
df51c78593
[fix](dbt)fix dbt run abnormal #12242
2022-09-01 12:10:48 +08:00
f294d33332
[bugfix](index) index page should not be bitshuffle decoded ( #12231 )
...
* [bugfix](index) index page should not be bitshuffle decoded
* minor change
2022-09-01 11:56:44 +08:00
fc05d54f0d
[fix](array-type) array_sort function with empty input #12175
...
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com >
2022-09-01 10:54:09 +08:00
8c8078ad28
[fix](projections) get error row_descriptor when have projections on ExecNode ( #12232 )
...
When ExecNode's projections is not empty, it use output row descriptor to initialize the block before doing projection. But we should use original row descriptor. This PR fix it.
2022-09-01 10:48:10 +08:00
60a2fa7dea
[Improvement](compaction) copy row in batch in VCollectIterator&VGenericIterator ( #12214 )
...
In VCollectIterator&VGenericIterator, use insert_range_from to copy rows
in a block which is continuous to save cpu cost.
If rows in rowset and segment are non overlapping, this whill improve 30%
throughput of compaction.If rows are completely overlapping such as load two
same files, the throughput goes nearly same as before.
Co-authored-by: yixiutt <yixiu@selectdb.com >
2022-09-01 10:20:17 +08:00
90fb3b7783
[Improvement](load) accelerate tablet sink ( #12174 )
2022-09-01 10:08:09 +08:00
ec4863b63a
[feature-wip](new-scan)Add new file scan node ( #12048 )
...
Related pr: #11582
This is the new file scan node and scanner for external hms catalog.
2022-09-01 10:01:20 +08:00
65051d67cf
[fix](yearweek) fixed the yearweek result error when mode is set to 1 ( #12234 )
2022-09-01 09:46:38 +08:00
d7e02a9514
[fix](join)join reorder by mistake ( #12113 )
2022-09-01 09:46:01 +08:00
f3cb0c24ee
[enhancement](test) add restore action and s3 helper methond ( #12084 )
...
Co-authored-by: morrySnow <morrysnow@126.com >
Co-authored-by: SWJTU-ZhangLei <1091517373@qq.com >
2022-08-31 23:08:23 +08:00
a49bde8a71
[fix](Nereids)statistics calculator for Project and Aggregate lost some columns ( #12196 )
...
There are some bugs in Nereids' StatsCalculator.
1. Project: return child column stats directly, so its parents cannot find column stats from project's slot.
2. Aggregate: do not return column that is Alias, its parents cannot find some column stats from Aggregate's slot.
3. All: use SlotReference as key of column to stats map. So we need change SlotReference's equals and hashCode method to just using ExprId as we discussed.
2022-08-31 20:47:22 +08:00
57051d3591
[fix](Nereids)cast StringType to DateType failed when bind TimestampArithmetic function ( #12198 )
...
When bind TimestampArithmetic, we always want to cast left child to DateTimeType. But sometimes, we need to cast it to DateType, this PR fix this problem.
2022-08-31 19:52:03 +08:00
1a198b3777
[typo](docs) Fix Bitmap index and Type QUANTILE_STATE: fix description ( #12217 )
...
Fix Bitmap index and Type QUANTILE_STATE: fix description
2022-08-31 18:10:17 +08:00
1cc9eeeb1a
[feature-wip](parquet-reader) read and generate array column ( #12166 )
...
Read and generate parquet array column.
When D=1, R=0, representing an empty array. Empty array is not a null value, so the NullMap for this row is false,
the offset for this row is [offset_start, offset_end) whose `offset_start == offset_end`,
and offset_end is the start offset of the next row, so there is no value in the nested primitive column.
When D=0, R=0, representing a null array, and the NullMap for this row is true.
2022-08-31 17:08:12 +08:00
573e5476dd
[Opt](load) Speed up the vectorized load ( #12146 )
...
* [Opt](load) Speed up the vectorized load
2022-08-31 16:23:36 +08:00
90c5180370
[Bug](array-type) Fix bug in creating view from table with array types ( #12200 )
2022-08-31 14:36:31 +08:00
d7e032bc38
Modify the startup script and print the log without using the --daemon parameter. ( #12218 )
2022-08-31 14:36:14 +08:00
da4ffd3c56
[Enhancement](metric-type) more readable error message for only metric type #12162
...
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com >
2022-08-31 14:35:48 +08:00
c2f7e95c7f
[typo](docs)Add somm json format load doc ( #12213 )
...
* Specify JSON root document when adding JSON import, and add JSON file size limit parameter prompt
2022-08-31 14:34:55 +08:00
3cdd19821d
[fix](sort)the slot in sort node should be nullable if it's outer joined ( #12193 )
...
The sort node's output expr should be nullable if it is outer joined.
2022-08-31 14:34:14 +08:00
8999ba34ae
[improve](Nereids)unify all plan toString() function ( #12132 )
...
Add a Util function to generate uniform format plan toString for easy reading and debugging
2022-08-31 14:28:44 +08:00
8b98e2021e
[enhancement](array-type) Array type do not support compare with '=','>', '<', make the error message more readable ( #12181 )
...
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com >
2022-08-31 12:49:10 +08:00
254cb321b9
[optimize](remote) Optimize cache reader use a pre-created buffer when downloading the cache ( #12165 )
...
* optimize cache reader
* add description for config
* optimize cache reader
* optimize cache reader
2022-08-31 10:15:40 +08:00
f0cde35ea6
[performance improvement] Spark Load, SparkDpp processRDDAggregate performance improvement ( #12186 )
...
Co-authored-by: hourong <hourong@zhihu.com >
2022-08-31 09:14:13 +08:00
8251e7cbfc
[refactor](column) remove confused field ( #12187 )
2022-08-31 09:13:31 +08:00
f72d2559cf
[fix](compile) Fix compile error '<unknown>' may be used uninitialized in PODArray::insert_prepare #12202
2022-08-31 09:12:28 +08:00
f949262ddf
[fix](planner) a slot id is bounded on a wrong tuple id, if cross join has a hash join as child ( #12156 )
2022-08-31 09:07:55 +08:00