doris

Author	SHA1	Message	Date
chaow	b154a1b45e	[doc] fix some docs issue (#11101 ) * fix some docs issue * add -y for apt-get Co-authored-by: chaow <941210239@qq.com>	2022-09-02 21:06:12 +08:00
Zhengguo Yang	c944496fb4	[chore](log) add cluster and tag message to exception (#12287 )	2022-09-02 20:46:39 +08:00
Stalary	0d33c713d1	[Bug](CTAS) Fix CTAS error for use agg column as first. (#12299 ) * FIX: ctas default use duplicate key.	2022-09-02 20:44:01 +08:00
lixiaobing-fabulous	1fd3490c56	remove duplicate "comments" (#12264 )	2022-09-02 18:57:10 +08:00
zhengshiJ	7f7a3a7524	[feature](nereids) Convert subqueries into algebraic expressions and … (#11454 ) 1.Convert subqueries to Apply nodes. 2.Convert ApplyNode to ordinary join. ### Detailed design: There are three types of current subexpressions, scalarSubquery, inSubquery, and Exists. The scalarSubquery refers to the returned data as 1 row and 1 column. Subquery replacement ``` before: scalarSubquery: filter(t1.a = scalarSubquery(output b)); inSubquery: filter(inSubquery); inSubquery = (t1.a in select *); exists: filter(exists); exists = (select ); end: scalarSubquery: filter(t1.a = b); inSubquery: filter(True); exists: filter(True); ``` Subquery Transformation Rules* ``` PushApplyUnderFilter * before: * Apply * / \ * Input(output:b) Filter(Correlated predicate/UnCorrelated predicate) * * after: * Filter(Correlated predicate) * \| * Apply * / \ * Input(output:b) Filter(UnCorrelated predicate) ``` ``` PushApplyUnderProject * before: * Apply * / \ * Input(output:b) Project(output:a) * * after: * Project(b,(if the Subquery is Scalar add 'a' as the output column)) * / \ * Input(output:b) Apply ``` ``` ApplyPullFilterOnAgg * before: * Apply * / \ * Input(output:b) agg(output:fn,c; group by:null) * \| * Filter(Correlated predicate(Input.e = this.f)/UnCorrelated predicate) * * end: * Apply(Correlated predicate(Input.e = this.f)) * / \ * Input(output:b) agg(output:fn,this.f; group by:this.f) * \| * Filter(UnCorrelated predicate) ``` ``` ApplyPullFilterOnProjectUnderAgg * before: * apply * / \ * Input(output:b) agg * \| * Project(output:a) * \| * Filter(correlated predicate(Input.e = this.f)/Unapply predicate) * \| * child * apply * / \ * Input(output:b) agg * \| * Filter(correlated predicate(Input.e = this.f)/Unapply predicate) * \| * Project(output:a,this.f, Unapply predicate(slots)) * \| * child ``` ``` ScalarToJoin * UnCorrelated -> CROSS_JOIN * Correlated -> LEFT_OUTER_JOIN ``` ``` InToJoin * Not In -> LEFT_ANTI_JOIN * In -> LEFT_SEMI_JOIN ``` ``` existsToJoin * Exists * Correlated -> LEFT_SEMI_JOIN * correlated LEFT_SEMI_JOIN(Correlated Predicate) * / \ --> / \ * input queryPlan input queryPlan * * UnCorrelated -> CROSS_JOIN(limit(1)) * uncorrelated CROSS_JOIN * / \ --> / \ * input queryPlan input limit(1) * \| * queryPlan * * Not Exists * Correlated -> LEFT_ANTI_JOIN * correlated LEFT_ANTI_JOIN(Correlated Predicate) * / \ --> / \ * input queryPlan input queryPlan * * UnCorrelated -> CROSS_JOIN(Count()) Filter(count() = 0) \| * apply Cross_Join * / \ --> / \ * input queryPlan input agg(output:count()) \| * limit(1) * \| * queryPlan ```	2022-09-02 17:34:19 +08:00
Mingyu Chen	08c5e0b1e3	[chore](deps) strip debug info of thirdparty dependencies (#12284 ) Strip debug info of most of thridparty dependencies' static lib. If can significantly reduce the size of thirdparty libs: 3.4G -> 1.6G And the doris_be binary size will be reduced: 1.5G -> 868M (clang build) And after compress, the BE binary is only 195M with debug info!	2022-09-02 15:43:29 +08:00
jiafeng.zhang	64302ff4c9	[typo](docs)Sidebar fix (#12297 ) * sidebar fix	2022-09-02 15:09:26 +08:00
Adonis Ling	81c5732dc7	[feature-wip](MTMV) Support creating materialized view for multiple tables (#11646 ) Support creating materialized view for multiple tables. Examples: mysql> CREATE TABLE t1 (pk INT, v1 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1'); mysql> CREATE TABLE t2 (pk INT, v2 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1'); mysql> CREATE MATERIALIZED VIEW mv BUILD IMMEDIATE REFRESH COMPLETE KEY (mv_pk) DISTRIBUTED BY HASH (mv_pk) PROPERTIES ('replication_num' = '1') AS SELECT t1.pk as mv_pk FROM t1, t2 WHERE t1.pk = t2.pk;	2022-09-02 14:51:56 +08:00
Pxl	a8c8ebf5cf	[Enhancement](compaction) empty string optimize for binary dict code (#12259 ) improve write empty string perfomance.	2022-09-02 14:25:19 +08:00
jiafeng.zhang	7a4173b497	[typo](docs)Fix admin copy table format (#12288 ) Fix admin copy table format	2022-09-02 14:08:56 +08:00
Ashin Gau	202ad5c659	[feature-wip](parquet-reader) bug fix, the number of rows are different among columns in a block (#12228 ) 1. `ExprContext` is delete in `ParquetReader::close()`, but it has not been closed, so the `DCHECH` in `~ExprContext()` is failed. the lifetime of `ExprContext` is managed by scan node, so we should not delete its pointer in `ParquetReader::close()`. 2. `RowGroupReader::next_batch` will update `_read_rows` in every column loop, and does not ensure the number of rows in every column are equal. 3. The skipped row ranges are variables in stack, which are released when calling `ArrayColumnReader::read_column_data`, so we should copy them out.	2022-09-02 09:50:25 +08:00
catpineapple	3ce6bb548d	doc_stream_load_format (#12144 ) doc_stream_load_format	2022-09-02 09:22:10 +08:00
Luzhijing	10c3e683dd	[docs]update users numbers (#12248 ) update users numbers	2022-09-02 09:21:36 +08:00
zhou zhuohan	1c91257c01	📝 fix create table doc typo (#12269 ) fix create table doc typo	2022-09-02 09:20:46 +08:00
Xujian Duan	061b49b7bf	[doc](website) update SHOW-PROC doc (#12229 )	2022-09-01 19:50:25 +08:00
zy-kkk	58c1d6ce9d	[typo](docs)Modify the maximum handle parameter reference #12244	2022-09-01 19:50:07 +08:00
morrySnow	87086ffe31	[enhancment](Nereids)enable normalize aggregate rule (#12194 ) enable normalize aggregate rule introduced by #12013	2022-09-01 19:20:37 +08:00
Mingyu Chen	3ce305134a	[fix](scan) fix potential wrong cancel when sql has limit (#12224 )	2022-09-01 19:11:40 +08:00
starocean999	f8eb480bec	[fix](emptynode)fix empty node bug in vec engine (#12258 ) * [fix](emptynode)fix empty node bug in vec engine * update fe ut	2022-09-01 18:52:10 +08:00
Henry2SS	ad8e2f4749	[fix](rpc) fix that coordinator rpc timeout too large may make show load blocked for long time (#12152 ) Co-authored-by: wuhangze <wuhangze@jd.com>	2022-09-01 18:05:37 +08:00
morrySnow	068e60145e	[enhancement](Nereids)ban groupPlan() pattern to avoid misuse (#12250 ) `groupPlan()` pattern means to find a `GroupPlan` in memo. Since we have no `GroupPlan` in memo, it is always return nothing. When we want write a pattern to match any GROUP, we should use `group()`. But pattern `groupPlan` is very confusing, and easy misuse. So, this PR ban `groupPlan()` pattern ti avoid misuse.	2022-09-01 14:37:48 +08:00
Gabriel	3bcab8bbef	[feature](function) support now/current_timestamp functions with precision (#12219 ) * [feature](function) support now/current_timestamp functions with precision	2022-09-01 14:35:12 +08:00
pengxiangyu	c5481dfdf7	[fix](remote)Fix bug for Segment::open() in case: config::file_cache_type (#12249 ) * fix bug for Segment::open() in case: config::file_cache_type * fix bug for Segment::open() in case: config::file_cache_type	2022-09-01 14:16:41 +08:00
catpineapple	df51c78593	[fix](dbt)fix dbt run abnormal #12242	2022-09-01 12:10:48 +08:00
TengJianPing	f294d33332	[bugfix](index) index page should not be bitshuffle decoded (#12231 ) * [bugfix](index) index page should not be bitshuffle decoded * minor change	2022-09-01 11:56:44 +08:00
camby	fc05d54f0d	[fix](array-type) array_sort function with empty input #12175 Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-09-01 10:54:09 +08:00
HappenLee	8c8078ad28	[fix](projections) get error row_descriptor when have projections on ExecNode (#12232 ) When ExecNode's projections is not empty, it use output row descriptor to initialize the block before doing projection. But we should use original row descriptor. This PR fix it.	2022-09-01 10:48:10 +08:00
yixiutt	60a2fa7dea	[Improvement](compaction) copy row in batch in VCollectIterator&VGenericIterator (#12214 ) In VCollectIterator&VGenericIterator, use insert_range_from to copy rows in a block which is continuous to save cpu cost. If rows in rowset and segment are non overlapping, this whill improve 30% throughput of compaction.If rows are completely overlapping such as load two same files, the throughput goes nearly same as before. Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-09-01 10:20:17 +08:00
Gabriel	90fb3b7783	[Improvement](load) accelerate tablet sink (#12174 )	2022-09-01 10:08:09 +08:00
Jibing-Li	ec4863b63a	[feature-wip](new-scan)Add new file scan node (#12048 ) Related pr: #11582 This is the new file scan node and scanner for external hms catalog.	2022-09-01 10:01:20 +08:00
luozenglin	65051d67cf	[fix](yearweek) fixed the yearweek result error when mode is set to 1 (#12234 )	2022-09-01 09:46:38 +08:00
starocean999	d7e02a9514	[fix](join)join reorder by mistake (#12113 )	2022-09-01 09:46:01 +08:00
Yongqiang YANG	f3cb0c24ee	[enhancement](test) add restore action and s3 helper methond (#12084 ) Co-authored-by: morrySnow <morrysnow@126.com> Co-authored-by: SWJTU-ZhangLei <1091517373@qq.com>	2022-08-31 23:08:23 +08:00
morrySnow	a49bde8a71	[fix](Nereids)statistics calculator for Project and Aggregate lost some columns (#12196 ) There are some bugs in Nereids' StatsCalculator. 1. Project: return child column stats directly, so its parents cannot find column stats from project's slot. 2. Aggregate: do not return column that is Alias, its parents cannot find some column stats from Aggregate's slot. 3. All: use SlotReference as key of column to stats map. So we need change SlotReference's equals and hashCode method to just using ExprId as we discussed.	2022-08-31 20:47:22 +08:00
morrySnow	57051d3591	[fix](Nereids)cast StringType to DateType failed when bind TimestampArithmetic function (#12198 ) When bind TimestampArithmetic, we always want to cast left child to DateTimeType. But sometimes, we need to cast it to DateType, this PR fix this problem.	2022-08-31 19:52:03 +08:00
siriume	1a198b3777	[typo](docs) Fix Bitmap index and Type QUANTILE_STATE: fix description (#12217 ) Fix Bitmap index and Type QUANTILE_STATE: fix description	2022-08-31 18:10:17 +08:00
Ashin Gau	1cc9eeeb1a	[feature-wip](parquet-reader) read and generate array column (#12166 ) Read and generate parquet array column. When D=1, R=0, representing an empty array. Empty array is not a null value, so the NullMap for this row is false, the offset for this row is [offset_start, offset_end) whose `offset_start == offset_end`, and offset_end is the start offset of the next row, so there is no value in the nested primitive column. When D=0, R=0, representing a null array, and the NullMap for this row is true.	2022-08-31 17:08:12 +08:00
HappenLee	573e5476dd	[Opt](load) Speed up the vectorized load (#12146 ) * [Opt](load) Speed up the vectorized load	2022-08-31 16:23:36 +08:00
xy720	90c5180370	[Bug](array-type) Fix bug in creating view from table with array types (#12200 )	2022-08-31 14:36:31 +08:00
Yankee24	d7e032bc38	Modify the startup script and print the log without using the --daemon parameter. (#12218 )	2022-08-31 14:36:14 +08:00
camby	da4ffd3c56	[Enhancement](metric-type) more readable error message for only metric type #12162 Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-08-31 14:35:48 +08:00
zy-kkk	c2f7e95c7f	[typo](docs)Add somm json format load doc (#12213 ) * Specify JSON root document when adding JSON import, and add JSON file size limit parameter prompt	2022-08-31 14:34:55 +08:00
starocean999	3cdd19821d	[fix](sort)the slot in sort node should be nullable if it's outer joined (#12193 ) The sort node's output expr should be nullable if it is outer joined.	2022-08-31 14:34:14 +08:00
jakevin	8999ba34ae	[improve](Nereids)unify all plan toString() function (#12132 ) Add a Util function to generate uniform format plan toString for easy reading and debugging	2022-08-31 14:28:44 +08:00
camby	8b98e2021e	[enhancement](array-type) Array type do not support compare with '=','>', '<', make the error message more readable (#12181 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-08-31 12:49:10 +08:00
zxealous	254cb321b9	[optimize](remote) Optimize cache reader use a pre-created buffer when downloading the cache (#12165 ) * optimize cache reader * add description for config * optimize cache reader * optimize cache reader	2022-08-31 10:15:40 +08:00
HouRong	f0cde35ea6	[performance improvement] Spark Load, SparkDpp processRDDAggregate performance improvement (#12186 ) Co-authored-by: hourong <hourong@zhihu.com>	2022-08-31 09:14:13 +08:00
Mingyu Chen	8251e7cbfc	[refactor](column) remove confused field (#12187 )	2022-08-31 09:13:31 +08:00
Xinyi Zou	f72d2559cf	[fix](compile) Fix compile error '<unknown>' may be used uninitialized in PODArray::insert_prepare #12202	2022-08-31 09:12:28 +08:00
minghong	f949262ddf	[fix](planner) a slot id is bounded on a wrong tuple id, if cross join has a hash join as child (#12156 )	2022-08-31 09:07:55 +08:00

... 151 152 153 154 155 ...

13721 Commits