doris

Author	SHA1	Message	Date
morrySnow	698bae09b2	[fix](Nereids)get NPE and group not be optimized when add REWRITE rule to Cascades Optimzer (#12346 ) Fix some bugs when add REWRITE rule to Cascades Optimizer - all rule should set as not rewrite rule when use them in Cascades Optimizer - IMPLEMENT rule promise should large than others since we should do exploration first.	2022-09-05 19:11:48 +08:00
minghong	f466a072d8	fix bug: tpch-q12 invalid type (#12347 ) In old planner, Predicate set its type in analyzeImpl(). However, function analyzeImpl() is in old planner path, but not in nereids path. And hence the type is invalid. Because all predicate has type bool, we set its type in constructor.	2022-09-05 19:09:27 +08:00
Kikyou1997	dadfd85c40	prune for agg with constant expr (#12274 ) Currently, nereids doesn't support aggregate function with no slot reference in query, since all the column would be pruned, e.g. SELECT COUNT(1) FROM t; This PR reserve the column with the smallest amount of data when doing column prune under this situation. To be noticed, this PR ONLY handle aggregate functions. So projection with no slot reference need to be handled in future.	2022-09-05 19:09:00 +08:00
Adonis Ling	8bfb89c100	[feature-wip](array-type) Add some regression tests for nested array (#12322 ) #11392 made _input_block in each BetaRowsetReaders sharable. However, for some types (e.g. nested array with more than 1 depth), the _column_vector_batches in RowBlockV2 can be nested which means that there is a ColumnVectorBatch inside another ColumnVectorBatch. In this case, the data of inner ColumnVectorBatch may be corrupted because the data of _input_block is copied shallowly to the _output_block.	2022-09-05 14:05:24 +08:00
Gabriel	3b104e334a	[Bug](load) fix missing nullable info in stream load (#12302 )	2022-09-05 13:41:28 +08:00
Jerry Hu	7b352c93ff	[improvement](sink) avoid frequent allocation and deallocation when serializing block (#12310 )	2022-09-05 12:23:43 +08:00
morrySnow	2398cd3bb6	[enhancement](Nereids)print slot name in explain string (#12272 ) Currently, explain string print all expression as slot id, e.g. `<slot 1>`. This PR, print its name with slot id instead, e.g. `column_a[#1]`. For details: - print qualified table name for OlapScanNode - print NamedExpression name with SlotId instead of just SlotId - OlapScanNode's node name use "OlapScanNode" instead of table name	2022-09-05 11:31:35 +08:00
lsy3993	e5f3f0e730	[typo](docs) mix of SSD and HDD disks should specify the storage directory only (#12309 ) add notice of storage	2022-09-05 09:23:34 +08:00
jiafeng.zhang	74b6eaf44b	[typo](docs)Replace table link fix (#12317 )	2022-09-05 08:29:41 +08:00
TaoZex	7929500608	[typo](docs)The table_function calling reset() function should set _eos to false #12323	2022-09-05 08:29:19 +08:00
morrySnow	7f10fa9768	[fix](compile)compile error when use clang on aarch64 platform (#12319 )	2022-09-05 08:28:51 +08:00
Gabriel	d5e5afe437	[Bug](function) disable LUT for yearweek (#12324 )	2022-09-05 08:27:43 +08:00
catpineapple	ef37396b63	[fix](dbt)fix dbt incremental bug (#12280 )	2022-09-04 16:40:40 +08:00
jiafeng.zhang	81664fd78c	github workflow build docs check fix (#12318 ) github workflow build docs check fix	2022-09-03 21:32:43 +08:00
camby	90a0baf5f8	[fix](array-type) Forbid ARRAY<NOT_NULL(T)> temporarily (#12262 ) Currently, there are still lots of bugs related to ARRAY<NOT_NULL(T)>. We decide that we don't support ARRAY<NOT_NULL(T)> types at the first version and all elements in ARRAY are nullable. Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-09-03 14:26:08 +08:00
pangzhili	3a30e12ffb	update data-model, add `error_code` into DUPLICATE KEY (#12131 )	2022-09-03 14:23:29 +08:00
minghong	34dd67f804	[feature](nereids) add weekOfYear to support ssb-flat benchmark (#12207 ) support function WeekOfYear In current implementation, WeekOfYear can be used in where clause, but not in select clause.	2022-09-03 12:04:51 +08:00
xy720	62561834a8	[Feature](array-type) Support is-null-predicate for array type (#12237 )	2022-09-03 11:37:57 +08:00
xy720	e7303c12c7	[Enhancement](array-type) Support Floating/Decimal type for array aggregation functions (#12271 )	2022-09-03 09:55:56 +08:00
jiafeng.zhang	5d0b1868c2	[chore](docs)Add compile check for document format (#12300 ) Add compile check for document format Avoid document formatting issues that fail in the daily build release of the official website so that we can find problems and fix them in time to avoid repeated modifications Since the compiler for the website is now in the doris-website repo, we pull the code from this repo, delete the documentation inside, and copy the documentation from doris master to perform the compiler check	2022-09-03 09:44:20 +08:00
chaow	b154a1b45e	[doc] fix some docs issue (#11101 ) * fix some docs issue * add -y for apt-get Co-authored-by: chaow <941210239@qq.com>	2022-09-02 21:06:12 +08:00
Zhengguo Yang	c944496fb4	[chore](log) add cluster and tag message to exception (#12287 )	2022-09-02 20:46:39 +08:00
Stalary	0d33c713d1	[Bug](CTAS) Fix CTAS error for use agg column as first. (#12299 ) * FIX: ctas default use duplicate key.	2022-09-02 20:44:01 +08:00
lixiaobing-fabulous	1fd3490c56	remove duplicate "comments" (#12264 )	2022-09-02 18:57:10 +08:00
zhengshiJ	7f7a3a7524	[feature](nereids) Convert subqueries into algebraic expressions and … (#11454 ) 1.Convert subqueries to Apply nodes. 2.Convert ApplyNode to ordinary join. ### Detailed design: There are three types of current subexpressions, scalarSubquery, inSubquery, and Exists. The scalarSubquery refers to the returned data as 1 row and 1 column. Subquery replacement ``` before: scalarSubquery: filter(t1.a = scalarSubquery(output b)); inSubquery: filter(inSubquery); inSubquery = (t1.a in select *); exists: filter(exists); exists = (select ); end: scalarSubquery: filter(t1.a = b); inSubquery: filter(True); exists: filter(True); ``` Subquery Transformation Rules* ``` PushApplyUnderFilter * before: * Apply * / \ * Input(output:b) Filter(Correlated predicate/UnCorrelated predicate) * * after: * Filter(Correlated predicate) * \| * Apply * / \ * Input(output:b) Filter(UnCorrelated predicate) ``` ``` PushApplyUnderProject * before: * Apply * / \ * Input(output:b) Project(output:a) * * after: * Project(b,(if the Subquery is Scalar add 'a' as the output column)) * / \ * Input(output:b) Apply ``` ``` ApplyPullFilterOnAgg * before: * Apply * / \ * Input(output:b) agg(output:fn,c; group by:null) * \| * Filter(Correlated predicate(Input.e = this.f)/UnCorrelated predicate) * * end: * Apply(Correlated predicate(Input.e = this.f)) * / \ * Input(output:b) agg(output:fn,this.f; group by:this.f) * \| * Filter(UnCorrelated predicate) ``` ``` ApplyPullFilterOnProjectUnderAgg * before: * apply * / \ * Input(output:b) agg * \| * Project(output:a) * \| * Filter(correlated predicate(Input.e = this.f)/Unapply predicate) * \| * child * apply * / \ * Input(output:b) agg * \| * Filter(correlated predicate(Input.e = this.f)/Unapply predicate) * \| * Project(output:a,this.f, Unapply predicate(slots)) * \| * child ``` ``` ScalarToJoin * UnCorrelated -> CROSS_JOIN * Correlated -> LEFT_OUTER_JOIN ``` ``` InToJoin * Not In -> LEFT_ANTI_JOIN * In -> LEFT_SEMI_JOIN ``` ``` existsToJoin * Exists * Correlated -> LEFT_SEMI_JOIN * correlated LEFT_SEMI_JOIN(Correlated Predicate) * / \ --> / \ * input queryPlan input queryPlan * * UnCorrelated -> CROSS_JOIN(limit(1)) * uncorrelated CROSS_JOIN * / \ --> / \ * input queryPlan input limit(1) * \| * queryPlan * * Not Exists * Correlated -> LEFT_ANTI_JOIN * correlated LEFT_ANTI_JOIN(Correlated Predicate) * / \ --> / \ * input queryPlan input queryPlan * * UnCorrelated -> CROSS_JOIN(Count()) Filter(count() = 0) \| * apply Cross_Join * / \ --> / \ * input queryPlan input agg(output:count()) \| * limit(1) * \| * queryPlan ```	2022-09-02 17:34:19 +08:00
Mingyu Chen	08c5e0b1e3	[chore](deps) strip debug info of thirdparty dependencies (#12284 ) Strip debug info of most of thridparty dependencies' static lib. If can significantly reduce the size of thirdparty libs: 3.4G -> 1.6G And the doris_be binary size will be reduced: 1.5G -> 868M (clang build) And after compress, the BE binary is only 195M with debug info!	2022-09-02 15:43:29 +08:00
jiafeng.zhang	64302ff4c9	[typo](docs)Sidebar fix (#12297 ) * sidebar fix	2022-09-02 15:09:26 +08:00
Adonis Ling	81c5732dc7	[feature-wip](MTMV) Support creating materialized view for multiple tables (#11646 ) Support creating materialized view for multiple tables. Examples: mysql> CREATE TABLE t1 (pk INT, v1 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1'); mysql> CREATE TABLE t2 (pk INT, v2 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1'); mysql> CREATE MATERIALIZED VIEW mv BUILD IMMEDIATE REFRESH COMPLETE KEY (mv_pk) DISTRIBUTED BY HASH (mv_pk) PROPERTIES ('replication_num' = '1') AS SELECT t1.pk as mv_pk FROM t1, t2 WHERE t1.pk = t2.pk;	2022-09-02 14:51:56 +08:00
Pxl	a8c8ebf5cf	[Enhancement](compaction) empty string optimize for binary dict code (#12259 ) improve write empty string perfomance.	2022-09-02 14:25:19 +08:00
jiafeng.zhang	7a4173b497	[typo](docs)Fix admin copy table format (#12288 ) Fix admin copy table format	2022-09-02 14:08:56 +08:00
Ashin Gau	202ad5c659	[feature-wip](parquet-reader) bug fix, the number of rows are different among columns in a block (#12228 ) 1. `ExprContext` is delete in `ParquetReader::close()`, but it has not been closed, so the `DCHECH` in `~ExprContext()` is failed. the lifetime of `ExprContext` is managed by scan node, so we should not delete its pointer in `ParquetReader::close()`. 2. `RowGroupReader::next_batch` will update `_read_rows` in every column loop, and does not ensure the number of rows in every column are equal. 3. The skipped row ranges are variables in stack, which are released when calling `ArrayColumnReader::read_column_data`, so we should copy them out.	2022-09-02 09:50:25 +08:00
catpineapple	3ce6bb548d	doc_stream_load_format (#12144 ) doc_stream_load_format	2022-09-02 09:22:10 +08:00
Luzhijing	10c3e683dd	[docs]update users numbers (#12248 ) update users numbers	2022-09-02 09:21:36 +08:00
zhou zhuohan	1c91257c01	📝 fix create table doc typo (#12269 ) fix create table doc typo	2022-09-02 09:20:46 +08:00
Xujian Duan	061b49b7bf	[doc](website) update SHOW-PROC doc (#12229 )	2022-09-01 19:50:25 +08:00
zy-kkk	58c1d6ce9d	[typo](docs)Modify the maximum handle parameter reference #12244	2022-09-01 19:50:07 +08:00
morrySnow	87086ffe31	[enhancment](Nereids)enable normalize aggregate rule (#12194 ) enable normalize aggregate rule introduced by #12013	2022-09-01 19:20:37 +08:00
Mingyu Chen	3ce305134a	[fix](scan) fix potential wrong cancel when sql has limit (#12224 )	2022-09-01 19:11:40 +08:00
starocean999	f8eb480bec	[fix](emptynode)fix empty node bug in vec engine (#12258 ) * [fix](emptynode)fix empty node bug in vec engine * update fe ut	2022-09-01 18:52:10 +08:00
Henry2SS	ad8e2f4749	[fix](rpc) fix that coordinator rpc timeout too large may make show load blocked for long time (#12152 ) Co-authored-by: wuhangze <wuhangze@jd.com>	2022-09-01 18:05:37 +08:00
morrySnow	068e60145e	[enhancement](Nereids)ban groupPlan() pattern to avoid misuse (#12250 ) `groupPlan()` pattern means to find a `GroupPlan` in memo. Since we have no `GroupPlan` in memo, it is always return nothing. When we want write a pattern to match any GROUP, we should use `group()`. But pattern `groupPlan` is very confusing, and easy misuse. So, this PR ban `groupPlan()` pattern ti avoid misuse.	2022-09-01 14:37:48 +08:00
Gabriel	3bcab8bbef	[feature](function) support now/current_timestamp functions with precision (#12219 ) * [feature](function) support now/current_timestamp functions with precision	2022-09-01 14:35:12 +08:00
pengxiangyu	c5481dfdf7	[fix](remote)Fix bug for Segment::open() in case: config::file_cache_type (#12249 ) * fix bug for Segment::open() in case: config::file_cache_type * fix bug for Segment::open() in case: config::file_cache_type	2022-09-01 14:16:41 +08:00
catpineapple	df51c78593	[fix](dbt)fix dbt run abnormal #12242	2022-09-01 12:10:48 +08:00
TengJianPing	f294d33332	[bugfix](index) index page should not be bitshuffle decoded (#12231 ) * [bugfix](index) index page should not be bitshuffle decoded * minor change	2022-09-01 11:56:44 +08:00
camby	fc05d54f0d	[fix](array-type) array_sort function with empty input #12175 Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-09-01 10:54:09 +08:00
HappenLee	8c8078ad28	[fix](projections) get error row_descriptor when have projections on ExecNode (#12232 ) When ExecNode's projections is not empty, it use output row descriptor to initialize the block before doing projection. But we should use original row descriptor. This PR fix it.	2022-09-01 10:48:10 +08:00
yixiutt	60a2fa7dea	[Improvement](compaction) copy row in batch in VCollectIterator&VGenericIterator (#12214 ) In VCollectIterator&VGenericIterator, use insert_range_from to copy rows in a block which is continuous to save cpu cost. If rows in rowset and segment are non overlapping, this whill improve 30% throughput of compaction.If rows are completely overlapping such as load two same files, the throughput goes nearly same as before. Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-09-01 10:20:17 +08:00
Gabriel	90fb3b7783	[Improvement](load) accelerate tablet sink (#12174 )	2022-09-01 10:08:09 +08:00
Jibing-Li	ec4863b63a	[feature-wip](new-scan)Add new file scan node (#12048 ) Related pr: #11582 This is the new file scan node and scanner for external hms catalog.	2022-09-01 10:01:20 +08:00

1 2 3 4 5 ...

6141 Commits