doris

Author	SHA1	Message	Date
Jerry Hu	b7c9007776	[improvement][agg]Process aggregated results in the vectorized way (#11084 )	2022-07-22 22:04:43 +08:00
Mingyu Chen	7e3fc0d321	[enhancement](vec) Support outer join for vectorized exec engine (#11068 ) Hash join node adds three new attributes. The following will take an SQL as an example to illustrate the meaning of these three attributes ``` select t1. a from t1 left join t2 on t1. a=t2. b; ``` 1. vOutputTupleDesc：Tuple2(a'') 2. vIntermediateTupleDescList: Tuple1(a', b'<nullable>) 2. vSrcToOutputSMap: <Tuple1(a'), Tuple2(a'')> The slot in intermediatetuple corresponds to the slot in output tuple one by one through the expr calculation of the left child in vsrctooutputsmap. This code mainly merges the contents of two PRs: 1. [fix](vectorized) Support outer join for vectorized exec engine (https://github.com/apache/doris/pull/10323) 2. [Fix](Join) Fix the bug of outer join function under vectorization #9954 The following is the specific description of the first PR In a vectorized scenario, the query plan will generate a new tuple for the join node. This tuple mainly describes the output schema of the join node. Adding this tuple mainly solves the problem that the input schema of the join node is different from the output schema. For example: 1. The case where the null side column caused by outer join is converted to nullable. 2. The projection of the outer tuple. The following is the specific description of the second PR This pr mainly fixes the following problems: 1. Solve the query combined with inline view and outer join. After adding a tuple to the join operator, the position of the `tupleisnull` function is inconsistent with the row storage. Currently the vectorized `tupleisnull` will be calculated in the HashJoinNode.computeOutputTuple() function. 2. Column nullable property error problem. At present, once the outer join occurs, the column on the null-side side will be planned to be nullable in the semantic parsing stage. For example： ``` select * from (select a as k1 from test) tmp right join b on tmp.k1=b.k1 ``` At this time, the nullable property of column k1 in the `tmp` inline view should be true. In the vectorized code, the virtual `tableRef` of tmp will be used in constructing the output tuple of HashJoinNode (specifically, the function HashJoinNode.computeOutputTuple()). So the correctness of the column nullable property of this tableRef is very important. In the above case, since the tmp table needs to perform a right join with the b table, as a null-side tmp side, it is necessary to change the column attributes involved in the tmp table to nullable. In non-vectorized code, since the virtual tableRef tmp is not used at all, it uses the `TupleIsNull` function in `outputsmp` to ensure data correctness. That is to say, the a column of the original table test is still non-null, and it does not affect the correctness of the result. The vectorized nullable attribute requirements are very strict. Outer join will change the nullable attribute of the join column, thereby changing the nullable attribute of the column in the upper operator layer by layer. Since FE has no mechanism to modify the nullable attribute in the upper operator tuple layer by layer after the analyzer. So at present, we can only preset the attributes before the lower join as nullable in the analyzer stage in advance, so as to avoid the problem. (At the same time, be also wrote some evasive code in order to deal with the problem of null to non-null.) Co-authored-by: EmmyMiao87 Co-authored-by: HappenLee Co-authored-by: morrySnow Co-authored-by: EmmyMiao87 <522274284@qq.com>	2022-07-21 23:39:25 +08:00
Xinyi Zou	4960043f5e	[enhancement] Refactor to improve the usability of MemTracker (step2) (#10823 )	2022-07-21 17:11:28 +08:00
Jet He	f6cb7a838b	[Optimize] Improve performance like/not like filter through pushdown function to storage engine (#10355 ) * support like/not like conjuncts push down to storage engine * vectorized engine support like/not like conjuncts push down to storage engine * support both evaluate and evaluate_vec method in like predicate * reuse remove_pushed_conjuncts and prevent logic error during move function conjuncts * change #ifndef to pragma once as per comments * change enable_function_pushdown default to false Co-authored-by: heguangnan <heguangnan@bytedance.com>	2022-07-19 08:33:04 +08:00
Pxl	afc1d0c05c	[Chore][Compile] fix compile fail on clang (#10837 ) fix compile fail on clang because of output int128	2022-07-18 19:21:01 +08:00
plat1ko	3bc6655069	[refactor] remove BlockManager (#10913 ) * remove BlockManager * remove deprecated field in tablet meta	2022-07-17 14:10:06 +08:00
Jerry Hu	d245ab76cc	[improvement]Use uint32 instead of size_t to reduce agg key's length (#10832 )	2022-07-14 14:11:55 +08:00
Gabriel	3b46242483	[feature-wip] Optimize Decimal type (#10794 ) * [feature-wip](decimalv3) support decimalv3 * [feature-wip] Optimize Decimal type Co-authored-by: liaoxin <liaoxinbit@126.com>	2022-07-14 10:50:50 +08:00
Jerry Hu	277a7dd97e	[bugfix]ColumnDecimal missed some interfaces about pre-serialization (#10751 )	2022-07-11 14:00:58 +08:00
Jerry Hu	e293fbd277	[improvement]pre-serialize aggregation keys (#10700 )	2022-07-09 06:21:56 +08:00
Pxl	0b251481d5	[Enhancement][Storage] refactor Comparison Predicates (#10380 )	2022-07-04 09:22:27 +08:00
Pxl	a9d23ce337	[refactor] remove collator (#10518 )	2022-07-01 10:35:32 +08:00
Jerry Hu	18ad8ebfbb	[improvement]Add reading by rowids to speed up lazy materialization (#10506 )	2022-06-30 21:03:41 +08:00
Gabriel	ca94867b4e	[Feature-wip] add date v2 type (#9916 )	2022-06-26 16:07:56 +08:00
Gabriel	eebfbd0c91	Revert "[fix](vectorized) Support outer join for vectorized exec engine (#10323 )" (#10424 ) This reverts commit 2cc670dba697a330358ae7d485d856e4b457c679.	2022-06-25 22:18:08 +08:00
HappenLee	2cc670dba6	[fix](vectorized) Support outer join for vectorized exec engine (#10323 ) In a vectorized scenario, the query plan will generate a new tuple for the join node. This tuple mainly describes the output schema of the join node. Adding this tuple mainly solves the problem that the input schema of the join node is different from the output schema. For example: 1. The case where the null side column caused by outer join is converted to nullable. 2. The projection of the outer tuple.	2022-06-24 08:59:30 +08:00
carlvinhust2012	1541dcd919	fix some typo in comments (#10374 )	2022-06-24 07:20:08 +08:00
wangbo	d73f170eeb	[optimize](storage)optimize date in storage layer (#8967 ) * opt date in storage * code style Co-authored-by: Wang Bo <wangbo36@meituan.com>	2022-06-23 12:29:10 +08:00
camby	0e404edf54	[improvement] Change array offset type from UInt32 to UInt64 (#10070 ) Now column `Array<T>` contains column `offsets` and `data`, and type of column `offsets` is UInt32 now. If we call array_union to merge arrays repeatedly, the size of array may overflow. So we need to extend it before `Array Data Type` release.	2022-06-19 10:24:08 +08:00
Adonis Ling	5e47b03595	[feature-wip](array-type) Add array aggregation functions (#10108 )	2022-06-17 11:07:49 +08:00
Pxl	ae9c231925	[Enhancement][Storage] refactor InListPredicate/NotInListPredicate (#10139 ) * refactor in_list_pred * update	2022-06-16 18:09:29 +08:00
Pxl	5805f8077f	[Feature] [Vectorized] Some pre-refactorings or interface additions for schema change part2 (#10003 )	2022-06-16 10:50:08 +08:00
Zhengguo Yang	39a2785ce2	[enhancement] support simd instructions on arm cpus through sse2neon (#10068 ) * [enhancement] support simd instructions on arm cpus through sse2neon	2022-06-14 09:17:09 +08:00
minghong	9c52b4a508	[enhance] improve dict in-predicate evaluate (#10009 )	2022-06-09 00:25:30 +08:00
minghong	f3193c5ea3	[improvement]opt column_dictinary range filter (#9881 ) * opt column_dictinary range filter * fomart	2022-05-31 22:30:05 +08:00
Luwei	af2cfa2db4	[fix] Fix bug of bloom filter hash value calculation error (#9802 ) * Fix bug of bloom filter hash value calculation error * fix code style	2022-05-27 20:44:26 +08:00
Pxl	13c1d20426	[Bug] [Vectorized] add padding when load char type data (#9734 )	2022-05-26 16:51:01 +08:00
camby	2725127421	[fix] group by with two NULL rows after left join (#9688 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-05-25 16:43:55 +08:00
ZenoYang	bdaf0b3fcc	[fix](storage) low_cardinality_optimize core dump when is null predicate (#9586 ) Issue Number: close #9555 Make the last value of the dictionary null, when ColumnDict inserts a null value, add the encoding corresponding to the last value of the dictionary·	2022-05-18 14:57:13 +08:00
Gabriel	4312ef93d7	[Improvement] reduce string size in serialization (#9550 )	2022-05-17 22:38:34 +08:00
camby	650e3a6ba0	[feature-wip](array-type) array_contains support more nested data types (#9170 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-05-13 12:42:40 +08:00
wangbo	a0b95d8fcb	[fix](storage) fix core for string predicate in storage layer (#9500 ) Co-authored-by: Wang Bo <wangbo36@meituan.com>	2022-05-12 15:41:39 +08:00
Adonis Ling	718a51a388	[refactor][style] Use clang-format to sort includes (#9483 )	2022-05-10 21:25:35 +08:00
chenlinzhong	c9961c9bb9	[style] clang-format all c++ code (#9305 ) - sh build-support/clang-format.sh to clang-format all c++ code	2022-04-29 16:14:22 +08:00
wangbo	48222f1fb0	[fix](storage)bloom filter support ColumnDict (#9167 ) bloom filter support ColumnDict(#9167)	2022-04-28 20:03:26 +08:00
camby	a2edc6fd8b	[feature-wip](array-type) replicate impl for ColumnArray to support join with array column (#9070 ) SQL with JOIN and columns ARRAY, will call function ColumnArray::replicate. At this pr, we implement replicate for ARRAY type, to support SQL like this: `SELECT count(lo_array),count(d_array),SUM(lo_extendedprice*lo_discount) AS REVENUE FROM lineorder, date WHERE lo_orderdate = d_datekey AND d_year = 1993 AND lo_discount BETWEEN 1 AND 3 AND lo_quantity < 25;`	2022-04-20 14:50:34 +08:00
Pxl	681f960257	[fix](storage)(vectorized) query get wrong result when read datetime type column (#8872 )	2022-04-18 19:34:06 +08:00
camby	52d18aa83c	permute impl for column array; and codes format (#8949 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-04-13 09:47:54 +08:00
zbtzbtzbt	6ed59bb98b	[refactor](code_style) remove useless inline #8933 1.Member functions defined in a class are inline by default (implicitly), and do not need to be added 2.inline is a keyword used for implementation, which has no effect when placed before the function declaration	2022-04-10 18:29:55 +08:00
ZenoYang	ca4055244e	[fix](storage) Fix core bug of convert to predicate column (#8833 ) recurrent: When `enable_low_cardinality_optimize = true`, for the TPCH dataset, using the following SQL query will Core ```sql select count(*) from lineitem where l_comment = 'ously even exc'; ``` This SQL will trigger the execution of `ColumnDictionary::convert_to_predicate_column_if_dictionary`, and `res->reserve(_codes.size())` is problematic because the current `_codes.size()` is smaller than its reserve value, so inserting a value into `PredicateColumn` will Core.	2022-04-07 11:29:26 +08:00
ZenoYang	586bec79f5	[fix](storage) Fix query result error due to find code by bound (#8787 ) Problem recurrence SSB single table `lineorder_flat`, the query SQL is as follows: ```sql SELECT sum(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS year, P_BRAND FROM lineorder_flat WHERE P_BRAND >= 'MFGR#22211111' AND P_BRAND <= 'MFGR#22281111' AND S_REGION = 'ASIA' and (LO_ORDERDATE DIV 10000) = 1992 GROUP BY year, P_BRAND ORDER BY year, P_BRAND; ``` when `enable_low_cardinality_optimize=false`, query result： ```sql +-------------------+------+-----------+ \| sum(`LO_REVENUE`) \| year \| P_BRAND \| +-------------------+------+-----------+ \| 65423264312 \| 1992 \| MFGR#2222 \| \| 66936772687 \| 1992 \| MFGR#2223 \| \| 64047191934 \| 1992 \| MFGR#2224 \| \| 65744559138 \| 1992 \| MFGR#2225 \| \| 66993045668 \| 1992 \| MFGR#2226 \| \| 67411226147 \| 1992 \| MFGR#2227 \| \| 69390885970 \| 1992 \| MFGR#2228 \| +-------------------+------+-----------+ ``` when `enable_low_cardinality_optimize=true`, query result： ```sql +-------------------+------+-----------+ \| sum(`LO_REVENUE`) \| year \| P_BRAND \| +-------------------+------+-----------+ \| 66936772687 \| 1992 \| MFGR#2223 \| \| 64047191934 \| 1992 \| MFGR#2224 \| \| 65744559138 \| 1992 \| MFGR#2225 \| \| 66993045668 \| 1992 \| MFGR#2226 \| \| 67411226147 \| 1992 \| MFGR#2227 \| \| 69390885970 \| 1992 \| MFGR#2228 \| +-------------------+------+-----------+ ``` One line less than the correct result. The reason is that 'MFGR#22211111' is not in the dictionary, so get the boundary code (`find_code_by_bound` method), but there is a bug here.	2022-04-03 10:38:14 +08:00
HappenLee	71ac86b183	[improvement](join) Support join project in query engine (#8722 )	2022-03-31 23:00:07 +08:00
ZenoYang	3724f94728	[refactor][optimize](storage) Code optimization and refactoring for low-cardinality columns in storage layer (#8627 ) * Optimize predicate calculation and refactor	2022-03-29 19:11:54 +08:00
Pxl	7fc22c2456	[fix][vectorized] fix core on get_predicate_column_ptr && fix double copy on _read_columns_by_rowids (#8581 )	2022-03-24 09:12:42 +08:00
Adonis Ling	2580da4f72	[feature-wip](array-type) Support insertion for vectorized engine. (#8494 ) (#8590 ) Please refer to #8493	2022-03-22 15:48:13 +08:00
camby	a498463ab5	[feature-wip](array-type)support select ARRAY data type on vectorized engine (#8217 ) (#8584 ) Usage Example: 1. create table for test; ``` `CREATE TABLE `array_test` ( `k1` tinyint(4) NOT NULL COMMENT "", `k2` smallint(6) NULL COMMENT "", `k3` ARRAY<int(11)> NULL COMMENT "" ) ENGINE=OLAP DUPLICATE KEY(`k1`) COMMENT "OLAP" DISTRIBUTED BY HASH(`k1`) BUCKETS 5 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "in_memory" = "false", "storage_format" = "V2" );` ``` 2. insert some data ``` `insert into array_test values(1, 2, [1, 2]);` `insert into array_test values(2, 3, null);` `insert into array_test values(3, null, null);` `insert into array_test values(4, null, []);` ``` 3. open vectorized `set enable_vectorized_engine=true;` 4. query array data `select * from array_test;` +------+------+--------+ \| k1 \| k2 \| k3 \| +------+------+--------+ \| 4 \| NULL \| [] \| \| 2 \| 3 \| NULL \| \| 1 \| 2 \| [1, 2] \| \| 3 \| NULL \| NULL \| +------+------+--------+ 4 rows in set (0.061 sec) Code Changes include： 1. add column_array, data_type_array codes; 2. codes about data_type creation by Field, TabletColumn, TypeDescriptor, PColumnMeta move to DataTypeFactory; 3. support create data_type for ARRAY date type; 4. RowBlockV2::convert_to_vec_block support ARRAY date type; 5. VMysqlResultWriter::append_block support ARRAY date type; 6. vectorized::Block serialize and deserialize support ARRAY date type;	2022-03-22 15:21:44 +08:00
Zhengguo Yang	7c1c2b1d17	[chore] fix compile error when use clang as compiler and a be ut problem (#8554 )	2022-03-21 15:38:59 +08:00
ZenoYang	2ec0b81030	[improvement](storage) Low cardinality string optimization in storage layer (#8318 ) Low cardinality string optimization in storage layer	2022-03-20 23:04:25 +08:00
wangbo	b8e6c3a00c	[fix] fix bitmap wrong result (#8478 ) Fix a bug when query bitmap return wrong result, even the simplest query. Such as ``` CREATE TABLE `pv_bitmap_fix2` ( `dt` int(11) NULL COMMENT "", `page` varchar(10) NULL COMMENT "", `user_id_bitmap` bitmap BITMAP_UNION NULL COMMENT "" ) ENGINE=OLAP AGGREGATE KEY(`dt`, `page`) COMMENT "OLAP" DISTRIBUTED BY HASH(`dt`) BUCKETS 2 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "in_memory" = "false", "storage_format" = "V2" ) Insert any hundreds of rows of data select count(distinct user_id_bitmap) from pv_bitmap_fix2 the result is wrong ``` This is a bug of vectorization of storage layer.	2022-03-16 11:39:41 +08:00
HappenLee	2c63fc1d6c	[improvement](vectorized) Support BetweenPredicate enable fold const expr (#8450 )	2022-03-13 09:36:24 +08:00

1 2

68 Commits