doris

Author	SHA1	Message	Date
Jerry Hu	d245ab76cc	[improvement]Use uint32 instead of size_t to reduce agg key's length (#10832 )	2022-07-14 14:11:55 +08:00
Gabriel	3b46242483	[feature-wip] Optimize Decimal type (#10794 ) * [feature-wip](decimalv3) support decimalv3 * [feature-wip] Optimize Decimal type Co-authored-by: liaoxin <liaoxinbit@126.com>	2022-07-14 10:50:50 +08:00
Jerry Hu	277a7dd97e	[bugfix]ColumnDecimal missed some interfaces about pre-serialization (#10751 )	2022-07-11 14:00:58 +08:00
Jerry Hu	e293fbd277	[improvement]pre-serialize aggregation keys (#10700 )	2022-07-09 06:21:56 +08:00
Pxl	0b251481d5	[Enhancement][Storage] refactor Comparison Predicates (#10380 )	2022-07-04 09:22:27 +08:00
Pxl	a9d23ce337	[refactor] remove collator (#10518 )	2022-07-01 10:35:32 +08:00
Jerry Hu	18ad8ebfbb	[improvement]Add reading by rowids to speed up lazy materialization (#10506 )	2022-06-30 21:03:41 +08:00
Gabriel	ca94867b4e	[Feature-wip] add date v2 type (#9916 )	2022-06-26 16:07:56 +08:00
Gabriel	eebfbd0c91	Revert "[fix](vectorized) Support outer join for vectorized exec engine (#10323 )" (#10424 ) This reverts commit 2cc670dba697a330358ae7d485d856e4b457c679.	2022-06-25 22:18:08 +08:00
HappenLee	2cc670dba6	[fix](vectorized) Support outer join for vectorized exec engine (#10323 ) In a vectorized scenario, the query plan will generate a new tuple for the join node. This tuple mainly describes the output schema of the join node. Adding this tuple mainly solves the problem that the input schema of the join node is different from the output schema. For example: 1. The case where the null side column caused by outer join is converted to nullable. 2. The projection of the outer tuple.	2022-06-24 08:59:30 +08:00
carlvinhust2012	1541dcd919	fix some typo in comments (#10374 )	2022-06-24 07:20:08 +08:00
wangbo	d73f170eeb	[optimize](storage)optimize date in storage layer (#8967 ) * opt date in storage * code style Co-authored-by: Wang Bo <wangbo36@meituan.com>	2022-06-23 12:29:10 +08:00
camby	0e404edf54	[improvement] Change array offset type from UInt32 to UInt64 (#10070 ) Now column `Array<T>` contains column `offsets` and `data`, and type of column `offsets` is UInt32 now. If we call array_union to merge arrays repeatedly, the size of array may overflow. So we need to extend it before `Array Data Type` release.	2022-06-19 10:24:08 +08:00
Adonis Ling	5e47b03595	[feature-wip](array-type) Add array aggregation functions (#10108 )	2022-06-17 11:07:49 +08:00
Pxl	ae9c231925	[Enhancement][Storage] refactor InListPredicate/NotInListPredicate (#10139 ) * refactor in_list_pred * update	2022-06-16 18:09:29 +08:00
Pxl	5805f8077f	[Feature] [Vectorized] Some pre-refactorings or interface additions for schema change part2 (#10003 )	2022-06-16 10:50:08 +08:00
Zhengguo Yang	39a2785ce2	[enhancement] support simd instructions on arm cpus through sse2neon (#10068 ) * [enhancement] support simd instructions on arm cpus through sse2neon	2022-06-14 09:17:09 +08:00
minghong	9c52b4a508	[enhance] improve dict in-predicate evaluate (#10009 )	2022-06-09 00:25:30 +08:00
minghong	f3193c5ea3	[improvement]opt column_dictinary range filter (#9881 ) * opt column_dictinary range filter * fomart	2022-05-31 22:30:05 +08:00
Luwei	af2cfa2db4	[fix] Fix bug of bloom filter hash value calculation error (#9802 ) * Fix bug of bloom filter hash value calculation error * fix code style	2022-05-27 20:44:26 +08:00
Pxl	13c1d20426	[Bug] [Vectorized] add padding when load char type data (#9734 )	2022-05-26 16:51:01 +08:00
camby	2725127421	[fix] group by with two NULL rows after left join (#9688 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-05-25 16:43:55 +08:00
ZenoYang	bdaf0b3fcc	[fix](storage) low_cardinality_optimize core dump when is null predicate (#9586 ) Issue Number: close #9555 Make the last value of the dictionary null, when ColumnDict inserts a null value, add the encoding corresponding to the last value of the dictionary·	2022-05-18 14:57:13 +08:00
Gabriel	4312ef93d7	[Improvement] reduce string size in serialization (#9550 )	2022-05-17 22:38:34 +08:00
camby	650e3a6ba0	[feature-wip](array-type) array_contains support more nested data types (#9170 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-05-13 12:42:40 +08:00
wangbo	a0b95d8fcb	[fix](storage) fix core for string predicate in storage layer (#9500 ) Co-authored-by: Wang Bo <wangbo36@meituan.com>	2022-05-12 15:41:39 +08:00
Adonis Ling	718a51a388	[refactor][style] Use clang-format to sort includes (#9483 )	2022-05-10 21:25:35 +08:00
chenlinzhong	c9961c9bb9	[style] clang-format all c++ code (#9305 ) - sh build-support/clang-format.sh to clang-format all c++ code	2022-04-29 16:14:22 +08:00
wangbo	48222f1fb0	[fix](storage)bloom filter support ColumnDict (#9167 ) bloom filter support ColumnDict(#9167)	2022-04-28 20:03:26 +08:00
camby	a2edc6fd8b	[feature-wip](array-type) replicate impl for ColumnArray to support join with array column (#9070 ) SQL with JOIN and columns ARRAY, will call function ColumnArray::replicate. At this pr, we implement replicate for ARRAY type, to support SQL like this: `SELECT count(lo_array),count(d_array),SUM(lo_extendedprice*lo_discount) AS REVENUE FROM lineorder, date WHERE lo_orderdate = d_datekey AND d_year = 1993 AND lo_discount BETWEEN 1 AND 3 AND lo_quantity < 25;`	2022-04-20 14:50:34 +08:00
Pxl	681f960257	[fix](storage)(vectorized) query get wrong result when read datetime type column (#8872 )	2022-04-18 19:34:06 +08:00
camby	52d18aa83c	permute impl for column array; and codes format (#8949 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-04-13 09:47:54 +08:00
zbtzbtzbt	6ed59bb98b	[refactor](code_style) remove useless inline #8933 1.Member functions defined in a class are inline by default (implicitly), and do not need to be added 2.inline is a keyword used for implementation, which has no effect when placed before the function declaration	2022-04-10 18:29:55 +08:00
ZenoYang	ca4055244e	[fix](storage) Fix core bug of convert to predicate column (#8833 ) recurrent: When `enable_low_cardinality_optimize = true`, for the TPCH dataset, using the following SQL query will Core ```sql select count(*) from lineitem where l_comment = 'ously even exc'; ``` This SQL will trigger the execution of `ColumnDictionary::convert_to_predicate_column_if_dictionary`, and `res->reserve(_codes.size())` is problematic because the current `_codes.size()` is smaller than its reserve value, so inserting a value into `PredicateColumn` will Core.	2022-04-07 11:29:26 +08:00
ZenoYang	586bec79f5	[fix](storage) Fix query result error due to find code by bound (#8787 ) Problem recurrence SSB single table `lineorder_flat`, the query SQL is as follows: ```sql SELECT sum(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS year, P_BRAND FROM lineorder_flat WHERE P_BRAND >= 'MFGR#22211111' AND P_BRAND <= 'MFGR#22281111' AND S_REGION = 'ASIA' and (LO_ORDERDATE DIV 10000) = 1992 GROUP BY year, P_BRAND ORDER BY year, P_BRAND; ``` when `enable_low_cardinality_optimize=false`, query result： ```sql +-------------------+------+-----------+ \| sum(`LO_REVENUE`) \| year \| P_BRAND \| +-------------------+------+-----------+ \| 65423264312 \| 1992 \| MFGR#2222 \| \| 66936772687 \| 1992 \| MFGR#2223 \| \| 64047191934 \| 1992 \| MFGR#2224 \| \| 65744559138 \| 1992 \| MFGR#2225 \| \| 66993045668 \| 1992 \| MFGR#2226 \| \| 67411226147 \| 1992 \| MFGR#2227 \| \| 69390885970 \| 1992 \| MFGR#2228 \| +-------------------+------+-----------+ ``` when `enable_low_cardinality_optimize=true`, query result： ```sql +-------------------+------+-----------+ \| sum(`LO_REVENUE`) \| year \| P_BRAND \| +-------------------+------+-----------+ \| 66936772687 \| 1992 \| MFGR#2223 \| \| 64047191934 \| 1992 \| MFGR#2224 \| \| 65744559138 \| 1992 \| MFGR#2225 \| \| 66993045668 \| 1992 \| MFGR#2226 \| \| 67411226147 \| 1992 \| MFGR#2227 \| \| 69390885970 \| 1992 \| MFGR#2228 \| +-------------------+------+-----------+ ``` One line less than the correct result. The reason is that 'MFGR#22211111' is not in the dictionary, so get the boundary code (`find_code_by_bound` method), but there is a bug here.	2022-04-03 10:38:14 +08:00
HappenLee	71ac86b183	[improvement](join) Support join project in query engine (#8722 )	2022-03-31 23:00:07 +08:00
ZenoYang	3724f94728	[refactor][optimize](storage) Code optimization and refactoring for low-cardinality columns in storage layer (#8627 ) * Optimize predicate calculation and refactor	2022-03-29 19:11:54 +08:00
Pxl	7fc22c2456	[fix][vectorized] fix core on get_predicate_column_ptr && fix double copy on _read_columns_by_rowids (#8581 )	2022-03-24 09:12:42 +08:00
Adonis Ling	2580da4f72	[feature-wip](array-type) Support insertion for vectorized engine. (#8494 ) (#8590 ) Please refer to #8493	2022-03-22 15:48:13 +08:00
camby	a498463ab5	[feature-wip](array-type)support select ARRAY data type on vectorized engine (#8217 ) (#8584 ) Usage Example: 1. create table for test; ``` `CREATE TABLE `array_test` ( `k1` tinyint(4) NOT NULL COMMENT "", `k2` smallint(6) NULL COMMENT "", `k3` ARRAY<int(11)> NULL COMMENT "" ) ENGINE=OLAP DUPLICATE KEY(`k1`) COMMENT "OLAP" DISTRIBUTED BY HASH(`k1`) BUCKETS 5 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "in_memory" = "false", "storage_format" = "V2" );` ``` 2. insert some data ``` `insert into array_test values(1, 2, [1, 2]);` `insert into array_test values(2, 3, null);` `insert into array_test values(3, null, null);` `insert into array_test values(4, null, []);` ``` 3. open vectorized `set enable_vectorized_engine=true;` 4. query array data `select * from array_test;` +------+------+--------+ \| k1 \| k2 \| k3 \| +------+------+--------+ \| 4 \| NULL \| [] \| \| 2 \| 3 \| NULL \| \| 1 \| 2 \| [1, 2] \| \| 3 \| NULL \| NULL \| +------+------+--------+ 4 rows in set (0.061 sec) Code Changes include： 1. add column_array, data_type_array codes; 2. codes about data_type creation by Field, TabletColumn, TypeDescriptor, PColumnMeta move to DataTypeFactory; 3. support create data_type for ARRAY date type; 4. RowBlockV2::convert_to_vec_block support ARRAY date type; 5. VMysqlResultWriter::append_block support ARRAY date type; 6. vectorized::Block serialize and deserialize support ARRAY date type;	2022-03-22 15:21:44 +08:00
Zhengguo Yang	7c1c2b1d17	[chore] fix compile error when use clang as compiler and a be ut problem (#8554 )	2022-03-21 15:38:59 +08:00
ZenoYang	2ec0b81030	[improvement](storage) Low cardinality string optimization in storage layer (#8318 ) Low cardinality string optimization in storage layer	2022-03-20 23:04:25 +08:00
wangbo	b8e6c3a00c	[fix] fix bitmap wrong result (#8478 ) Fix a bug when query bitmap return wrong result, even the simplest query. Such as ``` CREATE TABLE `pv_bitmap_fix2` ( `dt` int(11) NULL COMMENT "", `page` varchar(10) NULL COMMENT "", `user_id_bitmap` bitmap BITMAP_UNION NULL COMMENT "" ) ENGINE=OLAP AGGREGATE KEY(`dt`, `page`) COMMENT "OLAP" DISTRIBUTED BY HASH(`dt`) BUCKETS 2 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "in_memory" = "false", "storage_format" = "V2" ) Insert any hundreds of rows of data select count(distinct user_id_bitmap) from pv_bitmap_fix2 the result is wrong ``` This is a bug of vectorization of storage layer.	2022-03-16 11:39:41 +08:00
HappenLee	2c63fc1d6c	[improvement](vectorized) Support BetweenPredicate enable fold const expr (#8450 )	2022-03-13 09:36:24 +08:00
HappenLee	68dd799796	[improvement](vectorized) Support function tuple is null (#8442 )	2022-03-11 16:54:37 +08:00
wangbo	d711d64dda	[fix](vectorization)Some small fix for SegmentIter Vectorization (#8267 ) 1. No longer using short-circuit to evaluate date type, because the cost of read date type is small, lazy materialization has higher costs. 2. Fix read hll/bitmap/date type error results.	2022-03-08 13:13:17 +08:00
awakeljw	b1e7343532	[Vectorized] [HashJoin] Opt HashJoin Performance (#8119 ) Co-authored-by: lihaopeng <happenlee@hotmail.com>	2022-02-23 10:28:16 +08:00
zuochunwei	802fcbbb05	(#8162 )refactor binary dict Co-authored-by: zuochunwei <zuochunwei@meituan.com>	2022-02-22 11:23:54 +08:00
Zhengguo Yang	50864aca7d	[refactor] fix warings when compile with clang (#8069 )	2022-02-19 11:29:02 +08:00
HappenLee	68b24d608f	[fix] (vectorization)Fix nullable column compute the hash value error (#8105 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-02-18 11:20:47 +08:00

1 2

62 Commits