Commit Graph

62 Commits

Author SHA1 Message Date
d245ab76cc [improvement]Use uint32 instead of size_t to reduce agg key's length (#10832) 2022-07-14 14:11:55 +08:00
3b46242483 [feature-wip] Optimize Decimal type (#10794)
* [feature-wip](decimalv3) support decimalv3

* [feature-wip] Optimize Decimal type

Co-authored-by: liaoxin <liaoxinbit@126.com>
2022-07-14 10:50:50 +08:00
277a7dd97e [bugfix]ColumnDecimal missed some interfaces about pre-serialization (#10751) 2022-07-11 14:00:58 +08:00
e293fbd277 [improvement]pre-serialize aggregation keys (#10700) 2022-07-09 06:21:56 +08:00
Pxl
0b251481d5 [Enhancement][Storage] refactor Comparison Predicates (#10380) 2022-07-04 09:22:27 +08:00
Pxl
a9d23ce337 [refactor] remove collator (#10518) 2022-07-01 10:35:32 +08:00
18ad8ebfbb [improvement]Add reading by rowids to speed up lazy materialization (#10506) 2022-06-30 21:03:41 +08:00
ca94867b4e [Feature-wip] add date v2 type (#9916) 2022-06-26 16:07:56 +08:00
eebfbd0c91 Revert "[fix](vectorized) Support outer join for vectorized exec engine (#10323)" (#10424)
This reverts commit 2cc670dba697a330358ae7d485d856e4b457c679.
2022-06-25 22:18:08 +08:00
2cc670dba6 [fix](vectorized) Support outer join for vectorized exec engine (#10323)
In a vectorized scenario, the query plan will generate a new tuple for the join node.
This tuple mainly describes the output schema of the join node.
Adding this tuple mainly solves the problem that the input schema of the join node is different from the output schema.
For example:
1. The case where the null side column caused by outer join is converted to nullable.
2. The projection of the outer tuple.
2022-06-24 08:59:30 +08:00
1541dcd919 fix some typo in comments (#10374) 2022-06-24 07:20:08 +08:00
d73f170eeb [optimize](storage)optimize date in storage layer (#8967)
* opt date in storage

* code style

Co-authored-by: Wang Bo <wangbo36@meituan.com>
2022-06-23 12:29:10 +08:00
0e404edf54 [improvement] Change array offset type from UInt32 to UInt64 (#10070)
Now column `Array<T>` contains column `offsets` and `data`, and type of column `offsets` is UInt32 now.
If we call array_union to merge arrays repeatedly, the size of array may overflow.
So we need to extend it before `Array Data Type` release.
2022-06-19 10:24:08 +08:00
5e47b03595 [feature-wip](array-type) Add array aggregation functions (#10108) 2022-06-17 11:07:49 +08:00
Pxl
ae9c231925 [Enhancement][Storage] refactor InListPredicate/NotInListPredicate (#10139)
* refactor in_list_pred

* update
2022-06-16 18:09:29 +08:00
Pxl
5805f8077f [Feature] [Vectorized] Some pre-refactorings or interface additions for schema change part2 (#10003) 2022-06-16 10:50:08 +08:00
39a2785ce2 [enhancement] support simd instructions on arm cpus through sse2neon (#10068)
* [enhancement] support simd instructions on arm cpus through sse2neon
2022-06-14 09:17:09 +08:00
9c52b4a508 [enhance] improve dict in-predicate evaluate (#10009) 2022-06-09 00:25:30 +08:00
f3193c5ea3 [improvement]opt column_dictinary range filter (#9881)
* opt column_dictinary range filter

* fomart
2022-05-31 22:30:05 +08:00
af2cfa2db4 [fix] Fix bug of bloom filter hash value calculation error (#9802)
* Fix bug of bloom filter hash value calculation error

* fix code style
2022-05-27 20:44:26 +08:00
Pxl
13c1d20426 [Bug] [Vectorized] add padding when load char type data (#9734) 2022-05-26 16:51:01 +08:00
2725127421 [fix] group by with two NULL rows after left join (#9688)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-05-25 16:43:55 +08:00
bdaf0b3fcc [fix](storage) low_cardinality_optimize core dump when is null predicate (#9586)
Issue Number: close #9555
Make the last value of the dictionary null, when ColumnDict inserts a null value,
add the encoding corresponding to the last value of the dictionary·
2022-05-18 14:57:13 +08:00
4312ef93d7 [Improvement] reduce string size in serialization (#9550) 2022-05-17 22:38:34 +08:00
650e3a6ba0 [feature-wip](array-type) array_contains support more nested data types (#9170)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-05-13 12:42:40 +08:00
a0b95d8fcb [fix](storage) fix core for string predicate in storage layer (#9500)
Co-authored-by: Wang Bo <wangbo36@meituan.com>
2022-05-12 15:41:39 +08:00
718a51a388 [refactor][style] Use clang-format to sort includes (#9483) 2022-05-10 21:25:35 +08:00
c9961c9bb9 [style] clang-format all c++ code (#9305)
- sh build-support/clang-format.sh  to  clang-format all c++ code
2022-04-29 16:14:22 +08:00
48222f1fb0 [fix](storage)bloom filter support ColumnDict (#9167)
bloom filter support ColumnDict(#9167)
2022-04-28 20:03:26 +08:00
a2edc6fd8b [feature-wip](array-type) replicate impl for ColumnArray to support join with array column (#9070)
SQL with JOIN and columns ARRAY, will call function ColumnArray::replicate. At this pr,
we implement replicate for ARRAY type, to support SQL like this:
`SELECT count(lo_array),count(d_array),SUM(lo_extendedprice*lo_discount) AS REVENUE FROM  lineorder, date WHERE  lo_orderdate = d_datekey AND d_year = 1993 AND lo_discount BETWEEN 1 AND 3 AND lo_quantity < 25;`
2022-04-20 14:50:34 +08:00
Pxl
681f960257 [fix](storage)(vectorized) query get wrong result when read datetime type column (#8872) 2022-04-18 19:34:06 +08:00
52d18aa83c permute impl for column array; and codes format (#8949)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-04-13 09:47:54 +08:00
6ed59bb98b [refactor](code_style) remove useless inline #8933
1.Member functions defined in a class are inline by default (implicitly), and do not need to be added
2.inline is a keyword used for implementation, which has no effect when placed before the function declaration
2022-04-10 18:29:55 +08:00
ca4055244e [fix](storage) Fix core bug of convert to predicate column (#8833)
recurrent:
When `enable_low_cardinality_optimize = true`, for the TPCH dataset, using the following SQL query will Core
```sql
select count(*) from lineitem where l_comment = 'ously even exc';
```

This SQL will trigger the execution of `ColumnDictionary::convert_to_predicate_column_if_dictionary`, and `res->reserve(_codes.size())` is problematic because the current `_codes.size()` is smaller than its reserve value, so inserting a value into `PredicateColumn` will Core.
2022-04-07 11:29:26 +08:00
586bec79f5 [fix](storage) Fix query result error due to find code by bound (#8787)
Problem recurrence
SSB single table `lineorder_flat`, the query SQL is as follows:
```sql
SELECT
    sum(LO_REVENUE),
    (LO_ORDERDATE DIV 10000) AS year,
    P_BRAND
FROM lineorder_flat
WHERE P_BRAND >= 'MFGR#22211111' AND P_BRAND <= 'MFGR#22281111' AND S_REGION = 'ASIA' and (LO_ORDERDATE DIV 10000) = 1992
GROUP BY
    year,
    P_BRAND
ORDER BY
    year,
    P_BRAND;
```

when `enable_low_cardinality_optimize=false`, query result:
```sql
+-------------------+------+-----------+
| sum(`LO_REVENUE`) | year | P_BRAND   |
+-------------------+------+-----------+
|       65423264312 | 1992 | MFGR#2222 |
|       66936772687 | 1992 | MFGR#2223 |
|       64047191934 | 1992 | MFGR#2224 |
|       65744559138 | 1992 | MFGR#2225 |
|       66993045668 | 1992 | MFGR#2226 |
|       67411226147 | 1992 | MFGR#2227 |
|       69390885970 | 1992 | MFGR#2228 |
+-------------------+------+-----------+
```

when `enable_low_cardinality_optimize=true`, query result:
```sql
+-------------------+------+-----------+
| sum(`LO_REVENUE`) | year | P_BRAND   |
+-------------------+------+-----------+
|       66936772687 | 1992 | MFGR#2223 |
|       64047191934 | 1992 | MFGR#2224 |
|       65744559138 | 1992 | MFGR#2225 |
|       66993045668 | 1992 | MFGR#2226 |
|       67411226147 | 1992 | MFGR#2227 |
|       69390885970 | 1992 | MFGR#2228 |
+-------------------+------+-----------+
```

One line less than the correct result.

The reason is that 'MFGR#22211111' is not in the dictionary, so get the boundary code (`find_code_by_bound` method), but there is a bug here.
2022-04-03 10:38:14 +08:00
71ac86b183 [improvement](join) Support join project in query engine (#8722) 2022-03-31 23:00:07 +08:00
3724f94728 [refactor][optimize](storage) Code optimization and refactoring for low-cardinality columns in storage layer (#8627)
* Optimize predicate calculation and refactor
2022-03-29 19:11:54 +08:00
Pxl
7fc22c2456 [fix][vectorized] fix core on get_predicate_column_ptr && fix double copy on _read_columns_by_rowids (#8581) 2022-03-24 09:12:42 +08:00
2580da4f72 [feature-wip](array-type) Support insertion for vectorized engine. (#8494) (#8590)
Please refer to #8493
2022-03-22 15:48:13 +08:00
a498463ab5 [feature-wip](array-type)support select ARRAY data type on vectorized engine (#8217) (#8584)
Usage Example:
1. create table for test;
```
`CREATE TABLE `array_test` (
  `k1` tinyint(4) NOT NULL COMMENT "",
  `k2` smallint(6) NULL COMMENT "",
  `k3` ARRAY<int(11)> NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`k1`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`k1`) BUCKETS 5
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2"
);`
```

2. insert some data
```
`insert into array_test values(1, 2, [1, 2]);`
`insert into array_test values(2, 3, null);`
`insert into array_test values(3, null, null);`
`insert into array_test values(4, null, []);`
```

3. open vectorized
`set enable_vectorized_engine=true;`

4. query array data
`select * from array_test;`
+------+------+--------+
| k1   | k2   | k3     |
+------+------+--------+
|    4 | NULL | []     |
|    2 |    3 | NULL   |
|    1 |    2 | [1, 2] |
|    3 | NULL | NULL   |
+------+------+--------+
4 rows in set (0.061 sec)

Code Changes include:
1. add column_array, data_type_array codes;
2. codes about data_type creation by Field, TabletColumn, TypeDescriptor, PColumnMeta move to DataTypeFactory;
3. support create data_type for ARRAY date type;
4. RowBlockV2::convert_to_vec_block support ARRAY date type;
5. VMysqlResultWriter::append_block support ARRAY date type;
6. vectorized::Block serialize and deserialize support ARRAY date type;
2022-03-22 15:21:44 +08:00
7c1c2b1d17 [chore] fix compile error when use clang as compiler and a be ut problem (#8554) 2022-03-21 15:38:59 +08:00
2ec0b81030 [improvement](storage) Low cardinality string optimization in storage layer (#8318)
Low cardinality string optimization in storage layer
2022-03-20 23:04:25 +08:00
b8e6c3a00c [fix] fix bitmap wrong result (#8478)
Fix a bug when query bitmap return wrong result, even the simplest query. 
Such as
```
CREATE TABLE `pv_bitmap_fix2` (
`dt` int(11) NULL COMMENT "",
`page` varchar(10) NULL COMMENT "",
`user_id_bitmap` bitmap BITMAP_UNION NULL COMMENT ""
) ENGINE=OLAP 
AGGREGATE KEY(`dt`, `page`) 
COMMENT "OLAP" DISTRIBUTED BY HASH(`dt`) BUCKETS 2
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2"
)

Insert any hundreds of rows of data

select count(distinct user_id_bitmap) from pv_bitmap_fix2

the result is wrong
```
This is a bug of vectorization of storage layer.
2022-03-16 11:39:41 +08:00
2c63fc1d6c [improvement](vectorized) Support BetweenPredicate enable fold const expr (#8450) 2022-03-13 09:36:24 +08:00
68dd799796 [improvement](vectorized) Support function tuple is null (#8442) 2022-03-11 16:54:37 +08:00
d711d64dda [fix](vectorization)Some small fix for SegmentIter Vectorization (#8267)
1. No longer using short-circuit to evaluate date type, because the cost of read date type is small,
    lazy materialization has higher costs.
2. Fix read hll/bitmap/date type error results.
2022-03-08 13:13:17 +08:00
b1e7343532 [Vectorized] [HashJoin] Opt HashJoin Performance (#8119)
Co-authored-by: lihaopeng <happenlee@hotmail.com>
2022-02-23 10:28:16 +08:00
802fcbbb05 (#8162)refactor binary dict
Co-authored-by: zuochunwei <zuochunwei@meituan.com>
2022-02-22 11:23:54 +08:00
50864aca7d [refactor] fix warings when compile with clang (#8069) 2022-02-19 11:29:02 +08:00
68b24d608f [fix] (vectorization)Fix nullable column compute the hash value error (#8105)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-02-18 11:20:47 +08:00