Commit Graph

30 Commits

Author SHA1 Message Date
691f3c5ee7 [Performance](Variant) Improve load performance for variant type (#33890)
1. remove phmap for padding rows
2. add SimpleFieldVisitorToScarlarType for short circuit type deducing
3. correct type coercion for conflict types bettween integers
4. improve nullable column performance
5. remove shared_ptr dependancy for DataType use TypeIndex instead
6. Optimization by caching the order of fields (which is almost always the same)
and a quick check to match the next expected field, instead of searching the hash table.

benchmark:
In clickbench data, load performance:
12m36.799s ->7m10.934s about 43% latency reduce

In variant_p2/performance.groovy:
3min44s20 -> 1min15s80 about 66% latency reducy
2024-05-18 17:58:33 +08:00
249a9c9875 [Feature](Variant) support aggregation model for Variant type (#33493)
refactor use `insert_from` to replace `replace_column_data` for variable lengths columns
2024-04-17 23:42:00 +08:00
617cc667fe [Fix](Variant) fix variant serialize root node (#31769) 2024-03-21 14:07:50 +08:00
e8aa5ee7d5 [Improve](Variant) support bloom filter for variant subcolumns (#31347)
* [Improve](Variant) support bloom filter for variant subcolumns

* rebase
2024-03-09 19:45:03 +08:00
0da010603e [Improve](TabletSchemaCache) reduce duplicated memory consumption for column name and column path (#31141)
Both could be reference to related field in TabletColumn.And use shared_ptr for TabletColumn in TabletSchema for later memory reuse
2024-03-09 19:44:42 +08:00
7b79b77cc9 [Optimize](Variant) make tablet schema more well-organized (#99) (#30922) 2024-02-18 11:50:17 +08:00
0442d5dc0e [fix](Variant Type) Add sparse columns meta to fix compaction (#28673)
Co-authored-by: eldenmoon <15605149486@163.com>
2024-02-16 10:12:23 +08:00
e6fbccd3ed [Feature](Variant) support row store for variant type (#30052) 2024-01-31 23:53:39 +08:00
9aaa6ba351 [Fix](Variant) fix variant lost null info after cast_column (#30153)
This could result incorrect output in hirachinal cases

```
 sql """insert into ${table_name} values (-3, '{"a" : 1, "b" : 1.5, "c" : [1, 2, 3]}')"""
    sql """insert into  ${table_name} select -2, '{"a": 11245, "b" : [123, {"xx" : 1}], "c" : {"c" : 456, "d" : "null", "e" : 7.111}}'  as json_str
            union  all select -1, '{"a": 1123}' as json_str union all select *, '{"a" : 1234, "xxxx" : "kaana"}' as json_str from numbers("number" = "4096") limit 4096 ;"""

mysql> select v["c"] from var_rs where k = -3 or k = -2;
+----------------------+
| element_at(`v`, 'c') |
+----------------------+
| [1,2,3]              |
| []                   |
+----------------------+
2 rows in set (0.04 sec)
```
2024-01-27 09:08:29 +08:00
Pxl
3cf95d0fdf [Improvement](execute) optimize for ColumnNullable's serialize_vec/deserialize_vec (#28788)
optimize for ColumnNullable's serialize_vec/deserialize_vec
2024-01-12 11:59:52 +08:00
e75d91c91b [regression-test](Variant) Add more cases related to schema changes (#27958)
* [regression-test](Variant) Add more cases related to schema changes

And fix bugs about schema change for variant:
fix bug schema change crash on doing schema change with tablet schema that contains extracted columns
2023-12-08 10:15:12 +08:00
48935c14e2 [Improvement](variant) limit the column size on tablet schema (#27399) (#27785)
1. limit the column count to default 2048
2. fix get_inverted_index return nullptr when variant's unique id is -1, using it's parent unique id instead
3. avoid add same path subcolumn duplicately in tablet schema
4. make extracted column unique id -1
2023-12-04 14:47:36 +08:00
7398c3daf1 [Feature-Variant](Variant Type) support variant type query and index (#27676) 2023-11-29 10:37:28 +08:00
6183b298e1 [refactor](data_type) remove some unused functions (#26966) 2023-11-15 09:23:53 +08:00
44b51bf0b9 [Feature](Variant) support variant load (#26572) 2023-11-08 00:37:57 -06:00
bb670118f5 [coverage](test) Delete unused function to improve test coverage (#25233) 2023-10-11 11:50:51 +08:00
Pxl
477961dc21 [Chore](agg) refactor of hash map (#22958)
refactor of hash map
2023-08-18 17:59:30 +08:00
f2919567df [feature](datetime) Support timezone when insert datetime value (#21898) 2023-07-31 13:08:28 +08:00
0396f78590 [fix](memory) Remove ChunkAllocator & fix Allocator no use mmap (#21259) 2023-06-28 16:10:24 +08:00
50c1d55769 [Improve](dynamic schema) support filtering invalid data (#21160)
* [Improve](dynamic schema) support filtering invalid data

1. Support dynamic schema to filter illegal data.
2. Expand the regular expression for ColumnName to support more column names.
3. Be compatible with PropertyAnalyzer and support legacy tables.
4. Default disable parse multi dimenssion array, since some bug unresolved
2023-06-26 19:32:43 +08:00
9e21318834 [refactor](dynamic table) Make segment_writer unaware of dynamic schema, and ensure parsing is exception-safe. (#19594)
1. make ColumnObject exception safe
2. introduce FlushContext and construct schema at memtable flush stage to make segment independent from dynamic schema
3. add more test cases
2023-06-01 10:25:04 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
564446e52f [Refact](type system) refact serde for type system and pb serde impl (#18627) 2023-04-18 14:13:56 +08:00
Pxl
c9b4eaea76 [Chore](storage) change FieldType to enum class #18500 2023-04-10 08:53:44 +08:00
a77921d767 [refactor](typesystem) remove unused rpc common file and using function rpc (#18270)
rpc common is duplicate, all its method is included in function rpc. So that I remove it.
get_field_type is never used, remove it.
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-31 18:13:25 +08:00
e0f6083e73 [refactor](dynamic table) add get_type_as_tprimitive_type and get_type_as_primitive_type in IDataType to get PrimitiveType and TPrimitiveType (#18260) 2023-03-31 09:03:06 +08:00
043f77200f [Bug](dynamic-table) Fix column alignment logic and support filtering null values when slot is not null (#17842)
Before this PR when encountering null values with some columns which is specified as `NOT NULL`, null values will not be filtered,thi behavior does not match with the original load behavior.
Second column alignment logic has bug :

```
template <typename ColumnInserterFn>
void align_variant_by_name_and_type(ColumnObject& dst, const ColumnObject& src, size_t row_cnt,
                                    ColumnInserterFn inserter) {
    CHECK(dst.is_finalized() && src.is_finalized());
    // Use rows() here instead of size(), since size() will check_consistency
    // but we could not check_consistency since num_rows will be upgraded even
    // if src and dst is empty, we just increase the num_rows of dst and fill
    // num_rows of default values when meet new data
    size_t num_rows = dst.rows();
```
2023-03-17 16:53:30 +08:00
9b7596f1c6 [Feature](Dynamic schema table) step1 support schema change expression (#17494)
1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns
2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility
2023-03-13 15:12:42 +08:00
36955a6769 [regression-test](dynamic-table) add regression test for dynamic table (#16656) 2023-02-14 00:03:19 +08:00
37d1519316 [WIP](dynamic-table) support dynamic schema table (#16335)
Issue Number: close #16351

Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.
2023-02-11 13:37:50 +08:00