doris

Author	SHA1	Message	Date
lihangyu	691f3c5ee7	[Performance](Variant) Improve load performance for variant type (#33890 ) 1. remove phmap for padding rows 2. add SimpleFieldVisitorToScarlarType for short circuit type deducing 3. correct type coercion for conflict types bettween integers 4. improve nullable column performance 5. remove shared_ptr dependancy for DataType use TypeIndex instead 6. Optimization by caching the order of fields (which is almost always the same) and a quick check to match the next expected field, instead of searching the hash table. benchmark: In clickbench data, load performance: 12m36.799s ->7m10.934s about 43% latency reduce In variant_p2/performance.groovy: 3min44s20 -> 1min15s80 about 66% latency reducy	2024-05-18 17:58:33 +08:00
lihangyu	249a9c9875	[Feature](Variant) support aggregation model for Variant type (#33493 ) refactor use `insert_from` to replace `replace_column_data` for variable lengths columns	2024-04-17 23:42:00 +08:00
lihangyu	617cc667fe	[Fix](Variant) fix variant serialize root node (#31769 )	2024-03-21 14:07:50 +08:00
lihangyu	e8aa5ee7d5	[Improve](Variant) support bloom filter for variant subcolumns (#31347 ) * [Improve](Variant) support bloom filter for variant subcolumns * rebase	2024-03-09 19:45:03 +08:00
lihangyu	0da010603e	[Improve](TabletSchemaCache) reduce duplicated memory consumption for column name and column path (#31141 ) Both could be reference to related field in TabletColumn.And use shared_ptr for TabletColumn in TabletSchema for later memory reuse	2024-03-09 19:44:42 +08:00
lihangyu	7b79b77cc9	[Optimize](Variant) make tablet schema more well-organized (#99 ) (#30922 )	2024-02-18 11:50:17 +08:00
Sun Chenyang	0442d5dc0e	[fix](Variant Type) Add sparse columns meta to fix compaction (#28673 ) Co-authored-by: eldenmoon <15605149486@163.com>	2024-02-16 10:12:23 +08:00
lihangyu	e6fbccd3ed	[Feature](Variant) support row store for variant type (#30052 )	2024-01-31 23:53:39 +08:00
lihangyu	9aaa6ba351	[Fix](Variant) fix variant lost null info after `cast_column` (#30153 ) This could result incorrect output in hirachinal cases ``` sql """insert into ${table_name} values (-3, '{"a" : 1, "b" : 1.5, "c" : [1, 2, 3]}')""" sql """insert into ${table_name} select -2, '{"a": 11245, "b" : [123, {"xx" : 1}], "c" : {"c" : 456, "d" : "null", "e" : 7.111}}' as json_str union all select -1, '{"a": 1123}' as json_str union all select *, '{"a" : 1234, "xxxx" : "kaana"}' as json_str from numbers("number" = "4096") limit 4096 ;""" mysql> select v["c"] from var_rs where k = -3 or k = -2; +----------------------+ \| element_at(`v`, 'c') \| +----------------------+ \| [1,2,3] \| \| [] \| +----------------------+ 2 rows in set (0.04 sec) ```	2024-01-27 09:08:29 +08:00
Pxl	3cf95d0fdf	[Improvement](execute) optimize for ColumnNullable's serialize_vec/deserialize_vec (#28788 ) optimize for ColumnNullable's serialize_vec/deserialize_vec	2024-01-12 11:59:52 +08:00
lihangyu	e75d91c91b	[regression-test](Variant) Add more cases related to schema changes (#27958 ) * [regression-test](Variant) Add more cases related to schema changes And fix bugs about schema change for variant: fix bug schema change crash on doing schema change with tablet schema that contains extracted columns	2023-12-08 10:15:12 +08:00
lihangyu	48935c14e2	[Improvement](variant) limit the column size on tablet schema (#27399 ) (#27785 ) 1. limit the column count to default 2048 2. fix get_inverted_index return nullptr when variant's unique id is -1, using it's parent unique id instead 3. avoid add same path subcolumn duplicately in tablet schema 4. make extracted column unique id -1	2023-12-04 14:47:36 +08:00
lihangyu	7398c3daf1	[Feature-Variant](Variant Type) support variant type query and index (#27676 )	2023-11-29 10:37:28 +08:00
Jerry Hu	6183b298e1	[refactor](data_type) remove some unused functions (#26966 )	2023-11-15 09:23:53 +08:00
lihangyu	44b51bf0b9	[Feature](Variant) support variant load (#26572 )	2023-11-08 00:37:57 -06:00
Gabriel	bb670118f5	[coverage](test) Delete unused function to improve test coverage (#25233 )	2023-10-11 11:50:51 +08:00
Pxl	477961dc21	[Chore](agg) refactor of hash map (#22958 ) refactor of hash map	2023-08-18 17:59:30 +08:00
zclllyybb	f2919567df	[feature](datetime) Support timezone when insert datetime value (#21898 )	2023-07-31 13:08:28 +08:00
Xinyi Zou	0396f78590	[fix](memory) Remove ChunkAllocator & fix Allocator no use mmap (#21259 )	2023-06-28 16:10:24 +08:00
lihangyu	50c1d55769	[Improve](dynamic schema) support filtering invalid data (#21160 ) * [Improve](dynamic schema) support filtering invalid data 1. Support dynamic schema to filter illegal data. 2. Expand the regular expression for ColumnName to support more column names. 3. Be compatible with PropertyAnalyzer and support legacy tables. 4. Default disable parse multi dimenssion array, since some bug unresolved	2023-06-26 19:32:43 +08:00
lihangyu	9e21318834	[refactor](dynamic table) Make segment_writer unaware of dynamic schema, and ensure parsing is exception-safe. (#19594 ) 1. make ColumnObject exception safe 2. introduce FlushContext and construct schema at memtable flush stage to make segment independent from dynamic schema 3. add more test cases	2023-06-01 10:25:04 +08:00
Adonis Ling	e412dd12e8	[chore](build) Use include-what-you-use to optimize includes (PART II) (#18761 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-19 23:11:48 +08:00
amory	564446e52f	[Refact](type system) refact serde for type system and pb serde impl (#18627 )	2023-04-18 14:13:56 +08:00
Pxl	c9b4eaea76	[Chore](storage) change FieldType to enum class #18500	2023-04-10 08:53:44 +08:00
yiguolei	a77921d767	[refactor](typesystem) remove unused rpc common file and using function rpc (#18270 ) rpc common is duplicate, all its method is included in function rpc. So that I remove it. get_field_type is never used, remove it. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-31 18:13:25 +08:00
lihangyu	e0f6083e73	[refactor](dynamic table) add `get_type_as_tprimitive_type` and `get_type_as_primitive_type` in IDataType to get `PrimitiveType` and `TPrimitiveType` (#18260 )	2023-03-31 09:03:06 +08:00
lihangyu	043f77200f	[Bug](dynamic-table) Fix column alignment logic and support filtering null values when slot is not null (#17842 ) Before this PR when encountering null values with some columns which is specified as `NOT NULL`, null values will not be filtered,thi behavior does not match with the original load behavior. Second column alignment logic has bug : ``` template <typename ColumnInserterFn> void align_variant_by_name_and_type(ColumnObject& dst, const ColumnObject& src, size_t row_cnt, ColumnInserterFn inserter) { CHECK(dst.is_finalized() && src.is_finalized()); // Use rows() here instead of size(), since size() will check_consistency // but we could not check_consistency since num_rows will be upgraded even // if src and dst is empty, we just increase the num_rows of dst and fill // num_rows of default values when meet new data size_t num_rows = dst.rows(); ```	2023-03-17 16:53:30 +08:00
lihangyu	9b7596f1c6	[Feature](Dynamic schema table) step1 support schema change expression (#17494 ) 1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns 2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility	2023-03-13 15:12:42 +08:00
lihangyu	36955a6769	[regression-test](dynamic-table) add regression test for dynamic table (#16656 )	2023-02-14 00:03:19 +08:00
lihangyu	37d1519316	[WIP](dynamic-table) support dynamic schema table (#16335 ) Issue Number: close #16351 Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.	2023-02-11 13:37:50 +08:00

30 Commits