doris

Files

Ashin Gau 938f768aba [fix](parquet) resolve offset check failed in parquet map type (#22510 )

Fix error when reading empty map values in parquet. The `offsets.back()` doesn't not equal the number of elements in map's key column.

### How does this happen
Map in parquet is stored as repeated group, and `repeated_parent_def_level` is set incorrectly when parsing map node in parquet schema.
```
the map definition in parquet:
 optional group <name> (MAP) {
   repeated group map (MAP_KEY_VALUE) {
     required <type> key;
     optional <type> value;
   }
}
```

### How to fix
Set the `repeated_parent_def_level` of key/value node as the definition level of map node.

`repeated_parent_def_level` is the definition level of the first ancestor node whose `repetition_type` equals `REPEATED`.  Empty array/map values are not stored in doris column, so have to use `repeated_parent_def_level` to skip the empty or null values in ancestor node.

For instance, considering an array of strings with 3 rows like the following:
`null, [], [a, b, c]`
We can store four elements in data column: `null, a, b, c`
and the offsets column is: `1, 1, 4`
and the null map is: `1, 0, 0`
For the `i-th` row in array column: range from `offsets[i - 1]` until `offsets[i]` represents the elements in this row, so we can't store empty array/map values in doris data column. As a comparison, spark does not require `repeated_parent_def_level`, because the spark column stores empty array/map values , and use anther length column to indicate empty values. Please reference: https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnVector.java

Furthermore, we can also avoid store null array/map values in doris data column. The same three rows as above, We can only store three elements in data column: `a, b, c`
and the offsets column is: `0, 0, 3`
and the null map is: `1, 0, 0`

2023-08-02 22:33:10 +08:00

account_p0

[Enhance](auth)Users support multiple roles (#17236 )

2023-03-07 10:28:56 +08:00

backup_restore

[Enhancement](regression-test)Add regression test for MoW backup and restore (#21223 )

2023-07-05 15:16:04 +08:00

bitmap_functions

[fix](planner) Keep type of null literal expr when register conjuncts (#15878 )

2023-01-17 16:48:02 +08:00

bloom_filter_p0

[Enchancement](compatible) show decimalv3 to decimal (#21782 )

2023-07-18 09:17:14 +08:00

brown_p2/sql

[test](regression) update some case in brown_p2 #21037

2023-06-21 16:25:07 +08:00

cast_decimal_to_boolean

[Fix](Planner) fix cast from decimal to boolean (#19585 )

2023-05-15 15:13:16 +08:00

cast_double_to_decimal

[Conf](decimalv3) enable decimalv3 by default

2023-05-29 15:38:31 +08:00

compaction

[Feature](Compaction)Support full compaction (#21177 )

2023-07-16 13:21:15 +08:00

compression_p0

[improvement](regression-test) add compression algorithm regression test (#22303 )

2023-07-28 17:28:52 +08:00

compression_p1

[improvement](regression-test) add compression algorithm regression test (#22303 )

2023-07-28 17:28:52 +08:00

connector_p0/spark_connector

[regression-test](spark-connector) Add the regression case of the spark doris connector (#14877 )

2023-01-18 16:41:41 +08:00

correctness

[fix](decimal) fix cast rounding half up with negative number (#22450 )

2023-08-01 21:47:42 +08:00

correctness_p0

[feature](table-value-functions)add catalogs table-value-function (#21790 )

2023-07-14 10:25:16 +08:00

csv_header_p0

[Regression](datev2) Add test cases for datev2/datetimev2 (#11831 )

2022-08-19 10:57:55 +08:00

data_model_p0

[fix](autoinc) fix _fill_auto_inc_cols when the input column is ColumnConst (#22175 )

2023-07-25 14:41:36 +08:00

datatype_p0

[feature](datetime) Support timezone when insert datetime value (#21898 )

2023-07-31 13:08:28 +08:00

datatype_p2

[FIX](Map)fix calculate map offset in olap convertor (#18295 )

2023-04-07 17:04:08 +08:00

ddl_p0

[fix](create table) modify varchar default length 1 to 65533 (#21302 )

2023-07-10 17:57:21 +08:00

decimalv3/tpch_sf0.1_p1/sql

[feature](decimal)support cast rounding half up and div precision increment in decimalv3. (#19811 )

2023-06-01 13:09:58 +08:00

delete_p0

[fix](Nereids) decimal divide should not return null if numerator is zero (#22309 )

2023-07-27 20:23:04 +08:00

demo_p0

[Bug](sort)fix merge sorter might div zero when block bytes less than block rows (#15859 )

2023-01-13 18:33:40 +08:00

dynamic_table_p0

[Improve](dynamic schema) support filtering invalid data (#21160 )

2023-06-26 19:32:43 +08:00

es_p0

[fix](DOE) es catalog not working with pipeline,datetimev2, array and esquery (#22046 )

2023-08-01 21:45:16 +08:00

export

[Conf](decimalv3) enable decimalv3 by default

2023-05-29 15:38:31 +08:00

export_p0

[Feature](Nereids) add executable function to support fold constant for functions (#18209 )

2023-05-17 21:26:31 +08:00

export_p2

[fix](exec) run exec_plan_fragment in pthread to avoid BE crash (#21343 )

2023-07-01 12:29:22 +08:00

external_catalog_p0/hive

[Fix](orc-reader) Fix Wrong data type for column error when column order in hive table is not same in orc file schema. (#21306 )

2023-07-03 09:32:55 +08:00

external_table_emr_p2

[fix](parquet) resolve offset check failed in parquet map type (#22510 )

2023-08-02 22:33:10 +08:00

flink_connector_p0

[regression](flink)add flink doris connector case (#15676 )

2023-01-10 17:25:06 +08:00

github_events_p2

[improvement](test) improve p2 case of githubevents (#20727 )

2023-06-13 14:31:24 +08:00

index_p0

[Enchancement](compatible) show decimalv3 to decimal (#21782 )

2023-07-18 09:17:14 +08:00

insert_overwrite_p0

[Feature](insert) support insert overwrite stmt (#19616 )

2023-05-14 20:01:30 +08:00

insert_p0

[fix](load) in strict mode, return error for insert if datatype convert fails (#20378 )

2023-06-06 12:04:03 +08:00

inverted_index_p0

[Refactor](inverted index) refact tokenize function for inverted index (#22313 )

2023-08-02 19:12:22 +08:00

inverted_index_p1/tpcds_sf1_index/sql

[feature](decimal)support cast rounding half up and div precision increment in decimalv3. (#19811 )

2023-06-01 13:09:58 +08:00

javaudf_p0

[vectorized](udaf) java udaf support with map type (#22397 )

2023-08-02 15:03:44 +08:00

jdbc_catalog_p0

[fix](jdbc_catalog) fix int and bigint in mysql view when use doris catalog (#22251 )

2023-07-27 16:50:42 +08:00

jdbc_p0

[fix](multi-catalog) verify the precision of datetime types for each data source (#19544 )

2023-05-17 20:50:15 +08:00

json_p0

[improve](jsonb)Invalid json path prompts an error instead of null (#19646 )

2023-06-30 14:29:21 +08:00

jsonb_p0

[Improve](jsonb_extract) support jsonb_extract multi parse path (#21555 )

2023-07-12 21:37:36 +08:00

load/insert

[Conf](decimalv3) enable decimalv3 by default

2023-05-29 15:38:31 +08:00

load_p0

[FIX](complex-type)fix complex type nested col_const (#22375 )

2023-07-31 14:53:18 +08:00

load_p1/stream_load

[fix](stream-load) find line delimiter in csv should start with no offset (#18161 )

2023-03-30 14:42:34 +08:00

load_p2/broker_load

[fix](test) fix p2 broker load (#20196 )

2023-05-30 16:26:00 +08:00

map_p0

[Feature](map-type) Support stream load and fix some bugs for map type (#16776 )

2023-02-19 15:11:54 +08:00

mtmv_p0

[fix](MTMV) Tasks leak when dropping job (#17984 )

2023-03-21 23:22:17 +08:00

mv_p0

[Chore](materialized-view) update documentation about materialized-view and update test (#22350 )

2023-08-01 15:13:34 +08:00

mysql_fulltext

[refactor](config) Delete the environment variable enable_vectorized_engine (#18166 )

2023-04-07 14:23:16 +08:00

mysqldump_p0

[Enchancement](mysql-compatable) add regression-test for MySQLdump #18208

2023-04-03 09:49:07 +08:00

nereids_arith_p0

[feature](decimal)support cast rounding half up and div precision increment in decimalv3. (#19811 )

2023-06-01 13:09:58 +08:00

nereids_function_p0

[opt](Nereids) add double signature back for round like function (#22284 )

2023-07-27 19:10:43 +08:00

nereids_p0

[Fix](Planner) fix parse error of view with group_concat order by (#22196 )

2023-07-31 17:20:23 +08:00

nereids_syntax_p0

[fix](nereids) recompute logical properties in plan post process (#22356 )

2023-07-31 21:04:39 +08:00

nereids_tpcds_shape_sf100_p0/shape

[stats](nereids) fix bug for avg-size (#22421 )

2023-08-01 17:13:00 +08:00

nereids_tpch_p0/tpch

[Bug](function) catch error state in function cast to avoid core dump (#20751 )

2023-06-14 17:34:34 +08:00

nereids_tpch_shape_sf500_p0/shape

[stats](nereids) fix bug for avg-size (#22421 )

2023-08-01 17:13:00 +08:00

nereids_tpch_shape_sf1000_p0/shape

[stats](nereids) fix bug for avg-size (#22421 )

2023-08-01 17:13:00 +08:00

opensky_p2/sql

[enhancement](test) add opensky cases to p2 (#12693 )

2022-09-19 08:38:17 +08:00

partition_p0

[regression-test] add list partition case and multi partition keys case (#22042 )

2023-07-28 10:12:35 +08:00

performance_p0

[opt](hive)opt select count(*) stmt push down agg on parquet in hive . (#22115 )

2023-07-29 00:31:01 +08:00

point_query_p0

[Bug](row-store) Fix row store with materialize index (#20356 )

2023-06-08 10:55:22 +08:00

query/regexp

[fix](test) move cases in query to query_p0 (#14452 )

2022-11-22 21:35:18 +08:00

query_p0

[regression](fix) fix test_round case (#22441 )

2023-08-01 11:35:44 +08:00

query_p1

[regression] add bitmap filter p1 regression case (#21591 )

2023-07-11 14:27:03 +08:00

query_p2

[Chore](case) remove load big lateral view from p1 to p2 (#17851 )

2023-03-20 13:10:12 +08:00

rollup

[Feature](Nereids) add executable function to support fold constant for functions (#18209 )

2023-05-17 21:26:31 +08:00

rollup_p0

[datetimev2](minor) Add scale parameter for datetimev2 (#21176 )

2023-06-27 19:55:35 +08:00

schema_change

[feature](table-metadata) support altering the property "light_schema_change" for the tables which created before 1.2 (#17704 )

2023-04-11 11:09:43 +08:00

schema_change_p0

Revert "[feature](merge-on-write) enable merge on write by default (#… (#21041 )

2023-06-21 18:36:46 +08:00

segcompaction_p2

[fix](regression) mv segcompaction_p1 to segcompaction_p2 (#18806 )

2023-04-26 15:34:46 +08:00

sql_block_rule_p0

[fix](test) try to let cases run in parallel (#13114 )

2022-10-04 20:56:22 +08:00

ssb_sf0.1_p1/sql

[refactor](config) Delete the environment variable enable_vectorized_engine (#18166 )

2023-04-07 14:23:16 +08:00

ssb_sf1_p2/sql

[enhancement](regression) split ssb sf1 to sf0.1 to get smaller test data size (#14437 )

2022-11-22 10:36:12 +08:00

ssb_sf100_p2/sql

Revert "[test](pipeline) Run nereids cases in p1/p2 (#16130 )" (#16792 )

2023-02-17 18:48:27 +08:00

ssb_unique_sql_zstd_p0/sql

[improvement](exchange) test: data stream sender stop sending data to receiver if it returns eos early (#20081 )

2023-05-26 16:05:38 +08:00

statistics

[Conf](decimalv3) enable decimalv3 by default

2023-05-29 15:38:31 +08:00

statistics_p1

[regressiontest](statistics) Collate and supplement statistics regression test (#19901 )

2023-05-24 20:17:28 +08:00

tpcds_sf1_p1

[feature](decimal)support cast rounding half up and div precision increment in decimalv3. (#19811 )

2023-06-01 13:09:58 +08:00

tpcds_sf1_unique_p1/sql

[feature](decimal)support cast rounding half up and div precision increment in decimalv3. (#19811 )

2023-06-01 13:09:58 +08:00

tpcds_sf100_dup_without_key_p2/sql

[test](regression) update some case in p2 (#21094 )

2023-06-25 11:16:56 +08:00

tpcds_sf100_p2/sql

[test](regression) update some case in p2 (#20436 )

2023-06-06 11:05:56 +08:00

tpcds_sf1000_p2/sql

[chore](testcase) change tpcds q67 testcase name to q67_ignore_temporarily (#19227 )

2023-05-10 15:06:23 +08:00

tpch_sf0.1_p1/sql

[feature](decimal)support cast rounding half up and div precision increment in decimalv3. (#19811 )

2023-06-01 13:09:58 +08:00

tpch_sf0.1_unique_p1/sql

[feature](decimal)support cast rounding half up and div precision increment in decimalv3. (#19811 )

2023-06-01 13:09:58 +08:00

tpch_sf1_p0/multi_catalog_query

[Fix](mutli-catalog) Use decimal v3 type to fix decimal loss issue in multi-catalog module. (#18835 )

2023-04-20 11:02:53 +08:00

tpch_sf1_p1

[fix](test) tpch_sf1_p1 and tpch_sf1_p1/tpch_sf1 are confusing (#14206 )

2022-11-28 19:30:32 +08:00

tpch_sf1_p2

[test](regression) update some case in p2 (#20436 )

2023-06-06 11:05:56 +08:00

tpch_sf1_unique_p1/sql

[fix](test) tpch_sf1_p1 and tpch_sf1_p1/tpch_sf1 are confusing (#14206 )

2022-11-28 19:30:32 +08:00

tpch_sf1_unique_p2/sql

[test](regression) update some case in p2 (#20436 )

2023-06-06 11:05:56 +08:00

tpch_sf100_p2/sql

[test](regression) update some case in p2 (#20436 )

2023-06-06 11:05:56 +08:00

tpch_sf100_unique_sql_p2/sql

[test](regression) update some case in p2 (#20683 )

2023-06-16 10:09:22 +08:00

tpch_unique_sql_zstd_p0/sql

[feature](decimal)support cast rounding half up and div precision increment in decimalv3. (#19811 )

2023-06-01 13:09:58 +08:00

trino_p0

[Conf](decimalv3) enable decimalv3 by default

2023-05-29 15:38:31 +08:00

types

[Feature](aggregation) add agg_state define and ddl support (#19824 )

2023-05-22 11:45:53 +08:00

types_p0/unsigned

Revert "[feature](merge-on-write) enable merge on write by default (#… (#21041 )

2023-06-21 18:36:46 +08:00

unique_with_mow_p0

[fix](partial update) remove CHECK on illegal number of partial columns (#22319 )

2023-07-28 23:11:58 +08:00

unique_with_mow_p2/ssb_unique_sql_zstd/sql

[regression-test](merge-on-write) Optimize merge-on-write case (#18038 )

2023-03-23 17:59:49 +08:00

update

Revert "[feature](merge-on-write) enable merge on write by default (#… (#21041 )

2023-06-21 18:36:46 +08:00

view_p0

[Bug](view) fix AES_ENCRYPT have wrong result on view (#18034 )

2023-03-29 10:49:39 +08:00

with_clause_p0/sql

[enhancement](test) add more p0 cases (#12285 )

2022-09-29 10:45:17 +08:00

yandex_metrica_p2

[refactor](config) Delete the environment variable enable_vectorized_engine (#18166 )

2023-04-07 14:23:16 +08:00