Commit Graph

7122 Commits

Author SHA1 Message Date
2f2d488668 [opt](parquet) Support hive struct schema change (#32438)
Followup: #31128
This optimization allows doris to correctly read struct type data after changing the schema from hive.

## Changing  struct schema  in hive:
```sql
hive> create table struct_test(id int,sf struct<f1: int, f2: string>) stored as parquet;

hive> insert into struct_test values
    >           (1, named_struct('f1', 1, 'f2', 's1')),
    >           (2, named_struct('f1', 2, 'f2', 's2')),
    >           (3, named_struct('f1', 3, 'f2', 's3'));

hive> alter table struct_test change sf sf struct<f1:int, f3:string>;

hive> select * from struct_test;
OK
1	{"f1":1,"f3":null}
2	{"f1":2,"f3":null}
3	{"f1":3,"f3":null}
Time taken: 5.298 seconds, Fetched: 3 row(s)
```

The previous result of doris was:
```sql
mysql> select * from struct_test;
+------+-----------------------+
| id   | sf                    |
+------+-----------------------+
|    1 | {"f1": 1, "f3": "s1"} |
|    2 | {"f1": 2, "f3": "s2"} |
|    3 | {"f1": 3, "f3": "s3"} |
+------+-----------------------+
```

Now the result is same as hive:

```sql
mysql> select * from struct_test;
+------+-----------------------+
| id   | sf                    |
+------+-----------------------+
|    1 | {"f1": 1, "f3": null} |
|    2 | {"f1": 2, "f3": null} |
|    3 | {"f1": 3, "f3": null} |
+------+-----------------------+
```
2024-03-22 16:35:47 +08:00
647a0606aa [pipelineX](refactor) Wait for 2-phase execution before opening (#32613)
Wait for 2-phase execution before opening
2024-03-22 16:35:47 +08:00
66336e59e6 [fix](join) the result of left semi join with empty right side should be false, not null (#32477) 2024-03-22 16:35:43 +08:00
baf3ae1a93 [refactor](nereids)unify outputTupleDesc and projection be part (#32439) 2024-03-22 16:35:43 +08:00
ab467f53db [fix](partition) Fix be tablet partition id eq 0 By report tablet (#32179) (#32667) 2024-03-22 15:38:58 +08:00
ea71472d64 [fix](build index) fix core when build index for a new column which without data (#32550) (#32669)
Co-authored-by: Luennng <luennng@gmail.com>
Co-authored-by: Tanya-W <tanya1218w@163,com>
2024-03-22 15:05:19 +08:00
a4a191fe56 [fix](index compaction)Fix MOW index compaction core (#32121) (#32657) 2024-03-22 14:20:19 +08:00
23c12fd68f [fix](join) core caused by null-safe-equal join (#32623) 2024-03-22 08:53:47 +08:00
921fab2196 [fix](memory) Fix thread context not initialized in MacOS (#32570) 2024-03-22 08:53:47 +08:00
6b54171778 [bugfix](deadlock) pipelinex map lock should only scope in map not about pipelinectx's cancel method (#32622)
both global lock in fragment mgr should only protect the map logic, could not use it to protect cancel method.
fragment ctx cancel method should be protected by a lock.
query ctx cancel --> pipelinex fragment cancel ---> query ctx cancel will dead lock.
2024-03-22 08:52:38 +08:00
Pxl
6462d913ca [Improvement](brpc) log error message when AutoReleaseClosure meet brpc error or response… (#32628)
log error message when AutoReleaseClosure meet brpc error or response with error status
2024-03-22 08:52:38 +08:00
d3bdda6071 [fix](partial update) fix data correctness risk when load delete sign data into a table with sequence col (#32574) 2024-03-22 08:52:38 +08:00
55b7f7f019 [fix](inverted index) skip read index column data only for DUP and MOW table (#32594) 2024-03-22 08:52:16 +08:00
2cb652a7fa [FIX](compile)fix for gcc compile (#32508)
* fix for gcc compile
2024-03-22 08:52:16 +08:00
d7a3ff1ddf [Fix](Outfile) Fix the column type mapping in the orc/parquet file format (#32281)
| Doris Type             | Orc Type                     |  Parquet Type                |
|---------------------|--------------------|------------------------|
| Date                            | Long (logical: DATE)                 |       int32 (Logical: Date)                                        |
| DateTime                    | TIMESTAMP (logical: TIMESTAMP)    |       int96                          |
2024-03-22 08:52:16 +08:00
fd0bc720e9 [opt](information_schema) Add DEFAULT_ENCRYPTION column to schemata table (#32501) 2024-03-22 08:52:16 +08:00
6888e52365 [pipelineX](fix) Fix illegal memory access (#32602) 2024-03-22 08:52:16 +08:00
844dd8b2ce [fix](spill) should wait for merging done before read agg result (#32537) 2024-03-22 08:52:16 +08:00
fd62af82d2 [enhancement](mow) Add bvar for bloom filter and segment (#32355) 2024-03-22 08:52:12 +08:00
0cde0cbf19 (invert index) modify of time series compaction policy 2024-03-22 08:16:30 +08:00
4c8aaa156a [fix](jni) remove 'push_down_predicates' and fix BE crash with decimal predicate (#32253) (#32599) 2024-03-21 14:07:50 +08:00
617cc667fe [Fix](Variant) fix variant serialize root node (#31769) 2024-03-21 14:07:50 +08:00
02ef02402a [pipelineX](debug) Add debug logs for long-running load task (#32534) 2024-03-21 14:07:50 +08:00
02430e6e53 [enhance](S3) Print the oss request id for each error s3 request (#32499) 2024-03-21 14:07:50 +08:00
7486e96b12 [improve](function) add error msg if exceeded maximum default value in repeat function (#32219)
add some error msg from repeat function, so the user could know the count is greater than default value.
2024-03-21 14:07:49 +08:00
6d076f9947 [improvement](group_comit) Add bvar to monitor the total wal count on disk (#31646) 2024-03-21 14:07:49 +08:00
09be4dc7ee [fix](random-bucket) tabletindex when there is no cached value in memory (#32336)
1. In cloud mode, get visible version is a rpc to metaservice, while
loads would get visible version for all partitions.
2. VunionNode should follow batch size.
2024-03-21 14:07:49 +08:00
06bf5541f2 [pipelineX](fix) Fix running tasks API core dump (#32503) 2024-03-21 14:07:49 +08:00
0db402e154 [expr](fix) Not to throw exception when close failed (#32287) 2024-03-21 14:07:49 +08:00
a40463617e [feature](cpu cores) get the cores when running within a cgroup. (#32370)
get the cores when running within a cgroup
2024-03-21 14:07:49 +08:00
b92a764665 [feature](function) Support for aggregate function foreach combiner for some error function (#31913)
Support for aggregate function foreach combiner for some error function
2024-03-21 14:07:49 +08:00
b6a35d68b0 [code](Refactor) Del unless filter id in runtime filter func (#32502)
Del unless filter id in runtime filter func
2024-03-21 14:07:49 +08:00
6871c964af [fix](nereids)NullSafeEqualToEqual rule only change to equal if both children are not nullable (#32374)
NullSafeEqualToEqual rule only change to equal if both children are not nullable
2024-03-21 14:07:49 +08:00
4efeb6618a [Fix](inverted index) fix inappropriate use of macro in inverted index fs directory error process (#32472) 2024-03-21 14:07:24 +08:00
50c247e08c [fix](snapshot-loader) Fix be crash caused by deref end() iterator (#32489)
The standard said that the input parameter `pos` of std::vector::erase
must be valid and dereferenceable, the `end()` iterator cannot be used
as a value of `pos`. I did some tests and the crash only occurs when the
vector is empty. Fortunately `local_files` is usually not empty.
2024-03-21 14:07:24 +08:00
612d3595e4 [improvement](spill) optimize the spilling logic of hash join operator (#32202) 2024-03-21 14:07:24 +08:00
e892774c9a [improvement](agg) streaming agg should not take too much memory when spilling enabled (#32426) 2024-03-21 14:07:24 +08:00
2196c534e8 [fix](group commit) Fix compatibility issues on serializing and deserializing wal file (#32299) 2024-03-21 14:07:24 +08:00
14c9537679 [fix](decimal) fix Arithmetic Overflow error of converting string to decimal (#32246) 2024-03-21 14:07:24 +08:00
ab512f935c [pipelineX](api) Add api for long-running tasks (#32459) 2024-03-21 14:07:24 +08:00
f99db38998 [fix](ParquetReader) Fix Parquet Reader to read int96 parquet type problem (#32394)
`hi - JULIAN_EPOCH_OFFSET_DAYS` could be negative, so we can't all use unsigned int.
2024-03-21 14:07:24 +08:00
0635a8716c [improve](group commit) Group commit support chunked stream load in flink (#32135) 2024-03-21 14:07:24 +08:00
7422f185da [Fix](smooth-upgrade) Fix incompatibility when upgrade from 2.0 to 2.1 (#32444) 2024-03-21 14:07:24 +08:00
715eed0748 [opt](like) opt LIKE and REGEXP clause with concat(col, pattern_str) (#32333)
opt LIKE and REGEXP clause with concat(col, pattern_str)
2024-03-21 14:07:24 +08:00
6ea8e51261 [Performance](join) speed up the colocate and bucket shuffle join by change rf size (#32421) 2024-03-21 14:07:24 +08:00
a5f3611b88 [Fix](Regression) DCHECK failed in runtime filter wrapper (#32446) 2024-03-21 14:07:23 +08:00
7a0b591b8f [FIX](array_agg) fix array agg with other agg function (#32387)
fix array agg with other agg function
2024-03-21 14:07:23 +08:00
a0a3a2a2ce [Fix](Variant) fix variant with not null (#32248)
ignore null bitmap for not null and make subcolumn access slots always nullable
2024-03-21 14:07:23 +08:00
590e1d52ec [pipelineX](streaming agg) Fix wrong columns produced by streaming agg (#32411)
* [pipelineX](streaming agg) Fix wrong columns produced by streaming agg

* update
2024-03-21 14:07:23 +08:00
4bf5a21ba3 [pipelineX](cancel) Remove lock for mapping query ctx to fragment (#32346) 2024-03-21 14:07:23 +08:00