Commit Graph

3453 Commits

Author SHA1 Message Date
75eded04d7 [Bug](compatibility) fix window funnel function coredump when upgrade (#39646)
## Proposed changes
this PR https://github.com/apache/doris/pull/39270 have change the agg
of window funnel
and max_be_exec_version is update to 5, in order to compatibility of the
agg function when upgrade.

<!--Describe your changes.-->
2024-08-21 08:46:50 +08:00
a3fd13fee6 [fix](catalog) set timeout for split fetch (#39346) (#39624)
bp #39346
2024-08-20 21:59:55 +08:00
12ed2951c4 [fix] (inverted index) remove tmp columns in block (#39369) (#39533) 2024-08-20 20:53:23 +08:00
5fcd6e6270 [Fix](load) Fix the incorrect src value printed in the error log when strict mode is true #39447 (#39587)
cherry pick from #39447
2024-08-20 12:02:13 +08:00
273a62584c [opt](inverted index) unified optimization judgment to prevent omissions (#39473)
https://github.com/apache/doris/pull/38027
2024-08-17 16:57:19 +08:00
b0da8430bc [opt](inverted index) Optimize the usage of the multi_match function (#39472)
## Proposed changes

https://github.com/apache/doris/pull/39193

<!--Describe your changes.-->
2024-08-17 16:53:52 +08:00
79aa079cc6 [fix](array-funcs) array min/max #39307 (#39484) 2024-08-17 10:56:44 +08:00
7687f2c53a [fix](ip-funcs) fix ip inet6_aton funcs #39415 (#39513) 2024-08-17 10:56:06 +08:00
824f035b98 [pick](Row store) fix row store with invalid json string in variant ty… (#39456)
#39394
2024-08-16 14:43:11 +08:00
021678c7c3 [fix](window_funnel) fix wrong result of window_funnel #38954 (#39270)
## Proposed changes

BP #38954
2024-08-16 09:59:31 +08:00
a44a274563 [Fix](parquet-reader) Fix and optimize parquet min-max filtering. (#39375)
Backport #38277.
2024-08-15 14:12:54 +08:00
acf07cab6f [refactor](minor) Init counter in prepare phase (#39287) (#39385)
pick #39287
2024-08-15 13:36:12 +08:00
226e01889c [fix](array_apply) pick array apply fix (#39328)
## Proposed changes
backport: https://github.com/apache/doris/pull/39105
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-14 18:52:29 +08:00
78d6e318fb [fix](ip)pick ip rowstore (#39345)
## Proposed changes
backport: https://github.com/apache/doris/pull/39258
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-14 18:51:58 +08:00
b26af32934 [fix](function) fix error return type in corr(float32,float32) (#39251) (#39350)
https://github.com/apache/doris/pull/39251
```
mysql [test11]>select corr(cast(x as float),cast(y as float)) from test_corr;
ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]column_type not match data_types in agg node, column_type=Nullable(Float64), data_types=Nullable(Float32),column name=

```

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-14 18:47:14 +08:00
677435cef8 [Pick](Branch-2.1) pick json reader fix and support specify $. as column (#39271)
#39206
#38213
2024-08-13 17:44:45 +08:00
7e7729c4b0 [cherry-pick](branch-21) fix partition-topn calculate partition input rows have error (#39100) (#39281)
## Proposed changes

cherry-pick from master: #39100 

<!--Describe your changes.-->
2024-08-13 17:42:29 +08:00
b976cbf14d [opt](log) avoid lots of json parse error logs (#39190) (#39246)
pick https://github.com/apache/doris/pull/39190 to branch-2.1
2024-08-13 17:01:10 +08:00
60eeec3754 [fix] (inverted index) Fix match function without inverted index (#38989) (#39220)
## Proposed changes

pick from #38989
2024-08-13 10:55:54 +08:00
a6155a517d [fix] (topn) fix uncleared block in topn_next() (#39119) (#39224)
## Proposed changes

pick from master #39119
2024-08-13 10:34:17 +08:00
96260d97bd [fix](ub) undefined behavior in FixedContainer (#39191) (#39201)
## Proposed changes

pick #39191
Undefined behavior occurs if there is a null value in the list.

```
/root/doris/be/src/vec/common/string_ref.h:271:54: runtime error: null pointer passed as argument 2, which is declared to never be null
/var/local/ldb-toolchain/bin/../usr/include/string.h:64:33: note: nonnull attribute specified here
#0 0x5616d072245d in doris::StringRef::eq(doris::StringRef const&) const /root/doris/be/src/vec/common/string_ref.h:271:41
#1 0x5616d072245d in doris::StringRef::operator==(doris::StringRef const&) const /root/doris/be/src/vec/common/string_ref.h:274:60
#2 0x5616d072245d in doris::FixedContainer::find(doris::StringRef const&) const /root/doris/be/src/exprs/hybrid_set.h:76:36
#3 0x5616d072245d in void doris::StringValueSet>::_find_batch(doris::vectorized::IColumn const&, unsigned long, doris::vectorized::PODArray, 16ul, 15ul> const*, doris::vectorized::PODArray, 16ul, 15ul>&) /root/doris/be/src/exprs/hybrid_set.h:688:63
#4 0x5616d0747857 in doris::vectorized::FunctionIn::execute_impl(doris::FunctionContext*, doris::vectorized::Block&, std::vector> const&, unsigned long, unsigned long) const /root/doris/be/src/vec/functions/in.h:170:21
#5 0x5616c741fa3a in doris::vectorized::DefaultExecutable::execute_impl(doris::FunctionContext*, doris::vectorized::Block&, std::vector> const&, unsigned long, unsigned long) const /root/doris/be/src/vec/functions/function.h:462:26
#6 0x5616cbb5b650 in doris::vectorized::PreparedFunctionImpl::_execute_skipped_constant_deal(doris::FunctionContext*, doris::vectorized::Block&, std::vector> const&, unsigned long, unsigned long, bool) const /root/doris/be/src/vec/functions/function.cpp
#7 0x5616cbb4e14e in doris::vectorized::PreparedFunctionImpl::execute_without_low_cardinality_columns(doris::FunctionContext*, doris::vectorized::Block&, std::vector> const&, unsigned long, unsigned long, bool) const /root/doris/be/src/vec/functions/function.cpp:244:12
#8 0x5616cbb4e3c2 in doris::vectorized::PreparedFunctionImpl::execute(doris::FunctionContext*, doris::vectorized::Block&, std::vector> const&, unsigned long, unsigned long, bool) const /root/doris/be/src/vec/functions/function.cpp:250:12
#9 0x5616c741cd68 in doris::vectorized::IFunctionBase::execute(doris::FunctionContext*, doris::vectorized::Block&, std::vector> const&, unsigned long, unsigned long, bool) const /root/doris/be/src/vec/functions/function.h:190:19
#10 0x5616c74cf712 in doris::vectorized::VInPredicate::execute(doris::vectorized::VExprContext*, doris::vectorized::Block*, int*) /root/doris/be/src/vec/exprs/vin_predicate.cpp:130:5
#11 0x5616c740d5c0 in doris::vectorized::VectorizedFnCall::_do_execute(doris::vectorized::VExprContext*, doris::vectorized::Block*, int*, std::vector>&) /root/doris/be/src/vec/exprs/vectorized_fn_call.cpp:183:9
#12 0x5616c740ecf5 in doris::vectorized::VectorizedFnCall::execute(doris::vectorized::VExprContext*, doris::vectorized::Block*, int*) /root/doris/be/src/vec/exprs/vectorized_fn_call.cpp:215:12
#13 0x5616c7462e24 in doris::vectorized::VCompoundPred::execute(doris::vectorized::VExprContext*, doris::vectorized::Block*, int*) /root/doris/be/src/vec/exprs/vcompound_pred.h:127:38
#14 0x5616c74bccec in doris::vectorized::VExprContext::execute(doris::vectorized::Block*, int*) /root/doris/be/src/vec/exprs/vexpr_context.cpp:54:5
#15 0x5616c74c1dcc in doris::vectorized::VExprContext::execute_conjuncts(std::vector, std::allocator>> const&, std::vector, 16ul, 15ul>, std::allocator, 16ul, 15ul>>> const*, bool, doris::vectorized::Block*, doris::vectorized::PODArray, 16ul, 15ul>, bool) /root/doris/be/src/vec/exprs/vexpr_context.cpp:169:9
#16 0x5616c74c5108 in doris::vectorized::VExprContext::execute_conjuncts_and_filter_block(std::vector, std::allocator>> const&, doris::vectorized::Block*, std::vector>&, int, doris::vectorized::PODArray, 16ul, 15ul>&) /root/doris/be/src/vec/exprs/vexpr_context.cpp:322:5
#17 0x5616ad8a7f1a in doris::segment_v2::SegmentIterator::_execute_common_expr(unsigned short*, unsigned short&, doris::vectorized::Block*) /root/doris/be/src/olap/rowset/segment_v2/segment_iterator.cpp:2680:5
#18 0x5616ad89e86e in doris::segment_v2::SegmentIterator::_next_batch_internal(doris::vectorized::Block*) /root/doris/be/src/olap/rowset/segment_v2/segment_iterator.cpp:2582:25
#19 0x5616ad892f5c in doris::segment_v2::SegmentIterator::next_batch(doris::vectorized::Block*)::$_0::operator()() const /root/doris/be/src/olap/rowset/segment_v2/segment_iterator.cpp:2315:9
#20 0x5616ad892f5c in doris::segment_v2::SegmentIterator::next_batch(doris::vectorized::Block*) /root/doris/be/src/olap/rowset/segment_v2/segment_iterator.cpp:2314:19
#21 0x5616ad6dd9cc in doris::segment_v2::LazyInitSegmentIterator::next_batch(doris::vectorized::Block*) /root/doris/be/src/olap/rowset/segment_v2/lazy_init_segment_iterator.h:44:33
#22 0x5616ad269d67 in doris::BetaRowsetReader::next_block(doris::vectorized::Block*) /root/doris/be/src/olap/rowset/beta_rowset_reader.cpp:380:29
#23 0x5616de6de110 in doris::vectorized::VCollectIterator::Level0Iterator::_refresh() /root/doris/be/src/vec/olap/vcollect_iterator.h
#24 0x5616de6c967f in doris::vectorized::VCollectIterator::Level0Iterator::refresh_current_row() /root/doris/be/src/vec/olap/vcollect_iterator.cpp:514:24
#25 0x5616de6ca8a6 in doris::vectorized::VCollectIterator::Level0Iterator::ensure_first_row_ref() /root/doris/be/src/vec/olap/vcollect_iterator.cpp:493:14
#26 0x5616de6d7008 in doris::vectorized::VCollectIterator::Level1Iterator::ensure_first_row_ref() /root/doris/be/src/vec/olap/vcollect_iterator.cpp:692:27
#27 0x5616de6bd200 in doris::vectorized::VCollectIterator::build_heap(std::vector, std::allocator>>&) /root/doris/be/src/vec/olap/vcollect_iterator.cpp:186:9
#28 0x5616de651b6c in doris::vectorized::BlockReader::_init_collect_iter(doris::TabletReader::ReaderParams const&) /root/doris/be/src/vec/olap/block_reader.cpp:157:5
#29 0x5616de65526f in doris::vectorized::BlockReader::init(doris::TabletReader::ReaderParams const&) /root/doris/be/src/vec/olap/block_reader.cpp:229:19
#30 0x5616e175a0f9 in doris::vectorized::NewOlapScanner::open(doris::RuntimeState*) /root/doris/be/src/vec/exec/scan/new_olap_scanner.cpp:237:32
#31 0x5616c736ad34 in doris::vectorized::ScannerScheduler::_scanner_scan(std::shared_ptr, std::shared_ptr) /root/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:236:5
#32 0x5616c736f05e in doris::vectorized::ScannerScheduler::submit(std::shared_ptr, std::shared_ptr)::$_1::operator()() const::'lambda'()::operator()() const::'lambda'()::operator()() const /root/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:176:21
#33 0x5616c736f05e in doris::vectorized::ScannerScheduler::submit(std::shared_ptr, std::shared_ptr)::$_1::operator()() const::'lambda'()::operator()() const /root/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:175:31
#34 0x5616c736f05e in void std::_invoke_impl, std::shared_ptr)::$_1::operator()() const::'lambda'()&>(std::_invoke_other, doris::vectorized::ScannerScheduler::submit(std::shared_ptr, std::shared_ptr)::$_1::operator()() const::'lambda'()&) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14
#35 0x5616c736f05e in std::enable_if, std::shared_ptr)::$1::operator()() const::'lambda'()&>, void>::type std::_invoke_r, std::shared_ptr)::$_1::operator()() const::'lambda'()&>(doris::vectorized::ScannerScheduler::submit(std::shared_ptr, std::shared_ptr)::$_1::operator()() const::'lambda'()&) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:111:2
#36 0x5616c736f05e in std::_Function_handler, std::shared_ptr)::$_1::operator()() const::'lambda'()>::_M_invoke(std::_Any_data const&) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291:9
#37 0x5616aeed6a3b in doris::ThreadPool::dispatch_thread() /root/doris/be/src/util/threadpool.cpp:543:24
#38 0x5616aeeae4f7 in doris::Thread::supervise_thread(void*) /root/doris/be/src/util/thread.cpp:498:5
#39 0x7f7e663e3ac2 in start_thread nptl/pthread_create.c:442:8
#40 0x7f7e6647584f misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /root/doris/be/src/vec/common/string_ref.h:271:54 in
```

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-12 10:02:07 +08:00
b38caed808 [Improve](columns)replace fatal with exception #38035 (#38996) 2024-08-12 09:51:30 +08:00
3da2d1c9d6 [bug](parquet)Fix the problem that the parquet reader reads the missing sub-columns of the struct and fails. (#38718) (#39192)
bp #38718
2024-08-11 20:37:40 +08:00
5f77f909d9 [cherry-pick](branch-2.1) Pick "[feature](function) support ip functions named ipv4_to_ipv6 and cut_ipv6" (#39058)
## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
pick https://github.com/apache/doris/pull/36883 and
https://github.com/apache/doris/pull/35239
2024-08-10 18:37:11 +08:00
5e1e725cee [feature](inverted index) Add multi_match function #37722 #38931 #39149 (#38877) 2024-08-10 15:20:08 +08:00
e15b6cfc68 [fix](be) return correct canceled status from scanner (#36392) (#39111)
## Proposed changes

pick #36392
2024-08-09 04:02:42 +08:00
0571342538 [fix](sink) The issue with 2GB limit of protocol buffer (#37990) (#39112)
```
Fail to serialize doris.PFetchDataResult
```

If the size of `PFetchDataResult` is greater than 2G, protocol buffer
cannot serialize the message.

pick #37990
2024-08-09 04:01:56 +08:00
efdd75f286 [fix](function) stddev with DecimalV2 type will result in an error (#… (#39072)
…38731)

https://github.com/apache/doris/pull/38731
The stddev function has a separate implementation for the DecimalV2
type, but there are issues with the implementation. Given that there is
almost no existing data for DecimalV2, it will be removed here. For be,
upgrading to this situation will result in an error directly.
```
SELECT STDDEV(data) FROM DECIMALV2_10_0_DATA;
ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]Agg Function stddev(decimal(10,0)) is not implemented
```
After removing DecimalV2, parameters of type DecimalV2 will be converted
to double for calculations.

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-08 17:53:17 +08:00
44cb7978a9 [opt](index) add more inverted index profile metrics #36696 (#38858) 2024-08-08 14:16:55 +08:00
0a3874f203 [fix](move-memtable) close stream when cancel load stream stub (#38912) (#39039)
backport #38912
2024-08-07 23:24:00 +08:00
773008d6fa [Fix](Json) fix some cast issue (#38683) (#39025)
#38683
2024-08-07 22:05:43 +08:00
7550fbaff7 [Fix](Exception) throw exception in defer may result std::terminate (… (#39007)
pick #38935
2024-08-07 13:46:23 +08:00
8cb5aa64f4 [test](inverted index) add an Inverted Index Testing Switch (#38077) (#38947)
https://github.com/apache/doris/pull/38077
2024-08-07 11:25:36 +08:00
e400859531 [fix](update null map) Fix update_null_map #38787 (#38920)
cherry pick from #38787
2024-08-07 10:21:41 +08:00
bc644cb253 [opt](catalog) merge scan range to avoid too many splits (#38311) (#38964)
bp #38311
2024-08-06 21:57:02 +08:00
3abb222064 [fix](group commit) Fix test_group_commit_async_wal_msg_fault_injection case (#35313) (#38911)
pick https://github.com/apache/doris/pull/35313
2024-08-06 17:57:22 +08:00
ff6fa33021 [opt](inverted index) mow supports index optimization #(#38180)
## Proposed changes

https://github.com/apache/doris/pull/37428
https://github.com/apache/doris/pull/37429

<!--Describe your changes.-->
2024-08-06 11:18:13 +08:00
c7b59b38ef [fix](hist) Fix unstable result of aggregrate function hist #38608 (#38893)
cherry pick from #38608
2024-08-06 08:52:03 +08:00
70a518e099 [Fix](multi-catalog) Fix not throw error when call close() in hive/iceberg writer. (#38902)
## Proposed changes
[Fix] (multi-catalog) Fix not throw error when call close() in
hive/iceberg writer.

When the file writer closes(), it will sync buffer to commit. Therefore,
sometimes data is written only when close() is called, which can expose
some errors. For example, hdfs_file_writer. Therefore, this error needs
to be captured in the entire close process.
2024-08-06 08:51:12 +08:00
65154f8abe [branch-2.1] (doris-future) Support auto partition name function (#38853)
cherry-pick https://github.com/apache/doris/pull/34258 to branch-2.1
2024-08-05 16:04:24 +08:00
Pxl
86ef0069ea [Feature](function) support group concat with distinct and order by (#38851)
pick from #38744 and #38776
2024-08-05 15:44:51 +08:00
607c0b82a9 [opt](serde)Optimize the filling of fixed values ​​into block columns without repeated deserialization. (#37377) (#38245) (#38810)
## Proposed changes
pick pr: #38575  and fix this pr bug :  #38245
2024-08-05 09:13:08 +08:00
2653087843 [pick](array-funcs)fix array with empty arg in be behavior (#38708)
## Proposed changes
backport: https://github.com/apache/doris/pull/36845
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-05 09:08:28 +08:00
5d02c48715 [feature](hive)Support reading renamed Parquet Hive and Orc Hive tables. (#38432) (#38809)
bp #38432 

## Proposed changes
Add `hive_parquet_use_column_names` and `hive_orc_use_column_names`
session variables to read the table after rename column in `Hive`.

These two session variables are referenced from
`parquet_use_column_names` and `orc_use_column_names` of `Trino` hive
connector.

By default, these two session variables are true. When they are set to
false, reading orc/parquet will access the columns according to the
ordinal position in the Hive table definition.

For example:
```mysql
in Hive :
hive> create table tmp (a int , b string) stored as parquet;
hive> insert into table tmp values(1,"2");
hive> alter table tmp  change column  a new_a int;
hive> insert into table tmp values(2,"4");

in Doris :
mysql> set hive_parquet_use_column_names=true;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|  NULL | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)

mysql> set hive_parquet_use_column_names=false;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|     1 | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)
```

You can use `set
parquet.column.index.access/orc.force.positional.evolution = true/false`
in hive 3 to control the results of reading the table like these two
session variables. However, for the rename struct inside column parquet
table, the effects of hive and doris are different.
2024-08-05 09:06:49 +08:00
0603ec1d9d [enhancement](compaction) optimizing memory usage for compaction (#37099) (#37486) 2024-08-04 10:49:18 +08:00
7bdc508ac7 [Bug](fix) fix coredump case in (not null, null) execpt (not null, not null) case (#38756)
## Proposed changes

Issue Number: close #38612

<!--Describe your changes.-->
2024-08-04 10:44:10 +08:00
556f0fc784 [pick](json-keys) support json_keys function (#38631)
## Proposed changes
backport: https://github.com/apache/doris/pull/36411
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-02 19:10:00 +08:00
9b07cd2069 [pick](json-serde)pick jsonb string deserialize with spec char (#38711)
## Proposed changes
backport: https://github.com/apache/doris/pull/37176
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-02 13:37:41 +08:00
1d982ada45 [pick](array-funcs)pick array func array_enumerate_uniq bugfix (#38721)
## Proposed changes
backport: https://github.com/apache/doris/pull/38384
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-02 11:25:17 +08:00
f5bc65989c [pick](array-range)improve array_range func for large param (#38707)
## Proposed changes
backport: https://github.com/apache/doris/pull/38284
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-02 11:22:46 +08:00