Commit Graph

7819 Commits

Author SHA1 Message Date
0e248e3594 [fix](inverted index) Corrected the issue of no_index_match failure caused by empty data #37947 (#38002) 2024-07-18 10:04:36 +08:00
38885d4b00 [fix](load) fix memtable agg functions (#38017) (#38021)
backport #38017
2024-07-17 23:04:57 +08:00
3d5043817a Revert "[opt](serde)Optimize the filling of fixed values ​​into block columns without repeated deserialization. (#37377)" (#38007)
Reverts apache/doris#37530
Need more test, revert it temporarily
2024-07-17 21:44:25 +08:00
1875267796 [fix](routine-load) fix routine load pause when Kafka data deleted after TTL (#37288) (#37983)
pick (#37288)

When using routine load, After the data load is completed, the lag is
still a positive number:
```
  Lag: {"0":16,"1":15,"2":16,"3":16,"4":16,"5":16,"6":15,"7":16,"8":16,"9":16,"10":15,"11":16,"12":15,"13":15,"14":16,"15":16,"16":17,"17":15,"18":16,"19":15,"20":16,"21":16,"22":16,"23":16,"24":15,"25":17,"26":17,"27":16,"28":16,"29":16,"30":16,"31":17,"32":14,"33":16,"34":17,"35":16,"36":15,"37":15,"38":15,"39":16,"40":16,"41":16,"42":15,"43":15,"44":17,"45":16,"46":15,"47":15,"48":16,"49":17,"50":16,"51":15,"52":16,"53":15,"54":15,"55":17,"56":16,"57":17,"58":16,"59":16,"60":15,"61":15,"62":16,"63":16,"64":17,"65":16,"66":15,"67":16,"68":17,"69":16,"70":15,"71":17}
```
and the routing load is paused when the Kafka data reaches TTL and is
deleted, the error is `out of range`.

The reason why this happened is EOF has it offset which needed
statistics.

**note(important):**
After the bug is fixed, if you set 
```
"property.enable.partition.eof" = "false"
```
in your routine load job, it will meet the problem. For EOF has offset,
and the config is true in Doris default.
2024-07-17 13:47:26 +08:00
Pxl
db0a43bad2 [Chore](exchange) change LocalExchangeSharedState:mem_usage signed ty… (#37981)
pick from #36682
2024-07-17 13:46:51 +08:00
33b379a51d [bug](join) remove broadcast join check about shared hashtable signal (#37969)
## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-17 12:26:19 +08:00
b6e5281a1c [Fix](bug) fix the divide zero in local shuffle: (#37948)
## Proposed changes

cherry pick #37906 

<!--Describe your changes.-->
2024-07-17 01:03:53 +08:00
21c6b854f7 [fix](explode-json-object)fix explode json object (#37956)
## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-17 01:03:07 +08:00
359e50fc58 [fix](load) change tablet schema pointer to shared_ptr in memtable (#37927) (#37939)
backport #37927
2024-07-16 22:32:03 +08:00
b15ccdbe98 [Pick](Variant) pick some fix (#37922)
#37674
#37839
#37883 
#37857 
#37794
2024-07-16 21:38:47 +08:00
cc85f7b94c [fix](build index)Remove index_meta in tablet schema when the index is dropped. (#37646) (#37897) 2024-07-16 20:32:30 +08:00
36337c8bd9 [fix](multicast) should not ignore Status of block::merge #35886 (#37869)
## Proposed changes

BP #35886
2024-07-16 19:03:24 +08:00
8440303b91 [fix](delete) Incorrect precision detection for the decimal type in condition.​ (#37801) (#37904)
## Proposed changes

pick #37801

For precision like Decimal(7,7), the value "0.1234567" should be
valid(the integer part is 0).

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-16 19:02:02 +08:00
9861f81630 [branch-2.1](memory) Fix Jemalloc Cache Memory Tracker (#37905)
pick #37464
2024-07-16 19:01:31 +08:00
cc6ff12097 [opt](function) Optimize the trim function for single-char inputs (#3… (#37799)
https://github.com/apache/doris/pull/36497

before
```
mysql [test]>select count(ltrim(str,"1")) from stringDb2;
+------------------------+
| count(ltrim(str, '1')) |
+------------------------+
|               64000000 |
+------------------------+
1 row in set (7.79 sec)
```

now
```
mysql [test]>select count(ltrim(str,"1")) from stringDb2;
+------------------------+
| count(ltrim(str, '1')) |
+------------------------+
|               64000000 |
+------------------------+
1 row in set (0.73 sec)
```

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-16 17:52:52 +08:00
1f779ba9de [branch-2.1](arrow-flight-sql) Open regression-test/pipeline/p0/arrow_flight_sql (#37727)
pick #36854
2024-07-16 16:23:43 +08:00
Pxl
c1b1437fc3 [Bug](bitmap) clear set when bitmap fastunion (#37816) (#37875)
pick from #37816
2024-07-16 16:01:32 +08:00
02716598d4 [Fix](sql function) memory overflow to the left of string address when do_money_format has small negative value #36226 (#37870)
cherry pick from #36226

Co-authored-by: sparrow <38098988+biohazard4321@users.noreply.github.com>
2024-07-16 15:04:42 +08:00
Pxl
9bb37e3616 [Bug](runtime-filter) try to fix DCHECK fail on _acquire_runtime_filter (#37805)
## Proposed changes
pick from https://github.com/apache/doris/pull/35422
2024-07-16 14:37:54 +08:00
8e42871228 [fix](in expr) fix error result when in expr has null value and lef… (#37800)
https://github.com/apache/doris/pull/36024

## Proposed changes

```
create table t2 (id int, c1 int);
insert into t2 values(1, null);
 select 0 in (c1, null) from t2; -- should return null,but 1
```

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-16 14:04:35 +08:00
253f929234 [cherry-pick](branch-2.1) fix inverted index file size (#37836)
## Proposed changes

pick from master #37232
pick from master #37564
2024-07-16 11:28:47 +08:00
Pxl
d7e84b7ee3 [Enchancement](bitmap) optimize bitmap deserialize and remove some unused code (#37623)
## Proposed changes
pick from #35789
2024-07-16 11:21:54 +08:00
47096f2083 [test](regression) add cases for data quality error url (#34987) (#37777)
cherry-pick #34987
2024-07-16 11:12:52 +08:00
3cb1d4e842 [feature](json)support explode_json_object func #36887 (#37378) 2024-07-16 10:59:11 +08:00
e64f2997e9 [fix](function) fix core when input not null array in foreach functio… (#37798)
## Proposed changes
https://github.com/apache/doris/pull/37349
error code
```C++
return creator_without_type::create<AggregateFunctionForEach>(transform_arguments, true,
                                                                      nested_function);
```
"transform_arguments is an internal type of array. All internal types of
the array are null, so an array that is not null was mistakenly treated
as a null array."

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-16 10:57:11 +08:00
6932eef65e [opt](serde)Optimize the filling of fixed values ​​into block columns without repeated deserialization. (#37377) (#37530)
bp #37377
2024-07-16 10:56:13 +08:00
0080815d11 [cherry-pick](branch-2.1)fix be metric doris_be_process_thread_num is zero (#36719)
## Proposed changes

Issue Number: close #xxx

cherry-pick #35511

<!--Describe your changes.-->
2024-07-16 10:43:06 +08:00
e7a001c420 [enhance](mtmv)support partition tvf (#37795)
pick from: https://github.com/apache/doris/pull/36479 and
https://github.com/apache/doris/pull/37201
2024-07-16 09:27:44 +08:00
aa2b902633 [cherry-pick](branch-21) fix broadcast join running when hash table build not finished (#37844)
cherry-pick from master https://github.com/apache/doris/pull/37792

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-16 09:20:06 +08:00
1d49d386aa [cherry-pick](branch-21) remove the useless code in column vector (#34432) (#37827)
cherry-pick from master https://github.com/apache/doris/pull/34432

Co-authored-by: HappenLee <happenlee@hotmail.com>
2024-07-15 22:10:58 +08:00
Pxl
010d9d88f8 [Feature](rpc) support set brpc_idle_timeout_sec and enable thrift so… (#37808)
pick from #37333
2024-07-15 21:12:25 +08:00
03e21dddff [cherry-pick](branch-21) fix cast string to int return wrong result (#36788) (#37803)
## Proposed changes
cherry-pick from master:
https://github.com/apache/doris/pull/36788
https://github.com/apache/doris/pull/36505

<!--Describe your changes.-->
2024-07-15 18:48:49 +08:00
Pxl
e5219467dd [Bug](join) avoid overflow on bucket_size+1 (#37807)
## Proposed changes
pick from #37493
2024-07-15 18:47:36 +08:00
b7dbd5c186 [feature](inverted index) add ordered functionality to match_phrase query (#37751)
## Proposed changes

1. select count() from tbl where b match_phrase 'the brown ~2+';
2024-07-15 18:42:48 +08:00
967173d7d0 [cherry-pick-2.1](table-function) pick some table functions exec performance (#34090) (#37778)
## Proposed changes

pick from master:
https://github.com/apache/doris/pull/33904
https://github.com/apache/doris/pull/34090

Co-authored-by: HappenLee <happenlee@hotmail.com>
2024-07-15 17:15:56 +08:00
a4d37d96ca [opt](file-scanner) add not found file number in profile (#37042) (#37764)
bp #37042
2024-07-15 17:11:06 +08:00
e5339a4014 [feature](ES Catalog)Support control scroll level by config #37180 (#37290)
## Proposed changes

backport #37180
2024-07-15 16:41:38 +08:00
8df2432e94 [fix](inverted index) implementation of match function without index #36471 (#36918) 2024-07-15 16:19:41 +08:00
8360e3f6cf [fix](sleep) sleep with character const make be crash (#37681) (#37775)
cherry-pick #37681 to branch-2.1
2024-07-15 14:57:46 +08:00
de61887cdc [chore](log) reduce print warning msg during be starting up #36710 (#37780)
cherry pick from #36710
2024-07-15 14:46:54 +08:00
79f6b647d5 [FIX] should check fe host standing when coordinator is not found. (#37772)
fix https://github.com/apache/doris/pull/37707
2024-07-15 12:27:31 +08:00
232202b71f [improve](load) reduce memory reserved in memtable limiter (#37511) (#37699)
cherry-pick #37511
2024-07-15 11:09:09 +08:00
2759383365 [branch-2.1](timezone) refactor tzdata load to accelerate and unify timezone parsing (#37062) (#37269)
pick https://github.com/apache/doris/pull/37062

1. revert https://github.com/apache/doris/pull/25097. we decide to rely
on OS. not maintain independent tzdata anymore to keep result
consistency
2. refactor timezone load. removed rwlock.

before:
```sql
mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates;
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) | count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
|                                                                            16000000 |                                               16000000 |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
1 row in set (6.88 sec)
```
now:
```sql
mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates;
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) | count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
|                                                                            16000000 |                                               16000000 |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
1 row in set (2.61 sec)
```
3. now don't support timezone offset format string like 'UTC+8', like we
already said in
https://doris.apache.org/docs/dev/query/query-variables/time-zone/#usage
4. support case-insensitive timezone parsing in nereids.
5. a bug when parse timezone using nereids. should check DST by input,
but wrongly by now before. now fixed.

doc pr: https://github.com/apache/doris-website/pull/810
2024-07-15 10:56:48 +08:00
351ba4aeb2 [opt](spill) handle oom exception in spill tasks (#35025) (#35171) 2024-07-15 10:33:33 +08:00
31b3afa2c8 [fix](pipeline) fix exception safety issue in MultiCastDataStreamer (#36814)
## Proposed changes

pick #36748

```cpp
RETURN_IF_ERROR(vectorized::MutableBlock(block).merge(*pos_to_pull->_block))
```
this line may throw an exception(cannot allocate)

```
*** Query id: b7b80bfd76cc42a5-a9916f8364d5a4d3 ***
*** tablet id: 0 ***
*** Aborted at 1719187603 (unix time) try "date -d @1719187603" if you are using GNU date ***
*** Current BE git commitID: a8c48f5328 ***
*** SIGSEGV address not mapped to object (@0x47) received by PID 1197117 (TID 1197376 OR 0x7f49a25e4640) from PID 71; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/doris_branch-2.0/doris/be/src/common/signal_handler.h:417
 1# os::Linux::chained_handler(int, siginfo_t*, void*) in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
 3# signalHandler(int, siginfo_t*, void*) in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
 4# 0x00007F4ABB927520 in /lib/x86_64-linux-gnu/libc.so.6
 5# std::default_delete<doris::vectorized::Block>::operator()(doris::vectorized::Block*) const at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:85
 6# doris::pipeline::MultiCastDataStreamer::close_sender(int) at /root/doris_branch-2.0/doris/be/src/pipeline/exec/multi_cast_data_streamer.cpp:60
 7# doris::pipeline::MultiCastDataStreamerSourceOperator::close(doris::RuntimeState*) at /root/doris_branch-2.0/doris/be/src/pipeline/exec/multi_cast_data_stream_source.cpp:120
 8# doris::pipeline::PipelineTask::close() at /root/doris_branch-2.0/doris/be/src/pipeline/pipeline_task.cpp:334
 9# doris::pipeline::TaskScheduler::_try_close_task(doris::pipeline::PipelineTask*, doris::pipeline::PipelineTaskState) at /root/doris_branch-2.0/doris/be/src/pipeline/task_scheduler.cpp:353
10# doris::pipeline::TaskScheduler::_do_work(unsigned long) in /mnt/disk1/STRESS_ENV/be/lib/doris_be
11# doris::ThreadPool::dispatch_thread() in /mnt/disk1/STRESS_ENV/be/lib/doris_be
12# doris::Thread::supervise_thread(void*) at /root/doris_branch-2.0/doris/be/src/util/thread.cpp:499
13# start_thread at ./nptl/pthread_create.c:442
14# 0x00007F4ABBA0B850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
```

<!--Describe your changes.-->
2024-07-15 10:32:20 +08:00
9556c07a16 [mac](compile) fix compile error on mac (#37726) 2024-07-15 10:19:42 +08:00
8de13c5cc8 [fix](function) error scale set in unix_timestamp (#36110) (#37619)
## Proposed changes

```
mysql [test]>set DEBUG_SKIP_FOLD_CONSTANT = true;
Query OK, 0 rows affected (0.00 sec)

mysql [test]>select cast(unix_timestamp("2024-01-01",'yyyy-MM-dd') as bigint);
+------------------------------------------------------------+
| cast(unix_timestamp('2024-01-01', 'yyyy-MM-dd') as BIGINT) |
+------------------------------------------------------------+
|                                           1704038400000000 |
+------------------------------------------------------------+
```
now
```
mysql [test]>select cast(unix_timestamp("2024-01-01",'yyyy-MM-dd') as bigint);
+------------------------------------------------------------+
| cast(unix_timestamp('2024-01-01', 'yyyy-MM-dd') as BIGINT) |
+------------------------------------------------------------+
|                                                 1704038400 |
+------------------------------------------------------------+
1 row in set (0.01 sec)
```

The column does not have a scale set, but the cast uses the scale to
perform the cast.


<!--Describe your changes.-->

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-15 10:00:04 +08:00
b55dd6f644 [fix](delete) fix the error message for valid decimal data for 2.1 (#37710)
## Proposed changes

cherry-pick : #36802

<!--Describe your changes.-->
2024-07-15 09:54:42 +08:00
747172237a [branch-2.1](memory) Pick some memory GC patch (#37725)
pick
#36768
#37164
#37174
#37525
2024-07-14 15:19:40 +08:00
5162789234 [Refactor](Variant) make many insterfaces exception safe (#37640) (#37719) 2024-07-13 16:52:10 +08:00