Commit Graph

8441 Commits

Author SHA1 Message Date
157d67e7ca [enhance](hive) Add regression-test cases for hive text ddl and hive text insert and fix reading null string bug #42200 (#42273)
cherry pick from #42200

Co-authored-by: Socrates <suxiaogang223@icloud.com>
2024-10-22 23:56:57 +08:00
bde8e2d474 [2.1][improvement](jdbc catalog) Add catalog property to enable jdbc connection pool (#42255)
pick (#41992)

We initially introduced jdbc connection pool to improve the connection
performance of jdbc catalog, but we always found that connection pool
would bring some unexpected errors, so we chose to add a catalog
property: `enable_connection_pool` to choose whether to enable the jdbc
connection pool of jdbc catalog, and the default false.However, the
created catalog will still open the connection pool when it is upgraded,
and only the newly created catalog will be false

And we conducted performance tests on this, the performance loss is
within the expected range.

- Enable connection pool: mysqlslap -uroot -h127.0.0.1 -P9030
--concurrency=1 --iterations=100 --query='SELECT * FROM mysql.test.test
limit 1;' --create-schema=mysql --delimiter=";" --verbose
Benchmark
        Average number of seconds to run all queries: 0.008 seconds
        Minimum number of seconds to run all queries: 0.004 seconds
        Maximum number of seconds to run all queries: 0.133 seconds
        Number of clients running queries: 1
        Average number of queries per client: 1

- Disable connection pool: mysqlslap -uroot -h127.0.0.1 -P9030
--concurrency=1 --iterations=100 --query='SELECT * FROM
mysql_no_pool.test.test limit 1;' --create-schema=mysql --delimiter=";"
--verbose
Benchmark
        Average number of seconds to run all queries: 0.054 seconds
        Minimum number of seconds to run all queries: 0.047 seconds
        Maximum number of seconds to run all queries: 0.184 seconds
        Number of clients running queries: 1
        Average number of queries per client: 1
2024-10-22 23:28:28 +08:00
25d7d0b255 [fix](move-memtable) abstract multi-streams to one logical stream (#42039) (#42250)
backport #42039
2024-10-22 20:26:42 +08:00
38e529cd29 [cherry-pick](branch-2.1) support decimal256 for parquet reader (#42241)
## Proposed changes
pick pr: https://github.com/apache/doris/pull/41526
2024-10-22 19:42:09 +08:00
e2bdac39fb [fix] Implementing match_phrase_edge without index query method (#41658) (#42098)
pick from #41658
2024-10-22 18:44:21 +08:00
6f2bac012a [pick](branch-2.1) pick #39398 #41754 #41770 (#42231)
pick #39398 #41754 #41770
2024-10-22 18:05:40 +08:00
7eec0f8fbb [branch-2.1](datetime) Fix date floor functions overflow (#35477) (#42238)
pick https://github.com/apache/doris/pull/35477
2024-10-22 15:54:53 +08:00
8877267930 [pipeline](API) Add a new API to find pipeline tasks by a specific qu… (#42233)
…ery ID (#35563)

pick #35563
2024-10-22 14:03:45 +08:00
47ff6f1300 [fix](OrcReader) fix the issue that orc_reader can not read DECIMAL(0,0) type of orc file #41795 (#42220)
cherry pick from #41795

Co-authored-by: Tiewei Fang <43782773+BePPPower@users.noreply.github.com>
2024-10-22 10:10:25 +08:00
1f8d685f26 [fix](inverted index) multi_match supports any, all, phrase. (#41663) (#42097)
https://github.com/apache/doris/pull/41663
2024-10-22 10:10:02 +08:00
e713b92321 [fix](multi-catalog) Disable string dictionary filtering when predicate express is not slot #42113 (#42222)
cherry pick from #42113

Co-authored-by: Socrates <suxiaogang223@icloud.com>
2024-10-22 09:43:29 +08:00
084434e25c [Test](tvf) add regression tests for testing orc reader #41606 #42188 (#42120)
cherry pick from #42031 #42188

---------

Co-authored-by: Tiewei Fang <43782773+BePPPower@users.noreply.github.com>
Co-authored-by: TieweiFang <ftw2139@163.com>
2024-10-21 21:31:18 +08:00
a3c1657c4b [cherry-pick](branch-2.1) check end of file when reading page (#42159)
## Proposed changes
pick pr: https://github.com/apache/doris/pull/41816
2024-10-21 17:01:04 +08:00
a32ad0b1f7 [cherry-pick](branch-2.1) support reading brotli compressed parquet file (#42162)
pick pr: https://github.com/apache/doris/pull/41875
2024-10-21 16:48:09 +08:00
b9e2738ee6 [Fix](orc-reader) Fix StringRef nullptr data by add checking string_values empty. #42061 (#42154)
cherry pick from #42061

Co-authored-by: Qi Chen <kaka11.chen@gmail.com>
2024-10-21 16:26:23 +08:00
dc438649d9 [bugfix](handshake) brpc handshake should not use light pool (#42115) (#42127)
The light pool may be full. Handshake is used to check the connection
state of brpc. Should not be interfered by the thread pool logic.

---------
pick #42115

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-10-19 16:19:17 +08:00
d5fef266ec [fix](inverted index) Fix incorrect exception handling (#42094)
https://github.com/apache/doris/pull/41874
2024-10-19 10:45:32 +08:00
5db44a1b91 [fix](arrays_overlap) support arrays overlap with inverted index (#42090)
## Proposed changes
backport : https://github.com/apache/doris/pull/41286
https://github.com/apache/doris/pull/41495
Issue Number: close #xxx

<!--Describe your changes.-->
2024-10-18 22:08:39 +08:00
dde0bf92ce [fix](inverted index) Fix incorrect usage of regexp compile_err (#41944) (#42085)
https://github.com/apache/doris/pull/41944
2024-10-18 22:06:59 +08:00
460ff02997 [cherry-pick](branch-21)fix date_floor function return wrong result (#41948) (#42065)
## Proposed changes

cherry-pick from master https://github.com/apache/doris/pull/41948

<!--Describe your changes.-->
2024-10-18 21:54:22 +08:00
03136baacf [fix](scanner) Fix incorrect _max_thread_num in scanner context when many queries are running. #41273 (#42016)
cherry pick from #41273
2024-10-18 18:08:07 +08:00
fb12e10272 [fix](array-funcs)fix array agg func with decimal type (#40839) (#42023)
## Proposed changes
backport: (https://github.com/apache/doris/pull/40839)
Issue Number: close #xxx

<!--Describe your changes.-->
2024-10-17 20:47:39 +08:00
Pxl
4d04db467e [Bug](predicate) Fixed the problem that the number of rows in inlist #41824 (#41910)
pick from #41824
2024-10-17 17:13:00 +08:00
Pxl
f4d9ddcb00 [Improvement](runtime-filter) set some rf brpc request to ignore_eovercrowded #41698 (#41897)
pick from #41698
2024-10-17 16:57:26 +08:00
5806dae467 [fix](move-memtable) do not retry open streams (#41550) (#41999)
backport #41550
2024-10-17 15:56:56 +08:00
b4875c2789 [fix](jni)fix jni use timezone_obj get timezone be core. (#41956) (#42003)
bp #41956 

This PR #40225 try to pass time zone info from BE to JNI, and it use
`_state->timezone_obj().name()`
to get the timezone name.
But when we do some rolling upgrade of BE, it may coredump like:

```
*** SIGSEGV address not mapped to object (@0x610) received by PID 72661 (TID 73538 OR 0x7f2e898d1640) from PID 1552; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/common/signal_handler.h:421
 1# os::Linux::chained_handler(int, siginfo_t*, void*) in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
 3# signalHandler(int, siginfo_t*, void*) in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
 4# 0x00007F3070D3E520 in /lib/x86_64-linux-gnu/libc.so.6
 5# cctz::time_zone::name[abi:cxx11]() const in /mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
 6# doris::vectorized::JniConnector::open(doris::RuntimeState*, doris::RuntimeProfile*) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/vec/exec/jni_connector.cpp:87
 7# doris::vectorized::AvroJNIReader::init_fetch_table_schema_reader() at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/vec/exec/format/avro/avro_jni_reader.cpp:119
 8# std::_Function_handler::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
 9# doris::WorkThreadPool::work_thread(int) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/util/work_thread_pool.hpp:159
10# execute_native_thread_routine at ../../../../../libstdc++-v3/src/c++11/thread.cc:84
11# start_thread at ./nptl/pthread_create.c:442
12# 0x00007F3070E22850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
172.20.50.206 last coredump sql: 2024-10-13 04:12:23,985 [query] 
```

This PR use another method: `_state->timezone()`, which just return a
string, instead of reading and initializing
time zone info file, to avoid potential coredump.
2024-10-17 14:47:33 +08:00
67d057a711 [cherry-pick](branch-21) fix conv function parser string failure return wrong result (#40530) (#41964)
## Proposed changes

Issue Number: close #39618
cherry-pick from master (#40530)
2024-10-17 14:45:46 +08:00
0b41cd2472 [fix](serde)fix the bug in DataTypeNullableSerDe.deserialize_column_from_fixed_json (#41217) (#41960)
bp #41217 

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-10-17 14:36:01 +08:00
968e33f07e [cherry-pick](branch-21) pick (#39057) (#41352) (#41958)
## Proposed changes

pick from master (#39057) (#41352)

<!--Describe your changes.-->

---------

Co-authored-by: Mryange <59914473+Mryange@users.noreply.github.com>
2024-10-17 14:30:40 +08:00
1b901f6fcc [cherry-pick](branch-2.1) add parquet tvf cases and fix some parquet bug (#41931)
## Proposed changes
pick pr:
  https://github.com/apache/doris/pull/41683
  https://github.com/apache/doris/pull/41506
  https://github.com/apache/doris/pull/41338
  https://github.com/apache/doris/pull/39326

---------

Co-authored-by: morningman <morningman@163.com>
2024-10-17 14:20:58 +08:00
b8214952a1 [branch-2.1] Fix is_partial_update parameter is not set in append_block_with_partial_content() (#41865)
https://github.com/apache/doris/pull/41439 forgets to set
`is_partial_update` parameter for `Tablet::lookup_row_key()` in
`append_block_with_partial_content()`
2024-10-17 12:44:41 +08:00
19784d420c [opt](inverted index) Improved top-N optimization by refining the sorting column check. (#39496) (#41954)
https://github.com/apache/doris/pull/39496
2024-10-17 11:31:11 +08:00
0b6447faeb [Fix](SchemaChange) refactor variant root column iterator to make row… (#41941)
pick #41700
2024-10-17 10:39:07 +08:00
7d99d5fcc4 [fix](analytic) Fix data distribution after analytic operator (#41902) (#41949)
Fix data distribution after analytic operator

pick #41902
2024-10-16 18:41:56 +08:00
5bd33fc88c [pick](branch-2.1) pick #41292 #41350 #41589 #41628 #41743 #41601 #41667 #41751 (#41927)
## Proposed changes

pick #41292 #41350 #41589 #41628 #41743 #41601 #41667 #41751

<!--Describe your changes.-->

---------

Co-authored-by: Pxl <pxl290@qq.com>
2024-10-16 15:41:28 +08:00
e56216211e [pick](branch-2.1) pick #40667 #40714 (#41905)
pick
#40667
#40714

---------

Co-authored-by: wangbo <wangbo@apache.org>
2024-10-16 14:09:03 +08:00
e6545a36a3 [improvement](iceberg)Parallelize splits for count(*) for 2.1 (#41169) (#41880)
bp: #41169
2024-10-16 10:52:06 +08:00
b185dfcbf6 [pick](branch-2.1) pick #41676 #41740 #41857 (#41904)
pick #41676 #41740 #41857
2024-10-15 22:41:17 +08:00
b91d8e2327 [Improvement](minor) Reduce locking scope (#41845) (#41844)
pick #41845
2024-10-15 18:39:53 +08:00
78b6157aa9 [fix](ip/variant) fix information meta (#41871)
fix datatype information meta  for ip/variant (#41666)

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-10-15 18:01:14 +08:00
abcba778ff [fix](cancel) Fix cancel msg on branch-2.1 (#41798)
Make sure we can tell cancel reason from:
1. user cancel
2. timeout
3. others

```text
mysql [demo]>set query_timeout=1;
--------------
set query_timeout=1
--------------

Query OK, 0 rows affected (0.00 sec)

mysql [demo]>select sleep(5);
--------------
select sleep(5)
--------------

ERROR 1105 (HY000): errCode = 2, detailMessage = Timeout

mysql [demo]>select sleep(5);
--------------
select sleep(5)
--------------

^C^C -- sending "KILL QUERY 0" to server ...
^C -- query aborted
ERROR 1105 (HY000): errCode = 2, detailMessage = cancel query by user from 127.0.0.1:64208
```
2024-10-15 17:15:05 +08:00
77fbe6397a [fix](http) Remove file if downloading faile is failed #41778 (#41827)
cherry pick from #41778
2024-10-15 15:30:29 +08:00
94687a2f3c [fix](array/map) fix resize impl in array/map (#41595) (#41699)
backport: https://github.com/apache/doris/pull/41595
2024-10-15 09:50:11 +08:00
d97642e9b5 [cherry-pick](branch-21) fix tablet sink shuffle without project not match the output tuple (#40299)(#41293) (#41327)
## Proposed changes

cherry-pick from master  (#40299)(#41293)

<!--Describe your changes.-->
2024-10-15 00:12:23 +08:00
4888c632f4 [cherry-pick](branch2.1) support escape.delim and serialization.null.format for hive text (#41684)
## Proposed changes
pick from master:
https://github.com/apache/doris/pull/40291
2024-10-15 00:08:23 +08:00
ff52e73a07 [Fix](inverted index) fix match null for inverted index #41746 (#41787)
cherry pick from #41746
2024-10-14 14:45:36 +08:00
f112af0fd2 [pick](branch-2.1) pick #41555 #41592 #38204 (#41781)
pick #41555 #41592 #38204
2024-10-14 14:05:08 +08:00
e10458baad [enhancement](err-msg) Output column info when size invalid in block data convertor (#41535) (#41764)
## Proposed changes

pick: #41535

As title.
2024-10-12 21:08:04 +08:00
2ae37626bb [opt](index compaction)Use RAM dir to create tmp index_writer (#41371) (#41705)
## Proposed changes

bp #41371
2024-10-12 17:13:55 +08:00
90d6985f91 [Fix](bug) Is null predicate get error query result (#41704)
cherry-pick #41668
2024-10-12 13:18:14 +08:00