Commit Graph

3988 Commits

Author SHA1 Message Date
2785202816 [Bug](regression-test) be coredump in pipeline when grace exit in regression test (#18131) 2023-03-27 18:36:27 +08:00
642c378fc7 [feature](table-valued-function) add Backends table-valued-function (#17667)
This pr implement a new Metadata TVF called backends. And the implement process tutorial is in #17974.
2023-03-27 15:18:31 +08:00
f03598f214 [enhance](cooldown) no snapshot or migration action for cooldown tablet (#17658) 2023-03-27 13:35:32 +08:00
d1f34a3be4 [bugfix](inverted index)temporary disable skip read column data if it match inverted index (#18065)
The optimization for skip reading column data if it match inverted index and only used in WHERE clause may get wrong result for complex SQL.

This PR temporary disable the optimization and later PRs will resolve the problem fundamentality.
2023-03-27 11:29:42 +08:00
2929a96224 [Refactor](inverted index cache) Use asc set instead of priority queue at the lru cache (#18033)
use asc set instead of priority queue at the LRU cache, to keep the lifecycle of the LRUHandle consistent in the sorted set and the LRU free list
2023-03-27 10:27:37 +08:00
bcf95cd920 [feature](function)Add ST_Angle_Sphere function (#17919) 2023-03-27 10:14:46 +08:00
fd5dd9a391 [Opt](Pipeline) opt pipeline code in mult tablet (#17999) 2023-03-27 10:02:48 +08:00
990479e177 [refactor](memory) Query waits for memory free in Allocator, after memory exceed limit. (#18075)
After the memory exceeds the limit, the previous query waited for memory free in the mem hook, and changed it to wait in the Allocator.

more controllable and safe
2023-03-27 09:06:03 +08:00
78abb40fdc [improvement](string) throw exception instead of log fatal if string column exceed total size limit (#17989)
Throw exception instead of log fatal if string column exceed total size limit, so that we can catch it and let query fail, instead of causing be exit.
2023-03-27 08:55:26 +08:00
c2dd005efb [fix](chore) fix BE compile and FE protoc artifact issue (#18120)
add <optional> head to solve the compilation issue
use 3.12.9 as the protoc.artifact's version, because there is no 3.12.21
See: https://repo.maven.apache.org/maven2/com/google/protobuf/protoc/
Remove --show-progress arguments of wget because it is not supported in low version wget
2023-03-27 08:53:42 +08:00
3e8b3d68fc [BugFix](jdbc catalog) fix OOM when jdbc catalog querys large data from doris #18067
When using JDBC Catalog to query the Doris data, because Doris does not provide the cursor reading method (that is, fetchBatchSize is invalid), Doris will send the data to the client at one time, resulting in client OOM.

The MySQL protocol provides a stream reading method. Doris can use this method to avoid OOM. The requirements of using the stream method are setting fetchbatchsize =  Integer.MIN_VALUE and setting ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY
2023-03-26 20:02:03 +08:00
Pxl
45ad297a1d [Enchancement](function) change aggregate function creator to return AggregateFunctionPtr (#18025)
change creator_type to return AggregateFunctionPtr.
remove some function and use creator directly.
2023-03-26 11:41:34 +08:00
c63807ccfe [chore](be) reduce log when trying to do async write cooldown meta (#18107) 2023-03-26 11:10:21 +08:00
5846b3fc54 [fix](memory) Remove PODArray peak allocated memory tracking #18010
#11740 , solved the problem that the query memory statistics are higher than the actual physical memory, because PODArray does not have memset 0 when allocating memory, and the query mem tracker is virtual memory.

But in extreme cases, such as csv load, PODArray frequent insert will cause performance problems. So revert part of #11740 and part of #12820.

The accuracy of the query mem tracker, there is currently no feedback, no further attention.
2023-03-26 09:45:10 +08:00
7c0bcbdca1 [enhance](parquet-reader) cache file meta of parquet to speed up query (#18074)
Problem:
1. FE will split the parquet file into split. So a file can have several splits.
2. BE will scan each split, read the footer of the parquet file.
3. If 2 splits belongs to a same parquet file, the footer of this file will be read twice.

This PR mainly changes:
1. Use kv cache to cache the footer of parquet file.
2. The kv cache is belong to a scan node, so all parquet reader belong to this scan node will share same kv cache.
3. In cache, the key is "meta_file_path", the value is parsed thrift footer.

The KV Cache is sharded into mutlti sub cache.
So that different file can use different sub cache, avoid blocking each other

In my test, a query with 26 splits can reduce the footer parse time from 4s -> 1s
2023-03-25 23:22:57 +08:00
360d3050bc [Feature](array-function) Support array_reverse_sort function (#17754)
Co-authored-by: zhangyu209 <zhangyu209@meituan.com>
2023-03-25 21:58:11 +08:00
50eeb2d9a4 [fix](json) change int to bigint for json function (#17769) 2023-03-25 21:57:29 +08:00
855852d582 [enhancement](timeout) fix set timeout failure and simplify timeout logic (#17837) 2023-03-25 21:56:06 +08:00
193ae352e4 [fix](coalesce) fix problem that coalesce function may cause problem of block mem reuse (#17940) 2023-03-25 21:50:37 +08:00
Pxl
a8753faeb1 [Bug](function) fix column complex not resize after filter (#18043) 2023-03-25 21:48:13 +08:00
77c9550420 [fix](bitmapfilter) fix bitmap filter timeout unit error (#18110) 2023-03-25 21:46:32 +08:00
7ae51c856e [refactor](unify exception) unify exception definition and error code (#18006)
* [refactor](unify exception) unify exception definition and error code


---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-25 12:41:07 +08:00
f84481886b [feature](string_functions) The 'split_part' function supports non-constant parameters (#18029) 2023-03-25 12:03:11 +08:00
2408ca5da8 [Bug](DECIMALV3) Fix wrong precision for plus/minus (#18052)
Result type for DECIMAL(x, y) plus/minus DECIMAL(m, n) should be DECIMAL(max(x - y, m - n) + max(y + n) + 1, max(y + n))
2023-03-25 09:42:39 +08:00
b2c70b51cc [refactor](vectorized) delete row-based AnyVal and DateTimeVal (#18093) 2023-03-25 09:40:04 +08:00
0523860877 [Enhancement](streamload) print profile for streamload (#18015)
When both enable_profile and enable_stream_load_profile_log is true, stream load profile is printed to the log
2023-03-24 20:17:33 +08:00
7ac7d35703 [bugfix](publish) fix TabletLoadInfo may released by delete txn (#17986) 2023-03-24 20:14:34 +08:00
b244c41371 [Bug](regression-test) Fix grace stop be coredump in pipeline (#18076) 2023-03-24 17:44:06 +08:00
e8b9587fe6 [Improvement](dict) compute hash only if needed (#18058) 2023-03-24 11:45:58 +08:00
1999cccde9 [feature](array-type) Unique table support array value (#17024)
Unique table support array value

---------

Co-authored-by: huangqixiang.871 <huangqixiang.871@bytedance.com>
2023-03-24 10:18:59 +08:00
5445a86570 [Bug](array_product) Fix array_product for ARRAY<DECIMAL> (#18014) 2023-03-23 20:29:50 +08:00
b0948ea4cd [Fix](SAP Hana External Table) fix that SAP Hana external table can not insert batch values (#17957)
In the batch insertion scenario, sap hana database does not support syntax insert into tables values (...),(...);
what it supports is:
```sql
INSERT INTO table(col1,col2)
SELECT c1v1, c2v1 FROM dummy
UNION ALL
SELECT c1v2, c2v2 FROM dummy;
```
2023-03-23 18:49:50 +08:00
cedd36c786 [improvement](compaction)Support segcompaction for inverted index (#17874)
Since Doris supports segcompaction #12866 during loading, inverted index support is also needed.
2023-03-23 14:41:30 +08:00
11936d85f9 [fix](inverted index) fix erroneous judgement for inverted index not read raw data (#17992)
when apply inverted index will use predicate_params() from ColumnPredicate, if comparison predicate be cloned, but the clone one not copy the predicate_params() together, that resulting when applying inverted index make the wrong choice.
2023-03-23 14:40:08 +08:00
Pxl
4b626d260a [Build] fix build fail when WITH_MYSQL=OFF (#18021) 2023-03-23 14:01:21 +08:00
3870689cbb [Fix](parquet-reader) Fix iceberg_schema_evolution regression test caused by slot col name different with parquet col name. (#17988) 2023-03-23 11:23:08 +08:00
089a91ecd5 [vectorized](function) support array_exists lambda function (#17931)
Co-authored-by: zhangyu209 <zhangyu209@meituan.com>
2023-03-23 11:11:39 +08:00
cfa0a8b136 [Improvement](DECIMALV3) multiply/plus DECIMAL32 and DECIMAL64 safely and not check overflow (#18031) 2023-03-23 10:10:03 +08:00
4be1b9e784 [enhancement](load) add slow log for memtable flush (#17962) 2023-03-22 20:21:39 +08:00
e2e806a5e7 [improve](clickhouse jdbc) support clickhouse array type (#17993)
In this PR, I match the array type of ClickHouse to the array type of Doris's jdbc external.
2023-03-22 19:42:32 +08:00
ebef0c038d Revert "[fix](function) fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17420)" (#17887)
This reverts commit 397cc011c4f1ba5a25c770258c13f1cd3f28b47d.
2023-03-22 13:28:25 +08:00
34ead026d4 [Improvement](decimal) Improve cast function between decimal type (#17996) 2023-03-22 11:35:07 +08:00
Pxl
40ca250678 [Feature](materialized-view) support where clause on create materialized view (#17534)
support where clause on create materialized view
2023-03-22 11:25:13 +08:00
Pxl
401836f523 [Bug](planner) fix core dump when lateral view above union node and have predicate (#17912)
fix core dump when lateral view above union node and have predicate
2023-03-22 11:24:45 +08:00
6cbf393665 [enhance](meta action) remove useless pb field and refactor writer cooldown meta code (#17652) 2023-03-22 11:13:13 +08:00
7fd0ec7d17 [Bug](float) fix wrong value when enable fold constant by BE (#17901) 2023-03-22 09:51:03 +08:00
cb79e42e5c [refactor](file-system)(step-1) refactor file sysmte on BE and remove storage_backend (#17586)
See #17764 for details
I have tested:
- Unit test for local/s3/hdfs/broker file system: be/test/io/fs/file_system_test.cpp
- Outfile to local/s3/hdfs/broker.
- Load from local/s3/hdfs/broker.
- Query file on local/s3/hdfs/broker file system, with table value function and catalog.
- Backup/Restore with local/s3/hdfs/broker file system

Not test:
- cold & host data separation case.
2023-03-21 21:08:38 +08:00
4193884a32 [feature](array_zip) Support array_zip function (#17696) 2023-03-21 18:44:30 +08:00
7754619e2b [fix](quit) be can not quit cleanly due to deadlock (#17971) 2023-03-21 12:52:48 +08:00
656b01d191 [fix](agg) Avoid reusing a non-nullable column that has been converted to nullable within a block (#17944)
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/doris/be/src/common/signal_handler.h:420
 1# os::Linux::chained_handler(int, siginfo*, void*) in /usr/local/java/jdk1.8.0_202/jre/lib/amd64/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/local/java/jdk1.8.0_202/jre/lib/amd64/server/libjvm.so
 3# signalHandler(int, siginfo*, void*) in /usr/local/java/jdk1.8.0_202/jre/lib/amd64/server/libjvm.so
 4# 0x00007F4051C9F400 in /lib64/libc.so.6
 5# memcpy at /root/doris/be/src/glibc-compatibility/memcpy/memcpy_x86_64.cpp:219
 6# doris::vectorized::ColumnString::deserialize_and_insert_from_arena(char const*) at /root/doris/be/src/vec/columns/column_string.cpp:226
 7# doris::vectorized::ColumnString::deserialize_vec_with_null_map(std::vector<StringRef, std::allocator<StringRef> >&, unsigned long, unsigned char const*) at /root/doris/be/src/vec/columns/column_string.cpp:283
 8# void doris::vectorized::AggregationNode::_serialize_with_serialized_key_result(doris::RuntimeState*, doris::vectorized::Block*, bool*)::{lambda(auto:1&&)#1}::operator()<doris::vectorized::AggregationMethodSerialized<PHHashMap<StringRef, char*, DefaultHash<StringRef, void>, false> >&>(doris::vectorized::AggregationMethodSerialized<PHHashMap<StringRef, char*, DefaultHash<StringRef, void>, false> >&) const at /root/doris/be/src/vec/exec/vaggregation_node.cpp:1232
 9# doris::vectorized::AggregationNode::_serialize_with_serialized_key_result(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /root/doris/be/src/vec/exec/vaggregation_node.cpp:1294
10# std::_Function_handler<doris::Status (doris::RuntimeState*, doris::vectorized::Block*, bool*), std::_Bind_result<doris::Status, doris::Status (doris::vectorized::AggregationNode::*(doris::vectorized::AggregationNode*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>))(doris::RuntimeState*, doris::vectorized::Block*, bool*)> >::_M_invoke(std::_Any_data const&, doris::RuntimeState*&&, doris::vectorized::Block*&&, bool*&&) at /var/local/ldb-toolchain/include/c++/11/bits/std_function.h:293
11# doris::vectorized::AggregationNode::get_next(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /root/doris/be/src/vec/exec/vaggregation_node.cpp:508
12# doris::ExecNode::get_next_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /root/doris/be/src/exec/exec_node.cpp:852
13# doris::PlanFragmentExecutor::get_vectorized_internal(doris::vectorized::Block**) at /root/doris/be/src/runtime/plan_fragment_executor.cpp:352
14# doris::PlanFragmentExecutor::open_vectorized_internal() at /root/doris/be/src/runtime/plan_fragment_executor.cpp:300
15# doris::PlanFragmentExecutor::open() at /root/doris/be/src/runtime/plan_fragment_executor.cpp:253
16# doris::FragmentExecState::execute() at /root/doris/be/src/runtime/fragment_mgr.cpp:251
17# doris::FragmentMgr::_exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::PlanFragmentExecutor*)>) at /root/doris/be/src/runtime/fragment_mgr.cpp:498
18# std::_Function_handler<void (), doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&, std::function<void (doris::PlanFragmentExecutor*)>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/include/c++/11/bits/std_function.h:291
19# doris::ThreadPool::dispatch_thread() at /root/doris/be/src/util/threadpool.cpp:542
20# doris::Thread::supervise_thread(void*) at /root/doris/be/src/util/thread.cpp:455
21# start_thread in /lib64/libpthread.so.0
22# clone in /lib64/libc.so.6
2023-03-21 09:00:06 +08:00