Commit Graph

11706 Commits

Author SHA1 Message Date
9ee7fa45d1 [Refactor](multi-catalog) Refactor to process splitted conjuncts for dict filter. (#21459)
Conjuncts are currently split, so refactor source code to handle split conjuncts for dict filters.
2023-07-07 09:19:08 +08:00
9bcf79178e [Improvement](statistics, multi catalog)Support iceberg table stats collection (#21481)
Fetch iceberg table stats automatically while querying a table.
Collect accurate statistics for Iceberg table by running analyze sql in Doris (remove collect by meta option).
2023-07-07 09:18:37 +08:00
79221a54ca [refactor](Nereids): remove withLogicalProperties & check children size (#21563) 2023-07-06 20:37:17 +08:00
fba3ae96b9 Revert "[Fix](planner) Set inline view output as non constant after analyze (#21212)" (#21581)
This reverts commit 0c3acfdb7c744decb7b60e372007707a55d14e00.
2023-07-06 20:30:27 +08:00
181dad4181 [fix](executor) make elt / repeat smooth upgrade. (#21493)
BE : 2.0,FE : 1.2

before

mysql [(none)]>select elt(1, 'aaa', 'bbb');
ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]Function elt get failed, expr is VectorizedFnCall[elt](arguments=,return=String) and return type is String.

mysql [test]> INSERT INTO tbb VALUES (1, repeat("test1111", 8192))(2, repeat("test1111", 131072));
mysql [test]>select k1, md5(v1), length(v1) from tbb;
+------+----------------------------------+--------------+
| k1   | md5(`v1`)                        | length(`v1`) |
+------+----------------------------------+--------------+
| 1    | d41d8cd98f00b204e9800998ecf8427e |            0 |
| 2    | d41d8cd98f00b204e9800998ecf8427e |            0 |
+------+----------------------------------+--------------+

now

mysql [test]>select elt(1, 'aaa', 'bbb');
+----------------------+
| elt(1, 'aaa', 'bbb') |
+----------------------+
| aaa                  |
+----------------------+

mysql [test]>select k1, md5(v1), length(v1) from tbb;
+------+----------------------------------+--------------+
| k1   | md5(`v1`)                        | length(`v1`) |
+------+----------------------------------+--------------+
| 1    | 1f44fb91f47cab16f711973af06294a0 |        65536 |
| 2    | 3c514d3b89e26e2f983b7bd4cbb82055 |      1048576 |
+------+----------------------------------+--------------+
2023-07-06 19:15:06 +08:00
2d94477748 [fix](type system) fix datetimev2 write column to arrow (#21529)
*** Query id: c1d804d455a24dee-a8967d16a258fc15 ***
*** Aborted at 1688530361 (unix time) try "date -d @1688530361" if you are using GNU date ***
*** Current BE git commitID: f2025b9 ***
*** SIGSEGV unknown detail explain (@0x0) received by PID 3709755 (TID 3710413 OR 0x7f5661d57700) from PID 0; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/doris/be/src/common/signal_handler.h:413
 1# os::Linux::chained_handler(int, siginfo*, void*) in /usr/lib/jvm/TencentKona-8.0.12-352/jre/lib/amd64/server/[libjvm.so](http://libjvm.so/)
 2# JVM_handle_linux_signal in /usr/lib/jvm/TencentKona-8.0.12-352/jre/lib/amd64/server/[libjvm.so](http://libjvm.so/)
 3# signalHandler(int, siginfo*, void*) in /usr/lib/jvm/TencentKona-8.0.12-352/jre/lib/amd64/server/[libjvm.so](http://libjvm.so/)
 4# 0x00007F5795EE5B50 in /lib64/libc.so.6
 5# doris::vectorized::DateV2Value<doris::vectorized::DateTimeV2ValueType>::to_buffer(char*, int) const at /root/doris/be/src/vec/runtime/vdatetime_value.cpp:2409
 6# doris::vectorized::DataTypeDateTimeV2SerDe::write_column_to_arrow(doris::vectorized::IColumn const&, unsigned char const*, arrow::ArrayBuilder*, int, int) const in /mnt/disk2/zhaobingquan/doris/be/lib/doris_be
 7# doris::vectorized::DataTypeNullableSerDe::write_column_to_arrow(doris::vectorized::IColumn const&, unsigned char const*, arrow::ArrayBuilder*, int, int) const at /root/doris/be/src/vec/data_types/serde/data_type_nullable_serde.cpp:120
 8# doris::FromBlockConverter::convert(std::shared_ptr<arrow::RecordBatch>*) at /root/doris/be/src/util/arrow/block_convertor.cpp:392
 9# doris::convert_to_arrow_batch(doris::vectorized::Block const&, std::shared_ptr<arrow::Schema> const&, arrow::MemoryPool*, std::shared_ptr<arrow::RecordBatch>*) in /mnt/disk2/zhaobingquan/doris/be/lib/doris_be
10# doris::vectorized::MemoryScratchSink::send(doris::RuntimeState*, doris::vectorized::Block*, bool) at /root/doris/be/src/vec/sink/vmemory_scratch_sink.cpp:83
11# doris::PlanFragmentExecutor::open_vectorized_internal() in /mnt/disk2/zhaobingquan/doris/be/lib/doris_be
12# doris::PlanFragmentExecutor::open() at /root/doris/be/src/runtime/plan_fragment_executor.cpp:273
13# doris::FragmentExecState::execute() at /root/doris/be/src/runtime/fragment_mgr.cpp:263
14# doris::FragmentMgr::_exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::RuntimeState*, doris::Status*)> const&) at /root/doris/be/src/runtime/fragment_mgr.cpp:527
15# std::_Function_handler<void (), doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&)::$_0>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
16# doris::ThreadPool::dispatch_thread() in /mnt/disk2/zhaobingquan/doris/be/lib/doris_be
17# doris::Thread::supervise_thread(void*) at /root/doris/be/src/util/thread.cpp:466
18# start_thread in /lib64/libpthread.so.0
19# __clone in /lib64/libc.so.6
2023-07-06 17:33:49 +08:00
8e6b9b4026 [fix](sink) Fix NodeChannel add_block_closure null pointer (#21534)
NodeChannel add_block_closure null pointer when canceled before open_wait new closure.
2023-07-06 17:09:43 +08:00
dac2b638c6 [refactor](load) move memtable flush logic to flush token and rowset writer (#21547) 2023-07-06 17:04:30 +08:00
457de3fc55 [refactor](load) move find_tablet out of VOlapTableSink (#21462) 2023-07-06 16:51:32 +08:00
2e651bbc9a [fix](nereids) fix some planner bugs (#21533)
1. allow cast boolean as date like type in nereids, the result is null
2. PruneOlapScanTablet rule can prune tablet even if a mv index is selected.
3. constant conjunct should not be pushed through agg node in old planner
2023-07-06 16:13:37 +08:00
0c3acfdb7c [Fix](planner) Set inline view output as non constant after analyze (#21212)
Problem:
Select list should be non const when from list have tables or multiple tuples. Or upper query will regard wrong of isConstant
And make wrong constant folding
For example: when using nullif funtion with subquery which result in two alternative constant, planner would treat it as constant expr. So analyzer would report an error of order by clause can not be constant

Solusion:
Change inline view output to non constant, because (select 1 a from table) as view , a in output is no constant when we see
view.a outside
2023-07-06 15:37:43 +08:00
068fe44493 [feature](profile) Add important time of legacy planner to profile (#20602)
Add important time in planning process. Add time points of:
// Join reorder end time
queryJoinReorderFinishTime means time after analyze and before join reorder
// Create single node plan end time
queryCreateSingleNodeFinishTime means time after join reorder and before finish create single node plan
// Create distribute plan end time
queryDistributedFinishTime means time after create single node plan and before finish create distributed node plan
2023-07-06 15:36:25 +08:00
bb3b6770b5 [Enhancement](multi-catalog) Make meta cache batch loading concurrently. (#21471)
I will enhance performance about querying meta cache of hms tables by 2 steps:
**Step1** : use concurrent batch loading for meta cache
**Step2** : execute some other tasks concurrently as soon as possible

**This pr mainly for step1 and it mainly do the following things:**
- Create a `CacheBulkLoader` for batch loading
- Remove the executor of the previous async cache loader and change the loader's type to `CacheBulkLoader` (We do not set any refresh strategies for LoadingCache, so the previous executor is not useful)
- Use a `FixedCacheThreadPool` to replace the `CacheThreadPool` (The previous `CacheThreadPool` just log warn infos and will not throw any exceptions when the pool is full).
- Remove parallel streams and use the `CacheBulkLoader` to do batch loadings
- Change the value of `max_external_cache_loader_thread_pool_size` to 64, and set the pool size of hms client pool to `max_external_cache_loader_thread_pool_size`
- Fix the spelling mistake for `max_hive_table_catch_num`
2023-07-06 15:18:30 +08:00
fde73b6cc6 [Fix](multi-catalog) Fix hadoop short circuit reading can not enabled in some environments. (#21516)
Fix hadoop short circuit reading can not enabled in some environments.
- Revert #21430 because it will cause performance degradation issue.
- Add `$HADOOP_CONF_DIR` to `$CLASSPATH`.
- Remove empty `hdfs-site.xml`. Because in some environments it will cause hadoop short circuit reading can not enabled.
- Copy the hadoop common native libs(which is copied from https://github.com/apache/doris-thirdparty/pull/98
) and add it to `LD_LIBRARY_PATH`. Because in some environments `LD_LIBRARY_PATH` doesn't contain hadoop common native libs, which will cause hadoop short circuit reading can not enabled.
2023-07-06 15:00:26 +08:00
06451c4ff1 fix: infinit loop when handle exceed limit memory (#21556)
In some situation, _handle_mem_exceed_limit will alloc a large memory block, more than 5G. After add some log, we found that:

alloc memory was made in vector::insert_realloc
writers_to_reduce_mem's size is more than 8 million.
which indicated that an infinite loop was met in while (!tablets_mem_heap.empty()).
By reviewing codes, """ if (std::get<0>(tablet_mem_item)++ != std::get<1>(tablet_mem_item)) """ is wrong,
which must be """ if (++std::get<0>(tablet_mem_item) != std::get<1>(tablet_mem_item)) """.
In the original code, we will made ++ on end iterator, and then compare to end iterator, the behavior is undefined.
2023-07-06 14:34:29 +08:00
4d17400244 [profile](join) add collisions into profile (#21510) 2023-07-06 14:30:10 +08:00
8839518bfb [Performance](Nereids): add withGroupExprLogicalPropChildren to reduce new Plan (#21477) 2023-07-06 14:10:31 +08:00
009b300abd [Fix](ScannerScheduler) fix dead lock when shutdown group_local_scan_thread_pool (#21553) 2023-07-06 13:09:37 +08:00
013bfc6a06 [Bug](row store) Fix column aggregate info lost when table is unique model (#21506) 2023-07-06 12:06:22 +08:00
9d2f879bd2 [Enhancement](inverted index) make InvertedIndexReader shared_from_this (#21381)
This PR proposes several changes to improve code safety and readability by replacing raw pointers with smart pointers in several places.

use enable_factory_creator in InvertedIndexIterator and InvertedIndexReader, remove explicit new constructor.
make InvertedIndexReader shared_from_this, it may desctruct when InvertedIndexIterator use it.
2023-07-06 11:52:59 +08:00
fb14950887 [refactor](load) split flush_segment_writer into two parts (#21372) 2023-07-06 11:13:34 +08:00
80be2bb220 [bugfix](RowsetIterator) use valid stats when creating segment iterator (#21512) 2023-07-06 10:35:16 +08:00
b1be59c799 [enhancement](query) enable strong consistency by syncing max journal id from master (#21205)
Add a session var & config enable_strong_consistency_read to solve the problem that loading result may be shortly invisible to follwers, to meet users requirements in strong consistency read scenario.

Will sync max journal id from master and wait for replaying.
2023-07-06 10:25:38 +08:00
6a0a21d8b0 [regression-test](load) add streamload default value test (#21536) 2023-07-06 10:14:13 +08:00
688a1bc059 [refactor](load) expand OlapTableValidator to VOlapTableBlockConvertor (#21476) 2023-07-06 10:11:53 +08:00
a2e679f767 [fix](status) Return the correct error code when clucene error occured (#21511) 2023-07-06 09:08:11 +08:00
c1e82ce817 [fix](backup) fix show snapshot cauing mysql connection lost (#21520)
If this is no `info file` in repository, the mysql connection may lost when user executing `show snapshot on repo`,
```
2023-07-05 09:22:48,689 WARN (mysql-nio-pool-0|199) [ReadListener.lambda$handleEvent$0():60] Exception happened in one session(org.apache.doris.qe.ConnectContext@730797c1).
java.io.IOException: Error happened when receiving packet.
    at org.apache.doris.qe.ConnectProcessor.processOnce(ConnectProcessor.java:691) ~[doris-fe.jar:1.2-SNAPSHOT]
    at org.apache.doris.mysql.ReadListener.lambda$handleEvent$0(ReadListener.java:52) ~[doris-fe.jar:1.2-SNAPSHOT]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_322]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_322]
    at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_322]
```

This is because there are some field missing in returned result set.
2023-07-05 22:44:57 +08:00
b6a5afa87d [Feature](multi-catalog) support query hive-view for nereids planner. (#21419)
Relevant pr #18815, support query hive views for nereids planner.
2023-07-05 21:58:03 +08:00
b3db904847 [fix](Nereids): when child is Aggregate, don't infer Distinct for it (#21519) 2023-07-05 19:39:41 +08:00
5d2739b5c5 [Fix](submodule) revert clucene version wrong rollback (#21523) 2023-07-05 19:10:15 +08:00
f868aa9d4a [Enhancement](multi-catalog) Add some checks for ShowPartitionsStmt. (#21446)
1.  Add some validations for ShowPartitionsStmt with hive tables
2. Make the behavior consistently with hive
2023-07-05 16:28:05 +08:00
0da1bc7acd [Fix](multi-catalog) Fallback to refresh catalog when hms events are missing (#21333)
Fix #20227, the implementation has some problems and can not catch event-missing-exception.
2023-07-05 16:27:01 +08:00
242a35fa80 [fix](s3) fix s3 fs benchmark tool (#21401)
1. fix concurrency bug of s3 fs benchmark tool, to avoid crash on multi thread.
2. Add `prefetch_read` operation to test prefetch reader.
3. add `AWS_EC2_METADATA_DISABLED` env in `start_be.sh` to avoid call ec2 metadata when creating s3 client.
4. add `AWS_MAX_ATTEMPTS` env in `start_be.sh` to avoid warning log of s3 sdk.
2023-07-05 16:20:58 +08:00
39590f95b0 [pipeline](load) return error status in pipeline load (#21303) 2023-07-05 16:13:32 +08:00
37a52789bd [improvement](statistics, multi catalog)Estimate hive table row count based on file size. (#21207)
Support estimate table row count based on file size.

With sample size=3000 (total partition number is 87491), load cache time is 45s.
With sample size=100000 (more than total partition number 87505), load cache time is 388s.
2023-07-05 16:07:12 +08:00
1121e7d0c3 [feature](Nereids): pushdown distinct through join. (#21437) 2023-07-05 15:55:21 +08:00
4d414c649a [fix](Nereids) set operation physical properties derive is wrong (#21496) 2023-07-05 15:44:40 +08:00
d8a549fe61 [Fix](Comment) Comment should be in English (#20964) 2023-07-05 15:41:34 +08:00
48bfb8e9cf [Enhancement](regression-test)Add regression test for MoW backup and restore (#21223) 2023-07-05 15:16:04 +08:00
38c8657e5e [improve](memory) more grace logging for memory exceed limit (#21311)
more grace logging for Allocator and MemTracker when memory exceed limit
fix bthread grace exit.
2023-07-05 14:59:06 +08:00
f9bc433917 [fix](nereids) fix runtime filter expr order (#21480)
Current runtime filter pushing down to cte internal, we construct the runtime filter expr_order with incremental number, which is not correct. For cte internal rf pushing down, the join node will be always different, the expr_order should be fixed as 0 without incrementation, otherwise, it will lead the checking for expr_order and probe_expr_size illegal or wrong query result.

This pr will revert 2827bc1 temporarily, it will break the cte rf pushing down plan pattern.
2023-07-05 14:27:35 +08:00
Pxl
f02bec8ad1 [Chore](runtime filter) change runtime filter dcheck to error status or exception (#21475)
change runtime filter dcheck to error status or exception
2023-07-05 14:03:55 +08:00
d3eeb233c8 [fix](dbt) dbt getconfig array or string (#21345)
{{ config(unique_key='id') }}
{{ config(unique_key=['id','name']) }}
Follow the dbt habit, use string for a single column name, and use array for multiple columns
2023-07-05 11:42:38 +08:00
e510e6b0a6 [fix](dbt) dbt-doris match dbt-core==1.5 (#21392)
dbt-doris==0.2 match dbt-core==1.3 or older version

dbt-doris Subsequent version match dbt-core==1.4,1.5
2023-07-05 11:42:19 +08:00
c9c183e498 [fix](dbt) dbt seed config read (#21492) 2023-07-05 11:41:59 +08:00
0084b9fd9a [fix](hudi) scala can't call Properties.putAll in jdk11 (#21494) 2023-07-05 10:53:09 +08:00
de5cfe34bf [fix](feut)should not create a DeriveStatsJob in fe ut (#21498) 2023-07-05 10:38:09 +08:00
15ec191a77 [Fix](CCR) Use tableId as the credential for CCR syncer instead of tableName (#21466) 2023-07-05 10:16:09 +08:00
93795442a4 [Fix](CCR) Binlog config is missed when create replica task (#21397) 2023-07-05 10:15:13 +08:00
0469c02202 [Test](regression) Temporarily disable quickTest for SHOW CREATE TABLE to adapt to enable_feature_binlog=true (#21247) 2023-07-05 10:12:02 +08:00