Commit Graph

1182 Commits

Author SHA1 Message Date
012e66729a [improvement](executor) Add tvf and regression test for Workload Scheduler (#28733)
1 Add select workload schedule policy tvf
2 Add reg test
2023-12-22 12:09:51 +08:00
0070909d30 [fix](group commit)Fix the issue of duplicate addition of wal path when encouter exception (#28691) 2023-12-21 20:27:33 +08:00
4f1aebb8e8 (topN)runtime_predicate is only triggered when the column name is obtained (#28419)
Issue Number: close #27485
2023-12-21 18:08:23 +08:00
bcf2683b9d [fix](scanner) fix concurrency bugs when scanner is stopped or finished (#28650)
`ScannerContext` will schedule scanners even after stopped, and confused with `_is_finished` and `_should_stop`.
 Only Fix the concurrency bugs when scanner is stopped or finished reported in https://github.com/apache/doris/pull/28384
2023-12-21 10:37:58 +08:00
970e1c8475 [fix](group_commit) fix group commit cancel stuck (#28749) 2023-12-21 10:32:21 +08:00
280a01b815 [pipelineX](improvement) Support global runtime filter (#28692) 2023-12-20 20:06:26 +08:00
504693be7f [bug](coredump) Fix coredump in aggregation node's destruction(#28684)
fix coredump in aggregation node's destruction
2023-12-20 20:02:48 +08:00
36857006cd [Fix](json reader) fix json reader crash due to fmt::format_to (#28737)
```
4# __gnu_cxx::__verbose_terminate_handler() [clone .cold] at ../../../../libstdc++-v3/libsupc++/vterminate.cc:75
5# __cxxabiv1::__terminate(void (*)()) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
6# 0x00005622F33D22B1 in /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be
7# 0x00005622F33D2404 in /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be
8# fmt::v7::detail::error_handler::on_error(char const*) in /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be
9# char const* fmt::v7::detail::parse_replacement_field<char, fmt::v7::detail::format_handler<fmt::v7::detail::buffer_appender<char>, char, fmt::v7::basic_format_context<fmt::v7::detail::buffer_appender<char>, char> >&>(char const*, char const*, fmt::v7::detail::format_handler<fmt::v7::detail::buffer_appender<char>, char, fmt::v7::basic_format_context<fmt::v7::detail::buffer_appender<char>, char> >&) in /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be
10# void fmt::v7::detail::vformat_to<char>(fmt::v7::detail::buffer<char>&, fmt::v7::basic_string_view<char>, fmt::v7::basic_format_args<fmt::v7::basic_format_context<fmt::v7::detail::buffer_appender<fmt::v7::type_identity<char>::type>, fmt::v7::type_identity<char>::type> >, fmt::v7::detail::locale_ref) in /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be
11# doris::vectorized::NewJsonReader::_append_error_msg(rapidjson::GenericValue<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool*) at /root/doris/be/src/vec/exec/format/json/new_json_reader.cpp:924
12# doris::vectorized::NewJsonReader::_set_column_value
```
2023-12-20 19:58:30 +08:00
2b2d3d0eb1 [fix](meta_scanner) fix meta_scanner process ColumnNullable (#28711) 2023-12-20 17:41:38 +08:00
111185407c [Improve](tvf)jni-avro support split file (#27933) 2023-12-19 16:37:34 +08:00
b142ade69e [refactor](renamefile) rename some files according to the class names (#28606) 2023-12-19 14:10:11 +08:00
Pxl
89d728290d [Chore](execute) remove some unused code and adjust check_row_nums #28576 2023-12-19 09:55:50 +08:00
66fbb22ad7 [fix](group commit) Fix some wal problems on group commit (#28554) 2023-12-19 09:51:03 +08:00
97e63516b7 [fix](streamload) catch exception when reading arrow data (#28558) 2023-12-18 22:03:57 +08:00
73f7b61019 [refactor](scanner) use weak ptr to lock task execution context to avoid core in scanner dctor (#28493)
using weak ptr as a lock between fragment execute thread and scanner thread, to solve the core problem in scanner's dctor to access scannode's profile.
2023-12-18 14:09:32 +08:00
469edbdd3d [feature](executor)make scan task wait timeout config #28467 2023-12-16 11:36:15 +08:00
fb925bdd08 [Bug](memory) Fix exception-unsafe in aggregation node (#28483)
The alloc function may throw std::bad_alloc exception when the process memory exceed limit.

be.INFO:

W1214 09:14:17.434849 771103 mem_tracker_limiter.cpp:204] Memory limit exceeded:<consuming tracker:<Load#Id=28448230da1f432e-8a66597e1032
9235>, process memory used 20.41 GB exceed limit 18.76 GB or sys mem available 9.04 GB less than low water mark 1.60 GB, failed alloc siz
e 1.86 MB>, executing msg:<execute:<>>. backend xx.x.x.xxx process memory used 20.41 GB, limit 18.76 GB. If query tracker exceed, set ex ec_mem_limit=8G to change limit, details see be.INFO.
Process Memory Summary:
    OS physical memory 31.26 GB. Process memory usage 20.41 GB, limit 18.76 GB, soft limit 16.88 GB. Sys available memory 9.04 GB, low wa
ter mark 1.60 GB, warning water mark 3.20 GB. Refresh interval memory growth 0 B
Alloc Stacktrace:
    @     0x555cd858bee9  doris::MemTrackerLimiter::print_log_usage()
    @     0x555cd859a384  doris::ThreadMemTrackerMgr::exceeded()
    @     0x555cd85a0ac4  malloc
    @     0x555cd8fcf368  Allocator<>::alloc()
    @     0x555cd8fdbdaf  doris::vectorized::Arena::add_chunk()
    @     0x555cd96dc0ab  doris::vectorized::AggregateDataContainer::_expand()
    @     0x555cd96aded8  (unknown)
    @     0x555cd969fa2c  doris::vectorized::AggregationNode::_pre_agg_with_serialized_key()
    @     0x555cd96d1d61  std::_Function_handler<>::_M_invoke()
    @     0x555cd967ab0b  doris::vectorized::AggregationNode::get_next()
    @     0x555cd81282a6  doris::ExecNode::get_next_after_projects()
    @     0x555cd8452968  doris::PlanFragmentExecutor::get_vectorized_internal()
    @     0x555cd845553b  doris::PlanFragmentExecutor::open_vectorized_internal()
    @     0x555cd8456a9e  doris::PlanFragmentExecutor::open()
    @     0x555cd842f200  doris::FragmentExecState::execute()
    @     0x555cd843280e  doris::FragmentMgr::_exec_actual()
    @     0x555cd8432d42  _ZNSt17_Function_handlerIFvvEZN5doris11FragmentMgr18exec_plan_fragmentERKNS1_23TExecPlanFragmentParamsESt8funct
ionIFvPNS1_20PlanFragmentExecutorEEEEUlvE_E9_M_invokeERKSt9_Any_data
    @     0x555cd86ead05  doris::ThreadPool::dispatch_thread()
    @     0x555cd86e015f  doris::Thread::supervise_thread()
    @     0x7f3321593ea5  start_thread
    @     0x7f33218a69fd  __clone
    @              (nil)  (unknown)
2023-12-15 19:17:18 +08:00
eb99e4270d [Fix](parquet_reader) Fix dict filtering doesn't work with plain dict encoding in parquet reader. (#28290) 2023-12-15 09:27:02 +08:00
9fe2fce306 [minor](refactor) remove unused code (#28383) 2023-12-14 17:16:41 +08:00
e53cfa09da [fix](join) incorrect result of right anti join with nullable (#28301) 2023-12-14 14:07:12 +08:00
c00dca70e6 [pipelineX](local shuffle) Support parallel execution despite of tablet number (#28266) 2023-12-14 12:53:54 +08:00
48937fef48 [Performance](json reader) optimize filling default values (#25542)
Add a faster path for filling default values, since looking up value map is relatively slow
2023-12-14 10:20:29 +08:00
ec91dd1129 [opt](vfilescanner) interrupt running parquet/orc readers when scannode is finished (#28223)
VScanNode::get_next will check whether the ScanNode has reached limit condition, and send eos to TaskScheduler, and TaskScheduler will try to close ScanNode.
However, ScanNode must wait all running scanners finished, so even if ScanNode has reached limit condition, it can't be closed immediately.
This PR try to interrupt the running readers, and make ScanNode to end as soon as possible.
2023-12-13 19:31:08 +08:00
764d893cbf Remove unused const variables NUMBER, ZERO in vnumbers_tvf.cpp (#28317)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-12-13 13:16:48 +08:00
13b9350aeb [Bug](scan)fix some case query timeout of not schedule scanner (#28243)
now in pipeline, when result block queue is empty, will be reschedule, and then choose a batch of scanner,
but sometimes, get_available_thread_slot_num() will return thread_slot_num <= 0, so it's will do nothing,
and then block queue will always empty.
have no chance to reschedule again until query timeout.
2023-12-12 21:00:22 +08:00
9861cfc4bc [Fix](Transactional-Hive) Fix transactional hive core dump when TransactionalHiveReader::init_row_filters(). (#28238)
Fix transactional hive core dump when TransactionalHiveReader::init_row_filters().
2023-12-12 14:17:26 +08:00
5ff110e845 [exec](profile) only build expr debug string enable profile (#28261) 2023-12-12 09:13:37 +08:00
ac167f493b [fix](join) fix decimal overflow caused by left outer join (#28221)
For left outer join or full outer join, when build side data is empty, null data is output for build side, but nested column data of nullable column is not properly initialized, which may cause decimal arithmetic overflow
2023-12-11 11:51:05 +08:00
e1587537bc [Fix](status) fix unhandled status in exprs #28218
which marked static_cast<void> in https://github.com/apache/doris/pull/23395/files
partially fixed #28160
2023-12-11 11:04:58 +08:00
8f2202c89d [minor](log) Add debug info in operators (#28211) 2023-12-11 10:02:24 +08:00
abc802b5ba [bugfix](core) child block is shared between operator and node, it should be shared ptr (#28106)
_child_block in nest loop join , table value function, repeat node will be shared between ExecNode and related operator, but it should not be a unique ptr in operator, it belongs to exec node.

It will double free the block, if operator's close method is not called correctly.

It should be a shared ptr, then it will not core even if the opeartor's close method is not called.
2023-12-09 00:18:14 +08:00
d8d8f15bf3 [improvement](vectorization) Use requires instead of specialization for doris::vectorized::Decimal (#28027)
Use requires instead of specialization for doris::vectorized::Decimal
2023-12-08 09:59:52 +08:00
9461e86b10 [pipelineX](debug) add debug string (#28137)
* [pipelineX](debug) add debug string

* update
2023-12-07 23:21:10 +08:00
f9d4690023 [improve](stack_trace) avoid print stack trace in csv and json reader #28129 2023-12-07 22:45:18 +08:00
81a0f8c041 [Feature](function) support generating const values from tvf numbers (#28051)
If specified, got a column of constant. otherwise an incremental series like it always be.

mysql> select * from numbers("number" = "5", "const_value" = "-123");
+--------+
| number |
+--------+
|   -123 |
|   -123 |
|   -123 |
|   -123 |
|   -123 |
+--------+
5 rows in set (0.11 sec)
2023-12-07 22:26:43 +08:00
cb9a6f63ab [refactor](simd_json_reader) refactor simd json parse to adapt stream parse (#27972) 2023-12-07 14:45:15 +08:00
8df94f0d07 [fix](remote-scanner-pool) missing _remote_thread_pool_max_size value (#28057) 2023-12-07 11:18:42 +08:00
54d062ddee [feature](stream load) (step one)Add arrow data type for stream load (#26709)
By using the Arrow data format, we can reduce the streamload of data transferred and improve the data import performance
2023-12-06 23:29:46 +08:00
3e8c75e246 [minor](orc) opt the log info in orc reader (#27951) 2023-12-06 20:47:36 +08:00
1be513b927 [pipelineX](local shuffle) Fix local shuffle for colocate/bucket join (#28032) 2023-12-06 10:02:36 +08:00
fd1db4da3d [agg](profile) fix incorrent profile (#28004) 2023-12-05 20:48:10 +08:00
6074cddcf8 [feature](mtmv)add Job and task tvf (#27967)
add:
select * from jobs("type"="mv");
select * from tasks("type"="mv");
select * from jobs("type"="insert");
select * from tasks("type"="insert");

add check priv for mv_infos("database"="xxx");

change JobType MTMV==>MV
2023-12-05 15:12:36 +08:00
54fe1a166b [Refactor](scan) refactor scan scheduler to improve performance (#27948)
* [Refactor](scan) refactor scan scheduler to improve performance

* fix pipeline x core
2023-12-05 13:03:16 +08:00
2b4c4bb442 [Fix][Opt](parquet-reader) Fix filter push down with decimal types in parquet reader. (#27897)
Fix filter push down with decimal types in parquet reader introduced by #22842
2023-12-04 22:25:39 +08:00
Pxl
e3d2425d47 [Improvement](join) remove insert_indices_from_join and special judge for -1 (#27779)
remove insert_indices_from_join and special judge for -1
2023-12-04 11:03:22 +08:00
d2a99aa03b [refactor](scan) change scan reschedule into scan context (#27766)
* [refactor](scan) change scan reschedule into scan context
2023-12-04 10:25:52 +08:00
97d36b4f38 [fix](csv_reader) fix trim_double_quotes behavior change (#27882) 2023-12-03 22:57:55 +08:00
fc8b32be7a [Opt](multi-catalog) Opt parquet orc reader numeric copy by memcpy() and memset(). (#27545)
Opt parquet orc reader null map decoding by memset().
2023-12-03 09:55:05 +08:00
54b5d04ff9 [improve](csv_reader) handle csv reader error (#27892) 2023-12-02 10:05:02 +08:00
1706699e7e [fix](multi-catalog)support the max compute partition prune (#27154)
1. max compute partition prune,
we just support filter mc partitions by '=',it can filter just one partition
to support multiple partition filter and range operator('>','<', '>='..), the partition prune should be supported.

2. add max compute row count cache and partitionValues cache

3. add max compute regression case
2023-12-01 22:28:26 +08:00