Commit Graph

4770 Commits

Author SHA1 Message Date
5d2739b5c5 [Fix](submodule) revert clucene version wrong rollback (#21523) 2023-07-05 19:10:15 +08:00
242a35fa80 [fix](s3) fix s3 fs benchmark tool (#21401)
1. fix concurrency bug of s3 fs benchmark tool, to avoid crash on multi thread.
2. Add `prefetch_read` operation to test prefetch reader.
3. add `AWS_EC2_METADATA_DISABLED` env in `start_be.sh` to avoid call ec2 metadata when creating s3 client.
4. add `AWS_MAX_ATTEMPTS` env in `start_be.sh` to avoid warning log of s3 sdk.
2023-07-05 16:20:58 +08:00
39590f95b0 [pipeline](load) return error status in pipeline load (#21303) 2023-07-05 16:13:32 +08:00
d8a549fe61 [Fix](Comment) Comment should be in English (#20964) 2023-07-05 15:41:34 +08:00
38c8657e5e [improve](memory) more grace logging for memory exceed limit (#21311)
more grace logging for Allocator and MemTracker when memory exceed limit
fix bthread grace exit.
2023-07-05 14:59:06 +08:00
Pxl
f02bec8ad1 [Chore](runtime filter) change runtime filter dcheck to error status or exception (#21475)
change runtime filter dcheck to error status or exception
2023-07-05 14:03:55 +08:00
0469c02202 [Test](regression) Temporarily disable quickTest for SHOW CREATE TABLE to adapt to enable_feature_binlog=true (#21247) 2023-07-05 10:12:02 +08:00
122f5f6c2d [enchanment](udf) add more info when download jar package failed (#21440)
when download jar package, some times show the checksum is not equal,
but the root reason is unknown, now add some error msg if failed.
2023-07-04 20:35:35 +08:00
3b73604f74 [fix](memory) fix jemalloc purge arena dirty pages core dump (#21486)
Issue Number: close #xxx

jemalloc/jemalloc#2470
Occasional core dump during stress test.
2023-07-04 20:35:13 +08:00
81ee4d7402 [performance](group_concat) avoid extra copy in group_concat (#21432)
avoid extra copy in group_concat
2023-07-04 20:21:44 +08:00
13fb69550a [improvement](kerberos) disable hdfs fs handle cache to renew kerberos ticket at fix interval (#21265)
Add a new BE config `kerberos_ticket_lifetime_seconds`, default is 86400.
Better set it same as the value of `ticket_lifetime` in `krb5.conf`
If a HDFS fs handle in cache is live longer than HALF of this time, it will be set as invalid and recreated.
And the kerberos ticket will be renewed.
2023-07-04 17:13:34 +08:00
9adbca685a [opt](hudi) use spark bundle to read hudi data (#21260)
Use spark-bundle to read hudi data instead of using hive-bundle to read hudi data.

**Advantage** for using spark-bundle to read hudi data:
1. The performance of spark-bundle is more than twice that of hive-bundle
2. spark-bundle using `UnsafeRow` can reduce data copying and GC time of the jvm
3. spark-bundle support `Time Travel`, `Incremental Read`, and `Schema Change`, these functions can be quickly ported to Doris

**Disadvantage** for using spark-bundle to read hudi data:
1. More dependencies make hudi-dependency.jar very cumbersome(from 138M -> 300M)
2. spark-bundle only provides `RDD` interface and cannot be used directly
2023-07-04 17:04:49 +08:00
90dd8716ed [refactor](multicast) change the way multicast do filter, project and shuffle (#21412)
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>

1. Filtering is done at the sending end rather than the receiving end
2. Projection is done at the sending end rather than the receiving end
3. Each sender can use different shuffle policies to send data
2023-07-04 16:51:07 +08:00
09f414e0f4 fix lru cache handle field order (#21435)
For LRUHandle, all fields should be put ahead of key_data.
The LRUHandle is allocated using malloc and starting from field key_data is for key data.
2023-07-04 16:10:05 +08:00
b5da3f74f5 [improvement](join) avoid unnecessary copying in _build_output_block (#21360)
If the source columns are mutually exclusive within a temporary block, there is no need to duplicate the data.
2023-07-04 12:13:49 +08:00
b86dd11a7d [fix](pipeline) refactor olap table sink close (#20771)
For pipeline, olap table sink close is divided into three stages, try_close() --> pending_finish() --> close()
only after all node channels are done or canceled, pending_finish() will return false, close() will start.
this will avoid block pipeline on close().

In close, check the index channel intolerable failure status after each node channel failure,
if intolerable failure is true, the close will be terminated in advance, and all node channels will be canceled to avoid meaningless blocking.
2023-07-04 11:27:51 +08:00
b1c16b96d6 [refactor](load) move validator out of VOlapTableSink (#21460) 2023-07-04 10:16:56 +08:00
938c0765cd [improvement](memory) improve inserting sparse rows into string column (#21420)
For the following test, which simulate hash join outputing 435699854 rows from 5131 buiding rows:

    {
        auto col = doris::vectorized::ColumnString::create();
        constexpr int build_rows = 5131;
        constexpr int output_rows = 435699854;
        std::string str("01234567");
        for (int i = 0; i < build_rows; ++i) {
            col->insert_data(str.data(), str.size());
        }
        int indices[output_rows];
        for (int i = 0; i < output_rows; ++i) {
            indices[i] = i % build_rows;
        }
        auto col2 = doris::vectorized::ColumnString::create();
        doris::MonotonicStopWatch watch;
        watch.start();
        col2->insert_indices_from(*col, indices, indices + output_rows);
        watch.stop();
        LOG(WARNING) << "string column insert_indices_from, rows: " << output_rows << ", time: " << doris::PrettyPrinter::print(watch.elapsed_time(), doris::TUnit::TIME_NS);
    }
The ColumnString::insert_indices_from inserting time improve from 6s665ms to 3s158ms:

W0702 23:08:39.672044 1277989 doris_main.cpp:545] string column insert_indices_from, rows: 435699854, time: 3s153ms
W0702 23:09:36.368853 1282061 doris_main.cpp:545] string column insert_indices_from, rows: 435699854, time: 3s158ms


W0703 00:30:26.093307 1468640 doris_main.cpp:545] string column insert_indices_from, rows: 435699854, time: 6s761ms
W0703 00:31:21.043638 1472937 doris_main.cpp:545] string column insert_indices_from, rows: 435699854, time: 6s665ms
2023-07-04 09:34:10 +08:00
790b771a49 [improvement](execute) Eliminate virtual function calls when serializing and deserializing aggregate functions (#21427)
Eliminate virtual function calls when serializing and deserializing aggregate functions.

For example, in AggregateFunctionUniq::deserialize_and_merge method, calling read_pod_binary(ref, buf) in the for loop generates a large number of virtual function calls.

void deserialize_and_merge(AggregateDataPtr __restrict place, BufferReadable& buf,
                           Arena* arena) const override {
    auto& set = this->data(place).set;
    UInt64 size;
    read_var_uint(size, buf);
    set.rehash(size + set.size());
    for (size_t i = 0; i < size; ++i) {
        KeyType ref;
        read_pod_binary(ref, buf);
        set.insert(ref);
    }
}

template <typename Type>
void read_pod_binary(Type& x, BufferReadable& buf) {
    buf.read(reinterpret_cast<char*>(&x), sizeof(x));
}
BufferReadable has only one subclass, VectorBufferReader, so it is better to implement the BufferReadable class directly.

The following sql was tested on SSB-flat dataset:

SELECT COUNT (DISTINCT lo_partkey), COUNT (DISTINCT lo_suppkey) FROM lineorder_flat;
before: MergeTime: 415.398ms
after opt: MergeTime: 174.660ms
2023-07-04 09:26:37 +08:00
Pxl
f7c724f8a3 [Bug](excution) avoid core dump on filter_block_internal and add debug information (#21433)
avoid core dump on filter_block_internal and add debug information
2023-07-03 18:10:30 +08:00
7e02566333 [fix](pipeline) fix coredump caused by uncaught exception (#21387)
For pipeline engine, ExecNode::create_tree may throw exception sometimes, e.g. SELECT MIN(-3.40282347e+38) FROM t0; will throw a exception bacause of invalid decimal precision.

*** Query id: 346886bf48494e77-96eeea5361233618 ***
*** Aborted at 1688101183 (unix time) try "date -d @1688101183" if you are using GNU date ***
*** Current BE git commitID: 2fcb0e090b ***
*** SIGABRT unknown detail explain (@0x13ef42) received by PID 1306434 (TID 1306918 OR 0x7ff0763e1700) from PID 1306434; stack trace: ***
terminate called recursively
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:413
 1# 0x00007FFA8780E090 in /lib/x86_64-linux-gnu/libc.so.6
 2# raise at ../sysdeps/unix/sysv/linux/raise.c:51
 3# abort at /build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:81
 4# __gnu_cxx::__verbose_terminate_handler() [clone .cold] at ../../../../libstdc++-v3/libsupc++/vterminate.cc:75
 5# __cxxabiv1::__terminate(void (*)()) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
 6# 0x000055B6C30C7401 in /mnt/ssd01/doris-master/VEC_ASAN/be/lib/doris_be
 7# 0x000055B6C30C7554 in /mnt/ssd01/doris-master/VEC_ASAN/be/lib/doris_be
 8# doris::vectorized::create_decimal(unsigned long, unsigned long, bool) at /home/zcp/repo_center/doris_master/doris/be/src/vec/data_types/data_type_decimal.cpp:167
 9# doris::vectorized::DataTypeFactory::create_data_type(doris::TypeDescriptor const&, bool) at /home/zcp/repo_center/doris_master/doris/be/src/vec/data_types/data_type_factory.cpp:185
10# doris::vectorized::AggFnEvaluator::AggFnEvaluator(doris::TExprNode const&) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exprs/vectorized_agg_fn.cpp:79
11# std::unique_ptr<doris::vectorized::AggFnEvaluator, std::default_delete<doris::vectorized::AggFnEvaluator> > doris::vectorized::AggFnEvaluator::create_unique<doris::TExprNode const&>(doris::TExprNode const&) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exprs/vectorized_agg_fn.h:49
12# doris::vectorized::AggFnEvaluator::create(doris::ObjectPool*, doris::TExpr const&, doris::TSortInfo const&, doris::vectorized::AggFnEvaluator**) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exprs/vectorized_agg_fn.cpp:92
13# doris::vectorized::AggregationNode::init(doris::TPlanNode const&, doris::RuntimeState*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/vaggregation_node.cpp:158
14# doris::ExecNode::create_tree_helper(doris::RuntimeState*, doris::ObjectPool*, std::vector<doris::TPlanNode, std::allocator<doris::TPlanNode> > const&, doris::DescriptorTbl const&, doris::ExecNode*, int*, doris::ExecNode**) at /home/zcp/repo_center/doris_master/doris/be/src/exec/exec_node.cpp:276
15# doris::ExecNode::create_tree(doris::RuntimeState*, doris::ObjectPool*, doris::TPlan const&, doris::DescriptorTbl const&, doris::ExecNode**) at /home/zcp/repo_center/doris_master/doris/be/src/exec/exec_node.cpp:231
16# doris::pipeline::PipelineFragmentContext::prepare(doris::TPipelineFragmentParams const&, unsigned long) at /home/zcp/repo_center/doris_master/doris/be/src/pipeline/pipeline_fragment_context.cpp:253
17# doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&)::$_1::operator()(int) const at /home/zcp/repo_center/doris_master/doris/be/src/runtime/fragment_mgr.cpp:895
18# doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&)::$_0::operator()() const at /home/zcp/repo_center/doris_master/doris/be/src/runtime/fragment_mgr.cpp:926
19# void std::__invoke_impl<void, doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&)::$_0&>(std::__invoke_other, doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&)::$_0&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61
2023-07-03 10:58:13 +08:00
a3d34e1e08 [decimalv2](compatibility) add config to allow invalid decimalv2 literal (#21327) 2023-07-03 10:55:27 +08:00
Pxl
59c1bbd163 [Feature](materialized view) support query match mv with agg_state on nereids planner (#21067)
* support create mv contain aggstate column

* update

* update

* update

* support query match mv with agg_state on nereids planner

update

* update

* update
2023-07-03 10:19:31 +08:00
Pxl
f90e8fcb26 [Chore](storage) add debug info for TabletColumn::get_aggregate_function (#21408) 2023-07-03 10:02:44 +08:00
ca0953ea51 [improvement](join) Serialize build keys in a vectorized (columnar) way (#21361)
There is a significant performance improvement in serializing keys in the aggregate node through vectorization. Now, applying it to the join node also brings performance improvement.
2023-07-03 09:29:10 +08:00
1c961f2272 [refactor](load) move generate_delete_bitmap from memtable to beta rowset writer (#21329) 2023-07-01 17:22:45 +08:00
4ad3a7a8de [fix](exec) run exec_plan_fragment in pthread to avoid BE crash (#21343)
If there is only one fragment of a query plan, FE will call `exec_plan_fragment` rpc to BE.
And on BE side, the `exec_plan_fragment()` will be executed directly in bthread, but it may call
some JNI method like `AttachCurrentThread()`, which will return error in bthread.

So I modify the `exec_plan_fragment` to make sure it will be executed in pthread pool.
2023-07-01 12:29:22 +08:00
1fe04b7242 [Chore](metrics) remove trace metrics code using runtime profile instead (#21394)
* commit

* fix

* format
2023-07-01 12:18:23 +08:00
Pxl
88cbea2b56 [Bug](agg-state) fix core dump on not nullable argument for aggstate's nested argument (#21331)
fix core dump on not nullable argument for aggstate's nested argument
2023-06-30 18:20:25 +08:00
b7d6a70868 [FIX](datatype) Implement hash func with array/map/struct type (#21334)
we do not Implement any hash functions in array/map/struct column , so we use sql like this will make be core

select * from (
        select
            bdp.nc_num,
            collect_list(distinct(bd.catalog_name)) as catalog_name,
            material_qty
        from
            dataease.bu_delivery_product bdp
            left join dataease.bu_trans_transfer btt on bdp.delivery_product_id = btt.delivery_product_id
            left join dataease.bu_delivery bd on bdp.delivery_id = bd.delivery_id
        where
            bd.val_status in ('10', '20', '30', '90')
            and bd.delivery_type in (0, 1, 2)
        group by
            nc_num,
            material_qty
        union
        ALL
        select
            bdp.nc_num,
            collect_list(distinct(bd.catalog_name)) as catalog_name,
            material_qty
        from
            dataease.bu_trans_transfer btt
            left join dataease.bu_delivery_product bdp on bdp.delivery_product_id = btt.delivery_product_id
            left join dataease.bu_delivery bd on bdp.delivery_id = bd.delivery_id
        where
            bd.val_status in ('10', '20', '30', '90')
            and bd.delivery_type in (0, 1, 2)
        group by
            nc_num,
            material_qty
) aa;
core :
2023-06-30 17:11:35 +08:00
53f90cb2e3 [fix](load) fix tablet id in RowsetWriterContext (#21336) 2023-06-30 14:59:43 +08:00
25b5bab22d [fix](memory) Fix hash table buf initialize null pointer (#21315)
When compiling FunctionArrayEnumerateUniq::_execute_by_hash, AllocatorWithStackMemory::free(buf)
will be called when delete HashMapContainer. the gcc compiler will think that size > N and buf is not heap memory,
and report an error ' void free(void*)' called on unallocated object 'hash_map'

This only fails on doris docker + gcc 11.1, no problem on doris docker + clang 16.0.1,
no problem on ldb_toolchanin gcc 11.1 and clang 16.0.1.
2023-06-30 14:50:53 +08:00
1ac724c2dd [enhance](BufferedReader) don't blocking wait on buffered reader's condition variable (#21153) 2023-06-30 14:34:27 +08:00
d76fa427a3 [improve](jsonb)Invalid json path prompts an error instead of null (#19646)
1. Invalid json path prompts an error instead of null:
before:
```sql
mysql> SELECT jsonb_extract('[{"k1":"v41","k2":400},1,"a",3.14]', '$[a]');
+-------------------------------------------------------------+
| jsonb_extract('[{"k1":"v41","k2":400},1,"a",3.14]', '$[a]') |
+-------------------------------------------------------------+
| NULL                                                        |
+-------------------------------------------------------------+
1 row in set (0.01 sec)
```
now
```sql
mysql> SELECT jsonb_extract('[{"k1":"v41","k2":400},1,"a",3.14]', '$[a]');
ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INVALID_ARGUMENT]Json path error: Invalid Json Path for value: $[a]
```
2. fix some problem: https://github.com/apache/doris/pull/19185
   a. support negative numbers
```sql
mysql> SELECT jsonb_extract('[{"k1":"v41","k2":400},1,"a",3.14]', '$[-2]');
+--------------------------------------------------------------+
| jsonb_extract('[{"k1":"v41","k2":400},1,"a",3.14]', '$[-2]') |
+--------------------------------------------------------------+
| "a"                                                          |
+--------------------------------------------------------------+
1 row in set (0.02 sec)
```
  b. Avoid using unnecessary memory
3. Supplementary regression test
2023-06-30 14:29:21 +08:00
df23ab3f29 [Enhancement](tvf) Add authentication for workload group tvf (#21323) 2023-06-30 12:56:23 +08:00
2fcb0e090b [Fix](Snapshot) Shoule use false instead of 0 in while loop (#20966) 2023-06-30 10:22:51 +08:00
33fa5dd1e9 [fix](cast) fix coredump of cast string of invalid datetime (#21350)
For sql like select cast("627492340" as datetime); the string is an invalid datetime, function DateV2Value<T>::from_date_str cast it as datetime 2062-74-92 23:40:00, with an out-of-range month and day value, which cause memory violation in function DateV2Value<T>::format_datetime when trying to access s_days_in_month.

==256444==ERROR: AddressSanitizer: global-buffer-overflow on address 0x55a7c1a5cff8 at pc 0x55a7e5aa3d2a bp 0x7f3b805f0370 sp 0x7f3b805f0368
READ of size 4 at 0x55a7c1a5cff8 thread T390 (FragmentMgrThre)
    #0 0x55a7e5aa3d29 in doris::vectorized::DateV2Value<doris::vectorized::DateTimeV2ValueType>::format_datetime(unsigned int*, bool*) const /home/zcp/repo_center/doris_master/doris/be/src/vec/runtime/vdatetime_value.cpp:1821:31
    #1 0x55a7e5aa3052 in doris::vectorized::DateV2Value<doris::vectorized::DateTimeV2ValueType>::from_date_str(char const*, int, int) /home/zcp/repo_center/doris_master/doris/be/src/vec/runtime/vdatetime_value.cpp:1968:5
    #2 0x55a7d48f0c49 in bool doris::vectorized::read_datetime_v2_text_impl<unsigned long>(unsigned long&, doris::vectorized::ReadBuffer&, unsigned int) /home/zcp/repo_center/doris_master/doris/be/src/vec/io/io_helper.h:309:19
    #3 0x55a7ddb21642 in bool doris::vectorized::try_read_datetime_v2_text<unsigned long>(unsigned long&, doris::vectorized::ReadBuffer&, unsigned int) /home/zcp/repo_center/doris_master/doris/be/src/vec/io/io_helper.h:409:12
    #4 0x55a7ddb215ec in bool doris::vectorized::try_parse_impl<doris::vectorized::DataTypeDateTimeV2, unsigned int, void*>(doris::vectorized::DataTypeDateTimeV2::FieldType&, doris::vectorized::ReadBuffer&, DateLUTImpl const*, unsigned int) /home/zcp/repo_center/doris_master/doris/be/src/vec/functions/function_cast.h:839:16
    #5 0x55a7ddb21c84 in auto doris::Status doris::vectorized::ConvertThroughParsing<doris::vectorized::DataTypeString, doris::vectorized::DataTypeDateTimeV2, doris::vectorized::NameCast>::execute<void*>(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long>> const&, unsigned long, unsigned long, bool, void*)::'lambda'(void*, auto)::operator()<std::integral_constant<bool, false>, std::integral_constant<bool, true>>(void*, auto) const /home/zcp/repo_center/doris_master/doris/be/src/vec/functions/function_cast.h:1340:38
    #6 0x55a7ddb216f7 in void* std::__invoke_impl<doris::Status, doris::Status doris::vectorized::ConvertThroughParsing<doris::vectorized::DataTypeString, doris::vectorized::DataTypeDateTimeV2, doris::vectorized::NameCast>::execute<void*>(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long>> const&, unsigned long, unsigned long, bool, void*)::'lambda'(void*, auto), std::integral_constant<bool, false>, std::integral_constant<bool, true>>(std::__invoke_other, auto&&, std::integral_constant<bool, false>&&, std::integral_constant<bool, true>&&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14
    #7 0x55a7ddb2167f in std::__invoke_result<void*, std::integral_constant<bool, false>, std::integral_constant<bool, true>>::type std::__invoke<doris::Status doris::vectorized::ConvertThroughParsing<doris::vectorized::DataTypeString, doris::vectorized::DataTypeDateTimeV2, doris::vectorized::NameCast>::execute<void*>(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long>> const&, unsigned long, unsigned long, bool, void*)::'lambda'(void*, auto), std::integral_constant<bool, false>, std::integral_constant<bool, true>>(void*&&, std::integral_constant<bool, false>&&, std::integral_constant<bool, true>&&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96:14
    #8 0x55a7ddb20d14 in std::__detail::__variant::__gen_vtable_impl<std::__detail::__variant::_Multi_array<std::__detail::__variant::__deduce_visit_result<doris::Status> (*)(doris::Status doris::vectorized::ConvertThroughParsing<doris::vectorized::DataTypeString, doris::vectorized::DataTypeDateTimeV2, doris::vectorized::NameCast>::execute<void*>(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long>> const&, unsigned long, unsigned long, bool, void*)::'lambda'(void*, auto)&&, std::variant<std::integral_constant<bool, false>, std::integral_constant<bool, true>>&&, std::variant<std::integral_constant<bool, false>, std::integral_constant<bool, true>>&&)>, std::integer_sequence<unsigned long, 0ul, 1ul>>::__visit_invoke(doris::Status doris::vectorized::ConvertThroughParsing<doris::vectorized::DataTypeString, doris::vectorized::DataTypeDateTimeV2, doris::vectorized::NameCast>::execute<void*>(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long>> const&, unsigned long, unsigned long, bool, void*)::'lambda'(void*, auto)&&, std::variant<std::integral_constant<bool, false>, std::integral_constant<bool, true>>&&, std::variant<std::integral_constant<bool, false>, std::integral_constant<bool, true>>&&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/variant:1013:11
    #9 0x55a7ddb20c15 in decltype(auto) std::__do_visit<std::__detail::__variant::__deduce_visit_result<doris::Status>, doris::Status doris::vectorized::ConvertThroughParsing<doris::vectorized::DataTypeString, doris::vectorized::DataTypeDateTimeV2, doris::vectorized::NameCast>::execute<void*>(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long>> const&, unsigned long, unsigned long, bool, void*)::'lambda'(void*, auto), std::variant<std::integral_constant<bool, false>, std::integral_constant<bool, true>>, std::variant<std::integral_constant<bool, false>, std::integral_constant<bool, true>>>(auto&&, std::variant<std::integral_constant<bool, false>, std::integral_constant<bool, true>>&&, std::variant<std::integral_constant<bool, false>, std::integral_constant<bool, true>>&&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/variant:1714:14
    #10 0x55a7ddb20b6a in decltype(auto) std::visit<doris::Status doris::vectorized::ConvertThroughParsing<doris::vectorized::DataTypeString, doris::vectorized::DataTypeDateTimeV2, doris::vectorized::NameCast>::execute<void*>(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long>> const&, unsigned long, unsigned long, bool, void*)::'lambda'(void*, auto), std::variant<std::integral_constant<bool, false>, std::integral_constant<bool, true>>, std::variant<std::integral_constant<bool, false>, std::integral_constant<bool, true>>>(void*&&, std::variant<std::integral_constant<bool, false>, std::integral_constant<bool, true>>&&, std::variant<std::integral_constant<bool, false>, std::integral_constant<bool, true>>&&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/variant:1769:9
    #11 0x55a7ddb205ff in doris::Status doris::vectorized::ConvertThroughParsing<doris::vectorized::DataTypeString, doris::vectorized::DataTypeDateTimeV2, doris::vectorized::NameCast>::execute<void*>(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long>> const&, unsigned long, unsigned long, bool, void*) /home/zcp/repo_center/doris_master/doris/be/src/vec/functions/function_cast.h:1321:23
    #12 0x55a7ddb1f2c7 in doris::vectorized::FunctionConvertFromString<doris::vectorized::DataTypeDateTimeV2, doris::vectorized::NameCast>::execute_impl(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long>> const&, unsigned long, unsigned long) /home/zcp/repo_center/doris_master/doris/be/src/vec/functions/function_cast.h:1417:20
2023-06-30 10:12:31 +08:00
a3033bff42 [Fix](s3FileWriter) fix bytes_appended bug for s3_file_writer (#21348) 2023-06-29 22:06:49 +08:00
2643f3a167 [fix](merge-on-write) fix dead lock when publish (#21339) 2023-06-29 20:58:47 +08:00
d00326549f [Fix](inverted index) fix a bundle of inverted index exception process errors (#21328)
1. fix alloc/free mismatch problem.
2. fix buffer use after free problem.
2023-06-29 19:47:55 +08:00
41ccf77c7d [feature][fix](fs)(s3)add fs_s3 benchmark tool and fix s3 file writer bug (#20926)
1.
Fix bug that the field of s3_file_write_bufferpool is not initialized, causing undefined behavior.

2.
add fs_s3 benchmark tool,Reference to the usage of tools https://github.com/apache/doris/pull/20770
And opt the output:

`sh bin/run-fs-benchmark.sh --conf=conf/s3.conf --fs_type=s3 --operation=single_read --threads=1 --iterations=1`

```
------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                    Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------------------------------
S3ReadBenchmark/iterations:1/repeats:3/manual_time/threads:1              7366 ms          123 ms            1 ReadRate(B/S)=12.1823M/s ReadTime(S)=7.36572 ReadTotal(B)=89.7314M
S3ReadBenchmark/iterations:1/repeats:3/manual_time/threads:1              6163 ms          116 ms            1 ReadRate(B/S)=14.5597M/s ReadTime(S)=6.16299 ReadTotal(B)=89.7314M
S3ReadBenchmark/iterations:1/repeats:3/manual_time/threads:1              6048 ms          110 ms            1 ReadRate(B/S)=14.8366M/s ReadTime(S)=6.04796 ReadTotal(B)=89.7314M
S3ReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_mean         6526 ms          116 ms            3 ReadRate(B/S)=13.8596M/s ReadTime(S)=6.52556 ReadTotal(B)=89.7314M
S3ReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_median       6163 ms          116 ms            3 ReadRate(B/S)=14.5597M/s ReadTime(S)=6.16299 ReadTotal(B)=89.7314M
S3ReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_stddev        730 ms         6.68 ms            3 ReadRate(B/S)=1.45914M/s ReadTime(S)=0.729876 ReadTotal(B)=0
S3ReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_cv          11.18 %          5.75 %             3 ReadRate(B/S)=10.53% ReadTime(S)=11.18% ReadTotal(B)=0.00%
S3ReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_max          7366 ms          123 ms            3 ReadRate(B/S)=14.8366M/s ReadTime(S)=7.36572 ReadTotal(B)=89.7314M
S3ReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_min          6048 ms          110 ms            3 ReadRate(B/S)=12.1823M/s ReadTime(S)=6.04796 ReadTotal(B)=89.7314M
```
2023-06-29 19:03:49 +08:00
8c532e8808 [fix](restore) work around, ingest binlog after backup/restore which local_tablet.partition_id is not correct, use req.partition_id (#21288)
* work around, ingest binlog after backup/restore which local_tablet.partition_id is not correct, use by
req.partition_id

Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-06-29 17:19:02 +08:00
Pxl
45f1909bc3 [Bug](lateral-view) make lateral view function's nullable mode work (#21242)
make lateral view function's nullable mode work
2023-06-29 10:50:07 +08:00
7f0e37069f [improvement](olap) filter the whole segment by dictionary (#21239) 2023-06-29 10:34:29 +08:00
3f99b91ddf [fix](gc_binlog) Fix tablet gc_binlogs nullptr (#21158) 2023-06-29 10:10:33 +08:00
Pxl
f8cfe5e579 [Bug](pipeline) add DCHECK for _instance_to_sending_by_pipeline = false on _send_rpc (#21169)
add DCHECK for _instance_to_sending_by_pipeline = false on _send_rpc
2023-06-29 10:03:57 +08:00
86af533e83 [Enhancement](heartbeat) make heartbeat ok when config repeated host-ip pairs (#21228) 2023-06-28 23:12:06 +08:00
a6b51ec19a [Feature](avro) Support Apache Avro file format (#19990)
support read avro file by hdfs() or s3() .
```sql
select * from s3(
         "uri" = "http://127.0.0.1:9312/test2/person.avro",
         "ACCESS_KEY" = "ak",
         "SECRET_KEY" = "sk",
         "FORMAT" = "avro");
+--------+--------------+-------------+-----------------+
| name   | boolean_type | double_type | long_type       |
+--------+--------------+-------------+-----------------+
| Alyssa |            1 |     10.0012 | 100000000221133 |
| Ben    |            0 |    5555.999 |      4009990000 |
| lisi   |            0 | 5992225.999 |      9099933330 |
+--------+--------------+-------------+-----------------+

select * from hdfs(
                "uri" = "hdfs://127.0.0.1:9000/input/person2.avro",
                "fs.defaultFS" = "hdfs://127.0.0.1:9000",
                "hadoop.username" = "doris",
                "format" = "avro");
+--------+--------------+-------------+-----------+
| name   | boolean_type | double_type | long_type |
+--------+--------------+-------------+-----------+
| Alyssa |            1 |  8888.99999 |  89898989 |
+--------+--------------+-------------+-----------+
```

current avro reader only support common data type, the complex data types will be supported later.
2023-06-28 21:15:35 +08:00
d2c42ec638 [fix](memory) Purge Jemalloc arena dirty pages when memory insufficient (#21237)
Jemalloc dirty page only use madvise MADV_FREE, memory is not release back to system, RSS won't reduce in time,

So when the process memory exceed limit or system available memory is insufficient,
manually transfer dirty page to the muzzy page, which will call MADV_DONTNEED to release the physical memory back to the system.

https://jemalloc.net/jemalloc.3.html#opt.dirty_decay_ms
2023-06-28 16:49:45 +08:00
0396f78590 [fix](memory) Remove ChunkAllocator & fix Allocator no use mmap (#21259) 2023-06-28 16:10:24 +08:00