doris

Author	SHA1	Message	Date
zy-kkk	5fb27eb652	[fix](compile) fix BE compile failure on Mac (#27206 )	2023-11-17 23:52:51 +08:00
huanghaibin	5d548935e0	[improvement](insert) support schema change and decommission for group commit (#26359 )	2023-11-17 21:41:38 +08:00
Mingyu Chen	c459408580	[fix](jni) avoid BE crash and NPE when close paimon reader (#27129 ) 1. Do not use FATAL log when jni encounter error, to avoid crash. 2. Fix NPE when closing PaimonReader, the reader may not be assigned if PaimonReader open failed.	2023-11-17 20:01:08 +08:00
Ashin Gau	52995c528e	[fix](iceberg) iceberg use customer method to encode special characters of field name (#27108 ) Fix two bugs: 1. Missing column is case sensitive, change the column name to lower case in FE for hive/iceberg/hudi 2. Iceberg use custom method to encode special characters in column name. Decode the column name to match the right column in parquet reader.	2023-11-17 18:38:55 +08:00
airborne12	593e3662b0	[Fix](match) fix match null for no index (#26983 ) This pull request addresses an issue observed with inverted index tables or tables without indices when querying null values using the MATCH function. Previously, executing a query like `SELECT * FROM table WHERE column MATCH null;` would yield incorrect results. The update introduces enhanced handling of nullable columns within the MATCH function, ensuring accurate query results when null values are involved.	2023-11-17 15:57:50 +08:00
Qi Chen	a0661ed9d2	[Fix](multi-catalog) Fix complex type crash when using dict filter facility in the parquet-reader. (#27151 ) - Fix complex type crash when using the dict filter facility in the parquet-reader by turning off the dict filter facility in this case. - Add orc complex types regression test.	2023-11-17 13:43:58 +08:00
Mryange	91af86bc78	[fix](function) fix error when use negative number in explode_numbers #27020	2023-11-17 12:02:14 +08:00
Chester	334260dff7	[feature](function) support ip function ipv4stringtonum(ordefault, ornull), inet_aton (#25510 )	2023-11-17 10:27:07 +08:00
lihangyu	a4d78682ff	[Optimize](point query) clear names to reduce mem consumption and cpu cost related to block column name (#26931 )	2023-11-17 10:18:21 +08:00
Kaijie Chen	afffcfd14c	[fix](load) skip cancel already cancelled channels (#27111 )	2023-11-16 18:38:40 +08:00
Kaijie Chen	e29d8cb110	[feature](move-memtable) support pipelineX in sink v2 (#27067 )	2023-11-16 15:00:55 +08:00
Jerry Hu	3ad865fef9	[refactor](storage) Expressing the types of computation layer and storage layer in PrimitiveTypeTraits (#26191 )	2023-11-15 21:34:49 +08:00
wangbo	035e593b26	remove useless hash function (#26955 )	2023-11-15 20:37:21 +08:00
caiconghui	83edcdead9	[enhancement](random_sink) change tablet search algorithm from random to round-robin for random distribution table (#26611 ) 1. fix race condition problem when get tablet load index 2. change tablet search algorithm from random to round-robin for random distribution table when load_to_single_tablet set to false	2023-11-15 19:55:31 +08:00
Qi Chen	0491437a86	[Opt](scanner-scheduler) Optimize `BlockingQueue`, `BlockingPriorityQueue` and change remote scan thread pool. (#26784 ) ## Proposed changes - Optimize `BlockingQueue`, `BlockingPriorityQueue` by swapping `notify` and `unlock` to reduce lock competition. Ref: https://www.boost.org/doc/libs/1_54_0/boost/thread/sync_bounded_queue.hpp - Change remote scan thread pool to `PriorityQueue`. ### Test result Before: ``` mysql> select sum(lo_partkey) from lineorder; +-----------------+ \| sum(lo_partkey) \| +-----------------+ \| 300021444265405 \| +-----------------+ 1 row in set (1.11 sec) ``` After: ``` mysql> select sum(lo_partkey) from lineorder; +-----------------+ \| sum(lo_partkey) \| +-----------------+ \| 300021444265405 \| +-----------------+ 1 row in set (0.80 sec) ```	2023-11-15 18:24:36 +08:00
TengJianPing	00896d8954	[fix](agg) fix coredump of multi distinct of decimal128I (#27014 ) * [fix](agg) fix coredump of multi distinct of decimal128 * fix	2023-11-15 17:37:20 +08:00
Jerry Hu	6183b298e1	[refactor](data_type) remove some unused functions (#26966 )	2023-11-15 09:23:53 +08:00
Jerry Hu	cdef768629	[fix](sink) crash caused by wild pointer of counter in VDataStreamSender (#26947 ) If preparation fails, the counter _peak_memory_usage_counter will be a wild pointer. * SIGSEGV address not mapped to object (@0x454d49545f) received by PID 16992 (TID 18856 OR 0x7f4d05444700) from PID 1296651359; stack trace: * 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t, void) at /root/doris/be/src/common/signal_handler.h:417 1# os::Linux::chained_handler(int, siginfo, void) in /app/doris/Nexchip-doris-1.2.4.2-bin-x86_64/java8/jre/lib/amd64/server/libjvm.so 2# JVM_handle_linux_signal in /app/doris/Nexchip-doris-1.2.4.2-bin-x86_64/java8/jre/lib/amd64/server/libjvm.so 3# signalHandler(int, siginfo, void) in /app/doris/Nexchip-doris-1.2.4.2-bin-x86_64/java8/jre/lib/amd64/server/libjvm.so 4# 0x00007F55C85B9400 in /lib64/libc.so.6 5# doris::vectorized::VDataStreamSender::close(doris::RuntimeState, doris::Status) at /root/doris/be/src/vec/sink/vdata_stream_sender.cpp:734 6# doris::PlanFragmentExecutor::close() at /root/doris/be/src/runtime/plan_fragment_executor.cpp:543 7# doris::PlanFragmentExecutor::~PlanFragmentExecutor() at /root/doris/be/src/runtime/plan_fragment_executor.cpp:95 8# doris::FragmentExecState::~FragmentExecState() at /root/doris/be/src/runtime/fragment_mgr.cpp:112 9# std::_Sp_counted_ptr<doris::FragmentExecState, (__gnu_cxx::_Lock_policy)2>::_M_dispose() at /root/ldb/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:348 10# doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&, std::function<void (doris::RuntimeState, doris::Status)> const&) at /root/doris/be/src/runtime/fragment_mgr.cpp:855 11# doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&) at /root/doris/be/src/runtime/fragment_mgr.cpp:592 12# doris::PInternalServiceImpl::_exec_plan_fragment_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, doris::PFragmentRequestVersion, bool) at /root/doris/be/src/service/internal_service.cpp:463 13# doris::PInternalServiceImpl::_exec_plan_fragment_in_pthread(google::protobuf::RpcController, doris::PExecPlanFragmentRequest const, doris::PExecPlanFragmentResult, google::protobuf::Closure) at /root/doris/be/src/service/internal_service.cpp:305 14# doris::WorkThreadPool<false>::work_thread(int) at /root/doris/be/src/util/work_thread_pool.hpp:160 15# execute_native_thread_routine at ../../../../../libstdc++-v3/src/c++11/thread.cc:84 16# start_thread in /lib64/libpthread.so.0 17# clone in /lib64/libc.so.6	2023-11-14 19:05:49 +08:00
daidai	3585c7e216	[test](parquet)append parquet reader byte_array_decimal and rle_bool case (#26751 )	2023-11-14 15:05:10 +08:00
Kaijie Chen	39473cdf48	[performance](load) add vertical segment writer (#24403 )	2023-11-14 11:53:09 +08:00
Kaijie Chen	f6a9914bc7	[feature](move-memtable) support auto partition in sink v2 (#26914 )	2023-11-14 11:39:44 +08:00
Xinyi Zou	de6ecd2035	[fix](tls) Manually track memory in Allocator instead of mem hook and ThreadContext life cycle to manual control (#26904 ) Manually track query/load/compaction/etc. memory in Allocator instead of mem hook. Can still use Mem Hook when cannot manually track memory code segments and find memory locations during debugging. This will cause memory tracking loss for Query, loss less than 10% compared to the past, but this is expected to be more controllable. Similarly, Mem Hook will no longer track unowned memory to the orphan mem tracker by default, so the total memory of all MemTrackers will be less than before. Not need to get memory size from jemalloc in Mem Hook each memory alloc and free, which would lose performance in the past. Not require caching bthread local in pthread local for memory hook, in the past this has caused core dumps inside bthread, seems to be a bug in bthread. ThreadContext life cycle to manual control In the past, ThreadContext was automatically created when it was used for the first time (this was usually in the Jemalloc Hook when the first malloc memory), and was automatically destroyed when the thread exited. Now instead of manually controlling the create and destroy of ThreadContext, it is mainly created manually when the task thread start and destroyed before the task thread end. Run 43 clickbench query tests. Use MemHook in the past:	2023-11-14 10:30:42 +08:00
Ashin Gau	ec40603b93	[fix](parquet) compressed_page_size has the same meaning in page v1 and v2 (#26783 ) 1. Parquet with page v2 is parsed error when using other codec except snappy. Because `compressed_page_size` has the same meaning in page v1 and v2, it always contains the bytes of definition level, repetition level and compressed data. 2. Add regression test for `fix_length_byte_array` stored decimal type, and dictionary encoded date/datetime type.	2023-11-14 08:30:42 +08:00
Kaijie Chen	b19abac5e2	[fix](move-memtable) pass num local sink to backends (#26897 )	2023-11-14 08:28:49 +08:00
Kaijie Chen	de62c00f4e	[fix](move-memtable) init auto partition context in VRowDistribution::open (#26911 )	2023-11-14 08:16:14 +08:00
Yongqiang YANG	5ad49dceaa	[fix](scanner_schedule) scanner hangs due to negative num_running_scanners (#26816 ) * [fix] scanner hangs due to negative num_running_scanners Before the patch, num_running_scanners is increased after submitting, then it may be decreased before increasing then negative values can be seen by get_block_from_queue and a expected submit does not happend. Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>	2023-11-13 23:03:49 +08:00
TengJianPing	504ec324bb	Revert "[refactor](scan) delete bloom_filter_predicate (#26499 )" (#26851 ) This reverts commit 2bb3ef198144954583aea106591959ee09932cba.	2023-11-13 16:27:23 +08:00
zy-kkk	2f32a721ee	[refactor](jni) unified jni framework for jdbc catalog (#26317 ) This commit overhauls the JDBC connector logic within our project, transitioning from the previous mechanism of fetching data through JNI calls for individual ResultSet items to a more efficient and unified approach using the VectorTable data structure.	2023-11-13 14:28:15 +08:00
Jerry Hu	fa3c7d98c8	[fix](map) the implementation of ColumnMap::replicate was incorrect" (#26647 )	2023-11-13 12:17:14 +08:00
meiyi	c0fda8c5c2	[improve](group commit) Add a swicth to wait internal group commit lo… (#26734 ) * [improve](group commit) Add a swicth to make internal group commit load finish * modify group commit tvf plan	2023-11-13 10:35:35 +08:00
TengJianPing	7332b1b371	[fix](decimal) fix undefined behaviour of divide by zero when cast string to decimal (#26822 ) * [fix](decimal) fix undefined behaviour of divide by zero when cast string to decimal * fix format	2023-11-13 10:09:06 +08:00
Yongqiang YANG	d9e0a9fa2e	[enhancement](230) print max version and spec version when -230 happens (#26643 ) More information is provided.	2023-11-13 09:57:22 +08:00
Mingyu Chen	66054a5c78	[opt](scanner) increase the connection num of s3 client (#26795 )	2023-11-12 00:29:11 -06:00
yiguolei	8cf360fff7	[refactor](closure) remove ref count closure using auto release closure (#26718 ) 1. closure should be managed by a unique ptr and released by brpc , should not hold by our code. If hold by our code, we need to wait brpc finished during cancel or close. 2. closure should be exception safe, if any exception happens, should not memory leak. 3. using a specific callback interface to be implemented by Doris's code, we could write any code and doris should manage callback's lifecycle. 4. using a weak ptr between callback and closure. If callback is deconstruted before closure'Run, should not core.	2023-11-12 11:57:46 +08:00
Siyang Tang	196fadc044	[enhancement](metrics) enhance visibility of flush thread pool (#26544 )	2023-11-11 19:53:24 +08:00
zclllyybb	2bf48d7829	Revert "[Coverage](BE) Delete vinfo_func in BE (#26562 )" (#26723 ) This reverts commit 01094fd25ed539a8025066d8823c1e907109048a.	2023-11-10 10:14:11 +08:00
Gabriel	d988193d39	[pipelineX](shuffle) block exchange sink by memory usage (#26595 )	2023-11-09 21:28:22 +08:00
Qi Chen	c07a70e22a	[Fix](orc-reader) Add missing `break` introduced by #26548 . (#26633 ) Add missing break introduced by #26548. Sorry for this mistake.	2023-11-09 18:29:44 +08:00
zhiqiang	a5565f68b2	[Refactor](opentelemetry) Remove opentelemetry (#26605 )	2023-11-09 18:05:34 +08:00
wudongliang	22bf2889e5	[feature](tvf)(jni-avro)jni-avro scanner add complex data types (#26236 ) Support avro's enum, record, union data types	2023-11-09 13:58:49 +08:00
Qi Chen	d1438a8563	[Fix](orc-reader) Fix orc complex types when late materialization was turned on by disabling late materialization in this case. (#26548 ) Fix orc complex types when late materialization was turned on in orc reader by disabling late materialization in this case.	2023-11-09 12:05:43 +08:00
zclllyybb	01094fd25e	[Coverage](BE) Delete vinfo_func in BE (#26562 ) Delete vinfo_func in BE	2023-11-09 11:00:15 +08:00
zhangstar333	74e452f19c	[bug](bitmap) fix bitmap value copy operator not call reset (#26451 ) when a empty bitmap assign to other bitmap the other bitmap should reset self firstly, and then set empty type.	2023-11-09 10:05:09 +08:00
zhiqiang	55b2988bfd	[Opt](date_add/sub) Throw exception when result of date_add/sub out of range (#26475 )	2023-11-09 08:46:51 +08:00
Qi Chen	3bce6d3828	[Opt](orc-reader) Optimize orc string dict filter in not_single_conjunct case. (#26386 ) Optimize orc/parquet string dict filter in not_single_conjunct case. We can optimize this processing to filter block firstly by dict code, then filter by not_single_conjunct. Because dict code is int, it will filter faster than string. For example: ``` select count(l_receiptdate) from lineitem_date_as_string where l_shipmode in ('MAIL', 'SHIP') and l_commitdate < l_receiptdate and l_receiptdate >= '1994-01-01' and l_receiptdate < '1995-01-01'; ``` `l_receiptdate` and `l_shipmode` will using string dict filtering, and `l_commitdate < l_receiptdate` is the an not_single_conjunct which contains dict filter field. We can optimize this processing to filter block firstly by dict code, then filter by not_single_conjunct. Because dict code is int, it will filter faster than string. ### Test Result: Before: mysql> select count(l_receiptdate) from lineitem_date_as_string where l_shipmode in ('MAIL', 'SHIP') and l_commitdate < l_receiptdate and l_receiptdate >= '1994-01-01' and l_receiptdate < '1995-01-01'; +----------------------+ \| count(l_receiptdate) \| +----------------------+ \| 49314694 \| +----------------------+ 1 row in set (6.87 sec) After: mysql> select count(l_receiptdate) from lineitem_date_as_string where l_shipmode in ('MAIL', 'SHIP') and l_commitdate < l_receiptdate and l_receiptdate >= '1994-01-01' and l_receiptdate < '1995-01-01'; +----------------------+ \| count(l_receiptdate) \| +----------------------+ \| 49314694 \| +----------------------+ 1 row in set (4.85 sec)	2023-11-08 18:03:18 +08:00
Kaijie Chen	58bf79f79e	[fix](move-memtable) pass load stream num to backends (#26198 )	2023-11-08 16:16:33 +08:00
TengJianPing	a3666aa87e	[feature](decimal) support decimal256 when creating table (#26308 )	2023-11-08 15:21:01 +08:00
lihangyu	44b51bf0b9	[Feature](Variant) support variant load (#26572 )	2023-11-08 00:37:57 -06:00
Yongqiang YANG	a2419a8eb4	[enhancement](sink) refactor code of auto partition and where clause and enable them on sinkv2 (#26432 ) For better performance and elasticity, we move memtable from loadchannel to sink, VTabletSinkV2 is introduced, then there are VTabletWriter and VTabletSinkV2 distributing rows to tablets. where clauses on mvs are executed in VTabletWriter, while VTabletSinkV2 needs it too. So common code is moved to row distribution. Actually, we can layer code by rows' data flow, then the code is much more understood and maintainable. ScanNode -> Sink/Writer (RowDistribution -> IndexChannel / DeltaWriter)	2023-11-08 11:51:40 +08:00
Xinyi Zou	1544110c1b	[feature-wip](arrow-flight)(step4) Support other DML and DDL statements, besides `Select` (#25919 ) Design Documentation Linked to #25514	2023-11-08 10:50:42 +08:00

1 2 3 4 5 ...

2512 Commits