Commit Graph

5049 Commits

Author SHA1 Message Date
a0463ea047 [round](decimalv2) round decimalv2 to precision value (#22138)
* [round](decimalv2) round decimalv2 to precision value

* update

* update`
2023-07-25 03:29:48 +08:00
752cec9e19 [Fix](multi-catalog) Fix not single slot filter conjuncts with dict filter issue. (#22052)
### Issue
Dictionary filtering is a mechanism that directly reads the dictionary encoding of a single string column filter condition for filter comparison. But dictionary filtered single string columns may be included in other multi-column filter conditions. This can cause problems.

For example:
`select * from multi_catalog.lineitem_string_date_orc where l_commitdate < l_receiptdate and l_receiptdate = '1995-01-01'  order by l_orderkey, l_partkey, l_suppkey, l_linenumber limit 10;`

`l_receiptdate` is string filter column,it is included by multi-column filter condition `l_commitdate < l_receiptdate`.

### Solution
Resolve it by separating the multi-column filter conditions and executing it after the dictionary filter column is converted to string.
2023-07-24 22:31:18 +08:00
7fcf702081 [improvement](multi catalog)paimon support filesystem metastore (#21910)
1.support filesystem metastore

2.support predicate and project when split

3.fix partition table query error

todo: Now you need to manually put paimon-s3-0.4.0-incubating.jar in be/lib/java_extensions when use s3 filesystem

doc pr: #21966
2023-07-24 22:02:57 +08:00
30c21789c8 [opt](filecache) use weak_ptr to cache the file handle of file segment (#21975)
Use weak_ptr to cache the file handle of file segment. The max cached number of file handles can be configured by `file_cache_max_file_reader_cache_size`, default `1000000`.
Users can inspect the number of cached file handles by request BE metrics: `http://be_host:be_webserver_port/metrics`:
```
# TYPE doris_be_file_cache_segment_reader_cache_size gauge
doris_be_file_cache_segment_reader_cache_size{path="/mnt/datadisk1/gaoxin/file_cache"} 2500
```
2023-07-24 19:09:27 +08:00
d2531db1cf [fix](inverted index) fix regression case test_index_change_7 occasional failure (#22066) 2023-07-24 15:39:08 +08:00
e146969376 [Fix](config) delete unuse lazy open config #22136 2023-07-24 15:02:34 +08:00
99bf901607 [fix](in) throw exception for unsupported data type of in expr (#22050) 2023-07-24 14:13:31 +08:00
86e80ae175 [enhancement](merge-on-write) support concurrent delete bitmap calc while close_wait (#21488) 2023-07-24 10:09:28 +08:00
Pxl
19ba6bec38 [Improvement](pipeline) support send eos on local exchange and remove some unused code (#22086)
support send eos on local exchange and remove some unused code
2023-07-24 09:25:32 +08:00
d0219062ef [refactor](be) use std::move to improve performance of push_back #22056 2023-07-24 08:51:28 +08:00
0396ac9d38 fix(compaction) release the block and segment iterator after reading to the end of the segment file (#22082)
When reading to the end of the segment file, clearing the block did not release the memory, leading to high memory usage during compaction.

When reading through segment file for columns that are dictionary encoded, the column iterator in the segment iterator will hold the dictionary. Release the segment iterator to free up the dictionary.
2023-07-24 08:47:19 +08:00
ddd7e9871d [improvement](Jsonb) optimization Jsonb path parse (#21495)
The previous logic was to read jsonbvalue while parsing the json path. For complex json paths, there will be a lot of repeated parsing work. The optimization idea is to separate the analysis and value of jsonpath
2023-07-23 18:59:12 +08:00
4f0158c458 [fix](partial-update) fix update core for merge-on-write table (#22090) 2023-07-23 13:35:08 +08:00
2c16fe0da9 [bugfix](runtimefilter) runtime filter is shared between multi instances with same node id, should not cache exprs (#22114)
runtime filter is shared among multi instances.
in the past, we cached pushdown expr(runtime filter generated)
every scannode[runtime filter consumer] will try to call prepare expr
but the expr may generated with different fn_context_id
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-07-23 13:04:33 +08:00
256051a965 [bug](node) fix partiton sort node core dump when eos (#22108)
fix partiton sort node core dump when eos
2023-07-23 12:00:53 +08:00
f8307f1a1a [bugfix](scanner) when scanner init failed during get tablet, not need call update counters (#22117)
Co-authored-by: yiguolei <yiguolei@gmail.com>
If the scanner is failed during init or open, then not need update counters because the query is fail and the counter is useless.
And it may core during update counters. For example, update counters depend on scanner's tablet, but the tablet == null when init failed.
2023-07-23 10:19:20 +08:00
Pxl
ae809fbeba [Bug](storage )fix dead lock when create_tablet need lock two tablet && update mv_p0… (#21969)
fix dead lock when create_tablet need lock two tablet && update mv_p0/ssb case
2023-07-22 15:27:05 +08:00
50c8563f35 [fix](partial update) fix some bugs of sequence column (#21896) 2023-07-22 15:26:48 +08:00
f7e3cc1553 [FIX](map)fix map proto contains_null #22107
when we select map in order by and limit; be node will coredump
2023-07-22 10:41:55 +08:00
afeac4419f [Bug](node) fix partition sort node forget handle some type of key in hashmap (#22037)
* [enhancement](repeat) add filter in repeat node in BE

* update
2023-07-21 23:30:40 +08:00
f7ac827c90 [fix](compaction) fix time series compaction point policy (#21670) 2023-07-21 23:09:02 +08:00
ef01988ae1 [opt](inverted index) support the same column create different type index (#21972) 2023-07-21 23:02:39 +08:00
bed940b7fc [fix](log) column index off-by-one error in scanner logs (#19747) 2023-07-21 18:30:01 +08:00
40299d280d [Fix](json reader) fix rapidjson array->PushBack may take ownership… (#21988)
With bellow json path
`["$.data","$.data.datatimestamp"]`

After `array_obj->PushBack` the `data` field owner will be taken from array_obj, and lead to null values for json path `$.data.datatimestamp`

Rapidjson doc:
```
//! Append a GenericValue at the end of the array.
  \note The ownership of \c value will be transferred to this array on success.
 */
GenericValue& PushBack(GenericValue& value, Allocator& allocator);
```
2023-07-21 17:02:01 +08:00
d1c5025bce [Fix](compaction) add error message when load unsupport compression code (#22033) 2023-07-21 16:50:46 +08:00
6b20cdb170 [Fix](compaction) Fix SizeBasedCumulativeCompactionPolicy pick_input_rowsets (#21732) 2023-07-21 16:48:53 +08:00
2b2ac10e93 [feature](partial update) add failure tolerance for strict mode partial update stream load 2023-07-21 16:46:44 +08:00
db69af1165 [fix](meger-on-write) fix query result wrong when schema change (#22044) 2023-07-21 15:29:04 +08:00
e4c6b9893a [improve](load) add more profiles in tablets channel (#21838) 2023-07-21 13:59:15 +08:00
732e0d14ff [Enhancement](window-funnel)add different modes for window_funnel() function (#20563) 2023-07-21 13:57:27 +08:00
6512893257 [refactor](vectorized) Remove useless control variables to simplify aggregation node code (#22026)
* [refactor](vectorized) Remove useless control variables to simplify aggregation node code

* fix
2023-07-21 12:45:23 +08:00
6875ef4b8b [refactor](mem_reuse) refactor mem_reuse in MutableBlock (#21564) 2023-07-20 22:53:19 +08:00
398defe10b [fix](publish) fix check_version_exist coredump (#22038) 2023-07-20 22:01:15 +08:00
7d488688b4 [fix](multi-catalog)fix minio default region and throw minio error msg, support s3 bucket root path (#21994)
1. check minio region, set default region if user region is not provided, and throw minio error msg
2. support read root path s3://bucket1
3. fix max compute public access
2023-07-20 20:48:55 +08:00
7947569993 [Bug][RegressionTest] fix the DCHECK failed in join code (#22021) 2023-07-20 18:12:20 +08:00
367ad9164a [feature-wip](auto-inc)(step-2) support auto-increment column for duplicate table (#19917) 2023-07-20 18:03:39 +08:00
c31e826756 [opt](config) rename alter_inverted_index_worker_count to alter_index_worker_count, and add docs (#21985) 2023-07-20 17:50:04 +08:00
650d7cfc8c [enhancement](repeat) add filter in repeat node in BE (#21984)
[enhancement](repeat) add filter in repeat node in BE (#21984)
2023-07-20 17:25:13 +08:00
9182b8d3c2 [Refactor](exec) Remove the unless header of vresult_writer (#22011)
Remove unless code of vresult_wirter;
2023-07-20 13:31:44 +08:00
0f5b973cb9 [Enhancement](http) Add HttpError to HttpClient::execute_with_retry (#21989) 2023-07-20 10:43:05 +08:00
8d36de3377 [fix](max_version) protect max_version by meta lock (#21948)
Otherwise, the be would core dump due to non thread safe access.
2023-07-20 10:35:23 +08:00
1afe090486 [improvement](memory) modify jemalloc conf in be.conf (#21943)
modify jemalloc conf in be.conf
    disable je_purge_all_arena_dirty_pages
2023-07-20 10:34:31 +08:00
20242d9a0e [Improve](simdjson) put unescaped string value after parsed (#21866)
In some cases, it is necessary to unescape the original value, such as when converting a string to JSONB.
If not unescape, then later jsonb parse will be failed
2023-07-20 10:33:17 +08:00
7e1299bcbc [fix](memory) fix mem tracker grace exit 2 (#22003)
fix: #21136
mem tracker group uses class static variables instead of global variables

https://stackoverflow.com/questions/2204608/does-c-call-destructors-for-global-and-class-static-variables
TODO: A mem tracker manager is required, don't use global variables, it will sad

==3623982==ERROR: AddressSanitizer: heap-use-after-free on address 0x60f0000056b8 at pc 0x56478bbe3ae0 bp 0x7f20953d2270 sp 0x7f20953d2268
READ of size 8 at 0x60f0000056b8 thread T41 (memory_tracker_)
*** Query id: 0-0 ***
*** Aborted at 1689749969 (unix time) try "date -d @1689749969" if you are using GNU date ***
*** Current BE git commitID: b3e9cad48e ***
*** SIGSEGV address not mapped to object (@0x0) received by PID 3623982 (TID 3624277 OR 0x7f19e06dd640) from PID 0; stack trace: ***
    #0 0x56478bbe3adf in std::__shared_ptr::operator bool() const /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1295:16
    #1 0x56478bbe306e in doris::MemTracker::refresh_profile_counter() /doris/be/src/runtime/memory/mem_tracker.h:149:13
    #2 0x56478bbec669 in doris::MemTrackerLimiter::refresh_all_tracker_profile() /doris/be/src/runtime/memory/mem_tracker_limiter.cpp:119:22
    #3 0x564788f53fa0 in doris::Daemon::memory_tracker_profile_refresh_thread() /doris/be/src/common/daemon.cpp:295:9
    #4 0x564788f5d04b in doris::Daemon::start()::$_4::operator()() const /doris/be/src/common/daemon.cpp:473:30
    #5 0x564788f5cff6 in void std::__invoke_impl(std::__invoke_other, doris::Daemon::start()::$_4&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14
    #6 0x564788f5cf78 in std::enable_if, void>::type std::__invoke_r(doris::Daemon::start()::$_4&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:111:2
    #7 0x564788f5cdae in std::_Function_handler::_M_invoke(std::_Any_data const&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291:9
    #8 0x56478903f576 in std::function::operator()() const /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:560:9
    #9 0x56478c4a35af in doris::Thread::supervise_thread(void*) /doris/be/src/util/thread.cpp:465:5
    #10 0x7f217c8a244f in start_thread nptl/pthread_create.c:473:8
    #11 0x7f217cb27d52 in __clone misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95

0x60f0000056b8 is located 56 bytes inside of 168-byte region [0x60f000005680,0x60f000005728)
freed by thread T0 here:
    #0 0x564788e7280d in operator delete(void*) (/mnt/hdd01/dorisTestEnv/NEREIDS_ASAN/be/lib/doris_be+0x1758280d) (BuildId: 219493cc924323ee)
    #1 0x56478acec1d5 in std::default_delete::operator()(doris::MemTrackerLimiter*) const /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:85:2
    #2 0x56478ace9faf in std::unique_ptr >::~unique_ptr() /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:361:4
    #3 0x56478ace1471 in doris::ShardedLRUCache::~ShardedLRUCache() /doris/be/src/olap/lru_cache.cpp:581:1
    #4 0x56478ace14c8 in doris::ShardedLRUCache::~ShardedLRUCache() /doris/be/src/olap/lru_cache.cpp:572:37
    #5 0x56478acd0984 in std::default_delete::operator()(doris::Cache*) const /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:85:2
    #6 0x56478acceddf in std::unique_ptr >::~unique_ptr() /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:361:4
    #7 0x56478ad96dc6 in doris::StoragePageCache::~StoragePageCache() /doris/be/src/olap/page_cache.h:78:7
    #8 0x7f217ca54146 in __run_exit_handlers stdlib/exit.c:108:8

previously allocated by thread T0 here:
    #0 0x564788e71fad in operator new(unsigned long) (/mnt/hdd01/dorisTestEnv/NEREIDS_ASAN/be/lib/doris_be+0x17581fad) (BuildId: 219493cc924323ee)
    #1 0x56478ace9c90 in std::_MakeUniq::__single_object std::make_unique, std::allocator > const&>(doris::MemTrackerLimiter::Type&&, std::__cxx11::basic_string, std::allocator > const&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:962:30
    #2 0x56478acde930 in doris::ShardedLRUCache::ShardedLRUCache(std::__cxx11::basic_string, std::allocator > const&, unsigned long, doris::LRUCacheType, unsigned int, unsigned int) /doris/be/src/olap/lru_cache.cpp:526:20
    #3 0x56478ace22e1 in doris::new_lru_cache(std::__cxx11::basic_string, std::allocator > const&, unsigned long, doris::LRUCacheType, unsigned int) /doris/be/src/olap/lru_cache.cpp:670:16
    #4 0x56478ad91da2 in doris::StoragePageCache::StoragePageCache(unsigned long, int, long, unsigned int) /doris/be/src/olap/page_cache.cpp:47:17
    #5 0x56478ad9156e in doris::StoragePageCache::create_global_cache(unsigned long, int, long, unsigned int) /doris/be/src/olap/page_cache.cpp:31:29
    #6 0x56478b98b3d3 in doris::ExecEnv::_init_mem_env() /doris/be/src/runtime/exec_env_init.cpp:251:5
    #7 0x56478b98946c in doris::ExecEnv::_init(std::vector > const&) /doris/be/src/runtime/exec_env_init.cpp:182:5
    #8 0x56478b987139 in doris::ExecEnv::init(doris::ExecEnv*, std::vector > const&) /doris/be/src/runtime/exec_env_init.cpp:98:17
    #9 0x564788e79b50 in main /doris/be/src/service/doris_main.cpp:429:5
    #10 0x7f217ca38564 in __libc_start_main csu/../csu/libc-start.c:332:16

Thread T41 (memory_tracker_) created by T0 here:
    #0 0x564788e1fcaa in pthread_create (/mnt/hdd01/dorisTestEnv/NEREIDS_ASAN/be/lib/doris_be+0x1752fcaa) (BuildId: 219493cc924323ee)
    #1 0x56478c4a2366 in doris::Thread::start_thread(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, std::function const&, unsigned long, scoped_refptr*) /doris/be/src/util/thread.cpp:419:15
    #2 0x564788f59b91 in doris::Status doris::Thread::create(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, doris::Daemon::start()::$_4 const&, scoped_refptr*) /doris/be/src/util/thread.h:50:16
    #3 0x564788f58165 in doris::Daemon::start() /doris/be/src/common/daemon.cpp:471:10
    #4 0x564788e79a96 in main /doris/be/src/service/doris_main.cpp:420:12
    #5 0x7f217ca38564 in __libc_start_main csu/../csu/libc-start.c:332:16
2023-07-20 10:33:03 +08:00
d180ed418d [fix](stacktrace) Speed up stack trace (#21755)
Introduce libunwind get stack trace, cost is negligible and has line numbers.
use StackTraceCache, PHDRCache speed up, is customizable and has some optimizations.
Other stack trace tools remain: glog, boost, glibc, in case for need.

TODO:

currently support linux __x86_64__, __arm__, __powerpc__, not supported __FreeBSD__, APPLE
Note: __arm__, __powerpc__ not been verified
Support signal handle
libunwid support unw_backtrace for jemalloc
Use of undefined compile option USE_MUSL for later
2023-07-19 15:43:14 +08:00
171f374f56 [improvement](invert index) Change the loading method of keyword type (#21893)
1. fix can not index Chinese
2. optimized invert index load
2023-07-19 15:26:49 +08:00
ce397a8d32 [FIX](map)fix arrow serde with map null key #21955 2023-07-19 12:09:34 +08:00
b35cfc5d5e [opt](join) Opt the performance of join probe (#21845) 2023-07-19 01:21:22 +08:00
845cf94a7a [feature](function) support time_to_sec (#21722)
mysql >select sec_to_time(time_to_sec(cast('16:32:18' as time)));
+----------------------------------------------------+
| sec_to_time(time_to_sec(CAST('16:32:18' AS TIME))) |
+----------------------------------------------------+
| 16:32:18                                           |
+----------------------------------------------------+
1 row in set (0.53 sec)

mysql [test]>select sec_to_time(59538);
+--------------------+
| sec_to_time(59538) |
+--------------------+
| 16:32:18           |
+--------------------+
1 row in set (0.03 sec)
2023-07-19 01:09:48 +08:00
Pxl
4171309b9b [Bug](scanner) fix core dump due to release ScannerContext too early #21946 2023-07-19 00:53:23 +08:00