Commit Graph

4953 Commits

Author SHA1 Message Date
8caa5a9ba4 [Fix](mutli-catalog) Fix null partitions error in iceberg tables. (#22185)
### Issue
when partition has null partitions, it throws error
`Failed to fill partition column: t_int=null`

### Resolution
- Fix the following null partitions error in iceberg tables by replacing null partition to '\N'.
- Add regression test for hive null partition.
2023-07-27 23:57:35 +08:00
00863f25e9 [improvement](profile) add table name for file scan node (#22299)
```
VFILE_SCAN_NODE(region)  (id=0):(Active:  3.537us,  %  non-child:  0.00%)
                                -  RuntimeFilters:  :  
                              -  UseSpecificThreadToken:  False
                              -  AcquireRuntimeFilterTime:  501ns
                              -  AllocateResourceTime:  105.598us
```
2023-07-27 23:54:31 +08:00
b5fa29e138 [fix](bitmap) incorrect result of function 'bitmap_from_array' (#22305) 2023-07-27 22:44:06 +08:00
5584d7a5ba [Improve](point query) Improve lookup connection cache from DoubleBuffer to LRU cache for better item pruning (#22041) 2023-07-27 22:22:50 +08:00
8371171e44 [Feature](inverted index) add inverted index tool (#22207) 2023-07-27 21:28:34 +08:00
687d97e648 [improvement][default_config] enlarge default value compaction related (#22286)
configs

1. Because vertical compaction is enabled by default, it consumes less
memory, we can enlarge default value of compaction related configs.
2. Enlarge default value of shard size related to lock.
2023-07-27 20:17:43 +08:00
d0c369d61b [fix](vec) Arena was not initialized in PartitionMethodSerialized (#22295) 2023-07-27 18:55:57 +08:00
6f1c03c766 [fix](jdbc_catalog) fix int and bigint in mysql view when use doris catalog (#22251) 2023-07-27 16:50:42 +08:00
aa75f79fad [fix](executor)cancel exchange buffer rpc when query is cancelled (#22226)
when brpc client make a request to a server, if the server doesn't response and may not response forever(such as BE restart), the query can be cancelled at once, but the ExchangeSinkBuffer can not be cancelled until rpc timeout.
So we hope when the query is cancelled, the ExchangeSinkBuffer can be closed at once.
2023-07-27 14:38:25 +08:00
9a95d664b9 [chore](third-party) Fix the build order for libunwind (#22244)
1. libunwind depends on lzma
2. Fix the missing headers issues reported by GCC-13
2023-07-27 14:07:08 +08:00
Pxl
05be45bd35 [Improvement](brpc) adjust brpc_light_work_pool_threads/brpc_heavy_work_pool_threads (#22241)
adjust brpc_light_work_pool_threads/brpc_heavy_work_pool_threads
2023-07-27 14:03:46 +08:00
31c856351a [enhancement](default_config) change default value of rpc related (#22149)
configs

Bdbje elect timeout is 30 seconds, so we enlarge thrift_rpc_timeout_ms
and txn_commit_rpc_timeout_ms to 60s.

BTW: enlarge bdbje_lock_timeout_second from 1 to 5.
2023-07-27 11:12:26 +08:00
341c45974c [round](decimalv2) round precise decimalv2 value (#22258) 2023-07-27 10:00:36 +08:00
7ed997cba8 [improvement](s3) increase the connection num of s3 client (#22049)
The default maxConnection of s3 client is 25.
It should be increased to improve the query performance.

In my test, a tpch 300 benchmark with data stored on object storage, the total time
can reduce from 430s -> 330s
2023-07-26 22:52:40 +08:00
Pxl
560731f392 [Bug](runtime-filter) fix probe expr prepared twice on minmax runtime filter (#22229)
fix probe expr prepared twice on minmax runtime filter
2023-07-26 19:44:35 +08:00
21ea0055fc [improvement](scanner) use batch size of session instead of limit to improve performance of reading (#22240) 2023-07-26 18:57:42 +08:00
9e16c69925 [improvement](compression) support LZ4_HC algorithm and parse LZ4_RAW (#22165) 2023-07-26 18:23:39 +08:00
8ff487cc4b [fix](cast) fix invalid value error when casting null date value to string then casting to date value (#22223) 2023-07-26 17:59:01 +08:00
a6311e2f95 fix erase (#22235) 2023-07-26 17:06:37 +08:00
Pxl
9451382428 [Improvement](aggregate) optimization for AggregationMethodKeysFixed::insert_keys_into_columns (#22216)
optimization for AggregationMethodKeysFixed::insert_keys_into_columns
2023-07-26 16:19:15 +08:00
6c0c66e664 [exceptionsafe](expr) exprcontext's open and prepare method should catch exception (#22230)
Co-authored-by: yiguolei <yiguolei@gmail.com>ExprContext may throw exception during expr->open, and it will core in non-pipeline mode.
2023-07-26 14:46:24 +08:00
1f3de0eae3 [fix](memory) fix invalid large memory check && fix memory info thread safety (#22027)
fix invalid large memory check
fix memory info thread safety
2023-07-26 12:18:31 +08:00
4c4f08f805 [fix](hudi) the required fields are empty if only reading partition columns (#22187)
1. If only read the partition columns, the `JniConnector` will produce empty required fields, so `HudiJniScanner` should read the "_hoodie_record_key" field at least to know how many rows in current hoodie split. Even if the `JniConnector` doesn't read this field, the call of `releaseTable` in `JniConnector` will reclaim the resource.

2. To prevent BE failure and exit, `JniConnector` should call release methods after `HudiJniScanner` is initialized. It should be noted that `VectorTable` is created lazily in `JniScanner`,  so we don't need to reclaim the resource when `HudiJniScanner` is failed to initialize.

## Remaining works
Other jni readers like `paimon` and `maxcompute` may encounter the same problems, the jni reader need to handle this abnormal situation on its own, and currently this fix can only ensure that BE will not exit.
2023-07-26 10:59:45 +08:00
d4a4c172ea [Improve](serde)update serialize and deserialize text for data type (#21109) 2023-07-26 10:06:16 +08:00
b12c993f05 [Fix](multi-catalog) Do not throw exceptions when file not exists for external hive tables. (#22140)
* [Fix](multi-catalog) Not throw exceptions when file not exists for query of hms catalog.

* [Fix](multi-catalog) Not throw exceptions when file not exists for query of hms catalog.

---------

Co-authored-by: 王翔宇 <wangxiangyu@360shuke.com>
2023-07-26 09:05:52 +08:00
7b270d1ae9 [Fix](mutli-catalog) Fix orc reader crashed when hdfs reading error by catching exception. (#22193)
orc reader crashed when hdfs reading error.

0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/zcp_repo/be/src/common/signal_handler.h:413
1# 0x00007F6F8B3C00C0 in /lib/x86_64-linux-gnu/libc.so.6
2# raise in /lib/x86_64-linux-gnu/libc.so.6
3# abort in /lib/x86_64-linux-gnu/libc.so.6
4# _gnu_cxx::_verbose_terminate_handler() [clone .cold] at ../../../../libstdc+-v3/libsupc+/vterminate.cc:75
5# _cxxabiv1::_terminate(void ()) at ../../../../libstdc+-v3/libsupc+/eh_terminate.cc:48
6# 0x0000555CBC4718C1 in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
7# 0x0000555CBC471A14 in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
8# doris::vectorized::ORCFileInputStream::read(void*, unsigned long, unsigned long) at /home/zcp/repo_center/zcp_repo/be/src/vec/exec/format/orc/vorc_reader.cpp:121
9# orc::SeekableFileInputStream::Next(void const*, int) in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
10# orc::DecompressionStream::readHeader() in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
11# orc::DecompressionStream::Next(void const*, int) in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
12# void orc::RleDecoderV2::next<long>(long*, unsigned long, char const*) in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
13# orc::StringDictionaryColumnReader::loadDictionary() in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
14# orc::StructColumnReader::loadStringDicts(std::unordered_map<unsigned long, std::_cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, std::unordered_map<std::cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, orc::StringDictionary*, std::hash<std::cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::_cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, orc::StringDictionary*> > >, orc::StringDictFilter const) in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
15# orc::RowReaderImpl::startNextStripe(orc::ReadPhase const&) in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
16# orc::RowReaderImpl::nextBatch(orc::ColumnVectorBatch&, void*) in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
17# doris::vectorized::OrcReader::get_next_block(doris::vectorized::Block*, unsigned long*, bool*) at /home/zcp/repo_center/zcp_repo/be/src/vec/exec/format/orc/vorc_reader.cpp:1420
18# doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/zcp_repo/be/src/vec/exec/scan/vfile_scanner.cpp:250
19# doris::vectorized::VScanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
20# doris::vectorized::ScannerScheduler::_scanner_scan(doris::vectorized::ScannerScheduler*, doris::vectorized::ScannerContext*, std::shared_ptr<doris::vectorized::VScanner>) at /home/zcp/repo_center/zcp_repo/be/src/vec/exec/scan/scanner_scheduler.cpp:335
21# std::_Function_handler<void (), doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext*)::$_1::operator()() const::
2023-07-26 08:57:31 +08:00
ba3a0922eb [fix](ipv6)Support IPV6 (#22219)
fe:Remove restrictions from IPv4
be: thrift server Specify binding address
be: Restore changed code of “be/src/olap/task/engine_clone_task.cpp”
2023-07-26 08:40:32 +08:00
111957401b [improvement](invert index) Added lucene9.5 unicode tokenizer (#22217) 2023-07-26 00:50:24 +08:00
cf677b327b [fix](jdbc catalog) Fixed mappings with type errors for bool and tinyint(1) (#22089)
First of all, mysql does not have a boolean type, its boolean type is actually tinyint(1), in the previous logic, We force tinyint(1) to be a boolean by passing tinyInt1isBit=true, which causes an error if tinyint(1) is not a 0 or 1, Therefore, we need to match tinyint(1) according to tinyint instead of boolean, and this change will not affect the correctness of where k = 1 or where k = true queries
2023-07-25 22:45:22 +08:00
c498b2cf69 [fix](partial-update) disable partial update when undergoing a schema changing process (#22133) 2023-07-25 21:33:20 +08:00
a7446fa59e [fix](inverted index) make error message more friendly when query token is empty (#22118) 2023-07-25 18:00:35 +08:00
23e7423748 [pipeline](refactor) refactor pipeline task schedule logics (#22028) 2023-07-25 17:18:26 +08:00
226b75e074 [Fix](compaction) return internal error to avoid be core when finalize_columns_data (#21882)
return error instead of CHECK_EQ to avoid be core when finalize_columns_data
2023-07-25 15:39:58 +08:00
fc2b9db0ad [Feature](inverted index) add tokenize function for inverted index (#21813)
In this PR, we introduce TOKENIZE function for inverted index, it is used as following:
```
SELECT TOKENIZE('I love my country', 'english');
```
It has two arguments, first is text which has to be tokenized, the second is parser type which can be **english**, **chinese** or **unicode**.
It also can be used with existing table, like this:
```
mysql> SELECT TOKENIZE(c,"chinese") FROM chinese_analyzer_test;
+---------------------------------------+
| tokenize(`c`, 'chinese')              |
+---------------------------------------+
| ["来到", "北京", "清华大学"]          |
| ["我爱你", "中国"]                    |
| ["人民", "得到", "更", "实惠"]        |
+---------------------------------------+
```
2023-07-25 15:05:35 +08:00
2b4bfe5be7 [fix](autoinc) fix _fill_auto_inc_cols when the input column is ColumnConst (#22175) 2023-07-25 14:41:36 +08:00
28b714c371 [feature](executor) using fe version to set instance_num (#22047) 2023-07-25 14:37:42 +08:00
c01230f99a [fix](match) Optimize the logic for match_phrase function filter (#21622) 2023-07-25 14:22:37 +08:00
c251a574e8 [Fix](MoW) Fix dup key when do schema change add new key (#22154) 2023-07-25 14:18:01 +08:00
103c473b96 [Bug](pipeline) fix pipeline shared scan + topn optimization (#21940) 2023-07-25 12:48:27 +08:00
0f439bb1ca [vectorized](udf) java udf support map type (#22059) 2023-07-25 11:56:20 +08:00
7891c99e9f [fix](pipeline) fix wrong state of runtime filter of pipeline (#22179) 2023-07-25 11:29:09 +08:00
b41fcbb783 [feature](agg) add the aggregation function 'mag_agg' (#22043)
New aggregation function: map_agg.

This function requires two arguments: a key and a value, which are used to build a map.

select map_agg(column1, column2) from t group by column3;
2023-07-25 11:21:03 +08:00
a0463ea047 [round](decimalv2) round decimalv2 to precision value (#22138)
* [round](decimalv2) round decimalv2 to precision value

* update

* update`
2023-07-25 03:29:48 +08:00
752cec9e19 [Fix](multi-catalog) Fix not single slot filter conjuncts with dict filter issue. (#22052)
### Issue
Dictionary filtering is a mechanism that directly reads the dictionary encoding of a single string column filter condition for filter comparison. But dictionary filtered single string columns may be included in other multi-column filter conditions. This can cause problems.

For example:
`select * from multi_catalog.lineitem_string_date_orc where l_commitdate < l_receiptdate and l_receiptdate = '1995-01-01'  order by l_orderkey, l_partkey, l_suppkey, l_linenumber limit 10;`

`l_receiptdate` is string filter column,it is included by multi-column filter condition `l_commitdate < l_receiptdate`.

### Solution
Resolve it by separating the multi-column filter conditions and executing it after the dictionary filter column is converted to string.
2023-07-24 22:31:18 +08:00
7fcf702081 [improvement](multi catalog)paimon support filesystem metastore (#21910)
1.support filesystem metastore

2.support predicate and project when split

3.fix partition table query error

todo: Now you need to manually put paimon-s3-0.4.0-incubating.jar in be/lib/java_extensions when use s3 filesystem

doc pr: #21966
2023-07-24 22:02:57 +08:00
30c21789c8 [opt](filecache) use weak_ptr to cache the file handle of file segment (#21975)
Use weak_ptr to cache the file handle of file segment. The max cached number of file handles can be configured by `file_cache_max_file_reader_cache_size`, default `1000000`.
Users can inspect the number of cached file handles by request BE metrics: `http://be_host:be_webserver_port/metrics`:
```
# TYPE doris_be_file_cache_segment_reader_cache_size gauge
doris_be_file_cache_segment_reader_cache_size{path="/mnt/datadisk1/gaoxin/file_cache"} 2500
```
2023-07-24 19:09:27 +08:00
d2531db1cf [fix](inverted index) fix regression case test_index_change_7 occasional failure (#22066) 2023-07-24 15:39:08 +08:00
e146969376 [Fix](config) delete unuse lazy open config #22136 2023-07-24 15:02:34 +08:00
99bf901607 [fix](in) throw exception for unsupported data type of in expr (#22050) 2023-07-24 14:13:31 +08:00
86e80ae175 [enhancement](merge-on-write) support concurrent delete bitmap calc while close_wait (#21488) 2023-07-24 10:09:28 +08:00