Commit Graph

12095 Commits

Author SHA1 Message Date
6c0c66e664 [exceptionsafe](expr) exprcontext's open and prepare method should catch exception (#22230)
Co-authored-by: yiguolei <yiguolei@gmail.com>ExprContext may throw exception during expr->open, and it will core in non-pipeline mode.
2023-07-26 14:46:24 +08:00
9f3960b460 [fix](kerberos)fix kerberos config read (#22081)
we should read kerberos config from properties first, so use override hive conf to set.
2023-07-26 13:36:12 +08:00
bb67a1467a [fix](Nereids): mergeGroup should merge target Group into existed Group (#22123) 2023-07-26 13:13:25 +08:00
1f3de0eae3 [fix](memory) fix invalid large memory check && fix memory info thread safety (#22027)
fix invalid large memory check
fix memory info thread safety
2023-07-26 12:18:31 +08:00
21a3593a9a [fix](Nereids) translate failed when enable topn two phase opt (#22197)
1. should not add rowid slot to reslovedTupleExprs
2. should set notMaterialize to sort's tuple when do two phase opt
2023-07-26 11:38:50 +08:00
f4396ef8c7 [Fix](regression-test) nereids_p0/javaudf and nereids_p0/outfile cases cannot run on multi be cluster (#21929)
the cases as title will not pass in multi-be environment because the be queried doesn't contain outfile data. We will copy the outfile to every instance to fix it.
2023-07-26 11:33:51 +08:00
4c4f08f805 [fix](hudi) the required fields are empty if only reading partition columns (#22187)
1. If only read the partition columns, the `JniConnector` will produce empty required fields, so `HudiJniScanner` should read the "_hoodie_record_key" field at least to know how many rows in current hoodie split. Even if the `JniConnector` doesn't read this field, the call of `releaseTable` in `JniConnector` will reclaim the resource.

2. To prevent BE failure and exit, `JniConnector` should call release methods after `HudiJniScanner` is initialized. It should be noted that `VectorTable` is created lazily in `JniScanner`,  so we don't need to reclaim the resource when `HudiJniScanner` is failed to initialize.

## Remaining works
Other jni readers like `paimon` and `maxcompute` may encounter the same problems, the jni reader need to handle this abnormal situation on its own, and currently this fix can only ensure that BE will not exit.
2023-07-26 10:59:45 +08:00
d4a4c172ea [Improve](serde)update serialize and deserialize text for data type (#21109) 2023-07-26 10:06:16 +08:00
ea2a7a8e56 [Docs](docs) Rename Release Note Title and File name of CN & EN Version (#22157) 2023-07-26 09:20:06 +08:00
9abf32324b [improvement](jdbc) add timestamp put to datev2 (#21680) 2023-07-26 09:10:34 +08:00
b12c993f05 [Fix](multi-catalog) Do not throw exceptions when file not exists for external hive tables. (#22140)
* [Fix](multi-catalog) Not throw exceptions when file not exists for query of hms catalog.

* [Fix](multi-catalog) Not throw exceptions when file not exists for query of hms catalog.

---------

Co-authored-by: 王翔宇 <wangxiangyu@360shuke.com>
2023-07-26 09:05:52 +08:00
5f846056f7 [fix](forward) fix MissingFormatArgumentException when failed to forward stmt to Master (#22142) 2023-07-26 09:00:04 +08:00
7b270d1ae9 [Fix](mutli-catalog) Fix orc reader crashed when hdfs reading error by catching exception. (#22193)
orc reader crashed when hdfs reading error.

0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/zcp_repo/be/src/common/signal_handler.h:413
1# 0x00007F6F8B3C00C0 in /lib/x86_64-linux-gnu/libc.so.6
2# raise in /lib/x86_64-linux-gnu/libc.so.6
3# abort in /lib/x86_64-linux-gnu/libc.so.6
4# _gnu_cxx::_verbose_terminate_handler() [clone .cold] at ../../../../libstdc+-v3/libsupc+/vterminate.cc:75
5# _cxxabiv1::_terminate(void ()) at ../../../../libstdc+-v3/libsupc+/eh_terminate.cc:48
6# 0x0000555CBC4718C1 in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
7# 0x0000555CBC471A14 in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
8# doris::vectorized::ORCFileInputStream::read(void*, unsigned long, unsigned long) at /home/zcp/repo_center/zcp_repo/be/src/vec/exec/format/orc/vorc_reader.cpp:121
9# orc::SeekableFileInputStream::Next(void const*, int) in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
10# orc::DecompressionStream::readHeader() in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
11# orc::DecompressionStream::Next(void const*, int) in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
12# void orc::RleDecoderV2::next<long>(long*, unsigned long, char const*) in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
13# orc::StringDictionaryColumnReader::loadDictionary() in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
14# orc::StructColumnReader::loadStringDicts(std::unordered_map<unsigned long, std::_cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, std::unordered_map<std::cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, orc::StringDictionary*, std::hash<std::cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::_cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, orc::StringDictionary*> > >, orc::StringDictFilter const) in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
15# orc::RowReaderImpl::startNextStripe(orc::ReadPhase const&) in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
16# orc::RowReaderImpl::nextBatch(orc::ColumnVectorBatch&, void*) in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
17# doris::vectorized::OrcReader::get_next_block(doris::vectorized::Block*, unsigned long*, bool*) at /home/zcp/repo_center/zcp_repo/be/src/vec/exec/format/orc/vorc_reader.cpp:1420
18# doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/zcp_repo/be/src/vec/exec/scan/vfile_scanner.cpp:250
19# doris::vectorized::VScanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
20# doris::vectorized::ScannerScheduler::_scanner_scan(doris::vectorized::ScannerScheduler*, doris::vectorized::ScannerContext*, std::shared_ptr<doris::vectorized::VScanner>) at /home/zcp/repo_center/zcp_repo/be/src/vec/exec/scan/scanner_scheduler.cpp:335
21# std::_Function_handler<void (), doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext*)::$_1::operator()() const::
2023-07-26 08:57:31 +08:00
b20af13966 [fix][jdbc_case]Change the method of obtaining the driver for case test_doris_jdbc_catalog 0724 #22164 2023-07-26 08:48:16 +08:00
cf717882d8 [fix](jdbc catalog) fix hana jdbc table bug (#22190) 2023-07-26 08:45:06 +08:00
ba3a0922eb [fix](ipv6)Support IPV6 (#22219)
fe:Remove restrictions from IPv4
be: thrift server Specify binding address
be: Restore changed code of “be/src/olap/task/engine_clone_task.cpp”
2023-07-26 08:40:32 +08:00
e8f4323e0f [Fix](jdbcCatalog) fix typo of some variable #22214 2023-07-26 08:34:45 +08:00
111957401b [improvement](invert index) Added lucene9.5 unicode tokenizer (#22217) 2023-07-26 00:50:24 +08:00
3414d1a61f [fix](hudi) table schema is not the same as parquet schema (#22186)
Upgrade hudi version from 0.13.0 to 0.13.1, and keep the hudi version of jni scanner the same as that of FE.
This may fix the bug of the table schema is not same as parquet schema.
2023-07-26 00:29:53 +08:00
cf677b327b [fix](jdbc catalog) Fixed mappings with type errors for bool and tinyint(1) (#22089)
First of all, mysql does not have a boolean type, its boolean type is actually tinyint(1), in the previous logic, We force tinyint(1) to be a boolean by passing tinyInt1isBit=true, which causes an error if tinyint(1) is not a 0 or 1, Therefore, we need to match tinyint(1) according to tinyint instead of boolean, and this change will not affect the correctness of where k = 1 or where k = true queries
2023-07-25 22:45:22 +08:00
b2be42c31c [fix](jdbc catalog) fix jdbc catalog like expr query error (#22141) 2023-07-25 22:30:28 +08:00
f44660db1a [chore](merge-on-write) disable single replica load and compaction for mow table (#22188) 2023-07-25 22:05:22 +08:00
5c8eda8685 [enhencement](regression) add UPDATE & DELETE tests for MOW partial update (#22212) 2023-07-25 22:03:38 +08:00
c498b2cf69 [fix](partial-update) disable partial update when undergoing a schema changing process (#22133) 2023-07-25 21:33:20 +08:00
999fbdc802 [improvement](jdbc) add new type 'object' of int (#21681) 2023-07-25 21:29:46 +08:00
20f180c4e1 [fix](iceberg) fix error when query iceberg v2 format (#22182)
This bug is introduced from #21771
Missing fileType field of TFileScanRangeParams, so the delete file of iceberg v2 will be treated as local file
and fail to read.
2023-07-25 21:15:46 +08:00
6dd0ca6d0b [fix](nereids) fix runtime filter on cte sender and set operation (#22181)
Current rf pushdown framework doesn't handle cte sender right. On cte consumer, it just return false and this will cause the rf is generated at the wrong place and lead the expr_order checking failed, but actually it should be pushed down on the cte sender. Also, set operation pushing down is unreachable if the outer stmt uses the alias of set operation's output before probeSlot's translation. Both of the above issues will be fixed in this pr
2023-07-25 20:26:04 +08:00
1715a824dd [fix](nereids) fix partition dest overwrite bug when cte as bc right (#22177)
In current cte multicast fragment param computing logic in coordinator, if shared hash table for bc opened, its destination's number will be the same as be hosts'. But the judgment of falling into shared hash table bc part code is wrong, which will cause when a multicast's target is fixed with both bc and partition, the first bc info will overwrite the following partition's, i.e, the destination info will be the host level, which should be per instance. This will cause the hash partition part hang.
2023-07-25 19:26:29 +08:00
28bbfdd590 [Fix](Nereids) fix minidump unit test caused of columnstatus changed (#22201)
Problem:
Minidump unit test failed because of column statistic deserialization need a new column schema but not added to minidump unit test file

Solved:
Add last update time to unit test input file
2023-07-25 19:23:12 +08:00
30965eed21 [fix](stats) Ignore complex type by default when collect column statistics (#21965)
By default, if it contains any complex type in Analyze stmt submitted by user and error would be thrown before this PR.
2023-07-25 18:26:49 +08:00
3b6702a1e3 [Bug](point query) cancel future when meet timeout in PointQueryExec (#21573)
1. cancel future when meet timeout and add config to modify rpc timeout
2. add config to modify numof BackendServiceProxy since under high concurrent work load GRPC channel will be blocked
2023-07-25 18:18:09 +08:00
a7446fa59e [fix](inverted index) make error message more friendly when query token is empty (#22118) 2023-07-25 18:00:35 +08:00
f74f3e7944 [refactor](Nereids) add sink interface and abstract class (#22150)
1. add trait Sink
2. add abstract class LogicalSink and PhysicalSink
3. replace some sink visitor by visitLogicalSink and visitPhysicalSink
2023-07-25 17:51:49 +08:00
23e7423748 [pipeline](refactor) refactor pipeline task schedule logics (#22028) 2023-07-25 17:18:26 +08:00
39ca91fc22 [opt](Nereids) always fallback when parse failed (#21865)
always fallback to legacy planner when parse failed even if enable_fallback_to_original_planner is set to false
2023-07-25 17:08:57 +08:00
527547b4ed [catalog](faq) add jdbc catalog faq (#22129) 2023-07-25 15:59:16 +08:00
1e8ae7ad16 [doc](flink-connector)improve flink connector doc (#22143) 2023-07-25 15:58:35 +08:00
226b75e074 [Fix](compaction) return internal error to avoid be core when finalize_columns_data (#21882)
return error instead of CHECK_EQ to avoid be core when finalize_columns_data
2023-07-25 15:39:58 +08:00
f84af95ac4 [feature](Nereids) Add minidump replay and refactor user feature of minidump (#20716)
### Two main changes:
- 1、add minidump replay
- 2、change minidump serialization of statistic messages and some interface between main logic of nereids optimizer and minidump

### Use of nereids ut:
- 1、save minidump files:  
        Execute command by mysql-client:
```
set enable_nereids_planner=true;
set enable_minidump=true;
```
        Execute sql in mysql-client
- 2、use nereids-ut script to execute directory:
```
cp -r ${DORIS_HOME}/minidump ${DORIS_HOME}/output/fe && cd ${DORIS_HOME}/output/fe
./nereids_ut --d ${directory_of_minidump_files}
```

### Refactor of minidump
- move statistics used serialization to serialization of input and serialize with catalogs
- generating minidump file only when enable_minidump flag is set, minidump module interactive with main optimizer only by :
serializeInputsToDumpFile(catalog, statistics, query) && serializeOutputsToDumpFile(outputplan).
2023-07-25 15:26:19 +08:00
fc2b9db0ad [Feature](inverted index) add tokenize function for inverted index (#21813)
In this PR, we introduce TOKENIZE function for inverted index, it is used as following:
```
SELECT TOKENIZE('I love my country', 'english');
```
It has two arguments, first is text which has to be tokenized, the second is parser type which can be **english**, **chinese** or **unicode**.
It also can be used with existing table, like this:
```
mysql> SELECT TOKENIZE(c,"chinese") FROM chinese_analyzer_test;
+---------------------------------------+
| tokenize(`c`, 'chinese')              |
+---------------------------------------+
| ["来到", "北京", "清华大学"]          |
| ["我爱你", "中国"]                    |
| ["人民", "得到", "更", "实惠"]        |
+---------------------------------------+
```
2023-07-25 15:05:35 +08:00
d96e31c4d7 [opt](Nereids) not push down global limit to avoid early gather (#21891)
the global limit will create a gather action, and all the data will be calculated in one instance. If we push down the global limit, the node run after the limit node will run slowly.
We fix it by push down only local limit.

a join plan tree before fixing:

```
LogicalLimit(global)
    LogicalLimit(local)
        Plan()
            LogicalLimit(global)
                LogicalLimit(local)
                    LogicalJoin
                        LogicalLimit(global)
                            LogicalLimit(local)
                                Plan()
                        LogicalLimit(global)
                            LogicalLimit(local)
                                Plan()    

after fixing:
LogicalLimit(global)
    LogicalLimit(local)      
        Plan()
            LogicalLimit(local)
                LogicalJoin
                    LogicalLimit(local)
                        Plan()
                    LogicalLimit(local)
                        Plan()
```
2023-07-25 14:45:20 +08:00
2b4bfe5be7 [fix](autoinc) fix _fill_auto_inc_cols when the input column is ColumnConst (#22175) 2023-07-25 14:41:36 +08:00
28b714c371 [feature](executor) using fe version to set instance_num (#22047) 2023-07-25 14:37:42 +08:00
c01230f99a [fix](match) Optimize the logic for match_phrase function filter (#21622) 2023-07-25 14:22:37 +08:00
c251a574e8 [Fix](MoW) Fix dup key when do schema change add new key (#22154) 2023-07-25 14:18:01 +08:00
103c473b96 [Bug](pipeline) fix pipeline shared scan + topn optimization (#21940) 2023-07-25 12:48:27 +08:00
0f439bb1ca [vectorized](udf) java udf support map type (#22059) 2023-07-25 11:56:20 +08:00
7891c99e9f [fix](pipeline) fix wrong state of runtime filter of pipeline (#22179) 2023-07-25 11:29:09 +08:00
f6b47c34b3 [improvement](stats) show stats with updated time (#21377)
Support to view the stats updated time.

After

```sql
mysql> show column stats t1;
+-------------+-------+------+----------+-----------+---------------+------+------+---------------------+
| column_name | count | ndv  | num_null | data_size | avg_size_byte | min  | max  | updated_time        |
+-------------+-------+------+----------+-----------+---------------+------+------+---------------------+
| col2        | 2.0   | 2.0  | 0.0      | 0.0       | 0.0           | 2    | 5    | 2023-06-30 15:50:24 |
| col3        | 2.0   | 2.0  | 0.0      | 0.0       | 0.0           | 3    | 6    | 2023-06-30 15:50:48 |
| col1        | 2.0   | 2.0  | 0.0      | 0.0       | 0.0           | '1'  | '4'  | 2023-06-30 15:50:48 |
+-------------+-------+------+----------+-----------+---------------+------+------+---------------------+
```

Before

```sql
mysql> show column stats t1;
+-------------+-------+------+----------+-----------+---------------+------+------+
| column_name | count | ndv  | num_null | data_size | avg_size_byte | min  | max  | 
+-------------+-------+------+----------+-----------+---------------+------+------+
| col2        | 2.0   | 2.0  | 0.0      | 0.0       | 0.0           | 2    | 5    | 
| col3        | 2.0   | 2.0  | 0.0      | 0.0       | 0.0           | 3    | 6    | 
| col1        | 2.0   | 2.0  | 0.0      | 0.0       | 0.0           | '1'  | '4'  | 
+-------------+-------+------+----------+-----------+---------------+------+------+
```
2023-07-25 11:22:08 +08:00
b41fcbb783 [feature](agg) add the aggregation function 'mag_agg' (#22043)
New aggregation function: map_agg.

This function requires two arguments: a key and a value, which are used to build a map.

select map_agg(column1, column2) from t group by column3;
2023-07-25 11:21:03 +08:00