Commit Graph

919 Commits

Author SHA1 Message Date
343a6dc29d [improvement](hash join) Return result early if probe side has no data (#23044) 2023-08-17 09:17:09 +08:00
9b42093742 [feature](agg) Make 'map_agg' support array type as value (#22945) 2023-08-15 14:44:50 +08:00
b49dc8042d [feature](load) refactor CSV reading process during scanning, and support enclose and escape for stream load (#22539)
## Proposed changes

Refactor thoughts: close #22383
Descriptions about `enclose` and `escape`: #22385

## Further comments

2023-08-09: 
It's a pity that experiment shows that the original way for parsing plain CSV is faster. Therefor, the refactor is only applied on enclose related code. The plain CSV parser use the original logic.

Fallback of performance is unavoidable anyway. From the `CSV reader`'s perspective, the real weak point may be the write column behavior, proved by the flame graph.
 
Trimming escape will be enable after fix: #22411 is merged

Cases should be discussed: 

1. When an incomplete enclose appears in the beginning of a large scale data, the line delimiter will be unreachable till the EOF, will the buffer become extremely large?
2. What if an infinite line occurs in the case? Essentially,  `1.` is equivalent to this.  

Only support stream load as trial in this PR, avoid too many unrelated changes. Docs will be added when `enclose` and `escape` is available for all kinds of load.
2023-08-15 09:23:53 +08:00
Pxl
d371101bfd [Improvement](aggregation) make fixed hashmap's bitmap_size flexable (#22573)
make fixed hashmap's bitmap_size flexable
2023-08-14 10:47:06 +08:00
5e2748d2b4 [Improve](complex-type)update orc reader for complex type and add regress tests (#22856) 2023-08-12 07:06:12 +08:00
db69457576 [fix](avro)Fix S3 TVF avro format reading failure (#22199)
This pr fixes two issues:

1. when using s3 TVF to query files in AVRO format, due to the change of `TFileType`, the originally queried `FILE_S3 ` becomes `FILE_LOCAL`, causing the query failed.
2. currently, both parameters `s3.virtual.key` and `s3.virtual.bucket` are removed. A new `S3Utils`  in jni-avro to parse the bucket and key of s3.
The purpose of doing this operation is mainly to unify the parameters of s3.
2023-08-11 17:22:48 +08:00
209f36f1bf [fix](multi-catalog)fix jdbc loader (#22814) 2023-08-11 14:36:19 +08:00
be1e0dcd27 [new-feature](complex-type) support read nested parquet and orc file with complex type (#22793) 2023-08-10 18:23:07 +08:00
Pxl
56392e21ae [Bug](decimalv3) fix decimalv3 keyrange set wrong number #22818 2023-08-10 18:15:40 +08:00
f2658dc7bd [Feature](multi-catalog) Truncate char or varchar columns if size is smaller than file columns or not found in the file column schema. (#22318)
Truncate char or varchar columns if size is smaller than file columns or not found in the file column schema by session var `truncate_char_or_varchar_columns`.
2023-08-10 14:37:20 +08:00
f1db6bd8c1 [feature](hive)append support for struct and map column type on textfile format of hive table (#22347)
1. append support for struct and map column type on textfile format  of hive table.
2. optimizer code that array column type.

```mysql
+------+------------------------------------+
| id   | perf                               |
+------+------------------------------------+
| 1    | {"key1":"value1", "key2":"value2"} |
| 1    | {"key1":"value1", "key2":"value2"} |
| 2    | {"name":"John", "age":"30"}        |
+------+------------------------------------+
```

```mysql
+---------+------------------+
| column1 | column2          |
+---------+------------------+
|       1 | {10, "data1", 1} |
|       2 | {20, "data2", 0} |
|       3 | {30, "data3", 1} |
+---------+------------------+
```
Summarizes support for complex types(support assign delimiter) :

1. array< primitive_type > and array< array< ... > >
2. map< primitive_type , primitive_type >
3. Struct< primitive_type , primitive_type ... >
2023-08-10 13:47:58 +08:00
eafdab0cfd [Enhancement](tvf) Add frontends_disks table-valued-function (#22568)
---------

Co-authored-by: yuxianbing <yuxianbing@yy.com>
Co-authored-by: yuxianbing <iloveqaz123>
2023-08-10 10:40:24 +08:00
919bfd73f1 [improvement](multi-catalog)add scanner isolation class loader (#22247)
Add scanner isolation class loader to make each plugin non-conflicting.
The BE will get scanner classes by JNI call and use JniClassLoader load them.
In the last version,we always get canner classes from the system class path by default,
so it cannot isolate the classes for each scanner
2023-08-10 10:02:46 +08:00
66784cef71 [Enhancement](Load) Stream Load using SQL (#22509)
This PR was originally #16940 , but it has not been updated for a long time due to the original author @Cai-Yao . At present, we will merge some of the code into the master first.

thanks @Cai-Yao @yiguolei
2023-08-08 13:49:04 +08:00
c9dc715c5d [fix](broker-load) fix error when using multi data description for same table in load stmt (#22666)
For load request, there are 2 tuples on scan node, input tuple and output tuple.
The input tuple is for reading file, and it will be converted to output tuple based on user specified column mappings.

And the broker load support different column mapping in different data description to same table(or partition).
So for each scanner, the output tuples are same but the input tuple can be different.

The previous implements save the input tuple in scan node level, causing different scanner using same input tuple,
which is incorrect.
This PR remove the input tuple from scan node and save them in each scanners.
2023-08-07 20:03:03 +08:00
Pxl
7839a0e708 [Bug](brpc) fix brpc failed on big query came concurrently (#22600)
fix PriorityThreadPool get_info get wrong number
change brpc pool from priority to fifo
do not use brpc pool when send eos
2023-08-05 21:24:32 +08:00
3024b82918 [fix](load)Fix wrong default value for char and varchar of reading json data (#22626)
If a column is defined as: col VARCHAR/CHAR NULL and no default value. Then we load json data which misses column col, the result queried is not correct:
+------+
| col |
+------+
| 1 |
+------+
But expect:
+------+
| col |
+------+
| NULL |
+------+

---------

Co-authored-by: duanxujian <duanxujian@jd.com>
2023-08-05 12:47:27 +08:00
Pxl
c1c38c956d [exec] fix coredump when limit<0 and limit!=-1 with 1.2 fe (#22622) 2023-08-04 22:18:45 +08:00
9c0528daf6 [Opt](orc-reader) opt the performance of date convertion. (#22381)
Opt the performance of date conversion in orc reader.

```
mysql> select count(l_commitdate) from lineitem;
+---------------------+
| count(l_commitdate) |
+---------------------+
|           600037902 |
+---------------------+
1 row in set (1.28 sec)

mysql> select count(l_commitdate) from lineitem;
+---------------------+
| count(l_commitdate) |
+---------------------+
|           600037902 |
+---------------------+
1 row in set (0.19 sec)
```
2023-08-04 10:52:09 +08:00
1ed1b69485 [refactor](reader) move reader from vec/exec/scan to vec/exec/format (#22371)
This readers should be in vec/exec/format
2023-08-04 09:47:20 +08:00
e7e73a618c [exec](join) Print join type in profile (#22567) 2023-08-03 20:46:15 +08:00
f7755aa538 [exec](set_operation) Support one child node in set operation (#22463)
Support one child node in set operation
2023-08-03 10:35:59 +08:00
938f768aba [fix](parquet) resolve offset check failed in parquet map type (#22510)
Fix error when reading empty map values in parquet. The `offsets.back()` doesn't not equal the number of elements in map's key column.

### How does this happen
Map in parquet is stored as repeated group, and `repeated_parent_def_level` is set incorrectly when parsing map node in parquet schema.
```
the map definition in parquet:
 optional group <name> (MAP) {
   repeated group map (MAP_KEY_VALUE) {
     required <type> key;
     optional <type> value;
   }
}
```

### How to fix
Set the `repeated_parent_def_level` of key/value node as the definition level of map node.

`repeated_parent_def_level` is the definition level of the first ancestor node whose `repetition_type` equals `REPEATED`.  Empty array/map values are not stored in doris column, so have to use `repeated_parent_def_level` to skip the empty or null values in ancestor node.

For instance, considering an array of strings with 3 rows like the following:
`null, [], [a, b, c]`
We can store four elements in data column: `null, a, b, c`
and the offsets column is: `1, 1, 4`
and the null map is: `1, 0, 0`
For the `i-th` row in array column: range from `offsets[i - 1]` until `offsets[i]` represents the elements in this row, so we can't store empty array/map values in doris data column. As a comparison, spark does not require `repeated_parent_def_level`, because the spark column stores empty array/map values , and use anther length column to indicate empty values. Please reference: https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnVector.java

Furthermore, we can also avoid store null array/map values in doris data column. The same three rows as above, We can only store three elements in data column: `a, b, c`
and the offsets column is: `0, 0, 3`
and the null map is: `1, 0, 0`
2023-08-02 22:33:10 +08:00
9d3f1dcf44 [improvement](vectorized) Deserialized elements of count distinct aggregation directly inserted into target hashset (#21888)
The original logic is to first deserialize the ColumnString into a HashSet (insert the deserialized elements into the hashset), and then traverse all the HashSet elements into the target HashSet during the merge phase.
After optimization, when deserializing, elements are directly inserted into the target HashSet, thereby reducing unnecessary hashset insert overhead.

In one of our internal query tests, 30 hashsets were merged in second phase aggregation(the average cardinality is 1,400,000), and the cardinality after merging is 42,000,000. After optimization, the MergeTime dropped from 5s965ms to 3s375ms.
2023-08-02 21:19:56 +08:00
4bc65aa921 [fix](load) PrefetchBufferedReader Crashing caused updating counter with an invalid runtime profile (#22464) 2023-08-02 18:19:48 +08:00
Pxl
f5e3cd2737 [Improvement](aggregation) optimization for aggregation hash_table_lazy_emplace (#22327)
optimization for aggregation hash_table_lazy_emplace
2023-08-02 11:50:21 +08:00
bc87002028 [opt](conf) remote scanner thread num is changed to core num * 10 (#22427) 2023-08-01 23:09:49 +08:00
3a11de889f [Opt](exec) opt the performance of date parquet convert by date dict (#22384)
before:

mysql> select count(l_commitdate) from lineitem;
+---------------------+
| count(l_commitdate) |
+---------------------+
| 600037902 |
+---------------------+
1 row in set (0.86 sec)
after:

mysql> select count(l_commitdate) from lineitem;
+---------------------+
| count(l_commitdate) |
+---------------------+
| 600037902 |
+---------------------+
1 row in set (0.36 sec)
2023-08-01 12:24:00 +08:00
89433f6a13 [fix](complex_type) throw error when reading complex types in broker/stream load (#22331)
Check whether there are complex types in parquet/orc reader in broker/stream load. Broker/stream load will cast any type as string type, and complex types will be casted wrong. This is a temporary method, and will be replaced by tvf.
2023-07-31 22:23:08 +08:00
f2919567df [feature](datetime) Support timezone when insert datetime value (#21898) 2023-07-31 13:08:28 +08:00
4077338284 [Opt](parquet) opt the performance of date convertion (#22360)
before:
```
mysql>  select count(l_commitdate) from lineitem;
+---------------------+
| count(l_commitdate) |
+---------------------+
|           600037902 |
+---------------------+
1 row in set (1.61 sec)
```

after:
```
mysql>  select count(l_commitdate) from lineitem;
+---------------------+
| count(l_commitdate) |
+---------------------+
|           600037902 |
+---------------------+
1 row in set (0.86 sec)
```
2023-07-30 15:54:13 +08:00
Pxl
210f6661b4 [Bug](profile) add lock on add_filter_info #22355
multiple scanner may update profile at same time
2023-07-29 12:45:50 +08:00
ae8a26335c [opt](hive)opt select count(*) stmt push down agg on parquet in hive . (#22115)
Optimization "select count(*) from table" stmtement , push down "count" type to BE.
support file type : parquet ,orc in hive .

1. 4kfiles , 60kwline num 
    before:  1 min 37.70 sec 
    after:   50.18 sec

2. 50files , 60kwline num
    before: 1.12 sec
    after: 0.82 sec
2023-07-29 00:31:01 +08:00
Pxl
f7e0479605 [Chore](refactor) remove some unused code (#22152)
remove some unused code
2023-07-28 17:30:46 +08:00
1c6246f7ee [improve](agg) support distinct agg node (#22169)
select c_name from customer union select c_name from customer
this sql used agg node to get distinct row of c_name,
so it's no need to wait for inserted all data to hash map,
could output the data which it's inserted into hash map successed.
2023-07-28 13:54:10 +08:00
0d7d9b92db [fix](multi-catalog) complex types parsing failed, with unexpected nulls and rows (#22228)
Fix tow bugs:
1. Unexpected null values in array column. If 65535 consecutive values are not null in nullable array column, this error will be triggered. The reason is that the array parser did not handle boundary conditions.
2. The number of rows of key filed, and that of value field in map column are not equal. Similarly, the number of rows among fields in struct column are not the same. This would be triggered when the number of rows are not equal among parquet pages of different columns in a row group.
2023-07-28 10:03:08 +08:00
8caa5a9ba4 [Fix](mutli-catalog) Fix null partitions error in iceberg tables. (#22185)
### Issue
when partition has null partitions, it throws error
`Failed to fill partition column: t_int=null`

### Resolution
- Fix the following null partitions error in iceberg tables by replacing null partition to '\N'.
- Add regression test for hive null partition.
2023-07-27 23:57:35 +08:00
00863f25e9 [improvement](profile) add table name for file scan node (#22299)
```
VFILE_SCAN_NODE(region)  (id=0):(Active:  3.537us,  %  non-child:  0.00%)
                                -  RuntimeFilters:  :  
                              -  UseSpecificThreadToken:  False
                              -  AcquireRuntimeFilterTime:  501ns
                              -  AllocateResourceTime:  105.598us
```
2023-07-27 23:54:31 +08:00
d0c369d61b [fix](vec) Arena was not initialized in PartitionMethodSerialized (#22295) 2023-07-27 18:55:57 +08:00
6f1c03c766 [fix](jdbc_catalog) fix int and bigint in mysql view when use doris catalog (#22251) 2023-07-27 16:50:42 +08:00
Pxl
560731f392 [Bug](runtime-filter) fix probe expr prepared twice on minmax runtime filter (#22229)
fix probe expr prepared twice on minmax runtime filter
2023-07-26 19:44:35 +08:00
21ea0055fc [improvement](scanner) use batch size of session instead of limit to improve performance of reading (#22240) 2023-07-26 18:57:42 +08:00
Pxl
9451382428 [Improvement](aggregate) optimization for AggregationMethodKeysFixed::insert_keys_into_columns (#22216)
optimization for AggregationMethodKeysFixed::insert_keys_into_columns
2023-07-26 16:19:15 +08:00
4c4f08f805 [fix](hudi) the required fields are empty if only reading partition columns (#22187)
1. If only read the partition columns, the `JniConnector` will produce empty required fields, so `HudiJniScanner` should read the "_hoodie_record_key" field at least to know how many rows in current hoodie split. Even if the `JniConnector` doesn't read this field, the call of `releaseTable` in `JniConnector` will reclaim the resource.

2. To prevent BE failure and exit, `JniConnector` should call release methods after `HudiJniScanner` is initialized. It should be noted that `VectorTable` is created lazily in `JniScanner`,  so we don't need to reclaim the resource when `HudiJniScanner` is failed to initialize.

## Remaining works
Other jni readers like `paimon` and `maxcompute` may encounter the same problems, the jni reader need to handle this abnormal situation on its own, and currently this fix can only ensure that BE will not exit.
2023-07-26 10:59:45 +08:00
7b270d1ae9 [Fix](mutli-catalog) Fix orc reader crashed when hdfs reading error by catching exception. (#22193)
orc reader crashed when hdfs reading error.

0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/zcp_repo/be/src/common/signal_handler.h:413
1# 0x00007F6F8B3C00C0 in /lib/x86_64-linux-gnu/libc.so.6
2# raise in /lib/x86_64-linux-gnu/libc.so.6
3# abort in /lib/x86_64-linux-gnu/libc.so.6
4# _gnu_cxx::_verbose_terminate_handler() [clone .cold] at ../../../../libstdc+-v3/libsupc+/vterminate.cc:75
5# _cxxabiv1::_terminate(void ()) at ../../../../libstdc+-v3/libsupc+/eh_terminate.cc:48
6# 0x0000555CBC4718C1 in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
7# 0x0000555CBC471A14 in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
8# doris::vectorized::ORCFileInputStream::read(void*, unsigned long, unsigned long) at /home/zcp/repo_center/zcp_repo/be/src/vec/exec/format/orc/vorc_reader.cpp:121
9# orc::SeekableFileInputStream::Next(void const*, int) in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
10# orc::DecompressionStream::readHeader() in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
11# orc::DecompressionStream::Next(void const*, int) in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
12# void orc::RleDecoderV2::next<long>(long*, unsigned long, char const*) in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
13# orc::StringDictionaryColumnReader::loadDictionary() in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
14# orc::StructColumnReader::loadStringDicts(std::unordered_map<unsigned long, std::_cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, std::unordered_map<std::cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, orc::StringDictionary*, std::hash<std::cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::_cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, orc::StringDictionary*> > >, orc::StringDictFilter const) in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
15# orc::RowReaderImpl::startNextStripe(orc::ReadPhase const&) in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
16# orc::RowReaderImpl::nextBatch(orc::ColumnVectorBatch&, void*) in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
17# doris::vectorized::OrcReader::get_next_block(doris::vectorized::Block*, unsigned long*, bool*) at /home/zcp/repo_center/zcp_repo/be/src/vec/exec/format/orc/vorc_reader.cpp:1420
18# doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/zcp_repo/be/src/vec/exec/scan/vfile_scanner.cpp:250
19# doris::vectorized::VScanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) in /mnt/hdd01/STRESS_ENV/be/lib/doris_be
20# doris::vectorized::ScannerScheduler::_scanner_scan(doris::vectorized::ScannerScheduler*, doris::vectorized::ScannerContext*, std::shared_ptr<doris::vectorized::VScanner>) at /home/zcp/repo_center/zcp_repo/be/src/vec/exec/scan/scanner_scheduler.cpp:335
21# std::_Function_handler<void (), doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext*)::$_1::operator()() const::
2023-07-26 08:57:31 +08:00
cf677b327b [fix](jdbc catalog) Fixed mappings with type errors for bool and tinyint(1) (#22089)
First of all, mysql does not have a boolean type, its boolean type is actually tinyint(1), in the previous logic, We force tinyint(1) to be a boolean by passing tinyInt1isBit=true, which causes an error if tinyint(1) is not a 0 or 1, Therefore, we need to match tinyint(1) according to tinyint instead of boolean, and this change will not affect the correctness of where k = 1 or where k = true queries
2023-07-25 22:45:22 +08:00
23e7423748 [pipeline](refactor) refactor pipeline task schedule logics (#22028) 2023-07-25 17:18:26 +08:00
103c473b96 [Bug](pipeline) fix pipeline shared scan + topn optimization (#21940) 2023-07-25 12:48:27 +08:00
752cec9e19 [Fix](multi-catalog) Fix not single slot filter conjuncts with dict filter issue. (#22052)
### Issue
Dictionary filtering is a mechanism that directly reads the dictionary encoding of a single string column filter condition for filter comparison. But dictionary filtered single string columns may be included in other multi-column filter conditions. This can cause problems.

For example:
`select * from multi_catalog.lineitem_string_date_orc where l_commitdate < l_receiptdate and l_receiptdate = '1995-01-01'  order by l_orderkey, l_partkey, l_suppkey, l_linenumber limit 10;`

`l_receiptdate` is string filter column,it is included by multi-column filter condition `l_commitdate < l_receiptdate`.

### Solution
Resolve it by separating the multi-column filter conditions and executing it after the dictionary filter column is converted to string.
2023-07-24 22:31:18 +08:00
7fcf702081 [improvement](multi catalog)paimon support filesystem metastore (#21910)
1.support filesystem metastore

2.support predicate and project when split

3.fix partition table query error

todo: Now you need to manually put paimon-s3-0.4.0-incubating.jar in be/lib/java_extensions when use s3 filesystem

doc pr: #21966
2023-07-24 22:02:57 +08:00