Commit Graph

310 Commits

Author SHA1 Message Date
db69457576 [fix](avro)Fix S3 TVF avro format reading failure (#22199)
This pr fixes two issues:

1. when using s3 TVF to query files in AVRO format, due to the change of `TFileType`, the originally queried `FILE_S3 ` becomes `FILE_LOCAL`, causing the query failed.
2. currently, both parameters `s3.virtual.key` and `s3.virtual.bucket` are removed. A new `S3Utils`  in jni-avro to parse the bucket and key of s3.
The purpose of doing this operation is mainly to unify the parameters of s3.
2023-08-11 17:22:48 +08:00
Pxl
56392e21ae [Bug](decimalv3) fix decimalv3 keyrange set wrong number #22818 2023-08-10 18:15:40 +08:00
f2658dc7bd [Feature](multi-catalog) Truncate char or varchar columns if size is smaller than file columns or not found in the file column schema. (#22318)
Truncate char or varchar columns if size is smaller than file columns or not found in the file column schema by session var `truncate_char_or_varchar_columns`.
2023-08-10 14:37:20 +08:00
eafdab0cfd [Enhancement](tvf) Add frontends_disks table-valued-function (#22568)
---------

Co-authored-by: yuxianbing <yuxianbing@yy.com>
Co-authored-by: yuxianbing <iloveqaz123>
2023-08-10 10:40:24 +08:00
c9dc715c5d [fix](broker-load) fix error when using multi data description for same table in load stmt (#22666)
For load request, there are 2 tuples on scan node, input tuple and output tuple.
The input tuple is for reading file, and it will be converted to output tuple based on user specified column mappings.

And the broker load support different column mapping in different data description to same table(or partition).
So for each scanner, the output tuples are same but the input tuple can be different.

The previous implements save the input tuple in scan node level, causing different scanner using same input tuple,
which is incorrect.
This PR remove the input tuple from scan node and save them in each scanners.
2023-08-07 20:03:03 +08:00
Pxl
7839a0e708 [Bug](brpc) fix brpc failed on big query came concurrently (#22600)
fix PriorityThreadPool get_info get wrong number
change brpc pool from priority to fifo
do not use brpc pool when send eos
2023-08-05 21:24:32 +08:00
Pxl
c1c38c956d [exec] fix coredump when limit<0 and limit!=-1 with 1.2 fe (#22622) 2023-08-04 22:18:45 +08:00
1ed1b69485 [refactor](reader) move reader from vec/exec/scan to vec/exec/format (#22371)
This readers should be in vec/exec/format
2023-08-04 09:47:20 +08:00
4bc65aa921 [fix](load) PrefetchBufferedReader Crashing caused updating counter with an invalid runtime profile (#22464) 2023-08-02 18:19:48 +08:00
bc87002028 [opt](conf) remote scanner thread num is changed to core num * 10 (#22427) 2023-08-01 23:09:49 +08:00
89433f6a13 [fix](complex_type) throw error when reading complex types in broker/stream load (#22331)
Check whether there are complex types in parquet/orc reader in broker/stream load. Broker/stream load will cast any type as string type, and complex types will be casted wrong. This is a temporary method, and will be replaced by tvf.
2023-07-31 22:23:08 +08:00
Pxl
210f6661b4 [Bug](profile) add lock on add_filter_info #22355
multiple scanner may update profile at same time
2023-07-29 12:45:50 +08:00
ae8a26335c [opt](hive)opt select count(*) stmt push down agg on parquet in hive . (#22115)
Optimization "select count(*) from table" stmtement , push down "count" type to BE.
support file type : parquet ,orc in hive .

1. 4kfiles , 60kwline num 
    before:  1 min 37.70 sec 
    after:   50.18 sec

2. 50files , 60kwline num
    before: 1.12 sec
    after: 0.82 sec
2023-07-29 00:31:01 +08:00
8caa5a9ba4 [Fix](mutli-catalog) Fix null partitions error in iceberg tables. (#22185)
### Issue
when partition has null partitions, it throws error
`Failed to fill partition column: t_int=null`

### Resolution
- Fix the following null partitions error in iceberg tables by replacing null partition to '\N'.
- Add regression test for hive null partition.
2023-07-27 23:57:35 +08:00
00863f25e9 [improvement](profile) add table name for file scan node (#22299)
```
VFILE_SCAN_NODE(region)  (id=0):(Active:  3.537us,  %  non-child:  0.00%)
                                -  RuntimeFilters:  :  
                              -  UseSpecificThreadToken:  False
                              -  AcquireRuntimeFilterTime:  501ns
                              -  AllocateResourceTime:  105.598us
```
2023-07-27 23:54:31 +08:00
21ea0055fc [improvement](scanner) use batch size of session instead of limit to improve performance of reading (#22240) 2023-07-26 18:57:42 +08:00
23e7423748 [pipeline](refactor) refactor pipeline task schedule logics (#22028) 2023-07-25 17:18:26 +08:00
103c473b96 [Bug](pipeline) fix pipeline shared scan + topn optimization (#21940) 2023-07-25 12:48:27 +08:00
7fcf702081 [improvement](multi catalog)paimon support filesystem metastore (#21910)
1.support filesystem metastore

2.support predicate and project when split

3.fix partition table query error

todo: Now you need to manually put paimon-s3-0.4.0-incubating.jar in be/lib/java_extensions when use s3 filesystem

doc pr: #21966
2023-07-24 22:02:57 +08:00
f8307f1a1a [bugfix](scanner) when scanner init failed during get tablet, not need call update counters (#22117)
Co-authored-by: yiguolei <yiguolei@gmail.com>
If the scanner is failed during init or open, then not need update counters because the query is fail and the counter is useless.
And it may core during update counters. For example, update counters depend on scanner's tablet, but the tablet == null when init failed.
2023-07-23 10:19:20 +08:00
bed940b7fc [fix](log) column index off-by-one error in scanner logs (#19747) 2023-07-21 18:30:01 +08:00
Pxl
4171309b9b [Bug](scanner) fix core dump due to release ScannerContext too early #21946 2023-07-19 00:53:23 +08:00
Pxl
3089e4b3b6 [Bug](excution) fix ScannerContext is done make query failed (#21923)
fix ScannerContext is done make query failed
2023-07-18 17:58:00 +08:00
Pxl
19492b06c1 [Bug](decimalv3) fix failed on test_dup_tab_decimalv3 due to wrong precision (#21890)
fix failed on test_dup_tab_decimalv3 due to wrong precision
2023-07-18 12:53:09 +08:00
Pxl
b3d3ffa2de [Bug](pipeline) adjust scanner scheduler.submit and _num_scheduling_ctx maintain (#21843)
adjust scanner scheduler.submit and _num_scheduling_ctx maintain
2023-07-18 11:55:21 +08:00
5fc0a84735 [improvement](catalog) reduce the size thrift params for external table query (#21771)
### 1
In previous implementation, for each FileSplit, there will be a `TFileScanRange`, and each `TFileScanRange`
contains a list of `TFileRangeDesc` and a `TFileScanRangeParams`.
So if there are thousands of FileSplit, there will be thousands of `TFileScanRange`, which cause the thrift
data send to BE too large, resulting in:

1. the rpc of sending fragment may fail due to timeout
2. FE will OOM

For a certain query request, the `TFileScanRangeParams` is the common part and is same of all `TFileScanRange`.
So I move this to the `TExecPlanFragmentParams`.
After that, for each FileSplit, there is only a list of `TFileRangeDesc`.

In my test, to query a hive table with 100000 partitions, the size of thrift data reduced from 151MB to 15MB,
and the above 2 issues are gone.

### 2
Support when setting `max_external_file_meta_cache_num` <=0, the file meta cache for parquet footer will
not be used.
Because I found that for some wide table, the footer is too large(1MB after compact, and much more after
deserialized to thrift), it will consuming too much memory of BE when there are many files.

This will be optimized later, here I just support to disable this cache.
2023-07-17 13:37:02 +08:00
ca6e33ec0c [feature](table-value-functions)add catalogs table-value-function (#21790)
mysql> select * from catalogs() order by CatalogId;
2023-07-14 10:25:16 +08:00
9cad929e96 [Fix](rowset) When a rowset is cooled down, it is directly deleted. This can result in data query misses in the second phase of a two-phase query. (#21741)
* [Fix](rowset) When a rowset is cooled down, it is directly deleted. This can result in data query misses in the second phase of a two-phase query.

related pr #20732

There are two reasons for moving the logic of delayed deletion from the Tablet to the StorageEngine. The first reason is to consolidate the logic and unify the delayed operations. The second reason is that delayed garbage collection during queries can cause rowsets to remain in the "stale rowsets" state, preventing the timely deletion of rowset metadata, It may cause rowset metadata too large.

* not use unused rowsets
2023-07-13 11:46:12 +08:00
d86c67863d Remove unused code (#21735) 2023-07-12 14:48:13 +08:00
d3317aa33b [Fix](executor)Fix scan entity core #21696
After the last time to call scan_task.scan_func(),the should be ended, this means PipelineFragmentContext could be released.
Then after PipelineFragmentContext is released, visiting its field such as query_ctx or _state may cause core dump.
But it can only explain core 2

void ScannerScheduler::_task_group_scanner_scan(ScannerScheduler* scheduler,
                                                taskgroup::ScanTaskTaskGroupQueue* scan_queue) {
    while (!_is_closed) {
        taskgroup::ScanTask scan_task;
        auto success = scan_queue->take(&scan_task);
        if (success) {
            int64_t time_spent = 0;
            {
                SCOPED_RAW_TIMER(&time_spent);
                scan_task.scan_func();
            }
            scan_queue->update_statistics(scan_task, time_spent);
        }
    }
}
2023-07-11 15:56:13 +08:00
Pxl
ca71048f7f [Chore](status) avoid empty error msg on status (#21454)
avoid empty error msg on status
2023-07-11 13:48:16 +08:00
f87a3ccba2 [fix](runtime_filter) runtime_profile was not initialized in multi_cast_data_stream_source (#21690) 2023-07-11 00:16:29 +08:00
9ee7fa45d1 [Refactor](multi-catalog) Refactor to process splitted conjuncts for dict filter. (#21459)
Conjuncts are currently split, so refactor source code to handle split conjuncts for dict filters.
2023-07-07 09:19:08 +08:00
009b300abd [Fix](ScannerScheduler) fix dead lock when shutdown group_local_scan_thread_pool (#21553) 2023-07-06 13:09:37 +08:00
9adbca685a [opt](hudi) use spark bundle to read hudi data (#21260)
Use spark-bundle to read hudi data instead of using hive-bundle to read hudi data.

**Advantage** for using spark-bundle to read hudi data:
1. The performance of spark-bundle is more than twice that of hive-bundle
2. spark-bundle using `UnsafeRow` can reduce data copying and GC time of the jvm
3. spark-bundle support `Time Travel`, `Incremental Read`, and `Schema Change`, these functions can be quickly ported to Doris

**Disadvantage** for using spark-bundle to read hudi data:
1. More dependencies make hudi-dependency.jar very cumbersome(from 138M -> 300M)
2. spark-bundle only provides `RDD` interface and cannot be used directly
2023-07-04 17:04:49 +08:00
90dd8716ed [refactor](multicast) change the way multicast do filter, project and shuffle (#21412)
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>

1. Filtering is done at the sending end rather than the receiving end
2. Projection is done at the sending end rather than the receiving end
3. Each sender can use different shuffle policies to send data
2023-07-04 16:51:07 +08:00
b86dd11a7d [fix](pipeline) refactor olap table sink close (#20771)
For pipeline, olap table sink close is divided into three stages, try_close() --> pending_finish() --> close()
only after all node channels are done or canceled, pending_finish() will return false, close() will start.
this will avoid block pipeline on close().

In close, check the index channel intolerable failure status after each node channel failure,
if intolerable failure is true, the close will be terminated in advance, and all node channels will be canceled to avoid meaningless blocking.
2023-07-04 11:27:51 +08:00
df23ab3f29 [Enhancement](tvf) Add authentication for workload group tvf (#21323) 2023-06-30 12:56:23 +08:00
7f0e37069f [improvement](olap) filter the whole segment by dictionary (#21239) 2023-06-29 10:34:29 +08:00
a6b51ec19a [Feature](avro) Support Apache Avro file format (#19990)
support read avro file by hdfs() or s3() .
```sql
select * from s3(
         "uri" = "http://127.0.0.1:9312/test2/person.avro",
         "ACCESS_KEY" = "ak",
         "SECRET_KEY" = "sk",
         "FORMAT" = "avro");
+--------+--------------+-------------+-----------------+
| name   | boolean_type | double_type | long_type       |
+--------+--------------+-------------+-----------------+
| Alyssa |            1 |     10.0012 | 100000000221133 |
| Ben    |            0 |    5555.999 |      4009990000 |
| lisi   |            0 | 5992225.999 |      9099933330 |
+--------+--------------+-------------+-----------------+

select * from hdfs(
                "uri" = "hdfs://127.0.0.1:9000/input/person2.avro",
                "fs.defaultFS" = "hdfs://127.0.0.1:9000",
                "hadoop.username" = "doris",
                "format" = "avro");
+--------+--------------+-------------+-----------+
| name   | boolean_type | double_type | long_type |
+--------+--------------+-------------+-----------+
| Alyssa |            1 |  8888.99999 |  89898989 |
+--------+--------------+-------------+-----------+
```

current avro reader only support common data type, the complex data types will be supported later.
2023-06-28 21:15:35 +08:00
e348b9464e [scan](freeblocks) use ConcurrentQueue to replace vector for free blocks (#21241) 2023-06-28 15:10:07 +08:00
76bdcf1d26 [improvement](pipeline) task group scan entity (#19924) 2023-06-25 14:43:35 +08:00
3dfeee3946 [fix](typesystem) fix wrong return type argument cause type check fail (#21082) 2023-06-22 00:04:46 +08:00
81abdeffbc [Improvement](pipeline) Improve shared scan performance (#20785) 2023-06-21 14:36:05 +08:00
2c11ce0a02 [bugfix](topn) fix key topn merge block conflict with index predicate result columns (#20820) 2023-06-20 21:23:00 +08:00
923f7edad0 [opt](hudi) using native reader to read the base file with no log file (#20988)
Two optimizations:
1. Insert string bytes directly to remove decoding&encoding process.
2. Use native reader to read the hudi base file if it has no log file. Use `explain` to show how many splits are read natively.
2023-06-20 11:20:21 +08:00
26cca5e00a [Enhancement](tvf) Add frontends table-valued-function (#20857) 2023-06-19 13:57:40 +08:00
d6b7640cf0 [fix](inverted index) fix check failed for block erase temp column (#20924) 2023-06-18 19:27:48 +08:00
ab32299ba4 [feature](nereids) Support multi target rf #20714
Support multi target runtime filter, mainly for set operation, such as union/intersect/except.
2023-06-16 20:26:00 +08:00
b7a50a09fe [Opt](orc-reader) Optimize orc reader by dict filtering. (#20806)
Optimize orc reader by dict filtering.  It is similar with #17594.
Test result
**ssb-flat-100**: (3 nodes)
| Query        | before opt           | after opt  |
| ------------- |:-------------:| ---------:|
Q1.1 | 1.239 | 1.145
Q1.2 | 1.254 | 1.128
Q1.3 | 1.931 | 1.644
Q2.1 | 1.359 | 1.006
Q2.2 | 1.229 | 0.674
Q2.3 | 0.934 | 0.427
Q3.1 | 2.226 | 1.712
Q3.2 | 2.042 | 1.562
Q3.3 | 1.631 | 1.021
Q3.4 | 1.618 | 0.732
Q4.1 | 2.294 | 1.858
Q4.2 | 2.511 | 1.961
Q4.3 | 1.736 | 1.446
total | 22.004 | 16.316
2023-06-16 13:11:37 +08:00