Commit Graph

1144 Commits

Author SHA1 Message Date
3e8c75e246 [minor](orc) opt the log info in orc reader (#27951) 2023-12-06 20:47:36 +08:00
1be513b927 [pipelineX](local shuffle) Fix local shuffle for colocate/bucket join (#28032) 2023-12-06 10:02:36 +08:00
fd1db4da3d [agg](profile) fix incorrent profile (#28004) 2023-12-05 20:48:10 +08:00
6074cddcf8 [feature](mtmv)add Job and task tvf (#27967)
add:
select * from jobs("type"="mv");
select * from tasks("type"="mv");
select * from jobs("type"="insert");
select * from tasks("type"="insert");

add check priv for mv_infos("database"="xxx");

change JobType MTMV==>MV
2023-12-05 15:12:36 +08:00
54fe1a166b [Refactor](scan) refactor scan scheduler to improve performance (#27948)
* [Refactor](scan) refactor scan scheduler to improve performance

* fix pipeline x core
2023-12-05 13:03:16 +08:00
2b4c4bb442 [Fix][Opt](parquet-reader) Fix filter push down with decimal types in parquet reader. (#27897)
Fix filter push down with decimal types in parquet reader introduced by #22842
2023-12-04 22:25:39 +08:00
Pxl
e3d2425d47 [Improvement](join) remove insert_indices_from_join and special judge for -1 (#27779)
remove insert_indices_from_join and special judge for -1
2023-12-04 11:03:22 +08:00
d2a99aa03b [refactor](scan) change scan reschedule into scan context (#27766)
* [refactor](scan) change scan reschedule into scan context
2023-12-04 10:25:52 +08:00
97d36b4f38 [fix](csv_reader) fix trim_double_quotes behavior change (#27882) 2023-12-03 22:57:55 +08:00
fc8b32be7a [Opt](multi-catalog) Opt parquet orc reader numeric copy by memcpy() and memset(). (#27545)
Opt parquet orc reader null map decoding by memset().
2023-12-03 09:55:05 +08:00
54b5d04ff9 [improve](csv_reader) handle csv reader error (#27892) 2023-12-02 10:05:02 +08:00
1706699e7e [fix](multi-catalog)support the max compute partition prune (#27154)
1. max compute partition prune,
we just support filter mc partitions by '=',it can filter just one partition
to support multiple partition filter and range operator('>','<', '>='..), the partition prune should be supported.

2. add max compute row count cache and partitionValues cache

3. add max compute regression case
2023-12-01 22:28:26 +08:00
68525fc112 [feature](profile) add RuntimeFilterInfo in merge profile #27869 2023-12-01 21:42:25 +08:00
3e910e2978 [refactor](simd_json_reader) refactor simd json reader to adapt to parse multi json (#27272) 2023-11-30 15:01:06 +08:00
e4149c6e4c [Fix](parquet-reader) Fix null map issue in parquet reader. (#27777)
Fix null map issue in parquet reader which cause result incorrect such as `min()`, `max()`.

In order to share null map between parquet converted src column and dst column to avoid copying. It is very tricky that will call mutable function `doris_nullable_column->get_null_map_column_ptr()` which will set `_need_update_has_null = true`. Because some operations such as agg will call `has_null()` to set `_need_update_has_null = false`.
2023-11-30 13:55:37 +08:00
498d27c905 [improve](json_reader) add prompt when all fields is null (#27630) 2023-11-29 18:26:42 +08:00
d9d5468621 [feature](audit-log) add audit-log in insert into (#27641) 2023-11-29 15:01:57 +08:00
7398c3daf1 [Feature-Variant](Variant Type) support variant type query and index (#27676) 2023-11-29 10:37:28 +08:00
Pxl
d969047b50 [Refactor](join) refactor of hash join (#27557)
Improve the performance under the tpch data set by reconstructing the join related code and the use of hash table

Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: BiteTheDDDDt <pxl290@qq.com>
2023-11-28 19:46:00 +08:00
Pxl
91b0edfaa2 [Bug](join) try fix wrong _has_null_in_build_side setted (#27684)
try fix wrong _has_null_in_build_side setted
2023-11-28 17:42:14 +08:00
f565f60bc3 [refactor](standard)BE:Initialize pointer variables in the class to nullptr by default (#27587) 2023-11-28 13:02:30 +08:00
fe7ff6f113 [Opt](functions) Opt tvf number for performance regression framework (#27582)
Opt tvf number for performance regression framework
2023-11-28 10:43:51 +08:00
d10a708fa2 [improve](jdbc catalog) add profile for jdbc scan (#27447) 2023-11-27 10:33:39 +08:00
dfe3a2dd01 [feature](mtmv)(3)Implementing multi table materialized views (#26146)
Introduction to Main Classes:
- MTMVService:MTMV services for other modules to call
- MTMVHookService:All operations that affect the MTMV
  - MTMVJobManager:All operations that affect the MTMV job
  - MTMVCacheManager:All operations that affect the MTMV Cache
- MTMVTask&MTMVJob:Inherit from job framework
2023-11-24 12:34:38 +08:00
dd65cc1d14 [opt](MergedIO) no need to merge large columns (#27315)
1. Fix a profile bug of `MergeRangeFileReader`, and add a profile `ApplyBytes` to show the total bytes  of ranges.
2. There's no need to merge large columns, because `MergeRangeFileReader` will increase the copy time.
2023-11-23 19:15:47 +08:00
2ea33518b0 [Opt](load) use batching to optimize auto partition (#26915)
use batching to optimize auto partition
2023-11-23 19:12:28 +08:00
b457856bd2 [chore](be) remove bthread scanner related codes (#27417) 2023-11-23 15:18:49 +08:00
Pxl
301bfe4d5d [Bug](mark-join) fix mark join report error when probe block have column do not output (#27360)
fix mark join report error when probe block have column do not output
2023-11-23 11:16:02 +08:00
5442e8d1fc [pipelineX](dependency) split different dependencies (#27366) 2023-11-22 12:50:39 +08:00
1ebb54afdc [fix](null equal) fix coredump of pushing eq_for_null (#27341) 2023-11-21 18:36:33 +08:00
459f75073f [pipelineX](dependency) remove OrDependency (#27242) 2023-11-20 13:05:34 +08:00
febd60c75f [fix](join) incorrect result of left join with other conjuncts (#27238) 2023-11-19 15:36:39 +08:00
b42828cf69 [fix](window_function) min/max/sum/avg should be always nullable (#27104)
Co-authored-by: starocean999 <40539150+starocean999@users.noreply.github.com>
2023-11-18 18:41:42 +08:00
b1eef30b49 [pipelineX](dependency) Wake up task by dependencies (#26879)
---------

Co-authored-by: Mryange <2319153948@qq.com>
2023-11-18 03:20:24 +08:00
5d548935e0 [improvement](insert) support schema change and decommission for group commit (#26359) 2023-11-17 21:41:38 +08:00
c459408580 [fix](jni) avoid BE crash and NPE when close paimon reader (#27129)
1. Do not use FATAL log when jni encounter error, to avoid crash.
2. Fix NPE when closing PaimonReader, the reader may not be assigned if PaimonReader open failed.
2023-11-17 20:01:08 +08:00
52995c528e [fix](iceberg) iceberg use customer method to encode special characters of field name (#27108)
Fix two bugs:
1. Missing column is case sensitive, change the column name to lower case in FE for hive/iceberg/hudi
2. Iceberg use custom method to encode special characters in column name. Decode the column name to match the right column in parquet reader.
2023-11-17 18:38:55 +08:00
a0661ed9d2 [Fix](multi-catalog) Fix complex type crash when using dict filter facility in the parquet-reader. (#27151)
- Fix complex type crash when using the dict filter facility in the parquet-reader by turning off the dict filter facility in this case.
- Add orc complex types regression test.
2023-11-17 13:43:58 +08:00
0491437a86 [Opt](scanner-scheduler) Optimize BlockingQueue, BlockingPriorityQueue and change remote scan thread pool. (#26784)
## Proposed changes
- Optimize `BlockingQueue`, `BlockingPriorityQueue` by swapping `notify` and `unlock` to reduce lock competition. Ref: https://www.boost.org/doc/libs/1_54_0/boost/thread/sync_bounded_queue.hpp
- Change remote scan thread pool to `PriorityQueue`.

### Test result
Before:
```
mysql> select  sum(lo_partkey)  from  lineorder;
+-----------------+
| sum(lo_partkey) |
+-----------------+
| 300021444265405 |
+-----------------+
1 row in set (1.11 sec)
```

After:
```
mysql> select  sum(lo_partkey)  from  lineorder;
+-----------------+
| sum(lo_partkey) |
+-----------------+
| 300021444265405 |
+-----------------+
1 row in set (0.80 sec)
```
2023-11-15 18:24:36 +08:00
3585c7e216 [test](parquet)append parquet reader byte_array_decimal and rle_bool case (#26751) 2023-11-14 15:05:10 +08:00
ec40603b93 [fix](parquet) compressed_page_size has the same meaning in page v1 and v2 (#26783)
1. Parquet with page v2 is parsed error when using other codec except snappy. Because `compressed_page_size` has the same meaning in page v1 and v2, it always contains the bytes of definition level, repetition level and compressed data.
2. Add regression test for `fix_length_byte_array` stored decimal type, and dictionary encoded date/datetime type.
2023-11-14 08:30:42 +08:00
5ad49dceaa [fix](scanner_schedule) scanner hangs due to negative num_running_scanners (#26816)
* [fix] scanner hangs due to negative num_running_scanners

Before the patch, num_running_scanners is increased after submitting,
then it may be decreased before increasing then negative values can
be seen by get_block_from_queue and a expected submit does not happend.

Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>
2023-11-13 23:03:49 +08:00
504ec324bb Revert "[refactor](scan) delete bloom_filter_predicate (#26499)" (#26851)
This reverts commit 2bb3ef198144954583aea106591959ee09932cba.
2023-11-13 16:27:23 +08:00
2f32a721ee [refactor](jni) unified jni framework for jdbc catalog (#26317)
This commit overhauls the JDBC connector logic within our project, transitioning from the previous mechanism of fetching data through JNI calls for individual ResultSet items to a more efficient and unified approach using the VectorTable data structure.
2023-11-13 14:28:15 +08:00
d9e0a9fa2e [enhancement](230) print max version and spec version when -230 happens (#26643)
More information is provided.
2023-11-13 09:57:22 +08:00
66054a5c78 [opt](scanner) increase the connection num of s3 client (#26795) 2023-11-12 00:29:11 -06:00
196fadc044 [enhancement](metrics) enhance visibility of flush thread pool (#26544) 2023-11-11 19:53:24 +08:00
c07a70e22a [Fix](orc-reader) Add missing break introduced by #26548. (#26633)
Add missing break introduced by #26548. Sorry for this mistake.
2023-11-09 18:29:44 +08:00
a5565f68b2 [Refactor](opentelemetry) Remove opentelemetry (#26605) 2023-11-09 18:05:34 +08:00
22bf2889e5 [feature](tvf)(jni-avro)jni-avro scanner add complex data types (#26236)
Support avro's enum, record, union data types
2023-11-09 13:58:49 +08:00