Commit Graph

120 Commits

Author SHA1 Message Date
e3c9f535dc [refactor](wal) refactor some wal code (#29434) 2024-01-03 14:45:57 +08:00
69a01e0cf5 [improve](move-memtable) skip load stream stub close wait when cancel (#29427) 2024-01-02 23:35:50 +08:00
706463781c [refactor](group commit) refactor group commit wal code (#29375) 2024-01-02 15:52:03 +08:00
b7487430da Revert "[improve](move-memtable) cancel load rapidly when stream close wait (#29322)" (#29371)
This reverts commit bbf58c5aa42d40e66bc6ccc9ed91a4fcb4bdfff7.
2024-01-02 11:32:14 +08:00
bbf58c5aa4 [improve](move-memtable) cancel load rapidly when stream close wait (#29322) 2023-12-31 16:26:41 +08:00
7623b5cc31 [cleanup](move-memtable) remove namespace stream_load (#27441) 2023-12-30 20:08:23 +08:00
51cb15d032 [improve](move-memtable) cancel load immediately when back pressure in delta writer v2 (#29280) 2023-12-30 10:45:06 +08:00
9ff8bd2e9c [Enhancement](Wal)Support dynamic wal space limit (#27726) 2023-12-27 11:51:32 +08:00
b142ade69e [refactor](renamefile) rename some files according to the class names (#28606) 2023-12-19 14:10:11 +08:00
1e5ff40e17 [refactor](group commit) remove future block (#27720)
Co-authored-by: huanghaibin <284824253@qq.com>
2023-12-11 08:41:51 +08:00
5d548935e0 [improvement](insert) support schema change and decommission for group commit (#26359) 2023-11-17 21:41:38 +08:00
b19abac5e2 [fix](move-memtable) pass num local sink to backends (#26897) 2023-11-14 08:28:49 +08:00
58bf79f79e [fix](move-memtable) pass load stream num to backends (#26198) 2023-11-08 16:16:33 +08:00
519b48648e [fix](move-memtable) handle status when possible (#26526) 2023-11-08 10:09:06 +08:00
a4e415ab09 [feature](hive)Support hive tables after alter type. (#25138)
1.Reconstruct the logic of decode to read parquet. The parquet  reader first reads the data according to the parquet physical type, and then performs a type conversion.

2.Support hive alter table.
2023-11-02 00:24:21 +08:00
8f320944a8 [fix](move-memtable) fix DeltaWriterV2 profile use-after-free (#26110)
The sink who creates the delta writer may be closed while other sinks still using this delta writer.
The parent profile is deconstructed and when the last sink trying to update the profile, it will meet use-after-free.

To address this issue, we record the profile number in delta writer,
and the last sink who close the delta writer will create and update the profile.
2023-10-31 13:52:18 +08:00
9c9fc84f39 [feature](merge-cloud) Abstract BaseTablet for CloudTablet (#24929) 2023-10-18 20:29:04 +08:00
7ea456ef91 [fix](insert) make group commit wal_manager exit elegantly (#25250) 2023-10-14 23:14:06 +08:00
642e5cdb69 [Fix](Status) Make Status [[nodiscard]] and handle returned Status correctly (#23395) 2023-09-29 22:38:52 +08:00
082bcd820b [feature](insert) Support wal for group commit insert (#23053) 2023-09-26 14:46:24 +08:00
dc9fa1a4f1 [Refactor](Sink) convert to tablet sink to tablet writer (#24474) 2023-09-20 14:47:18 +08:00
563c3f75ff [feature](move-memtable) share delta writer v2 among sinks (#24066) 2023-09-13 14:39:29 +08:00
eaf2a6a80e [fix](date) return right date value even if out of the range of date dictionary(#23664)
PR(https://github.com/apache/doris/pull/22360) and PR(https://github.com/apache/doris/pull/22384) optimized the performance of date type. However hive supports date out of 1970~2038, leading wrong date value in tpcds benchmark.
How to fix:
1. Increase dictionary range: 1900 ~ 2038
2. The date out of 1900 ~ 2038 is regenerated.
2023-09-01 14:40:20 +08:00
62c075bf7e [improvement](Block) Replace Block(const PBlock&) with deserialize because it has heavy operations in ctor (#23672) 2023-08-31 14:44:17 +08:00
5ff7b57fc1 [fix](parquet) parquet reader confuses logical/physical/slot id of columns (#23198)
`ParquetReader` confuses logical/physical/slot id of columns. If only reading the scalar types, there's nothing wrong, but when reading complex types, `RowGroup` and `PageIndex` will get wrong statistics. Therefore, if the query contains complex types and pushed-down predicates, the probability of the result set is incorrect.
2023-08-22 13:35:29 +08:00
3a11de889f [Opt](exec) opt the performance of date parquet convert by date dict (#22384)
before:

mysql> select count(l_commitdate) from lineitem;
+---------------------+
| count(l_commitdate) |
+---------------------+
| 600037902 |
+---------------------+
1 row in set (0.86 sec)
after:

mysql> select count(l_commitdate) from lineitem;
+---------------------+
| count(l_commitdate) |
+---------------------+
| 600037902 |
+---------------------+
1 row in set (0.36 sec)
2023-08-01 12:24:00 +08:00
Pxl
19ba6bec38 [Improvement](pipeline) support send eos on local exchange and remove some unused code (#22086)
support send eos on local exchange and remove some unused code
2023-07-24 09:25:32 +08:00
c6063ed92f [Revert](lazy open) revert lazy open and add case (#21821) 2023-07-18 19:41:33 +08:00
ab8125d56f [Improve](performance) introduce SchemaCache to cache TabletSchame & Schema (#20037)
* [Improve](performance) introduce SchemaCache to cache TabletSchame & Schema

1. When the system is under high-concurrency load with wide table point queries, the frequent memory allocation and deallocation of Schema become evident system bottlenecks. Additionally, the initialization of TabletSchema and Schema also becomes a CPU hotspot.Therefore, the introduction of a SchemaCache is implemented to cache these resources for reuse.

2. Make some variables wrapped with std::unique<unique_ptr>

Performance:
| 状态              | QPS | 平均响应时间 (avg) | P99 响应时间 |
|------------------|-----|------------------|-------------|
| 开启 SchemaCache | 501 | 20ms             | 34ms        |
| 关闭 SchemaCache | 321 | 31ms             | 61ms        |

* handle schema change with schema version

* remove useless header

* rebase
2023-05-29 17:34:53 +08:00
9f8de89659 [refactor](exec) replace the single pointer with an array of 'conjuncts' in ExecNode (#19758)
Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity.

By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed.

This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.
2023-05-29 11:47:31 +08:00
f8ef25bb10 [enhancement](load) lazy-open necessary partitions when load (#18874) 2023-05-14 16:09:55 +08:00
Pxl
dfad7b6b38 [Feature](generic-aggregation) some prowork of generic aggregation (#19343)
some prowork of generic aggregation
2023-05-09 21:42:21 +08:00
16a394da0e [chore](build) Use include-what-you-use to optimize includes (PART III) (#18958)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-24 14:51:51 +08:00
63a76ed115 [refactor](exceptionsafe) disallow call new method explicitly (#18830)
disallow call new method explicitly
force to use create_shared or create_unique to use shared ptr
placement new is allowed
reference https://abseil.io/tips/42 to add factory method to all class.
I think we should follow this guide because if throw exception in new method, the program will terminate.

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-21 09:13:24 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
Pxl
c9b4eaea76 [Chore](storage) change FieldType to enum class #18500 2023-04-10 08:53:44 +08:00
47aa8a6d8a [fix](file_cache) turn on file cache by FE session variable (#18340)
Fix tow bugs:
1. Enabling file caching requires both `FE session` and `BE` configurations(enable_file_cache=true) to be enabled.
2. `ParquetReader` has not used `IOContext` previously, but `CachedRemoteFileReader::read_at` needs `IOContext` after PR(#17586).
2023-04-05 15:51:47 +08:00
7c0bcbdca1 [enhance](parquet-reader) cache file meta of parquet to speed up query (#18074)
Problem:
1. FE will split the parquet file into split. So a file can have several splits.
2. BE will scan each split, read the footer of the parquet file.
3. If 2 splits belongs to a same parquet file, the footer of this file will be read twice.

This PR mainly changes:
1. Use kv cache to cache the footer of parquet file.
2. The kv cache is belong to a scan node, so all parquet reader belong to this scan node will share same kv cache.
3. In cache, the key is "meta_file_path", the value is parsed thrift footer.

The KV Cache is sharded into mutlti sub cache.
So that different file can use different sub cache, avoid blocking each other

In my test, a query with 26 splits can reduce the footer parse time from 4s -> 1s
2023-03-25 23:22:57 +08:00
cb79e42e5c [refactor](file-system)(step-1) refactor file sysmte on BE and remove storage_backend (#17586)
See #17764 for details
I have tested:
- Unit test for local/s3/hdfs/broker file system: be/test/io/fs/file_system_test.cpp
- Outfile to local/s3/hdfs/broker.
- Load from local/s3/hdfs/broker.
- Query file on local/s3/hdfs/broker file system, with table value function and catalog.
- Backup/Restore with local/s3/hdfs/broker file system

Not test:
- cold & host data separation case.
2023-03-21 21:08:38 +08:00
bd8e3e6405 [refactor](date) unify DateTimeValue and VecDateTimeValue (#17670) 2023-03-20 16:27:08 +08:00
dd53bc1c8d [unify type system](remove unused type desc) remove some code (#17921)
There are many type definitions in BE. Should unify the type system and simplify the development.



---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-19 14:05:02 +08:00
b4b126b817 [Feature](parquet-reader) Implements dict filter functionality parquet reader. (#17594)
Implements dict filter functionality parquet reader to improve performance.
2023-03-16 20:29:27 +08:00
7229751bd9 [Improve](map-type) Add contains_null for map (#16948)
Add contains_null for map type.
2023-02-23 20:47:26 +08:00
5ec8c51366 [fix](union iterator) fix bug that result data order of VUnionIterator is different (#16938)
Fix bug of #16680, data order of VUnionIterator outout block is changed, which will impact compaction.
2023-02-21 14:17:21 +08:00
9b8c91e18c [improvement](rowset reader) fix possible memleak (#16680)
* [improvement](rowset reader) fix possible memleak

* fix be UT
2023-02-15 11:13:31 +08:00
4fcd6cd236 [refactor](remove unused code) remove load stream mgr (#16580)
remove old stream load pipe
remove old stream load manager

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-02-10 07:46:18 +08:00
Pxl
5e4bb98900 [Chore](build) enable -Wpedantic and update lowest gcc version to 11.1 (#16290)
enable -Wpedantic and update lowest gcc version to 11.1
2023-02-03 11:28:48 +08:00
69e748b076 [fix](schema scanner)change schema_scanner::get_next_row to get_next_block (#15718) 2023-01-30 10:01:50 +08:00
3235b636cc [refactor](remove unused code) remove thread pool manager (#16179)
* remove thread resource manager

* remove  string buffer
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-29 13:03:08 +08:00
79ad74637d [refactor](remove expr) remove non vectorized Expr and ExprContext related codes (#16136) 2023-01-24 10:45:35 +08:00