Commit Graph

149 Commits

Author SHA1 Message Date
df8bc8f23d branch-2.1: [fix](parquet) impl has_dict_page to replace old logic and fix write empty parquet row group bug #45740 (#45954)
Cherry-picked from #45740

Co-authored-by: Socrates <suyiteng@selectdb.com>
2024-12-26 15:17:49 +08:00
d4a6fd1850 Revert #43255 & #44615 (#45096)
Revert "branch-2.1: [enhance](orc) Optimize ORC Predicate Pushdown for
OR-connected Predicate #43255 (#44438)"
Revert "[fix](orc) check all the cases before build_search_argument
(#44615) (#44801)"
2024-12-06 21:14:13 +08:00
5f952cf6ed branch-2.1: [fix](iceberg)Bring field_id with parquet files And fix map type's key optional #44470 (#44828)
Cherry-picked from #44470

Co-authored-by: wuwenchi <wuwenchi@selectdb.com>
2024-12-02 10:24:07 +08:00
4b15b1f263 [fix](orc) check all the cases before build_search_argument (#44615) (#44801)
cherry-pick #44615

Co-authored-by: Socrates <suyiteng@selectdb.com>
2024-11-30 09:17:56 +08:00
dceaf97381 branch-2.1: [enhance](orc) Optimize ORC Predicate Pushdown for OR-connected Predicate #43255 (#44438)
Cherry-picked from #43255

Co-authored-by: Socrates <suyiteng@selectdb.com>
2024-11-22 22:52:53 +08:00
a44a274563 [Fix](parquet-reader) Fix and optimize parquet min-max filtering. (#39375)
Backport #38277.
2024-08-15 14:12:54 +08:00
fed632bf4a [fix](move-memtable) check segment num when closing each tablet (#36753) (#37536)
cherry-pick #36753 and #37660
2024-07-11 20:33:44 +08:00
b75533e72b [branch-2.1](beut) fix BE UT (#36147)
only for branch-2.1
2024-06-12 08:21:38 +08:00
b91d2caab8 [Feature](iceberg-writer) Implements iceberg sink basic functionality for inserting into table. (#35587)
backport #34929
2024-05-29 16:40:54 +08:00
98f8eb5c43 [opt](split) get file splits in batch mode (#34032) (#35107)
bp  #34032
2024-05-21 22:27:07 +08:00
0f0c0a266b [opt](parquet)Skip page with offset index (#33082)
Make skip_page() in ColumnChunkReader more efficient. No more reading page headers if there are pagelocations in chunk.
2024-04-26 15:06:16 +08:00
c631f4f8a8 [fix](schema change) resolve the use count check of source logical column (#33932)
Fix error like:
```
8# google::LogMessageFatal::~LogMessageFatal() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
 9# doris::vectorized::Block::clear_column_data(int) in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
10# doris::vectorized::ParquetReader::get_next_block(doris::vectorized::Block*, unsigned long*, bool*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/parquet/vparquet_reader.cpp:514
11# doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/vfile_scanner.cpp:333
12# doris::vectorized::VScanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/vscanner.cpp:132
13# doris::vectorized::VScanner::get_block_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/vscanner.cpp:99
```

Because source logical column is the destination logical column if logical converter is consistent. Previously, the reference of column was reset after the conversion was completed, but if an EOF occurred, it was returned in advance, but EOF is not a true error.
```
if (_logical_converter->is_consistent()) {
            // If logical converter is consistent, _src_logical_column is the final destination column,
            // other components will check the use count
            _src_logical_column.reset();
}
```
2024-04-22 12:31:46 +08:00
9b7af4c0cf [feature](schema change) unified schema change for parquet and orc reader (#32873)
Following #25138, unified schema change interface for parquet and orc reader, and can be applied to other format readers as well.
Unified schema change interface for all format readers:
- First, read the data according to the column type of the file into source column;
- Second, convert source column to the destination column with type planned by FE.
2024-04-12 15:09:25 +08:00
Pxl
5f30463bb3 [Chore](descriptors) remove unused codes for descriptors (#33408)
remove unused codes for descriptors
2024-04-12 15:09:25 +08:00
Pxl
5688c28364 [Bug](runtime-filter) try to fix heap use after free on runtime filter send filter size (#33465) (#33522) 2024-04-11 13:10:24 +08:00
cf7595d423 [opt](memory) Optimize mem tracker accuracy (#32039) (#33140) 2024-04-10 11:42:19 +08:00
950ca68fac [fix](move-memtable) fix timeout to get tablet schema (#33256) (#33260) 2024-04-04 21:45:55 +08:00
df197c6a14 [fix](move-memtable) fix initial use count of streams for auto partition (#33165) (#33236)
Co-authored-by: Kaijie Chen <ckj@apache.org>
2024-04-03 20:31:29 +08:00
2196c534e8 [fix](group commit) Fix compatibility issues on serializing and deserializing wal file (#32299) 2024-03-21 14:07:24 +08:00
ef2151ae66 [Feature-WIP](multi-catalog) Add Hive sink on BE side. (#32306) (#32364)
bp #32306
Co-authored-by: Qi Chen <kaka11.chen@gmail.com>
2024-03-18 11:23:01 +08:00
0da010603e [Improve](TabletSchemaCache) reduce duplicated memory consumption for column name and column path (#31141)
Both could be reference to related field in TabletColumn.And use shared_ptr for TabletColumn in TabletSchema for later memory reuse
2024-03-09 19:44:42 +08:00
b66583551c [fix](group_commit)Fix bound checking problem when reading wal block (#31112) 2024-02-22 13:01:48 +08:00
7a1bd6abb0 [improvment](group_commit) Refector scan wal function (#30939)
Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>
2024-02-20 09:12:38 +08:00
45b4189bb6 [Refactor](opt) Opt rf and remove unless code (#30900)
Opt rf and remove unless code
2024-02-18 11:50:16 +08:00
f7a340a2df [improve](move-memtable) add cancel method to load stream stub (#29994) 2024-01-16 20:23:09 +08:00
d494674ff4 [opt](parquet-reader) Opt parquet decimal type reading. (#29825) 2024-01-12 13:58:19 +08:00
7287c0ca15 [Opt](exec)(multi-catalog) Opt date type reading. (#29571) 2024-01-12 11:48:39 +08:00
48f58510a8 [refactor](tabletwriter) make tablet writer's rpc callback safe, could exit any time (#29684)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-01-12 11:46:29 +08:00
0b731800a0 [enhancement](group_commit) refector wal manager code (#29560) 2024-01-07 18:54:41 +08:00
e3c9f535dc [refactor](wal) refactor some wal code (#29434) 2024-01-03 14:45:57 +08:00
69a01e0cf5 [improve](move-memtable) skip load stream stub close wait when cancel (#29427) 2024-01-02 23:35:50 +08:00
706463781c [refactor](group commit) refactor group commit wal code (#29375) 2024-01-02 15:52:03 +08:00
b7487430da Revert "[improve](move-memtable) cancel load rapidly when stream close wait (#29322)" (#29371)
This reverts commit bbf58c5aa42d40e66bc6ccc9ed91a4fcb4bdfff7.
2024-01-02 11:32:14 +08:00
bbf58c5aa4 [improve](move-memtable) cancel load rapidly when stream close wait (#29322) 2023-12-31 16:26:41 +08:00
7623b5cc31 [cleanup](move-memtable) remove namespace stream_load (#27441) 2023-12-30 20:08:23 +08:00
51cb15d032 [improve](move-memtable) cancel load immediately when back pressure in delta writer v2 (#29280) 2023-12-30 10:45:06 +08:00
9ff8bd2e9c [Enhancement](Wal)Support dynamic wal space limit (#27726) 2023-12-27 11:51:32 +08:00
b142ade69e [refactor](renamefile) rename some files according to the class names (#28606) 2023-12-19 14:10:11 +08:00
1e5ff40e17 [refactor](group commit) remove future block (#27720)
Co-authored-by: huanghaibin <284824253@qq.com>
2023-12-11 08:41:51 +08:00
5d548935e0 [improvement](insert) support schema change and decommission for group commit (#26359) 2023-11-17 21:41:38 +08:00
b19abac5e2 [fix](move-memtable) pass num local sink to backends (#26897) 2023-11-14 08:28:49 +08:00
58bf79f79e [fix](move-memtable) pass load stream num to backends (#26198) 2023-11-08 16:16:33 +08:00
519b48648e [fix](move-memtable) handle status when possible (#26526) 2023-11-08 10:09:06 +08:00
a4e415ab09 [feature](hive)Support hive tables after alter type. (#25138)
1.Reconstruct the logic of decode to read parquet. The parquet  reader first reads the data according to the parquet physical type, and then performs a type conversion.

2.Support hive alter table.
2023-11-02 00:24:21 +08:00
8f320944a8 [fix](move-memtable) fix DeltaWriterV2 profile use-after-free (#26110)
The sink who creates the delta writer may be closed while other sinks still using this delta writer.
The parent profile is deconstructed and when the last sink trying to update the profile, it will meet use-after-free.

To address this issue, we record the profile number in delta writer,
and the last sink who close the delta writer will create and update the profile.
2023-10-31 13:52:18 +08:00
9c9fc84f39 [feature](merge-cloud) Abstract BaseTablet for CloudTablet (#24929) 2023-10-18 20:29:04 +08:00
7ea456ef91 [fix](insert) make group commit wal_manager exit elegantly (#25250) 2023-10-14 23:14:06 +08:00
642e5cdb69 [Fix](Status) Make Status [[nodiscard]] and handle returned Status correctly (#23395) 2023-09-29 22:38:52 +08:00
082bcd820b [feature](insert) Support wal for group commit insert (#23053) 2023-09-26 14:46:24 +08:00
dc9fa1a4f1 [Refactor](Sink) convert to tablet sink to tablet writer (#24474) 2023-09-20 14:47:18 +08:00