doris

Author	SHA1	Message	Date
Ashin Gau	c631f4f8a8	[fix](schema change) resolve the use count check of source logical column (#33932 ) Fix error like: ``` 8# google::LogMessageFatal::~LogMessageFatal() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be 9# doris::vectorized::Block::clear_column_data(int) in /mnt/hdd01/ci/master-deploy/be/lib/doris_be 10# doris::vectorized::ParquetReader::get_next_block(doris::vectorized::Block, unsigned long, bool) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/parquet/vparquet_reader.cpp:514 11# doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState, doris::vectorized::Block, bool) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/vfile_scanner.cpp:333 12# doris::vectorized::VScanner::get_block(doris::RuntimeState, doris::vectorized::Block, bool) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/vscanner.cpp:132 13# doris::vectorized::VScanner::get_block_after_projects(doris::RuntimeState, doris::vectorized::Block, bool) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/vscanner.cpp:99 ``` Because source logical column is the destination logical column if logical converter is consistent. Previously, the reference of column was reset after the conversion was completed, but if an EOF occurred, it was returned in advance, but EOF is not a true error. ``` if (_logical_converter->is_consistent()) { // If logical converter is consistent, _src_logical_column is the final destination column, // other components will check the use count _src_logical_column.reset(); } ```	2024-04-22 12:31:46 +08:00
Ashin Gau	9b7af4c0cf	[feature](schema change) unified schema change for parquet and orc reader (#32873 ) Following #25138, unified schema change interface for parquet and orc reader, and can be applied to other format readers as well. Unified schema change interface for all format readers: - First, read the data according to the column type of the file into source column; - Second, convert source column to the destination column with type planned by FE.	2024-04-12 15:09:25 +08:00
Pxl	5f30463bb3	[Chore](descriptors) remove unused codes for descriptors (#33408 ) remove unused codes for descriptors	2024-04-12 15:09:25 +08:00
Pxl	5688c28364	[Bug](runtime-filter) try to fix heap use after free on runtime filter send filter size (#33465 ) (#33522 )	2024-04-11 13:10:24 +08:00
Xinyi Zou	cf7595d423	[opt](memory) Optimize mem tracker accuracy (#32039 ) (#33140 )	2024-04-10 11:42:19 +08:00
Xin Liao	950ca68fac	[fix](move-memtable) fix timeout to get tablet schema (#33256 ) (#33260 )	2024-04-04 21:45:55 +08:00
Xin Liao	df197c6a14	[fix](move-memtable) fix initial use count of streams for auto partition (#33165 ) (#33236 ) Co-authored-by: Kaijie Chen <ckj@apache.org>	2024-04-03 20:31:29 +08:00
huanghaibin	2196c534e8	[fix](group commit) Fix compatibility issues on serializing and deserializing wal file (#32299 )	2024-03-21 14:07:24 +08:00
Mingyu Chen	ef2151ae66	[Feature-WIP](multi-catalog) Add Hive sink on BE side. (#32306 ) (#32364 ) bp #32306 Co-authored-by: Qi Chen <kaka11.chen@gmail.com>	2024-03-18 11:23:01 +08:00
lihangyu	0da010603e	[Improve](TabletSchemaCache) reduce duplicated memory consumption for column name and column path (#31141 ) Both could be reference to related field in TabletColumn.And use shared_ptr for TabletColumn in TabletSchema for later memory reuse	2024-03-09 19:44:42 +08:00
huanghaibin	b66583551c	[fix](group_commit)Fix bound checking problem when reading wal block (#31112 )	2024-02-22 13:01:48 +08:00
huanghaibin	7a1bd6abb0	[improvment](group_commit) Refector scan wal function (#30939 ) Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>	2024-02-20 09:12:38 +08:00
HappenLee	45b4189bb6	[Refactor](opt) Opt rf and remove unless code (#30900 ) Opt rf and remove unless code	2024-02-18 11:50:16 +08:00
Kaijie Chen	f7a340a2df	[improve](move-memtable) add cancel method to load stream stub (#29994 )	2024-01-16 20:23:09 +08:00
Qi Chen	d494674ff4	[opt](parquet-reader) Opt parquet decimal type reading. (#29825 )	2024-01-12 13:58:19 +08:00
Qi Chen	7287c0ca15	[Opt](exec)(multi-catalog) Opt date type reading. (#29571 )	2024-01-12 11:48:39 +08:00
yiguolei	48f58510a8	[refactor](tabletwriter) make tablet writer's rpc callback safe, could exit any time (#29684 ) --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2024-01-12 11:46:29 +08:00
huanghaibin	0b731800a0	[enhancement](group_commit) refector wal manager code (#29560 )	2024-01-07 18:54:41 +08:00
abmdocrt	e3c9f535dc	[refactor](wal) refactor some wal code (#29434 )	2024-01-03 14:45:57 +08:00
HHoflittlefish777	69a01e0cf5	[improve](move-memtable) skip load stream stub close wait when cancel (#29427 )	2024-01-02 23:35:50 +08:00
meiyi	706463781c	[refactor](group commit) refactor group commit wal code (#29375 )	2024-01-02 15:52:03 +08:00
HHoflittlefish777	b7487430da	Revert "[improve](move-memtable) cancel load rapidly when stream close wait (#29322 )" (#29371 ) This reverts commit bbf58c5aa42d40e66bc6ccc9ed91a4fcb4bdfff7.	2024-01-02 11:32:14 +08:00
HHoflittlefish777	bbf58c5aa4	[improve](move-memtable) cancel load rapidly when stream close wait (#29322 )	2023-12-31 16:26:41 +08:00
Kaijie Chen	7623b5cc31	[cleanup](move-memtable) remove namespace `stream_load` (#27441 )	2023-12-30 20:08:23 +08:00
HHoflittlefish777	51cb15d032	[improve](move-memtable) cancel load immediately when back pressure in delta writer v2 (#29280 )	2023-12-30 10:45:06 +08:00
abmdocrt	9ff8bd2e9c	[Enhancement](Wal)Support dynamic wal space limit (#27726 )	2023-12-27 11:51:32 +08:00
yiguolei	b142ade69e	[refactor](renamefile) rename some files according to the class names (#28606 )	2023-12-19 14:10:11 +08:00
meiyi	1e5ff40e17	[refactor](group commit) remove future block (#27720 ) Co-authored-by: huanghaibin <284824253@qq.com>	2023-12-11 08:41:51 +08:00
huanghaibin	5d548935e0	[improvement](insert) support schema change and decommission for group commit (#26359 )	2023-11-17 21:41:38 +08:00
Kaijie Chen	b19abac5e2	[fix](move-memtable) pass num local sink to backends (#26897 )	2023-11-14 08:28:49 +08:00
Kaijie Chen	58bf79f79e	[fix](move-memtable) pass load stream num to backends (#26198 )	2023-11-08 16:16:33 +08:00
Kaijie Chen	519b48648e	[fix](move-memtable) handle status when possible (#26526 )	2023-11-08 10:09:06 +08:00
daidai	a4e415ab09	[feature](hive)Support hive tables after alter type. (#25138 ) 1.Reconstruct the logic of decode to read parquet. The parquet reader first reads the data according to the parquet physical type, and then performs a type conversion. 2.Support hive alter table.	2023-11-02 00:24:21 +08:00
Kaijie Chen	8f320944a8	[fix](move-memtable) fix DeltaWriterV2 profile use-after-free (#26110 ) The sink who creates the delta writer may be closed while other sinks still using this delta writer. The parent profile is deconstructed and when the last sink trying to update the profile, it will meet use-after-free. To address this issue, we record the profile number in delta writer, and the last sink who close the delta writer will create and update the profile.	2023-10-31 13:52:18 +08:00
plat1ko	9c9fc84f39	[feature](merge-cloud) Abstract BaseTablet for CloudTablet (#24929 )	2023-10-18 20:29:04 +08:00
huanghaibin	7ea456ef91	[fix](insert) make group commit wal_manager exit elegantly (#25250 )	2023-10-14 23:14:06 +08:00
bobhan1	642e5cdb69	[Fix](Status) Make `Status` `[[nodiscard]]` and handle returned `Status` correctly (#23395 )	2023-09-29 22:38:52 +08:00
huanghaibin	082bcd820b	[feature](insert) Support wal for group commit insert (#23053 )	2023-09-26 14:46:24 +08:00
HappenLee	dc9fa1a4f1	[Refactor](Sink) convert to tablet sink to tablet writer (#24474 )	2023-09-20 14:47:18 +08:00
Kaijie Chen	563c3f75ff	[feature](move-memtable) share delta writer v2 among sinks (#24066 )	2023-09-13 14:39:29 +08:00
Ashin Gau	eaf2a6a80e	[fix](date) return right date value even if out of the range of date dictionary(#23664 ) PR(https://github.com/apache/doris/pull/22360) and PR(https://github.com/apache/doris/pull/22384) optimized the performance of date type. However hive supports date out of 1970~2038, leading wrong date value in tpcds benchmark. How to fix: 1. Increase dictionary range: 1900 ~ 2038 2. The date out of 1900 ~ 2038 is regenerated.	2023-09-01 14:40:20 +08:00
TengJianPing	62c075bf7e	[improvement](Block) Replace Block(const PBlock&) with deserialize because it has heavy operations in ctor (#23672 )	2023-08-31 14:44:17 +08:00
Ashin Gau	5ff7b57fc1	[fix](parquet) parquet reader confuses logical/physical/slot id of columns (#23198 ) `ParquetReader` confuses logical/physical/slot id of columns. If only reading the scalar types, there's nothing wrong, but when reading complex types, `RowGroup` and `PageIndex` will get wrong statistics. Therefore, if the query contains complex types and pushed-down predicates, the probability of the result set is incorrect.	2023-08-22 13:35:29 +08:00
HappenLee	3a11de889f	[Opt](exec) opt the performance of date parquet convert by date dict (#22384 ) before： mysql> select count(l_commitdate) from lineitem; +---------------------+ \| count(l_commitdate) \| +---------------------+ \| 600037902 \| +---------------------+ 1 row in set (0.86 sec) after: mysql> select count(l_commitdate) from lineitem; +---------------------+ \| count(l_commitdate) \| +---------------------+ \| 600037902 \| +---------------------+ 1 row in set (0.36 sec)	2023-08-01 12:24:00 +08:00
Pxl	19ba6bec38	[Improvement](pipeline) support send eos on local exchange and remove some unused code (#22086 ) support send eos on local exchange and remove some unused code	2023-07-24 09:25:32 +08:00
HHoflittlefish777	c6063ed92f	[Revert](lazy open) revert lazy open and add case (#21821 )	2023-07-18 19:41:33 +08:00
lihangyu	ab8125d56f	[Improve](performance) introduce SchemaCache to cache TabletSchame & Schema (#20037 ) * [Improve](performance) introduce SchemaCache to cache TabletSchame & Schema 1. When the system is under high-concurrency load with wide table point queries, the frequent memory allocation and deallocation of Schema become evident system bottlenecks. Additionally, the initialization of TabletSchema and Schema also becomes a CPU hotspot.Therefore, the introduction of a SchemaCache is implemented to cache these resources for reuse. 2. Make some variables wrapped with std::unique<unique_ptr> Performance: \| 状态 \| QPS \| 平均响应时间 (avg) \| P99 响应时间 \| \|------------------\|-----\|------------------\|-------------\| \| 开启 SchemaCache \| 501 \| 20ms \| 34ms \| \| 关闭 SchemaCache \| 321 \| 31ms \| 61ms \| * handle schema change with schema version * remove useless header * rebase	2023-05-29 17:34:53 +08:00
Jerry Hu	9f8de89659	[refactor](exec) replace the single pointer with an array of 'conjuncts' in ExecNode (#19758 ) Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity. By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed. This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.	2023-05-29 11:47:31 +08:00
HHoflittlefish777	f8ef25bb10	[enhancement](load) lazy-open necessary partitions when load (#18874 )	2023-05-14 16:09:55 +08:00
Pxl	dfad7b6b38	[Feature](generic-aggregation) some prowork of generic aggregation (#19343 ) some prowork of generic aggregation	2023-05-09 21:42:21 +08:00

1 2 3

138 Commits