Commit Graph

1399 Commits

Author SHA1 Message Date
7058b31edd [fix](move-memtable) clear load streams before shutdown SegmentFileWriterThreadPool (#35217) 2024-05-28 13:12:03 +08:00
Pxl
b143f0dfe2 [Improvement](date) shortcut for str to date parse (#35288)
shortcut for str to date parse
2024-05-25 17:47:20 +08:00
639c7ee7fb [fix](decimalv2) fix scale of decimalv2 to string (#35222) (#35359)
* [fix](decimalv2) fix scale of decimalv2 to string
2024-05-24 17:20:43 +08:00
309503855e [Fix](bloom filter) Fix bloom filter memory leak (#34871)
* Issue: Doris occasionally encounters an issue where memory usage becomes exceptionally high and does not decrease. The leaked memory is occupied by Bloom filters stored in memory.

Reason: The segment cache stores segment objects read from files into memory. It functions as an LRU cache with an eviction strategy: when the number of segments exceeds the maximum number, or the total memory size of segment objects in the cache exceeds the maximum usage, it evicts the older segments. However, there is a piece of logic in the code that first reads the segment object into memory, assuming it occupies memory size A, then places the read segment object into the cache (at this point, the cache considers the segment object size to be A). It then reads the segment's Bloom filter from the file and assigns it to the segment's Bloom filter member variable, assuming the Bloom filter occupies memory size B. Thus, the total size of the segment object at this point is A+B. However, the cache does not update this size, leading to the actual size of the segment object stored in the cache (A+B) being larger than the size considered by the cache (A). When the number of segment objects in the cache increases to a certain extent, the used memory will surge dramatically. However, the cache does not perceive the size as reaching the eviction limit, so it does not evict the segment objects. In such cases, a memory leak issue arises.

Solution: Since each segment object only reads the Bloom filter once, the issue can be resolved by changing the logic from reading the segment, placing it into the cache, and then reading the Bloom filter to reading the segment, reading the Bloom filter, and then placing it into the cache.
2024-05-24 16:23:58 +08:00
a6f7747d29 [feature](datatype) add BE config to allow zero date (#34961)
Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
2024-05-23 19:12:39 +08:00
c23384ff07 [fix](decimal) Fix long string casting to decimalv2 (#35121) 2024-05-22 14:32:29 +08:00
98f8eb5c43 [opt](split) get file splits in batch mode (#34032) (#35107)
bp  #34032
2024-05-21 22:27:07 +08:00
b4a798240a [fix](inverted_index) donot use int32_t for index id to avoid overflow (#35062) 2024-05-21 12:58:38 +08:00
e3e5f18f26 [Fix](Json type) correct cast result for json type (#34764) 2024-05-18 18:40:17 +08:00
eb7eaee386 [fix](function) money format (#34680) 2024-05-18 18:35:29 +08:00
1a24895257 [opt](routine-load) optimize routine load task thread pool and related param(#32282) (#34896) 2024-05-15 12:42:02 +08:00
95b05928fd [fix](compaction) fix time series compaction merge empty rowsets priority #34562 (#34765) 2024-05-14 09:10:09 +08:00
0ae1b9c70a [chore](remove code) Remove dragonbox related (#34528)
* Revert "[refactor](mysql result format) use new serde framework to tuple convert (#25006)"

This reverts commit e5ef0aa6d439c3f9b1f1fe5bc89c9ea6a71d4019.

* run buildall

* MORE

* FIX
2024-05-13 22:16:57 +08:00
32cbd4a583 [chore](status) unify error code between thrift,pb, status.h (#34397)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-05-10 14:41:01 +08:00
9b712b03b4 [FIX]fix is_ip_address_in_range func with const param (#34266) 2024-05-10 14:37:20 +08:00
8fdfbcb3c4 Revert "[Opt](func) opt the percentile func performance (#34373) (#34416)"
This reverts commit 509ae425e416b4779ae94eab9c2b21f9850e03c3.
2024-05-07 07:23:48 +08:00
f7900b53ce [enhancement](function) floor/ceil/round/round_bankers can use column as scale argument (#34391) 2024-05-06 22:18:36 +08:00
509ae425e4 [Opt](func) opt the percentile func performance (#34373) (#34416) 2024-05-06 20:10:35 +08:00
0f0c0a266b [opt](parquet)Skip page with offset index (#33082)
Make skip_page() in ColumnChunkReader more efficient. No more reading page headers if there are pagelocations in chunk.
2024-04-26 15:06:16 +08:00
c631f4f8a8 [fix](schema change) resolve the use count check of source logical column (#33932)
Fix error like:
```
8# google::LogMessageFatal::~LogMessageFatal() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
 9# doris::vectorized::Block::clear_column_data(int) in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
10# doris::vectorized::ParquetReader::get_next_block(doris::vectorized::Block*, unsigned long*, bool*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/parquet/vparquet_reader.cpp:514
11# doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/vfile_scanner.cpp:333
12# doris::vectorized::VScanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/vscanner.cpp:132
13# doris::vectorized::VScanner::get_block_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/vscanner.cpp:99
```

Because source logical column is the destination logical column if logical converter is consistent. Previously, the reference of column was reset after the conversion was completed, but if an EOF occurred, it was returned in advance, but EOF is not a true error.
```
if (_logical_converter->is_consistent()) {
            // If logical converter is consistent, _src_logical_column is the final destination column,
            // other components will check the use count
            _src_logical_column.reset();
}
```
2024-04-22 12:31:46 +08:00
7e91e69eb9 [fix](compaction) fix single compaction (#33907)
* [fix](compaction)Fix single compaction to get all local versions #33849

add test and comment

* remove single replica compaction prepare input rowsets

reviesd
2024-04-19 23:30:25 +08:00
ffd9da44a2 [fix](move-memtable) fix commit may fail due to duplicated reports (#32403) 2024-04-19 15:02:49 +08:00
9b7af4c0cf [feature](schema change) unified schema change for parquet and orc reader (#32873)
Following #25138, unified schema change interface for parquet and orc reader, and can be applied to other format readers as well.
Unified schema change interface for all format readers:
- First, read the data according to the column type of the file into source column;
- Second, convert source column to the destination column with type planned by FE.
2024-04-12 15:09:25 +08:00
a4924dabb7 [enhancement](exception) enble exception logic in pipeline execute thread (#33437)
* [enhancement](exception) enble exception logic in pipeline execute thread

* f

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-04-12 15:09:25 +08:00
Pxl
5f30463bb3 [Chore](descriptors) remove unused codes for descriptors (#33408)
remove unused codes for descriptors
2024-04-12 15:09:25 +08:00
26d9082b9a [Feature](function) Add function strcmp (#33272) 2024-04-12 15:09:25 +08:00
31984bb4f0 [feature](function) support quote string function #33055 2024-04-12 15:09:25 +08:00
3d66723214 [branch-2.1](auto-partition) pick auto partition and some more prs (#33523) 2024-04-11 17:12:17 +08:00
Pxl
5688c28364 [Bug](runtime-filter) try to fix heap use after free on runtime filter send filter size (#33465) (#33522) 2024-04-11 13:10:24 +08:00
bc929686e3 [feature](debug point) add macro DBUG_RUN_CALLBACK (#33407) 2024-04-11 09:31:50 +08:00
ef26479282 [improve](serde) support complex type in write/read pb serde (#33124)
support complex type and ip/jsonb in DataTypeSerDe::write_column_to_pb/read_column_from_pb function
2024-04-11 09:31:50 +08:00
e2ad7149c3 [feature](debug point) Add handler to debug point (#33350) 2024-04-10 16:24:13 +08:00
Pxl
8fd6d4c41b [Chore](build) add -Wconversion and remove some unused code (#33127)
add -Wconversion and remove some unused code
2024-04-10 15:26:08 +08:00
c61d6ad1e2 [Feature] support function uuid_to_int and int_to_uuid #33005 2024-04-10 14:53:56 +08:00
bf022f9d8d [enhancement](function truncate) truncate can use column as scale argument (#32746)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-04-10 14:53:56 +08:00
cf7595d423 [opt](memory) Optimize mem tracker accuracy (#32039) (#33140) 2024-04-10 11:42:19 +08:00
39fba884fb [fix](typo) typo fix for 'delete bimap' changing to 'delete bitmap' (#32341) 2024-04-10 11:34:30 +08:00
28e2d89ce3 [Improve](inverted_index) update clucene and improve array inverted index writer (#32436) 2024-04-10 11:34:29 +08:00
950ca68fac [fix](move-memtable) fix timeout to get tablet schema (#33256) (#33260) 2024-04-04 21:45:55 +08:00
df197c6a14 [fix](move-memtable) fix initial use count of streams for auto partition (#33165) (#33236)
Co-authored-by: Kaijie Chen <ckj@apache.org>
2024-04-03 20:31:29 +08:00
ff0da8108b [fix](RF) fix 'Invalid value' error of RF of decimal type (#32749) 2024-03-25 22:34:19 +08:00
d7a3ff1ddf [Fix](Outfile) Fix the column type mapping in the orc/parquet file format (#32281)
| Doris Type             | Orc Type                     |  Parquet Type                |
|---------------------|--------------------|------------------------|
| Date                            | Long (logical: DATE)                 |       int32 (Logical: Date)                                        |
| DateTime                    | TIMESTAMP (logical: TIMESTAMP)    |       int96                          |
2024-03-22 08:52:16 +08:00
7486e96b12 [improve](function) add error msg if exceeded maximum default value in repeat function (#32219)
add some error msg from repeat function, so the user could know the count is greater than default value.
2024-03-21 14:07:49 +08:00
2196c534e8 [fix](group commit) Fix compatibility issues on serializing and deserializing wal file (#32299) 2024-03-21 14:07:24 +08:00
8bd101129a [behavior change](output) change float output format (#32049) 2024-03-21 14:07:22 +08:00
0990014e94 [fix](datetime) fix datetime rounding on BE (#32075) 2024-03-21 14:07:19 +08:00
ef2151ae66 [Feature-WIP](multi-catalog) Add Hive sink on BE side. (#32306) (#32364)
bp #32306
Co-authored-by: Qi Chen <kaka11.chen@gmail.com>
2024-03-18 11:23:01 +08:00
7b74b199a5 [fix](memory) Fix LRU cache deleter and memory tracking (#32080)
In order to add common code to the value deleter of LRU cache, let all lru cache values inherit from LRUCacheValueBase class and tracking memory in destructor.
2024-03-15 17:57:58 +08:00
847ec368be [Fix](smooth-upgrade) Fix incompatibility when upgrade from 2.0 to 2.1 (#32220) 2024-03-14 11:23:05 +08:00
0da010603e [Improve](TabletSchemaCache) reduce duplicated memory consumption for column name and column path (#31141)
Both could be reference to related field in TabletColumn.And use shared_ptr for TabletColumn in TabletSchema for later memory reuse
2024-03-09 19:44:42 +08:00