Commit Graph

1415 Commits

Author SHA1 Message Date
ceef9ee123 [feature](serde) support presto compatible output format (#37039) (#37253)
bp #37039
2024-07-04 13:56:05 +08:00
07278e9dcb [improvement](segmentcache) limit segment cache by memory or segment … (#37035)
…num (#37026)

pick ##37026
2024-06-30 20:34:13 +08:00
f27ae8fa09 [fix](bitmap) incorrect type of BitmapValue with fastunion (#36834) (#36896) 2024-06-28 11:29:03 +08:00
0cff539810 [feature](function) support new function replace_empty (#36283) (#36656)
#36283
2024-06-21 16:46:22 +08:00
c8f2a3f952 [fix](eq_for_null) fix incorrect logic in function eq_for_null #36004 (#36124)
cherry pick from #36004
cherry pick from #36164
2024-06-21 14:31:21 +08:00
612f2ae961 [feature](api) add BE HTTP /api/load_streams (#36312) (#36338)
cherry-pick #36312
2024-06-16 22:09:04 +08:00
b75533e72b [branch-2.1](beut) fix BE UT (#36147)
only for branch-2.1
2024-06-12 08:21:38 +08:00
596a9a16d3 [chore](Compile) Fix segment cache ut's compile error due to miss cherry-pick (#36099) 2024-06-11 17:12:42 +08:00
a0f3c1cd1e [chore](Compile) Fix S3 file writer ut's compile error due to miss cherry-pick (#36037)
The S3 File Writer's ut can't pass ut compile, this pr tries to fix it.
2024-06-08 22:21:20 +08:00
af779f5cd8 Pick "[fix](gclog) Skip tablet dir without schema hash dir in path gc (#32793)" (#35978)
## Proposed changes
Pick "[fix](gclog) Skip tablet dir without schema hash dir in path gc
(#32793)"
2024-06-06 22:24:30 +08:00
f80b856405 [enhancement](oom) return error when bloom filter allocate memory failed (#35790)
## Proposed changes


1. return error when bloom filter allocate memory failed
2. return error when deserialize a block,  it may need a lot of memory.

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-06-03 18:22:11 +08:00
9dd573888a [bugfix](stdcallonce) replace std callonce with a lock because it is not exception safe (#35126) 2024-06-01 08:00:42 +08:00
9c270e5cdf [fix](delete) Fix unrecognized column name delete handler (#32429) (#35742)
pick doris-master #32429
2024-05-31 20:41:22 +08:00
680be6d19f [fix](ub) fix uninitialized accesses in BE (#35370)
ubsan hints:
```c++
/root/doris/be/src/olap/hll.h:93:29: runtime error: load of value 3078029312, which is not a valid value for type 'HllDataType'
/root/doris/be/src/olap/hll.h:94:23: runtime error: load of value 3078029312, which is not a valid value for type 'HllDataType'
/root/doris/be/src/runtime/descriptors.h:439:38: runtime error: load of value 118, which is not a valid value for type 'bool'
/root/doris/be/src/vec/exec/vjdbc_connector.cpp:61:50: runtime error: load of value 35, which is not a valid value for type 'bool' 
```
2024-05-29 20:31:07 +08:00
b91d2caab8 [Feature](iceberg-writer) Implements iceberg sink basic functionality for inserting into table. (#35587)
backport #34929
2024-05-29 16:40:54 +08:00
8fb28244d6 [improvement](page builder) avoid allocating big memory in ctor (#35493)
## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

## Further comments

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
2024-05-29 15:03:54 +08:00
7058b31edd [fix](move-memtable) clear load streams before shutdown SegmentFileWriterThreadPool (#35217) 2024-05-28 13:12:03 +08:00
Pxl
b143f0dfe2 [Improvement](date) shortcut for str to date parse (#35288)
shortcut for str to date parse
2024-05-25 17:47:20 +08:00
639c7ee7fb [fix](decimalv2) fix scale of decimalv2 to string (#35222) (#35359)
* [fix](decimalv2) fix scale of decimalv2 to string
2024-05-24 17:20:43 +08:00
309503855e [Fix](bloom filter) Fix bloom filter memory leak (#34871)
* Issue: Doris occasionally encounters an issue where memory usage becomes exceptionally high and does not decrease. The leaked memory is occupied by Bloom filters stored in memory.

Reason: The segment cache stores segment objects read from files into memory. It functions as an LRU cache with an eviction strategy: when the number of segments exceeds the maximum number, or the total memory size of segment objects in the cache exceeds the maximum usage, it evicts the older segments. However, there is a piece of logic in the code that first reads the segment object into memory, assuming it occupies memory size A, then places the read segment object into the cache (at this point, the cache considers the segment object size to be A). It then reads the segment's Bloom filter from the file and assigns it to the segment's Bloom filter member variable, assuming the Bloom filter occupies memory size B. Thus, the total size of the segment object at this point is A+B. However, the cache does not update this size, leading to the actual size of the segment object stored in the cache (A+B) being larger than the size considered by the cache (A). When the number of segment objects in the cache increases to a certain extent, the used memory will surge dramatically. However, the cache does not perceive the size as reaching the eviction limit, so it does not evict the segment objects. In such cases, a memory leak issue arises.

Solution: Since each segment object only reads the Bloom filter once, the issue can be resolved by changing the logic from reading the segment, placing it into the cache, and then reading the Bloom filter to reading the segment, reading the Bloom filter, and then placing it into the cache.
2024-05-24 16:23:58 +08:00
a6f7747d29 [feature](datatype) add BE config to allow zero date (#34961)
Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
2024-05-23 19:12:39 +08:00
c23384ff07 [fix](decimal) Fix long string casting to decimalv2 (#35121) 2024-05-22 14:32:29 +08:00
98f8eb5c43 [opt](split) get file splits in batch mode (#34032) (#35107)
bp  #34032
2024-05-21 22:27:07 +08:00
b4a798240a [fix](inverted_index) donot use int32_t for index id to avoid overflow (#35062) 2024-05-21 12:58:38 +08:00
e3e5f18f26 [Fix](Json type) correct cast result for json type (#34764) 2024-05-18 18:40:17 +08:00
eb7eaee386 [fix](function) money format (#34680) 2024-05-18 18:35:29 +08:00
1a24895257 [opt](routine-load) optimize routine load task thread pool and related param(#32282) (#34896) 2024-05-15 12:42:02 +08:00
95b05928fd [fix](compaction) fix time series compaction merge empty rowsets priority #34562 (#34765) 2024-05-14 09:10:09 +08:00
0ae1b9c70a [chore](remove code) Remove dragonbox related (#34528)
* Revert "[refactor](mysql result format) use new serde framework to tuple convert (#25006)"

This reverts commit e5ef0aa6d439c3f9b1f1fe5bc89c9ea6a71d4019.

* run buildall

* MORE

* FIX
2024-05-13 22:16:57 +08:00
32cbd4a583 [chore](status) unify error code between thrift,pb, status.h (#34397)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-05-10 14:41:01 +08:00
9b712b03b4 [FIX]fix is_ip_address_in_range func with const param (#34266) 2024-05-10 14:37:20 +08:00
8fdfbcb3c4 Revert "[Opt](func) opt the percentile func performance (#34373) (#34416)"
This reverts commit 509ae425e416b4779ae94eab9c2b21f9850e03c3.
2024-05-07 07:23:48 +08:00
f7900b53ce [enhancement](function) floor/ceil/round/round_bankers can use column as scale argument (#34391) 2024-05-06 22:18:36 +08:00
509ae425e4 [Opt](func) opt the percentile func performance (#34373) (#34416) 2024-05-06 20:10:35 +08:00
0f0c0a266b [opt](parquet)Skip page with offset index (#33082)
Make skip_page() in ColumnChunkReader more efficient. No more reading page headers if there are pagelocations in chunk.
2024-04-26 15:06:16 +08:00
c631f4f8a8 [fix](schema change) resolve the use count check of source logical column (#33932)
Fix error like:
```
8# google::LogMessageFatal::~LogMessageFatal() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
 9# doris::vectorized::Block::clear_column_data(int) in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
10# doris::vectorized::ParquetReader::get_next_block(doris::vectorized::Block*, unsigned long*, bool*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/parquet/vparquet_reader.cpp:514
11# doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/vfile_scanner.cpp:333
12# doris::vectorized::VScanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/vscanner.cpp:132
13# doris::vectorized::VScanner::get_block_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/vscanner.cpp:99
```

Because source logical column is the destination logical column if logical converter is consistent. Previously, the reference of column was reset after the conversion was completed, but if an EOF occurred, it was returned in advance, but EOF is not a true error.
```
if (_logical_converter->is_consistent()) {
            // If logical converter is consistent, _src_logical_column is the final destination column,
            // other components will check the use count
            _src_logical_column.reset();
}
```
2024-04-22 12:31:46 +08:00
7e91e69eb9 [fix](compaction) fix single compaction (#33907)
* [fix](compaction)Fix single compaction to get all local versions #33849

add test and comment

* remove single replica compaction prepare input rowsets

reviesd
2024-04-19 23:30:25 +08:00
ffd9da44a2 [fix](move-memtable) fix commit may fail due to duplicated reports (#32403) 2024-04-19 15:02:49 +08:00
9b7af4c0cf [feature](schema change) unified schema change for parquet and orc reader (#32873)
Following #25138, unified schema change interface for parquet and orc reader, and can be applied to other format readers as well.
Unified schema change interface for all format readers:
- First, read the data according to the column type of the file into source column;
- Second, convert source column to the destination column with type planned by FE.
2024-04-12 15:09:25 +08:00
a4924dabb7 [enhancement](exception) enble exception logic in pipeline execute thread (#33437)
* [enhancement](exception) enble exception logic in pipeline execute thread

* f

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-04-12 15:09:25 +08:00
Pxl
5f30463bb3 [Chore](descriptors) remove unused codes for descriptors (#33408)
remove unused codes for descriptors
2024-04-12 15:09:25 +08:00
26d9082b9a [Feature](function) Add function strcmp (#33272) 2024-04-12 15:09:25 +08:00
31984bb4f0 [feature](function) support quote string function #33055 2024-04-12 15:09:25 +08:00
3d66723214 [branch-2.1](auto-partition) pick auto partition and some more prs (#33523) 2024-04-11 17:12:17 +08:00
Pxl
5688c28364 [Bug](runtime-filter) try to fix heap use after free on runtime filter send filter size (#33465) (#33522) 2024-04-11 13:10:24 +08:00
bc929686e3 [feature](debug point) add macro DBUG_RUN_CALLBACK (#33407) 2024-04-11 09:31:50 +08:00
ef26479282 [improve](serde) support complex type in write/read pb serde (#33124)
support complex type and ip/jsonb in DataTypeSerDe::write_column_to_pb/read_column_from_pb function
2024-04-11 09:31:50 +08:00
e2ad7149c3 [feature](debug point) Add handler to debug point (#33350) 2024-04-10 16:24:13 +08:00
Pxl
8fd6d4c41b [Chore](build) add -Wconversion and remove some unused code (#33127)
add -Wconversion and remove some unused code
2024-04-10 15:26:08 +08:00
c61d6ad1e2 [Feature] support function uuid_to_int and int_to_uuid #33005 2024-04-10 14:53:56 +08:00