Commit Graph

126 Commits

Author SHA1 Message Date
fd62af82d2 [enhancement](mow) Add bvar for bloom filter and segment (#32355) 2024-03-22 08:52:12 +08:00
ecadb60bcd [Pick 2.1](inverted index) support inverted index format v2 (#30145) (#32418) 2024-03-19 08:11:33 +08:00
Pxl
6b08a4ec93 [Bug](top-n) do not get runtime predicate when predicate not initialized #32209 2024-03-14 09:12:09 +08:00
0da010603e [Improve](TabletSchemaCache) reduce duplicated memory consumption for column name and column path (#31141)
Both could be reference to related field in TabletColumn.And use shared_ptr for TabletColumn in TabletSchema for later memory reuse
2024-03-09 19:44:42 +08:00
Pxl
25d1934289 [Feature](topn) support multiple topn filter on backend (#31665)
support multiple topn filter on backend
2024-03-06 13:05:22 +08:00
7d1db6cd1f [refactor](exception safe) Refactor delete handler and block column predicates to make sure exception safe (#31618) 2024-03-01 14:21:17 +08:00
Pxl
d36ad56dce [Opt](Exec) Support runtime update topn filter (#31250) 2024-02-29 12:38:03 +08:00
586217bf73 [Improve](Variant) support prune segment for quering variant (#31310) 2024-02-28 17:52:11 +08:00
a3c78dd21a [chore](refactor) refactor some rf code and delete rpc file (#31031)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-02-18 11:50:17 +08:00
0442d5dc0e [fix](Variant Type) Add sparse columns meta to fix compaction (#28673)
Co-authored-by: eldenmoon <15605149486@163.com>
2024-02-16 10:12:23 +08:00
b23a785775 [Fix](Variant) support materialize view for variant and accessing variant subcolumns (#30603)
* [Fix](Variant) support materialize view for variant and accessing variant subcolumns
1. fix schema change with path lost and lead to invalid data read
2. support element_at function in BE side and use simdjson to parse data
3. fix multi slot expression
2024-02-16 10:12:23 +08:00
e610044bae [Enhancement] (schema) add column type check (#28718) 2023-12-28 17:11:24 +08:00
6d817bc253 [fix](topn opt) avoid using topn runtime predicate which segment does not contain such column(column unique id) when pruning segment (#29148) 2023-12-27 20:31:03 +08:00
e9e1e2894b [performance](variant) support topn 2phase read for variant column (#28318)
[performance](variant) support topn 2phase read for variant column
2023-12-25 11:50:41 +08:00
341822ec05 [regression-test](Variant) add compaction case for variant and fix bugs (#28066) 2023-12-08 12:18:46 +08:00
a7d1e92fc2 [Fix](variant) handle StorageReadOptions to avoid crash in new_column_iterator_with_path (#27936)
In partial update, read variant without `opt` will lead to crash
2023-12-04 17:02:35 +08:00
48935c14e2 [Improvement](variant) limit the column size on tablet schema (#27399) (#27785)
1. limit the column count to default 2048
2. fix get_inverted_index return nullptr when variant's unique id is -1, using it's parent unique id instead
3. avoid add same path subcolumn duplicately in tablet schema
4. make extracted column unique id -1
2023-12-04 14:47:36 +08:00
a2fa0b3745 [compability](segment) fix compability issue introduced by #27676 (#27799)
Prior to PR #27676, data was written with empty path information. Consequently, after implementing #27676, data that already exists in a segment is not included in `column_id_to_footer_ordinal`. This issue will lead to `invalid nonexistent column without default value` error.
2023-11-30 21:24:59 +08:00
7398c3daf1 [Feature-Variant](Variant Type) support variant type query and index (#27676) 2023-11-29 10:37:28 +08:00
553e4a8903 [feature-wip](merge-on-write) MOW table support different primary keys and sort keys (#24788) 2023-11-24 16:37:30 +08:00
c51146df10 [Fix](segment) need to rebuild col_id_to_predicates when true predicates encountered (#25685) 2023-10-22 21:26:52 -05:00
dbf5787682 [fix](be) Make DorisCallOnce's function exception-safe (#25579) 2023-10-18 22:13:30 +08:00
80e5e72202 [fix](scanner) coredump caused by 'prune_predicates_by_zone_map' (#25555) 2023-10-18 16:11:41 +08:00
283bd59eba [improvement](scanner) Remove the predicate that is always true for the segment (#25366)
By utilizing the zonemap index of the segment, we can ascertain if a predicate is always true. For example, if the segment’s maximum value is 100 and the predicate is col < 101, then this predicate is always true for this segment.
2023-10-13 15:25:38 +08:00
b9ddcbf729 [feature](merge-cloud) Rewrite code related to IOContext (#24269) 2023-09-15 19:57:58 +08:00
d8ef9dda59 [feature](merge-cloud) Rewrite FS interface (#23953) 2023-09-12 19:20:25 +08:00
bdacefa734 [Fix](status)Fix leaky abstraction and shield the status code END_OF_FILE from upper layers (#24165) 2023-09-12 11:10:52 +08:00
1228995dec [improvement](segment) reduce memory footprint of column_reader and segment (#24140) 2023-09-11 21:54:00 +08:00
153c7982f3 [Optimize](invert index) Optimize multiple terms conjunction query (#23871) 2023-09-09 01:52:58 +08:00
09bcedb116 [feature](merge-cloud) Remove deprecated old cache (#23881)
* Remove deprecated old cache
2023-09-06 08:07:05 +08:00
347cceb530 [Feature](inverted index) push count on index down to scan node (#22687)
Co-authored-by: airborne12 <airborne12@gmail.com>
2023-09-02 22:24:43 +08:00
e05a0466f2 [improve](Status) Add new status codeKEY_NOT_FOUND and KEY_ALREADY_EXISTS for merge on write (#23619) 2023-08-30 08:50:07 +08:00
2678afd2db [fix][improvement](fs) add HdfsIO profile and modification time (#21638)
Refactor the interface of create_file_reader

the file_size and mtime are merged into FileDescription, not in FileReaderOptions anymore.
Now the file handle cache can get correct file's modification time from FileDescription.
Add HdfsIO for hdfs file reader
pick from [Enhancement](multi-catalog) Add hdfs read statistics profile. #21442
2023-07-08 14:49:44 +08:00
9d2f879bd2 [Enhancement](inverted index) make InvertedIndexReader shared_from_this (#21381)
This PR proposes several changes to improve code safety and readability by replacing raw pointers with smart pointers in several places.

use enable_factory_creator in InvertedIndexIterator and InvertedIndexReader, remove explicit new constructor.
make InvertedIndexReader shared_from_this, it may desctruct when InvertedIndexIterator use it.
2023-07-06 11:52:59 +08:00
85ce6a22c0 [enhancement](merge-on-write) some misc optimizations (#21039) 2023-06-21 16:16:06 +08:00
87e3a79387 [enhancement](pk) add bvar latency recorder for pk (#20942) 2023-06-19 15:29:42 +08:00
48065fce19 [bugfix](merge-on-write) optimize rowset tree and tablet header lock (#20911) 2023-06-18 19:26:02 +08:00
15b9830859 [fix](partial-update) sequence column is not proceeded correctly #20813
When checking the keys in PrimaryKeyIndex, seq_col_length is not set to correct value, then we got a NOT_FOUND result for an existing key.
2023-06-15 14:07:00 +08:00
ab8125d56f [Improve](performance) introduce SchemaCache to cache TabletSchame & Schema (#20037)
* [Improve](performance) introduce SchemaCache to cache TabletSchame & Schema

1. When the system is under high-concurrency load with wide table point queries, the frequent memory allocation and deallocation of Schema become evident system bottlenecks. Additionally, the initialization of TabletSchema and Schema also becomes a CPU hotspot.Therefore, the introduction of a SchemaCache is implemented to cache these resources for reuse.

2. Make some variables wrapped with std::unique<unique_ptr>

Performance:
| 状态              | QPS | 平均响应时间 (avg) | P99 响应时间 |
|------------------|-----|------------------|-------------|
| 开启 SchemaCache | 501 | 20ms             | 34ms        |
| 关闭 SchemaCache | 321 | 31ms             | 61ms        |

* handle schema change with schema version

* remove useless header

* rebase
2023-05-29 17:34:53 +08:00
16f5d3d5b3 [Improvement](memory) new page use Allocator (#19472) 2023-05-16 19:09:17 +08:00
aef9355cd3 [feature-wip](partial update) PART1: support basic partial write (#17542) 2023-04-28 17:17:57 +08:00
6eb12640a1 [fix](segment_iter) do not init segment_iterator twice (#18337)
* [fix](segment_iter) do not init segment_iterator twice

SegmentIterator::init is called by Segment::new_iterator and
BetaRowsetReader::get_segment_iterators twice.
2023-04-27 09:51:57 +08:00
3736530585 [refactor](query context) rename query fragments context to query context and make query context safe (#18950)
* [refactor](query context) rename query fragments context to query context and make query context safe

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-23 22:53:56 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
e3ff2e3d21 [fix](file cache) Fix be core while use block/whole/sub file cache (#18440)
BE will core dump while use whole/sub file cache.
Call func CachedRemoteFileReader/WholeFileCache/SubFileCache::read_at_impl() did not pass IOContext when reading segment footer.
2023-04-07 16:39:59 +08:00
cb79e42e5c [refactor](file-system)(step-1) refactor file sysmte on BE and remove storage_backend (#17586)
See #17764 for details
I have tested:
- Unit test for local/s3/hdfs/broker file system: be/test/io/fs/file_system_test.cpp
- Outfile to local/s3/hdfs/broker.
- Load from local/s3/hdfs/broker.
- Query file on local/s3/hdfs/broker file system, with table value function and catalog.
- Backup/Restore with local/s3/hdfs/broker file system

Not test:
- cold & host data separation case.
2023-03-21 21:08:38 +08:00
e0cd8599d2 [fix](delete) fix delete from bug which can get wrong result (#17146)
理论上,如果是两次独立的删除,比如delete from table where a=1; delete from table where a=2;其实这个地方应该可以使用的,但是目前的代码,是把所有不同版本的delete predicates和不同列的delete predicates都放到一起了,失去了版本信息、失去了谓词间可能是and的关系,统一弱化成了delete predicates都是独立的,有一个delete predicates满足条件,就把page都去掉。
这个pr的修改方式,就是在当前代码的基础上,当只有一个delete predicate的时候才能保证后续淘汰page的正确性,所以这里一律加了 == 1的判断才传递delete predicates。
如果要把不同版本的delete predicates和不同列的delete predicates作为完整和严谨的逻辑去判断page,需要修改的设计就有点多了,目前的方案算是一种优先解决bug的思路,后续可以进一步把delete predicates这块加速zone判断进行page淘汰的逻辑完善,提高delete predicates使用的场景。
2023-02-28 09:20:10 +08:00
b194a7cf83 [improvement](memory) Support GC segment cache, when memory insufficient (#16987)
fix segment cache memory tracker statistics
support GC
2023-02-22 18:31:20 +08:00
c98a0bf803 [Enchancement](merge-on-write) check the correctness of rowid conversion after compaction (#16689)
MoW updates the delete bitmap of the imported data during the compaction by rowid conversion. The correctness of rowid conversion is very important to the result of delete bitmap. So I add a rowid conversion result check.
2023-02-20 16:27:18 +08:00
9b8c91e18c [improvement](rowset reader) fix possible memleak (#16680)
* [improvement](rowset reader) fix possible memleak

* fix be UT
2023-02-15 11:13:31 +08:00