doris

Author	SHA1	Message	Date
Yongqiang YANG	1228995dec	[improvement](segment) reduce memory footprint of column_reader and segment (#24140 )	2023-09-11 21:54:00 +08:00
zzzxl	153c7982f3	[Optimize](invert index) Optimize multiple terms conjunction query (#23871 )	2023-09-09 01:52:58 +08:00
plat1ko	09bcedb116	[feature](merge-cloud) Remove deprecated old cache (#23881 ) * Remove deprecated old cache	2023-09-06 08:07:05 +08:00
airborne12	347cceb530	[Feature](inverted index) push count on index down to scan node (#22687 ) Co-authored-by: airborne12 <airborne12@gmail.com>	2023-09-02 22:24:43 +08:00
bobhan1	e05a0466f2	[improve](Status) Add new status code`KEY_NOT_FOUND` and `KEY_ALREADY_EXISTS` for merge on write (#23619 )	2023-08-30 08:50:07 +08:00
Mingyu Chen	2678afd2db	[fix][improvement](fs) add HdfsIO profile and modification time (#21638 ) Refactor the interface of create_file_reader the file_size and mtime are merged into FileDescription, not in FileReaderOptions anymore. Now the file handle cache can get correct file's modification time from FileDescription. Add HdfsIO for hdfs file reader pick from [Enhancement](multi-catalog) Add hdfs read statistics profile. #21442	2023-07-08 14:49:44 +08:00
airborne12	9d2f879bd2	[Enhancement](inverted index) make InvertedIndexReader shared_from_this (#21381 ) This PR proposes several changes to improve code safety and readability by replacing raw pointers with smart pointers in several places. use enable_factory_creator in InvertedIndexIterator and InvertedIndexReader, remove explicit new constructor. make InvertedIndexReader shared_from_this, it may desctruct when InvertedIndexIterator use it.	2023-07-06 11:52:59 +08:00
zhannngchen	85ce6a22c0	[enhancement](merge-on-write) some misc optimizations (#21039 )	2023-06-21 16:16:06 +08:00
Yongqiang YANG	87e3a79387	[enhancement](pk) add bvar latency recorder for pk (#20942 )	2023-06-19 15:29:42 +08:00
Xin Liao	48065fce19	[bugfix](merge-on-write) optimize rowset tree and tablet header lock (#20911 )	2023-06-18 19:26:02 +08:00
zhannngchen	15b9830859	[fix](partial-update) sequence column is not proceeded correctly #20813 When checking the keys in PrimaryKeyIndex, seq_col_length is not set to correct value, then we got a NOT_FOUND result for an existing key.	2023-06-15 14:07:00 +08:00
lihangyu	ab8125d56f	[Improve](performance) introduce SchemaCache to cache TabletSchame & Schema (#20037 ) * [Improve](performance) introduce SchemaCache to cache TabletSchame & Schema 1. When the system is under high-concurrency load with wide table point queries, the frequent memory allocation and deallocation of Schema become evident system bottlenecks. Additionally, the initialization of TabletSchema and Schema also becomes a CPU hotspot.Therefore, the introduction of a SchemaCache is implemented to cache these resources for reuse. 2. Make some variables wrapped with std::unique<unique_ptr> Performance: \| 状态 \| QPS \| 平均响应时间 (avg) \| P99 响应时间 \| \|------------------\|-----\|------------------\|-------------\| \| 开启 SchemaCache \| 501 \| 20ms \| 34ms \| \| 关闭 SchemaCache \| 321 \| 31ms \| 61ms \| * handle schema change with schema version * remove useless header * rebase	2023-05-29 17:34:53 +08:00
Xinyi Zou	16f5d3d5b3	[Improvement](memory) new page use Allocator (#19472 )	2023-05-16 19:09:17 +08:00
yixiutt	aef9355cd3	[feature-wip](partial update) PART1: support basic partial write (#17542 )	2023-04-28 17:17:57 +08:00
Yongqiang YANG	6eb12640a1	[fix](segment_iter) do not init segment_iterator twice (#18337 ) * [fix](segment_iter) do not init segment_iterator twice SegmentIterator::init is called by Segment::new_iterator and BetaRowsetReader::get_segment_iterators twice.	2023-04-27 09:51:57 +08:00
yiguolei	3736530585	[refactor](query context) rename query fragments context to query context and make query context safe (#18950 ) * [refactor](query context) rename query fragments context to query context and make query context safe --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-04-23 22:53:56 +08:00
Adonis Ling	e412dd12e8	[chore](build) Use include-what-you-use to optimize includes (PART II) (#18761 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-19 23:11:48 +08:00
zxealous	e3ff2e3d21	[fix](file cache) Fix be core while use block/whole/sub file cache (#18440 ) BE will core dump while use whole/sub file cache. Call func CachedRemoteFileReader/WholeFileCache/SubFileCache::read_at_impl() did not pass IOContext when reading segment footer.	2023-04-07 16:39:59 +08:00
Mingyu Chen	cb79e42e5c	[refactor](file-system)(step-1) refactor file sysmte on BE and remove storage_backend (#17586 ) See #17764 for details I have tested: - Unit test for local/s3/hdfs/broker file system: be/test/io/fs/file_system_test.cpp - Outfile to local/s3/hdfs/broker. - Load from local/s3/hdfs/broker. - Query file on local/s3/hdfs/broker file system, with table value function and catalog. - Backup/Restore with local/s3/hdfs/broker file system Not test: - cold & host data separation case.	2023-03-21 21:08:38 +08:00
xueweizhang	e0cd8599d2	[fix](delete) fix delete from bug which can get wrong result (#17146 ) 理论上，如果是两次独立的删除，比如delete from table where a=1; delete from table where a=2;其实这个地方应该可以使用的，但是目前的代码，是把所有不同版本的delete predicates和不同列的delete predicates都放到一起了，失去了版本信息、失去了谓词间可能是and的关系，统一弱化成了delete predicates都是独立的，有一个delete predicates满足条件，就把page都去掉。这个pr的修改方式，就是在当前代码的基础上，当只有一个delete predicate的时候才能保证后续淘汰page的正确性，所以这里一律加了 == 1的判断才传递delete predicates。如果要把不同版本的delete predicates和不同列的delete predicates作为完整和严谨的逻辑去判断page，需要修改的设计就有点多了，目前的方案算是一种优先解决bug的思路，后续可以进一步把delete predicates这块加速zone判断进行page淘汰的逻辑完善，提高delete predicates使用的场景。	2023-02-28 09:20:10 +08:00
Xinyi Zou	b194a7cf83	[improvement](memory) Support GC segment cache, when memory insufficient (#16987 ) fix segment cache memory tracker statistics support GC	2023-02-22 18:31:20 +08:00
Xin Liao	c98a0bf803	[Enchancement](merge-on-write) check the correctness of rowid conversion after compaction (#16689 ) MoW updates the delete bitmap of the imported data during the compaction by rowid conversion. The correctness of rowid conversion is very important to the result of delete bitmap. So I add a rowid conversion result check.	2023-02-20 16:27:18 +08:00
TengJianPing	9b8c91e18c	[improvement](rowset reader) fix possible memleak (#16680 ) * [improvement](rowset reader) fix possible memleak * fix be UT	2023-02-15 11:13:31 +08:00
Kang	aba843bb2b	[Improvement](inverted index) inverted index query match bitmap cache (#16578 ) Add cache for inverted index query match bitmap to accelerate common query keyword, especially for keyword matching many rows. Tests result: - large result: matching 99% out of 247 million rows shows 8x speed up. - small result: matching 0.1% out of 247 million rows shows 2x speed up.	2023-02-11 13:38:58 +08:00
lihangyu	1d8265c5a3	[refactor](row-store) make row store column a hidden column in meta (#16251 ) This could simplfy storage engine logic and make code more readable, and we could analyze the hidden `__DORIS_ROW_STORE_COL__` length etc..	2023-02-02 20:56:13 +08:00
yiguolei	5eaa995704	[refactor](some mempool) not memset 0 in default value iterator (#16194 ) --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-29 22:50:39 +08:00
lihangyu	116e17428b	[Enhancement](point query optimize) improve performace of point query on primary keys (#15491 ) 1. support row format using codec of jsonb 2. short path optimize for point query 3. support prepared statement for point query 4. support mysql binary format	2023-01-20 13:33:01 +08:00
pengxiangyu	58c520dbfd	[Feature](remote) Cooldown cold data to object storage only one replica (#15832 )	2023-01-14 23:58:00 +08:00
Tiewei Fang	f17d69e450	[feature](file cache)Import `file cache` for remote file reader (#15622 ) The main purpose of this pr is to import `fileCache` for lakehouse reading remote files. Use the local disk as the cache for reading remote file, so the next time this file is read, the data can be obtained directly from the local disk. In addition, this pr includes a few other minor changes Import File Cache: 1. The imported `fileCache` is called `block_file_cache`, which uses lru replacement policy. 2. Implement a new FileRereader `CachedRemoteFilereader`, so that the logic of `file cache` is hidden under `CachedRemoteFilereader`. Other changes: 1. Add a new interface `fs()` for `FileReader`. 2. `IOContext` adds some statistical information to count the situation of `FileCache` Co-authored-by: Lightman <31928846+Lchangliang@users.noreply.github.com>	2023-01-10 12:23:56 +08:00
Kang	9d1f02c580	[Improvement](topn) runtime prune for topn query (#15558 )	2023-01-05 20:10:12 +08:00
Xin Liao	cc7a9d92ad	[refactor](non-vec) remove non vec code for indexed column reader (#15409 )	2022-12-30 23:01:54 +08:00
YueW	edecc2e706	[feature-wip](inverted index) API for inverted index reader and syntax for fulltext match (#14211 ) * [feature-wip](inverted index)inverted index api: reader * [feature-wip](inverted index) Fulltext query syntax with MATCH/MATCH_ALL/MATCH_ALL * [feature-wip](inverted index) Adapt to index meta * [enhance] add more metrics * [enhance] add fulltext match query check for column type and index parser * [feature-wip](inverted index) Support apply inverted index in compound predicate which except leaf node of and node	2022-12-30 21:48:14 +08:00
Mingyu Chen	29492f0d6c	[refactor](file-cache) refactor the file cache interface (#15398 ) Refactor the usage of file cache ### Motivation There may be many kinds of file cache for different scenarios. So the logic of the file cache should be hidden inside the file reader, so that for the upper-layer caller, the change of the file cache does not need to modify the upper-layer calling logic. ### Details 1. Add `FileReaderOptions` param for `fs->open_file()`, and in `FileReaderOptions` 1. `CachePathPolicy` Determine the cache file path for a given file path. We can implement different `CachePathPolicy` for different file cache. 2. `FileCacheType` Specified file cache type: SUB_FILE_CACHE, WHOLE_FILE_CACHE, FILE_BLOCK_SIZE, etc. 2. Hide the cache logic inside the file reader. The `RemoteFileSystem` will handle the `CacheOptions` and determine whether to return a `CachedFileReader` or a `RemoteFileReader`. And the file cache is managed by `CachedFileReader`	2022-12-29 12:15:46 +08:00
Xinyi Zou	cffdeff4ec	[fix](memory) Fix memory leak by calling boost::stacktrace (#14269 ) boost::stacktrace::stacktrace() has memory leak, so use glog internal func to print stacktrace. The reason for the memory leak of boost::stacktrace is that a state is saved in the thread local of each thread but not actively released. The test found that each thread leaked about 100M after calling boost::stacktrace. refer to: boostorg/stacktrace#118 boostorg/stacktrace#111	2022-11-15 08:58:57 +08:00
pengxiangyu	d55faa7f6a	[feature](remote)Only query can use local cache when reading remote files. (#13865 ) When calling select on remote files, download cache files to local disk. When calling alter table on remote files, read files directly from remote storage. So if tablet is too large, it will not take up too many local disk when creating local cache file.	2022-11-14 10:30:15 +08:00
Xinyi Zou	3bc26f773d	[hotfix](memtracker) Fix expired `DCHECK(_limit != -1);` and segment_meta_mem_tracker inelegant end (#14223 )	2022-11-13 17:15:29 +08:00
Pxl	9d8b4bc176	[Enhancement](Dictionary-codec) update dict once on same segment (#13936 ) update dict once on same segment	2022-11-08 10:59:35 +08:00
pengxiangyu	eab8876abc	[Feature](remote) Using heavy schema change if the table is not enable light weight schema change (#13487 )	2022-10-28 15:48:22 +08:00
zxealous	a83eaddfcf	[test](cache)Add remote cache ut (#13377 )	2022-10-16 23:59:50 +08:00
HappenLee	f7e3ca29b5	[Opt](Vectorized) Support push down no grouping agg (#12803 ) Support push down no grouping agg	2022-09-23 18:29:54 +08:00
pengxiangyu	c5481dfdf7	[fix](remote)Fix bug for Segment::open() in case: config::file_cache_type (#12249 ) * fix bug for Segment::open() in case: config::file_cache_type * fix bug for Segment::open() in case: config::file_cache_type	2022-09-01 14:16:41 +08:00
Gabriel	5f7d6e8f2b	[Refactor](predicate) Unify Conditions and ColumnPredicate (#11985 )	2022-08-29 12:11:22 +08:00
pengxiangyu	a6e2e2f3bc	[feature](remote)Add cache files cleaner for remote olap files (#11959 )	2022-08-26 23:59:36 +08:00
yixiutt	11dc5cad83	[feature-wip](unique-key-merge-on-write) add min/max key in segment (#11830 ) some feature: 1. add min max key in segment footer to speed up get_row_ranges_by_keys 2. do not load pk bloom filter in query Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-08-17 18:11:39 +08:00
Xin Liao	12c4d1f4dd	[feature-wip](unique-key-merge-on-write) unique key table with MOW supports sequence column (#11808 )	2022-08-17 10:56:14 +08:00
pengxiangyu	e5c2bb9699	[fix](remote)Fix bug for Cache Reader (#11629 )	2022-08-12 13:40:32 +08:00
Lightman	b5531c5caf	[BugFix](BE) fix condition index doesn't match (#11474 ) * [BugFix](Be) fix condition index doesn't match	2022-08-05 07:57:18 +08:00
Xinyi Zou	346fdeeee0	[fix](ut) Fix BE UT BetaRowsetTest failed (#11500 )	2022-08-04 17:57:57 +08:00
pengxiangyu	a943adac1a	[feature](cache) Add FileCache for RemoteFile (#11186 ) Add FileCache for RemoteFile, it will be opened in StoragePolicy. Cold data in remote file will be download to local cache files.	2022-08-04 10:57:32 +08:00
Lightman	b35daf0a04	[improvement](light-schema-change) Support tablet schema cache (#11131 )	2022-08-01 12:18:00 +08:00

1 2

99 Commits