doris

Author	SHA1	Message	Date
zuochunwei	5f50d9ae3b	predicate test bugfix (#8134 ) Co-authored-by: zuochunwei <zuochunwei@meituan.com>	2022-02-19 12:05:26 +08:00
Zhengguo Yang	50864aca7d	[refactor] fix warings when compile with clang (#8069 )	2022-02-19 11:29:02 +08:00
yiguolei	aea3e4e59b	[refactor] Remove version hash from BE and related test in BE (#8027 )	2022-02-14 09:29:27 +08:00
Pxl	64f71ddae3	[fix](be-ut) fix segmentation fault at unaligned address int128 (#8021 )	2022-02-14 09:29:05 +08:00
yiguolei	7d7e3a39f5	[refactor] Remove snapshot converter and unused Protobuf Definitions (#8026 ) 1. remove snapshot converter 2. remove unused protobuf definitions 3. move some macro as const variables	2022-02-12 16:06:04 +08:00
Mingyu Chen	c0e59e59aa	[fix][refactor] fix bugs and refactor some code by lint (#7871 ) 1. Fix some `passedByValue` issues. 2. Fix some `dereferenceBeforeCheck` issues. 3. Fix some `uninitMemberVar` issues. 4. Fix some iterator `eraseDereference` issues. 5. Fix compile issue introduced from #7923 #7905 #7848	2022-02-01 14:31:14 +08:00
zhangstar333	fb6e22f4ca	[Fix] fix memory leak in be unit test (#7857 ) 1. fix be unit test memory leak 2. ignore mindump test with ASAN test	2022-01-29 01:00:38 +08:00
Pxl	cd73a6b84b	[chore] fix clang compile error (#7883 )	2022-01-26 12:53:35 +08:00
wangbo	cf02e43ec1	[improvement](vectorized) optimize dict read (#7805 )	2022-01-22 10:18:30 +08:00
Amos Bird	800a36343a	[chore] Prolog of hermetic build with GCC 11 and Clang 13. (#7712 ) Prepare to generate hermetic build using GCC 11 and Clang 13. The ideal toolchain would be ldb toolchain generated by [ldb_toolchain_gen.sh](https://github.com/amosbird/ldb_toolchain_gen/releases/download/v0.3/ldb_toolchain_gen.sh) To kick off a clang build, set `DORIS_TOOLCHAIN=clang` before running any build scripts.	2022-01-21 12:12:04 +08:00
Mingyu Chen	ef984a6a72	[improvement](load) Improve load fault tolerance (#7674 ) Currently, if we encounter a problem with a replica of a tablet during the load process, such as a write error, rpc error, -235, etc., it will cause the entire load job to fail, which results in a significant reduction in Doris' fault tolerance. This PR mainly changes: 1. refined the judgment of failed replicas in the load process, so that the failure of a few replicas will not affect the normal completion of the load job. 2. fix a bug introduced from #7754 that may cause BE coredump	2022-01-20 09:23:21 +08:00
HappenLee	e1d7233e9c	[feature](vectorization) Support Vectorized Exec Engine In Doris (#7785 ) # Proposed changes Issue Number: close #6238 Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com> Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com> Co-authored-by: wangbo <506340561@qq.com> Co-authored-by: emmymiao87 <522274284@qq.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com> Co-authored-by: thinker <zchw100@qq.com> Co-authored-by: Zeno Yang <1521564989@qq.com> Co-authored-by: Wang Shuo <wangshuo128@gmail.com> Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> Co-authored-by: xinghuayu007 <1450306854@qq.com> Co-authored-by: weizuo93 <weizuo@apache.org> Co-authored-by: yiguolei <guoleiyi@tencent.com> Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com> Co-authored-by: awakeljw <993007281@qq.com> Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com> Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com> ## Problem Summary: ### 1. Some code from clickhouse ClickHouse is an excellent implementation of the vectorized execution engine database, so here we have referenced and learned a lot from its excellent implementation in terms of data structure and function implementation. We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers. The following comment has been added to the code from Clickhouse, eg: // This file is copied from // https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h // and modified by Doris ### 2. Support exec node and query: * vaggregation_node * vanalytic_eval_node * vassert_num_rows_node * vblocking_join_node * vcross_join_node * vempty_set_node * ves_http_scan_node * vexcept_node * vexchange_node * vintersect_node * vmysql_scan_node * vodbc_scan_node * volap_scan_node * vrepeat_node * vschema_scan_node * vselect_node * vset_operation_node * vsort_node * vunion_node * vhash_join_node You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set. ### 3. Data Model Vec Exec Engine Support Dup/Agg/Unq table, Support Block Reader Vectorized. Segment Vec is working in process. ### 4. How to use 1. Set the environment variable `set enable_vectorized_engine = true; `(required) 2. Set the environment variable `set batch_size = 4096; ` (recommended) ### 5. Some diff from origin exec engine https://github.com/doris-vectorized/doris-vectorized/issues/294 ## Checklist(Required) 1. Does it affect the original behavior: (No) 2. Has unit tests been added: (Yes) 3. Has document been added or modified: (No) 4. Does it need to update dependencies: (No) 5. Are there any changes that cannot be rolled back: (Yes)	2022-01-18 10:07:15 +08:00
pengxiangyu	20ef8a6e21	[feature-wip](remote storage)(step1) use a struct instead of string for parameter path, add basic remote method (#7098 ) For the first, we need to make a parameter to discribe the data is local or remote. At then, we need to support some basic function to support the operation for remote storage.	2021-12-22 22:58:23 +08:00
Dayue Gao	414c5a8b5a	[fix] LRUCache::prune_if may not remove all the entries matching the predicate (#7383 ) [fix] LRUCache::prune_if may not remove all the entries matching the predicate Co-authored-by: gaodayue <gaodayue@bytedance.com>	2021-12-13 21:09:47 +08:00
xinghuayu007	dd36ccc3bf	[feature](storage-format) Z-Order Implement (#7149 ) Support sort data by Z-Order: ``` CREATE TABLE table2 ( siteid int(11) NULL DEFAULT "10" COMMENT "", citycode int(11) NULL COMMENT "", username varchar(32) NULL DEFAULT "" COMMENT "", pv bigint(20) NULL DEFAULT "0" COMMENT "" ) ENGINE=OLAP DUPLICATE KEY(siteid, citycode) COMMENT "OLAP" DISTRIBUTED BY HASH(siteid) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "data_sort.sort_type" = "ZORDER", "data_sort.col_num" = "2", "in_memory" = "false", "storage_format" = "V2" ); ```	2021-12-02 11:39:51 +08:00
Pxl	a74fdf184c	[refactor](be) refactor predicate function creator (#7054 ) Refactor predicate function creator, make MinMaxFunction/HybridSet/BloomFilter use a unified interface through template to get function.	2021-11-24 10:39:29 +08:00
Zhengguo Yang	e2d3d0134e	dd a method to get doris current memory usage (#6979 ) Add all memory usage check when TryConsume memory	2021-11-24 10:07:54 +08:00
Zhengguo Yang	6c6380969b	[refactor] replace boost smart ptr with stl (#6856 ) 1. replace all boost::shared_ptr to std::shared_ptr 2. replace all boost::scopted_ptr to std::unique_ptr 3. replace all boost::scoped_array to std::unique<T[]> 4. replace all boost:thread to std::thread	2021-11-17 10:18:35 +08:00
Zhengguo Yang	4bc5ba8819	mark the load job fail when more than a half of replica write failed of a tablet, (#7126 ) the code before is counting all replica has more than a half write failed.	2021-11-17 10:18:04 +08:00
Mingyu Chen	ed7a873a44	[Memory Usage] Implement segment lru cache to save memory of BE (#6829 )	2021-10-25 10:07:15 +08:00
Mingyu Chen	63dbcbc4e1	[UT] Fix ut bugs (#6862 ) Co-authored-by: morningman <chenmingyu@baidu.com>	2021-10-18 10:12:55 +08:00
Zhengguo Yang	24d38614a0	[Dependency] Upgrade thirdparty libs (#6766 ) Upgrade the following dependecies: libevent -> 2.1.12 OpenSSL 1.0.2k -> 1.1.1l thrift 0.9.3 -> 0.13.0 protobuf 3.5.1 -> 3.14.0 gflags 2.2.0 -> 2.2.2 glog 0.3.3 -> 0.4.0 googletest 1.8.0 -> 1.10.0 snappy 1.1.7 -> 1.1.8 gperftools 2.7 -> 2.9.1 lz4 1.7.5 -> 1.9.3 curl 7.54.1 -> 7.79.0 re2 2017-05-01 -> 2021-02-02 zstd 1.3.7 -> 1.5.0 brotli 1.0.7 -> 1.0.9 flatbuffers 1.10.0 -> 2.0.0 apache-arrow 0.15.1 -> 5.0.0 CRoaring 0.2.60 -> 0.3.4 orc 1.5.8 -> 1.6.6 libdivide 4.0.0 -> 5.0 brpc 0.97 -> 1.0.0-rc02 librdkafka 1.7.0 -> 1.8.0 after this pr compile doris should use build-env:1.4.0	2021-10-15 13:03:04 +08:00
Pxl	4dd610c28d	[Feature] Support for storage layer benchmark (#6506 ) * add benchmark tool	2021-09-02 09:57:19 +08:00
Zhengguo Yang	8738ce380b	Add long text type STRING, with a maximum length of 2GB. Usage is similar to varchar, and there is no guarantee for the performance of storing extremely long data (#6391 )	2021-08-18 09:05:40 +08:00
HappenLee	9216735cfa	[New Featrue] Support Vectorization Execution Engine Interface For Doris (#6329 ) 1. FE vectorized plan code 2. Function register vec function 3. Diff function nullable type 4. New thirdparty code and new thrift struct	2021-08-11 14:54:06 +08:00
HappenLee	6597a338dc	[Feature] Support config max length of zone map index (#6293 )	2021-07-30 09:23:11 +08:00
stdpain	776df2effc	[BUG][stack-buffer-overflow] fix overflow while calculate hash code in ArrayType and fix some warning	2021-07-27 13:41:00 +08:00
huangmengbin	2d78c31d49	[Enhance] improve performance of init_scan_key by sharing the schema (#6099 ) Co-authored-by: huangmengbin <huangmengbin@bytedance.com>	2021-07-21 10:50:31 +08:00
Mingyu Chen	68f988b78a	[Optimize] Use flat_hash_set to replace unorderd_set in InPredicate (#6216 ) Co-authored-by: chenmingyu <chenmingyu@baidu.com>	2021-07-15 11:15:11 +08:00
Zhengguo Yang	ed3ff470ce	[ARRAY] Support array type load and select not include access by index (#5980 ) This is part of the array type support and has not been fully completed. The following functions are implemented 1. fe array type support and implementation of array function, support array syntax analysis and planning 2. Support import array type data through insert into 3. Support select array type data 4. Only the array type is supported on the value lie of the duplicate table this pr merge some code from #4655 #4650 #4644 #4643 #4623 #2979	2021-07-13 14:02:39 +08:00
stdpain	290a844e04	[optimize] Optimize bloomfilter performance (#6180 ) refactor runtime filter bloomfilter and eliminate some virtual function calls which obtained a performance improvement of about 5% import block bloom filter, for avx version obtained 40% performance improvement before: bloomfilter size:default, about 2000W item cost about 1s400ms after: bloomfilter size:524288, about 2000W item cost about 400ms	2021-07-10 10:12:12 +08:00
Zhengguo Yang	739c0268ff	[refactor] Remove decimal v1 related code from code base (#6079 ) remove ALL DECIMAL V1 type code ， this is a part of #6073	2021-07-07 10:26:32 +08:00
stdpain	149def9e42	[Feature] Support RuntimeFilter in Doris (BE Implement) (#6077 ) 1. support in/bloomfilter/minmax 2. support broadcast/shuffle/bucket shuffle/colocate join 3. opt memory use and cpu cache miss while build runtime filter 4. opt memory use in left semi join (works well on tpcds-95)	2021-07-04 20:59:05 +08:00
Zhengguo Yang	fe65a623c1	Fix timeout error when delete condition contains invalid datetime format (#6030 ) * add date time format check in delete statment	2021-06-29 09:47:42 +08:00
Mingyu Chen	5cfe081b05	[Bug] Remove duplicate memtracker (#6041 ) * [Enhanece] Remove duplicate memtracker This problem will cause frequent creation of memtracker and affect query concurrency.	2021-06-18 11:28:37 +08:00
Mingyu Chen	d57c2344e1	[MemTracker] Refactored the hierarchical structure of memtracker (#5956 ) To avoid showing too many memtracker on BE web pages. The MemTracker level now has 3 levels: OVERVIEW, TASK and VERBOSE. OVERVIEW Mainly used for main memory consumption module such as Query/Load/Metadata. TASK is mainly used to record the memory overhead of a single task such as a single query, load, and compaction task. VERBOSE is used for other more detailed memtrackers.	2021-06-16 09:44:24 +08:00
Yingchun Lai	6d6c3d9703	[Enhancement] Reduce memory consumption by releasing readers earier (#5811 ) We created multiple rowset readers to read data of one tablet, after one rowset reader has reached EOF, it can be released to reduce resource (typically memory) consumption. As the same, we can release segment reader when it reach EOF.	2021-06-16 09:37:50 +08:00
Lijia Liu	4d64612b96	[ARRAY]Save array's size instead of offset. (#5983 ) * Save array's size instead of offset. * Optimize variable name * Fix comment	2021-06-10 12:32:58 +08:00
Mingyu Chen	81ecf3d097	[Bug] Rebuilt version graph of a tablet when there are too many orphan vertex (#5945 ) The version information of the tablet will be stored in the memory in an adjacency graph data structure. And as the new version is written and the old version is deleted, the data structure will begin to have empty vertex with no edge associations(orphan vertex). These orphan vertexs should be removed somehow.	2021-06-03 09:59:20 +08:00
weizuo93	e519a24c9a	dynamic adjust compaction policy (#5651 ) Co-authored-by: weizuo <weizuo@xiaomi.com>	2021-04-26 12:39:13 +08:00
Yingchun Lai	be733cfa9c	[Metrics] Add some large memtrackers' metric (#5614 ) MemTracker can provide memory consumption for us to find out which module consume more memory, but it's just a current value, this patch add metrics for some large memory consumers, then we can find out which module consume more memory in timeline, it would be useful to troubleshoot OOM problems and optimize configs.	2021-04-21 09:15:04 +08:00
HappenLee	b423274f17	[Enhance] Make MemTracker more accurate (#5515 ) (#5516 ) * [Enhance] Make MemTracker more accurate (#5515) This PR main about: 1. Improve the readability of MemTrackers' name 2. Add the MemTracker of: * Load * Compaction * SchemaChange * StoragePageCache * TabletManager 3. Change SchemaChange to a Singleon * revise some code for Code Review * change the name of mem_tracker * keep reader_context have the same lifetime of rowset_reader in schema change. * change vlog notice to log(warning) in schema change	2021-04-08 09:14:55 +08:00
Zhengguo Yang	d641a26490	[Refactor] Remove boost filesystem (#5579 ) * use std::filesystem instead of boost Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>	2021-04-08 09:11:59 +08:00
stdpain	bfeb717abe	[Refactor] fix some warning in gcc higher than 7 make decimal12_t as a POD type (#5547 )	2021-03-23 09:37:10 +08:00
Yingchun Lai	8ead0aaad8	[Enhance] Sort directories by available space when do trash sweep (#5498 ) * [Enhance] Sort directories by available space when do trash sweep In the case when one disk is about to be full, we want to sweep trash data on this disk as quickly as possible. The currently trash sweep function is to remove trashed files order by path's name, however, disk data directories may have some large different available space because of the load balance algorithm, this patch improve it to remove files by directories' available space. * add log	2021-03-12 13:43:27 +08:00
HappenLee	462efeaf39	[Performance Optimization and Refactor] (#5358 ) (#5364 ) 1. Add BlockColumnPredicate support OR and AND column predicate in RowBlockV2 2. Support evaluate vectorization delete predicate in storage engine not in Reader in SegmentV2	2021-02-07 22:41:33 +08:00
Mingyu Chen	a6e2c3e3f1	[Bug][Clone] Fix the bug that incremental clone is not triggered (#5230 ) In version 0.13, we support a more efficient compaction logic. This logic will maintain multiple version paths of the tablet. This can avoid -230 errors and can also support incremental clone. But the previous incremental clone uses the incremental rowset meta recorded in `incr_rs_meta`. At present, the incremental rowset meta recorded in `incr_rs_meta` and the records in `stale_rs_meta` are duplicated, and the current clone logic does not adapt to the new multi-version path, resulting in many cases not triggering incremental clone. This CL mainly modified: 1. Removed `incr_rs_meta` metadata 2. Modified the clone logic. When the clone is incremented, it will try to read the rowset in `stale_rs_meta`. 3. Delete a lot of code that was previously used for version compatibility.	2021-02-06 22:04:48 +08:00
Skysheepwang	6c098e45fc	[Optimize][Cache]Implementation of Separated Page Cache (#5008 ) #4995 Implementation of Separated Page Cache - Add config "index_page_cache_ratio" to set the ratio of capacity of index page cache - Change the member of StoragePageCache to maintain two type of cache - Change the interface of StoragePageCache for selecting type of cache - Change the usage of page cache in read_and_decompress_page in page_io.cpp - add page type as argument - check if current page type is available in StoragePageCache (cover the situation of ratio == 0 or 1) - Add type as argument in superior call of read_and_decompress_page - Change Unit Test	2021-01-04 12:19:24 +08:00
HappenLee	5807413ad0	[UT] Add ut for column predicate of comlumnblock (#5123 ) Add ut for column predicate of ColumnBlock	2021-01-04 09:29:30 +08:00
HuangWei	5e1a80bb22	[UT][Bug] fix LOOP_LESS_OR_MORE (#5157 ) This bug introduced by #5131. When AllowSlowTests() is true, we should loop more.	2020-12-29 09:48:19 +08:00

1 2 3 4 5 ...

263 Commits