doris

Author	SHA1	Message	Date
zhengyu	62ec74f4e7	segcompaction featuring verticalcompaction (#16731 ) This patchset applies the following changes: using vertical compaction machanism to do segcompaction basic (WIP) refraction to separate segcompaction logic from BetaRowsetWriter add segcompaction specific ut and regression tests	2023-03-01 10:55:40 +08:00
Jerry Hu	a1db5c6f52	[fix](vec) crash caused by not-implemented function in ColumnFixedLengthObject (#17215 )	2023-02-28 15:27:06 +08:00
yiguolei	33acaa067b	[refactor](mempool) remove mempool parameter from key decoder methods (#17137 ) decode method is only used for big int and other decode method is only used in unit test. I remove the useless method and we can remove mempool parameter from decode method. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-02-27 11:16:14 +08:00
amory	7229751bd9	[Improve](map-type) Add contains_null for map (#16948 ) Add contains_null for map type.	2023-02-23 20:47:26 +08:00
zhannngchen	e5f884a6fc	[enhancement](cache) make segment cache prune more effectively (#17011 ) BloomFilter in MoW table may consume lots of memory, and it's life cycle is same as segment. This patch try to improve the efficiency of recycling segment cache, to release the memory in time.	2023-02-23 18:24:18 +08:00
zhannngchen	edead494cb	[Enhancement](storage) add a new hidden column __DORIS_VERSION_COL__ for unique key table (#16509 )	2023-02-23 15:47:17 +08:00
Lijia Liu	8eeb435963	[improvement](meta) Enhance Doris's fault tolerance to disk error (#16472 ) Sense io error. Retry query when io error. Greylist: When finds one disk is completely broken, or the diff of tablet number in BE and FE meta is too large,reduce the query priority of the BE.	2023-02-23 08:40:45 +08:00
Xinyi Zou	b194a7cf83	[improvement](memory) Support GC segment cache, when memory insufficient (#16987 ) fix segment cache memory tracker statistics support GC	2023-02-22 18:31:20 +08:00
Xin Liao	0b624d282d	[enhancement](ut) add merge-on-write ut code back (#16939 )	2023-02-22 16:29:15 +08:00
TengJianPing	5ec8c51366	[fix](union iterator) fix bug that result data order of VUnionIterator is different (#16938 ) Fix bug of #16680, data order of VUnionIterator outout block is changed, which will impact compaction.	2023-02-21 14:17:21 +08:00
ElvinWei	f32cd2c123	[fix](statistics) fix a problem with histogram statistics collection parameters (#16918 ) 1. Fixed a problem with histogram statistics collection parameters. 2. Solved the problem that it takes a long time to collect histogram statistics. TODO: Optimize histogram statistics sampling method and make the sampling parameters effective. The problem is that the histogram function works as expected in the single-node test, but doesn't work in the multi-node test. In addition, the performance of the current support sampling to collect histogram is low, resulting in a large time consumption when collecting histogram information. Fixed the parameter issue and temporarily removed support for sampling to speed up the collection of histogram statistics. Will next support sampling to collect histogram information.	2023-02-20 16:33:18 +08:00
Pxl	2bc014d83a	[Enchancement](function) remove unused params on aggregate function (#16886 ) remove unused params on aggregate function	2023-02-20 11:08:45 +08:00
YueW	30dafd6a44	[improve](inverted index) Add element count limit for inverted index searcher cache (#16758 ) The element in InvertedIndexSearcherCache is inverted index searcher, which is a file descriptor of inverted index file, so InvertedIndexSearcherCache is actually cache file descriptor of inverted index file. If open file descriptor limit of the Linux system is set too small and config inverted_index_searcher_cache_limit is too big, during high pressure load maybe cause "Too many open files". So, when insert inverted index searcher into InvertedIndexSearcherCache, need also check whether reach file_descriptor_number limit for inverted index file.	2023-02-17 11:53:07 +08:00
Ashin Gau	e2245cbdd3	[improvement](filecache) split file cache into sharding directories (#16767 ) Save cached file segment into path like `cache_path / hash(filepath).substr(0, 3) / hash(filepath) / offset` to prevent too many directories in `cache_path`.	2023-02-16 16:04:29 +08:00
Pxl	f50edff59d	[Chore](build) enable fallthrough check annd fix some fallthrough bug (#16748 ) * enable fallthrough check annd fix some fallthrough bug * fix * fix	2023-02-15 15:58:43 +08:00
TengJianPing	9b8c91e18c	[improvement](rowset reader) fix possible memleak (#16680 ) * [improvement](rowset reader) fix possible memleak * fix be UT	2023-02-15 11:13:31 +08:00
zhengshengjun	d013d529c8	[Feature](ipv6)Support IPV6 (#14063 ) Support IPV6 in Apache Doris, the main changes are: 1. enable binding to IPV6 address if network priority in config file contains an IPV6 CIDR string 2. BRPC and HTTP support binding to IPV6 address 3. BRPC and HTTP support visiting IPV6 Services	2023-02-14 21:43:10 +08:00
yiguolei	be9385d40a	[improvement](lock raii) use raii to lock and unlock (#16652 ) * [improvement](lock raii) use raii to lock and unlock This is part of exception safe: #16366. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-02-13 14:06:36 +08:00
Kang	aba843bb2b	[Improvement](inverted index) inverted index query match bitmap cache (#16578 ) Add cache for inverted index query match bitmap to accelerate common query keyword, especially for keyword matching many rows. Tests result: - large result: matching 99% out of 247 million rows shows 8x speed up. - small result: matching 0.1% out of 247 million rows shows 2x speed up.	2023-02-11 13:38:58 +08:00
lihangyu	37d1519316	[WIP](dynamic-table) support dynamic schema table (#16335 ) Issue Number: close #16351 Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.	2023-02-11 13:37:50 +08:00
AlexYue	1f631c388d	[enhance](cooldown)accelerate cooldown task produce efficiency (#16089 )	2023-02-10 16:58:27 +08:00
xy720	1b3902baa2	[Feature](Complex-type) Add struct and map type to Doris (#16444 ) This commit support: 1、Insert + select for struct/map type 2、Json stream load for struct type 3、m[key] function for map type How to use: Set the fe config to create table for struct and map type 1、admin set frontend config("enable_struct_type" = "true"); 2、admin set frontend config("enable_map_type" = "true"); #16547 Co-authored-by: xy720 <xuyang25@baidu.com> Co-authored-by: amory <wangqiannan@selectdb.com> Co-authored-by: cambyzju <zhuxiaoli01@baidu.com> Co-authored-by: hucheng01 <hucheng01@baidu.com>	2023-02-10 11:00:33 +08:00
yiguolei	4fcd6cd236	[refactor](remove unused code) remove load stream mgr (#16580 ) remove old stream load pipe remove old stream load manager --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-02-10 07:46:18 +08:00
yiguolei	d390e63a03	[enhancement](stream receiver) make stream receiver exception safe (#16412 ) make stream receiver exception safe change get_block(block*) to get_block(block , bool* eos) unify stream semantic	2023-02-07 12:44:20 +08:00
yiguolei	6fdd35a6f2	[enhancement](mpp process) remove unused method and make report process more clear (#16441 ) both update status and open_vectorized_internal will call send_report and stop report thread. move update_status code to open method and remove unnecessary send_report and stop_report_thread. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-02-07 12:28:55 +08:00
Xin Liao	2bee26b05a	[fix](merge-on-write) fix that the query result has duplicate keys (#16336 ) * [fix](merge-on-write) fix that the query result has duplicate keys * add ut	2023-02-06 17:09:53 +08:00
plat1ko	bd8ef4edeb	[fix](cooldown) Fix core in remove_all_remote_rowsets (#16374 )	2023-02-04 22:31:38 +08:00
Pxl	5e4bb98900	[Chore](build) enable -Wpedantic and update lowest gcc version to 11.1 (#16290 ) enable -Wpedantic and update lowest gcc version to 11.1	2023-02-03 11:28:48 +08:00
plat1ko	6ee0dbfb23	[fix](cooldown) Fix bugs in cooldown single replica files (#16299 )	2023-02-02 19:31:26 +08:00
zhannngchen	69f34cd1c3	[fix](load) sequence column do not compare correctly in memtable (#16211 )	2023-02-02 11:00:23 +08:00
HappenLee	7c145faa80	[Enhance] use fast_float::from_chars to do str cast to float/double to avoid lose precision (#16190 )	2023-02-01 23:53:34 +08:00
plat1ko	00a598a839	[feature](cooldown) Decouple storage policy and resource (#15873 )	2023-01-31 14:13:47 +08:00
yiguolei	90b12143a3	[refactor](remove unused code) remove runtime tuple structure and useless utils class (#16237 )	2023-01-30 16:45:14 +08:00
yiguolei	4b6a4b3cf7	[refactor](remove unused code) Remove unused mempool declare or function params (#16222 ) * Remove unused mempool declare or function params --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-30 13:03:18 +08:00
WenYao	69e748b076	[fix](schema scanner)change schema_scanner::get_next_row to get_next_block (#15718 )	2023-01-30 10:01:50 +08:00
yiguolei	5eaa995704	[refactor](some mempool) not memset 0 in default value iterator (#16194 ) --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-29 22:50:39 +08:00
Pxl	46347a51d2	[Bug](exec) enable warning on ignoring function return value for vctx (#16157 ) * enable warning on ignoring function return value for vctx	2023-01-29 17:23:21 +08:00
plat1ko	f97c51d786	[enhancement](compaction) Optimize calculating level size of input rowset in SizeBasedCumulativeCompactionPolicy (#15633 ) * Optimize calculating level size of input rowset in SizeBasedCumulativeCompactionPolicy	2023-01-29 17:18:50 +08:00
yiguolei	3235b636cc	[refactor](remove unused code) remove thread pool manager (#16179 ) * remove thread resource manager * remove string buffer --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-29 13:03:08 +08:00
yiguolei	241a956b20	[refactor](remove unused code) remove partition info from datastream sender (#16162 ) --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-28 19:56:41 +08:00
yiguolei	e49766483e	[refactor](remove unused code) remove many xxxVal structure (#16143 ) remove many xxxVal structure remove BetaRowsetWriter::_add_row remove anyval_util.cpp remove non-vectorized geo functions remove non-vectorized like predicate Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-28 14:17:43 +08:00
zhannngchen	4e64ff6329	[enhancement](load) avoid schema copy to reduce cpu usage (#16034 )	2023-01-28 11:13:57 +08:00
yiguolei	adb758dcac	[refactor](remove non vec code) remove json functions string functions match functions and some code (#16141 ) remove json functions code remove string functions code remove math functions code move MatchPredicate to olap since it is only used in storage predicate process remove some code in tuple, Tuple structure should be removed in the future. remove many code in collection value structure, they are useless	2023-01-26 16:21:12 +08:00
yiguolei	615a5e7b51	[refactor](remove non vec code) remove non vec functions and AggregateInfo (#16138 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-25 12:53:05 +08:00
yiguolei	6e8eedc521	[refactor](remove unused code) remove storage buffer and orc reader (#16137 ) remove olap storage byte buffer remove orc reader remove time operator remove read_write_util remove aggregate funcs remove compress.h and cpp remove bhp_lib Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-24 22:29:32 +08:00
yiguolei	79ad74637d	[refactor](remove expr) remove non vectorized Expr and ExprContext related codes (#16136 )	2023-01-24 10:45:35 +08:00
yiguolei	a3cd0ddbdc	[refactor](remove broker scan node) it is not useful any more (#16128 ) remove broker scannode remove broker table remove broker scanner remove json scanner remove orc scanner remove hive external table remove hudi external table remove broker external table, user could use broker table value function instead Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-23 19:37:38 +08:00
ZhaoChangle	199d7d3be8	[Refactor]Merged string_value into string_ref (#15925 )	2023-01-22 16:39:23 +08:00
lihangyu	116e17428b	[Enhancement](point query optimize) improve performace of point query on primary keys (#15491 ) 1. support row format using codec of jsonb 2. short path optimize for point query 3. support prepared statement for point query 4. support mysql binary format	2023-01-20 13:33:01 +08:00
Jibing-Li	3ebc98228d	[feature wip](multi catalog)Support iceberg schema evolution. (#15836 ) Support iceberg schema evolution for parquet file format. Iceberg use unique id for each column to support schema evolution. To support this feature in Doris, FE side need to get the current column id for each column and send the ids to be side. Be read column id from parquet key_value_metadata, set the changed column name in Block to match the name in parquet file before reading data. And set the name back after reading data.	2023-01-20 12:57:36 +08:00

1 2 3 4 5 ...

985 Commits