doris

Author	SHA1	Message	Date
Pxl	5e4bb98900	[Chore](build) enable -Wpedantic and update lowest gcc version to 11.1 (#16290 ) enable -Wpedantic and update lowest gcc version to 11.1	2023-02-03 11:28:48 +08:00
lihangyu	1d8265c5a3	[refactor](row-store) make row store column a hidden column in meta (#16251 ) This could simplfy storage engine logic and make code more readable, and we could analyze the hidden `__DORIS_ROW_STORE_COL__` length etc..	2023-02-02 20:56:13 +08:00
zhannngchen	6470ae58ea	[enhancement](config) remove config load_process_max_memory_limit_bytes (#15686 )	2023-01-31 21:36:34 +08:00
Xinyi Zou	17885acd09	[improvement](metrics) Metrics add all rowset nums and segment nums (#16208 )	2023-01-30 09:55:32 +08:00
yiguolei	5eaa995704	[refactor](some mempool) not memset 0 in default value iterator (#16194 ) --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-29 22:50:39 +08:00
Xinyi Zou	e9afd3210c	[improvement](memory) Optimize the log of process memory insufficient and support regular GC cache (#16084 ) 1. When the process memory is insufficient, print the process memory statistics in a more timely and detailed manner. 2. Support regular GC cache, currently only page cache and chunk allocator are included, because many people reported that the memory does not drop after the query ends. 3. Reduce system available memory warning water mark to reduce memory waste 4. Optimize soft mem limit logging	2023-01-29 10:02:04 +08:00
yiguolei	e49766483e	[refactor](remove unused code) remove many xxxVal structure (#16143 ) remove many xxxVal structure remove BetaRowsetWriter::_add_row remove anyval_util.cpp remove non-vectorized geo functions remove non-vectorized like predicate Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-28 14:17:43 +08:00
caiconghui	0148b39de0	[fix](metric) fix be down when enable_system_metrics is false (#16140 ) if we set enable_system_metrics to false, we will see be down with following message "enable metric calculator failed, maybe you set enable_system_metrics to false ", so fix it Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2023-01-28 00:10:39 +08:00
yiguolei	adb758dcac	[refactor](remove non vec code) remove json functions string functions match functions and some code (#16141 ) remove json functions code remove string functions code remove math functions code move MatchPredicate to olap since it is only used in storage predicate process remove some code in tuple, Tuple structure should be removed in the future. remove many code in collection value structure, they are useless	2023-01-26 16:21:12 +08:00
yiguolei	615a5e7b51	[refactor](remove non vec code) remove non vec functions and AggregateInfo (#16138 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-25 12:53:05 +08:00
yiguolei	6e8eedc521	[refactor](remove unused code) remove storage buffer and orc reader (#16137 ) remove olap storage byte buffer remove orc reader remove time operator remove read_write_util remove aggregate funcs remove compress.h and cpp remove bhp_lib Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-24 22:29:32 +08:00
yiguolei	79ad74637d	[refactor](remove expr) remove non vectorized Expr and ExprContext related codes (#16136 )	2023-01-24 10:45:35 +08:00
lihangyu	116e17428b	[Enhancement](point query optimize) improve performace of point query on primary keys (#15491 ) 1. support row format using codec of jsonb 2. short path optimize for point query 3. support prepared statement for point query 4. support mysql binary format	2023-01-20 13:33:01 +08:00
YueW	6485221ffb	[Feature-WIP](inverted index)(bkd) Support try query before query bkd to improve query efficiency (#16075 )	2023-01-20 11:19:36 +08:00
Xin Huang	05f0f63718	[fix](daemon) should use GetMonoTimeMicros() (#16070 )	2023-01-19 10:44:06 +08:00
lihangyu	3894de49d2	[Enhancement](topn) support two phase read for topn query (#15642 ) This PR optimize topn query like `SELECT * FROM tableX ORDER BY columnA ASC/DESC LIMIT N`. TopN is is compose of SortNode and ScanNode, when user table is wide like 100+ columns the order by clause is just a few columns.But ScanNode need to scan all data from storage engine even if the limit is very small.This may lead to lots of read amplification.So In this PR I devide TopN query into two phase: 1. The first phase we just need to read `columnA`'s data from storage engine along with an extra RowId column called `__DORIS_ROWID_COL__`.The other columns are pruned from ScanNode. 2. The second phase I put it in the ExchangeNode beacuase it's the central node for topn nodes in the cluster.The ExchangeNode will spawn a RPC to other nodes using the RowIds(sorted and limited from SortNode) read from the first phase and read row by row from storage engine. After the second phase read, Block will contain all the data needed for the query	2023-01-19 10:01:33 +08:00
YueW	e579530c99	[Feature-WIP](inverted index) support use inverted index searcher cache (#16003 ) use inverted index searcher cache to improve query performance dependency pr: #14211 #15807 #15823	2023-01-18 09:30:55 +08:00
YueW	b1caa68706	[Feature-WIP](inverted index) inverted index reader's implementation, and add mysql_fulltext regression case to test fulltext query (#15823 ) Issue Number: Step2 of DSIP-023: Add inverted index for full text search implementation of inverted index reader dependency pr: #14211 #15807 #15821	2023-01-17 09:13:56 +08:00
airborne12	0206e0bc57	[Feature](inverted index) implementation of inverted index writer for numeric types, using bkd index (#15918 ) Step3 of DSIP-023: Add inverted index for full text search implementation of inverted index writer for numeric types, using bkd index dependency pr: #14207 #15807 #15821	2023-01-14 21:06:51 +08:00
yiguolei	98c74f9ab8	[improvement](signal) add tid during core dump,the tid is equal to tid in be.INFO (#15893 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-14 18:40:02 +08:00
yixiutt	d8990522fb	[conf](compaction) enable vertical_compaction ordered_data_compaction (#14945 )	2023-01-13 23:12:42 +08:00
airborne12	ecb5aea182	[Feature-WIP](inverted index) inverted index writer's implementation (#15821 )	2023-01-13 21:30:44 +08:00
HappenLee	9468711f9f	[Bug](join) fix bug null aware left anti join not correct result (#15841 )	2023-01-13 10:18:05 +08:00
yiguolei	16862d9b43	[refactor](remove unused code) remove buffer pool and disk io mgr (#15853 ) * [refactor](remove buffer pool and disk io mgr) remove unused code Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-13 09:42:58 +08:00
Gabriel	0fbdf8e3e1	[Refactor](table function) Decouple vectorized table functions from non-vectorized ones (#15772 )	2023-01-12 15:08:21 +08:00
zbtzbtzbt	fe5e5d2bf4	[refactor] separate agg and flush in memtable (#15713 )	2023-01-11 10:07:34 +08:00
TengJianPing	8f31a36429	[feature] support spill to disk for sort node (#15624 )	2023-01-11 08:40:58 +08:00
Tiewei Fang	f17d69e450	[feature](file cache)Import `file cache` for remote file reader (#15622 ) The main purpose of this pr is to import `fileCache` for lakehouse reading remote files. Use the local disk as the cache for reading remote file, so the next time this file is read, the data can be obtained directly from the local disk. In addition, this pr includes a few other minor changes Import File Cache: 1. The imported `fileCache` is called `block_file_cache`, which uses lru replacement policy. 2. Implement a new FileRereader `CachedRemoteFilereader`, so that the logic of `file cache` is hidden under `CachedRemoteFilereader`. Other changes: 1. Add a new interface `fs()` for `FileReader`. 2. `IOContext` adds some statistical information to count the situation of `FileCache` Co-authored-by: Lightman <31928846+Lchangliang@users.noreply.github.com>	2023-01-10 12:23:56 +08:00
HappenLee	f24659c003	[Refactor](pipeline) refactor the code of channel buffer limit and change the default value (#15650 )	2023-01-06 14:52:43 +08:00
spaces-x	1018657d9d	[Enhancement](SparkLoad): avoid BE OOM in push task, fix #15572 (#15620 ) Release memory pool held by the parquet reader when the data has been flushed by rowset writter. Co-authored-by: spaces-x <weixiang06@meituan.com>	2023-01-05 10:20:32 +08:00
YueW	edecc2e706	[feature-wip](inverted index) API for inverted index reader and syntax for fulltext match (#14211 ) * [feature-wip](inverted index)inverted index api: reader * [feature-wip](inverted index) Fulltext query syntax with MATCH/MATCH_ALL/MATCH_ALL * [feature-wip](inverted index) Adapt to index meta * [enhance] add more metrics * [enhance] add fulltext match query check for column type and index parser * [feature-wip](inverted index) Support apply inverted index in compound predicate which except leaf node of and node	2022-12-30 21:48:14 +08:00
spaces-x	a22ee89431	[Enhancement](jemalloc):support heap dump by http request at runtime (#15429 )	2022-12-28 20:10:50 +08:00
YueW	305dd15fea	[improvement](index) Support bitmap index can be applied with compound predicate when enable vectorized engine query (#13035 ) Current bitmap index only can apply pushed down predicates which in AND conditions. When predicates in OR conditions and other complex compound conditions, it will not be pushed down to the storage layer, this leads to read more data. Based on that situation, this pr will do: 1. this pr in order to support bitmap index apply compound predicates, query sql like: select * from tb where a > 'hello' or b < 100; select * from tb where a > 'hello' or b < 100 or c > 'ok'; select * from tb where (a > 'hello' or b <100) and (a < 'world' or b > 200); select * from tb where (not a> 'hello') or b < 100; ... above sql，column a and b and c has created bitmap_index. 2. this optimization can reduce reading data by index 3. set config enable_index_apply_compound_predicates to use this optimization	2022-12-28 20:08:57 +08:00
Jet He	75aa00d3d0	[Feature](NGram BloomFilter Index) add new ngram bloom filter index to speed up like query (#11579 ) This PR implement the new bloom filter index: NGram bloom filter index, which was proposed in #10733. The new index can improve the like query performance greatly, from our some test case , can get order of magnitude improve. For how to use it you can check the docs in this PR, and the index based on the ```enable_function_pushdown```, you need set it to ```true```, to make the index work for like query.	2022-12-28 18:01:50 +08:00
Mingyu Chen	8b6e4e74e7	[improvement](jdbc) add default jdbc driver's dir (#15346 ) Add a new config "jdbc_drivers_dir" for both FE and BE. User can put jdbc drivers' jar file in this dir, and only specify file name in "driver_url" properties when creating jdbc resource. And Doris will find jar files in this dir. Also modify the logic so that when the jdbc resource is modified, the corresponding jdbc table will get the latest properties.	2022-12-26 11:51:12 +08:00
Gabriel	b085ff49f0	[refactor](non-vec) delete non-vec data sink (#15283 ) * [refactor](non-vec) delete non-vec data sink Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-12-23 14:10:47 +08:00
Gabriel	13bc8c2ef8	[Pipeline](runtime filter) Support runtime filters on pipeline engine (#15040 )	2022-12-18 21:48:00 +08:00
Pxl	219489ca0e	[Bug](s2geo) avoid some core dump on s2geo && enable ut of s2geo #15068	2022-12-16 10:56:02 +08:00
Xinyi Zou	c16cc5c602	[fix](memtracker) Fix load channel memory tracker are not refreshed in time (#15048 )	2022-12-16 10:43:03 +08:00
Pxl	decabbb933	[Chore](s2geo) upgrade s2geo to 0.10.0 (#15002 ) upgrade s2geo to 0.10.0	2022-12-13 10:34:51 +08:00
Gabriel	c8551e0cad	[Bug](compile) Fix compiling error if set ENABLE_STACKTRACE (#15004 )	2022-12-12 20:41:10 +08:00
plat1ko	f3aea7f0f0	[Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744 )	2022-12-11 23:33:18 +08:00
Xinyi Zou	dffa3c0db2	[enhancement](memory) Support query memroy overcommit #14948 Add conf enable_query_memroy_overcommit If true, when the process does not exceed the soft mem limit, the query memory will not be limited; when the process memory exceeds the soft mem limit, the query with the largest ratio between the currently used memory and the exec_mem_limit will be canceled. If false, cancel query when the memory used exceeds exec_mem_limit, same as before.	2022-12-09 14:09:05 +08:00
yixiutt	4c5ddf70db	[bugfix](conf) remove useless conf (#14918 )	2022-12-08 14:11:15 +08:00
yixiutt	6a26435e8d	[bugfix](compaction) fix promotion size bug (#14836 )	2022-12-07 18:54:30 +08:00
yixiutt	204ab4c951	[enhancement](compaction) add some trigger and delete useless log (#14796 ) 1.add a vertical compaction segment file size config, make it more flexible to set segment file size 2.add a config to close skip tablet compaction. If current skip logic has some bug so we can still use old logic 3.delete some useless log	2022-12-07 18:53:55 +08:00
zhannngchen	dfa58e7985	[improvement](config) update high_priority_flush_thread_num_per_store default value to 6 (#14775 )	2022-12-07 15:31:00 +08:00
Xinyi Zou	cdbbf1e4ee	[enhancement](memory) Add Memory GC when the available memory of the BE process is lacking (#14712 ) When the system MemAvailable is less than the warning water mark, or the memory used by the BE process exceeds the mem soft limit, run minor gc and try to release cache. When the MemAvailable of the system is less than the low water mark, or the memory used by the BE process exceeds the mem limit, run fucc gc, try to release the cache, and start canceling from the query with the largest memory usage until the memory of mem_limit * 20% is released.	2022-12-07 15:28:52 +08:00
lsy3993	5292880310	[refactor](odbc) move param to config (#14596 ) move param to config	2022-12-06 17:38:52 +08:00
Yongqiang YANG	07472f7318	[fix](tcmalloc_gc) optimize policy of tcmalloc gc (#14776 ) Release memory when memory pressure is above pressure limit and keep at lease 2% memory as tcmalloc cache.	2022-12-05 21:16:35 +08:00

1 2 3 4 5 ...

442 Commits