doris

Author	SHA1	Message	Date
lihangyu	2b014a0464	[Improve](doris::Status performance) fix the performance issue due to copy of std::string (#17411 )	2023-03-04 15:08:59 +08:00
Jerry Hu	823d968452	[fix](expr) avoid crashing caused by big depth of expression tree (#17314 )	2023-03-02 16:55:53 +08:00
Mingyu Chen	30df268c1f	[fix](hdfs)(catalog) fix BE crash when hdfs-site.xml not exist in be/conf and fix compute node logic (#17244 ) We set LIBHDFS3_CONF env in start_be.sh, so libhdfs3 will try to read this hdfs-site.xml, if file does not exist, it will throw error. But Doris does not handle this error, cause BE crash. This CL mainly changes: Modify start_be.sh to only set LIBHDFS3_CONF if hdfs-site.xml exist. Refactor the HDFSCommonBuilder so that it can return error correctly. Add BE IP info in status, so that we can get ip from error msg like: ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]failed to init reader for file 000.snappy.orc, err: [INTERNAL_ERROR][172.21.0.101]failed to init HDFSCommonBuilder, please check check be/conf/hdfs-site.xml The logic of prefer compute node is wrong, which causing the external table query can only assign up to 3 backends. This CL refactor this logic and also change some FE config: prefer_compute_node_for_external_table If set to true, query on external table will prefer to assign to compute node. And the max number of compute node is controlled by min_backend_num_for_external_table. If set to false, query on external table will assign to any node. min_backend_num_for_external_table Only take effect when prefer_compute_node_for_external_table is true. If the compute node number is less than this value, query on external table will try to get some mix node to assign, to let the total number of node reach this value. If the compute node number is larger than this value, query on external table will assign to compute node only.	2023-03-02 11:09:55 +08:00
Xinyi Zou	b7677beab7	[enhancement](memtracker) Add special counter for memtracker and fix thread create and destroy track #17301 Add a special counter for memtracker, faster, but relaxed ordering and not accurate in real time Track thread create and destroy memory, which was previously removed due to performance loss and added back	2023-03-02 08:55:00 +08:00
Xinyi Zou	a1e3b908d7	[fix](memory) split mem usage thread and gc thread to different threads (#17213 ) Ensure that the memory status is refreshed in time Avoid frequent GC	2023-03-01 19:19:05 +08:00
shee	6eeba204f9	[Enhancement] path scan causes disk io to skyrocket (#16968 )	2023-02-25 09:15:15 +08:00
zhannngchen	e5f884a6fc	[enhancement](cache) make segment cache prune more effectively (#17011 ) BloomFilter in MoW table may consume lots of memory, and it's life cycle is same as segment. This patch try to improve the efficiency of recycling segment cache, to release the memory in time.	2023-02-23 18:24:18 +08:00
Lijia Liu	8eeb435963	[improvement](meta) Enhance Doris's fault tolerance to disk error (#16472 ) Sense io error. Retry query when io error. Greylist: When finds one disk is completely broken, or the diff of tablet number in BE and FE meta is too large,reduce the query priority of the BE.	2023-02-23 08:40:45 +08:00
Xinyi Zou	a1c0054b4c	[fix](memory) fix memory GC details and join probe catch bad_alloc (#16989 ) Fix Redhat 4.x OS /proc/meminfo has no MemAvailable, disable MemAvailable to control memory. vm_rss_str and mem_available_str recorded when gc is triggered, to avoid memory changes during gc and cause inaccurate logs. join probe catch bad_alloc, this may alloc 64G memory at a time, avoid OOM. Modify document doris_be_all_segments_num and doris_be_all_rowsets_num names.	2023-02-23 08:33:30 +08:00
YueW	7b0fc17c04	[enhancement](inverted index) Support fulltext index evaluate equal query and list query (#16994 ) Fulltext index is the inverted index of the specified tokenizer, before this pr, fulltext index only can evaluate match predicate, this pr to support evaluate equal predicate and list predicate.	2023-02-22 20:18:10 +08:00
chenlinzhong	0e3be4eff5	[Improvement](brpc) Using a thread pool for RPC service avoiding std::mutex block brpc::bthread (#16639 ) mainly include: - brpc service adds two types of thread pools. The number of "light" and "heavy" thread pools is different Classify the interfaces of be. Those related to data transmission are classified as heavy interfaces and others as light interfaces - Add some monitoring to the thread pool, including the queue size and the number of active threads. Use these - indicators to guide the configuration of the number of threads	2023-02-22 14:15:47 +08:00
Mingyu Chen	c0bb2e33a8	[improvement](scan) separate scanner into local and remote scanner pool (#16891 ) There are 2 kinds for scanner thread pool, local and remote. Local is for local file read, specially for olap scanner. Remote is for other external data source, such as file scanner, jdbc scanner. This PR mainly changes: For olap scanner, use cold or hot rowset to decide whether to use local or remote pool. For other scanner, user remote pool by default. Add a new BE config doris_max_remote_scanner_thread_pool_thread_num, default is 512, indicate the max thread number of the remote scanner thread pool This will alleviate the problem of interaction between olap queries with load job and external queries.	2023-02-21 14:13:09 +08:00
lihangyu	113023fb86	(Enhancement)[load-json] support simdjson in new json reader (#16903 ) be config: enable_simdjson_reader=true related PR #11665	2023-02-21 11:31:00 +08:00
YueW	30dafd6a44	[improve](inverted index) Add element count limit for inverted index searcher cache (#16758 ) The element in InvertedIndexSearcherCache is inverted index searcher, which is a file descriptor of inverted index file, so InvertedIndexSearcherCache is actually cache file descriptor of inverted index file. If open file descriptor limit of the Linux system is set too small and config inverted_index_searcher_cache_limit is too big, during high pressure load maybe cause "Too many open files". So, when insert inverted index searcher into InvertedIndexSearcherCache, need also check whether reach file_descriptor_number limit for inverted index file.	2023-02-17 11:53:07 +08:00
lihangyu	2426d8e6e8	[chore](be-config) set disable_storage_row_cache default true to default disable row cache (#16827 )	2023-02-17 10:21:28 +08:00
zhengshengjun	d013d529c8	[Feature](ipv6)Support IPV6 (#14063 ) Support IPV6 in Apache Doris, the main changes are: 1. enable binding to IPV6 address if network priority in config file contains an IPV6 CIDR string 2. BRPC and HTTP support binding to IPV6 address 3. BRPC and HTTP support visiting IPV6 Services	2023-02-14 21:43:10 +08:00
plat1ko	f1b9185830	[feature](cooldown) Implement cold data compaction (#16681 )	2023-02-14 15:21:54 +08:00
plat1ko	5014ad03e7	[feature](cooldown) Auto delete unused remote files (#16588 )	2023-02-13 23:59:39 +08:00
huangzhaowei	f41a2055d3	[feature](Load)Remove user/password in properties for mysql load to avoid double auth. (#16073 ) Use FE cluster token to auth stream load. This auth is only open for be, and fe auth still only support http basic auth. I will use this auth for mysql load to build a no-auth stream load from fe to be. And this will avoid double auth in mysql load. More information to see the design doc.	2023-02-13 10:00:08 +08:00
caiconghui	1de4e312cc	[fix](metric) Fix be core when set enable_system_metrics to false in be (#16646 ) when enable_system_metrics is false, we should not use system_metrics any more Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2023-02-12 23:01:41 +08:00
Kang	aba843bb2b	[Improvement](inverted index) inverted index query match bitmap cache (#16578 ) Add cache for inverted index query match bitmap to accelerate common query keyword, especially for keyword matching many rows. Tests result: - large result: matching 99% out of 247 million rows shows 8x speed up. - small result: matching 0.1% out of 247 million rows shows 2x speed up.	2023-02-11 13:38:58 +08:00
lihangyu	37d1519316	[WIP](dynamic-table) support dynamic schema table (#16335 ) Issue Number: close #16351 Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.	2023-02-11 13:37:50 +08:00
YueW	43eca4f209	[Feature-WIP](inverted index) Implementation for alter inverted index. (#16371 ) implementation for add/drop inverted index.	2023-02-10 17:56:17 +08:00
lihangyu	32188855ef	[improve](topn) seperate multiget rpc to ThreadPool (#16598 ) multiget_data working in bthread and may block the whole worker pthread of BRPC framework and effect other bthreads, so I seperate work task into a seperate task pool.	2023-02-10 17:39:31 +08:00
Gabriel	a038fdaec6	[Bug](pipeline) Fix bug in non-local exchange on pipeline engine (#16463 ) Currently, for broadcast shuffle, we serialize a block once and then send it by RPC through multiple channel. After this, we will serialize next block in the same memory for consideration of memory reuse. However, since the RPC is asynchronized, maybe the next block serialization will happen before sending the previous block. So, in this PR, I use a ref count to identify if the serialized block can be reuse in broadcast shuffle.	2023-02-09 19:22:40 +08:00
Ashin Gau	539fd684e9	[improvement](filecache) use dynamic segment size to cache remote file block (#16485 ) `CachedRemoteFileReader` has used fixed segment size(file_cache_max_file_segment_size=4M) to cache remote file blocks. However, the column size in a rowgroup/strip maybe smaller than 10K if a parquet/orc file has many columns, resulting in particularly serious read amplification. For example: Q1 in clickbench: select count() from hits ``` - FileCache: 0ns - IOHitCacheNum: 552 - IOTotalNum: 835 - ReadFromFileCacheBytes: 19.98 MB - ReadFromWriteCacheBytes: 0.00 - ReadTotalBytes: 29.52 MB - SkipCacheBytes: 0.00 - WriteInFileCacheBytes: 915.77 MB - WriteInFileCacheNum: 283 ``` Only 30MB of data is needed, but 900MB+ of data is read from hdfs. The query time of Q1(single scan thread) increased from 5.17s* to 24.45s when enable file cache. Therefore, this PR introduce dynamic segment size which is based on the `read_size` of the data. In order to prevent too small or too large IO, the segment size is limited in [4096, file_cache_max_file_segment_size]. Q1 in clickbench is 5.66s when enable file cache. The performance is almost the same as if the cache is disabled, and the data size read from hdfs is reduced to 45MB. ``` - FileCache: 0ns - IOHitCacheNum: 297 - IOTotalNum: 835 - ReadFromFileCacheBytes: 8.73 MB - ReadFromWriteCacheBytes: 0.00 - ReadTotalBytes: 29.52 MB - SkipCacheBytes: 0.00 - WriteInFileCacheBytes: 45.66 MB - WriteInFileCacheNum: 544 ``` ## Remaining Problems Small queries may result in a large number of small files(4KB at least), and the `BE` saves too much meta information of cached segments. ## Fix bug `FileCachePolicy` in `FileReaderOptions` is a constant reference, but the parameter passed in `FileFactory::create_file_reader` is a temporary variable, resulting in segmentation fault.	2023-02-09 16:39:10 +08:00
Xiaocc	0142ef8b95	[improvement](scanner) Supports bthread scanner (#16031 )	2023-02-09 10:24:56 +08:00
Kang	cf18de14b5	[fix](writer) add _is_closed state to DeltaWriter and avoid write/close core after close (#16453 )	2023-02-07 22:40:26 +08:00
lihangyu	f2fd47f238	[Improve](row-store) support row cache (#16263 )	2023-02-06 11:16:39 +08:00
Xinyi Zou	63d57b83f3	[fix](memory) Fix request jemallloc metrics wait lock je_malloc_mutex_lock_slow #16381 MetricRegistry::trigger_all_hooks holds the metrics lock and is stuck in get_je_metrics, to_prometheus is waiting for MetricRegistry::trigger_all_hooks to release the lock, so get_je_metrics is no longer called in MetricRegistry::trigger_all_hooks.	2023-02-04 22:49:22 +08:00
Pxl	5e4bb98900	[Chore](build) enable -Wpedantic and update lowest gcc version to 11.1 (#16290 ) enable -Wpedantic and update lowest gcc version to 11.1	2023-02-03 11:28:48 +08:00
lihangyu	1d8265c5a3	[refactor](row-store) make row store column a hidden column in meta (#16251 ) This could simplfy storage engine logic and make code more readable, and we could analyze the hidden `__DORIS_ROW_STORE_COL__` length etc..	2023-02-02 20:56:13 +08:00
zhannngchen	6470ae58ea	[enhancement](config) remove config load_process_max_memory_limit_bytes (#15686 )	2023-01-31 21:36:34 +08:00
Xinyi Zou	17885acd09	[improvement](metrics) Metrics add all rowset nums and segment nums (#16208 )	2023-01-30 09:55:32 +08:00
yiguolei	5eaa995704	[refactor](some mempool) not memset 0 in default value iterator (#16194 ) --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-29 22:50:39 +08:00
Xinyi Zou	e9afd3210c	[improvement](memory) Optimize the log of process memory insufficient and support regular GC cache (#16084 ) 1. When the process memory is insufficient, print the process memory statistics in a more timely and detailed manner. 2. Support regular GC cache, currently only page cache and chunk allocator are included, because many people reported that the memory does not drop after the query ends. 3. Reduce system available memory warning water mark to reduce memory waste 4. Optimize soft mem limit logging	2023-01-29 10:02:04 +08:00
yiguolei	e49766483e	[refactor](remove unused code) remove many xxxVal structure (#16143 ) remove many xxxVal structure remove BetaRowsetWriter::_add_row remove anyval_util.cpp remove non-vectorized geo functions remove non-vectorized like predicate Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-28 14:17:43 +08:00
caiconghui	0148b39de0	[fix](metric) fix be down when enable_system_metrics is false (#16140 ) if we set enable_system_metrics to false, we will see be down with following message "enable metric calculator failed, maybe you set enable_system_metrics to false ", so fix it Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2023-01-28 00:10:39 +08:00
yiguolei	adb758dcac	[refactor](remove non vec code) remove json functions string functions match functions and some code (#16141 ) remove json functions code remove string functions code remove math functions code move MatchPredicate to olap since it is only used in storage predicate process remove some code in tuple, Tuple structure should be removed in the future. remove many code in collection value structure, they are useless	2023-01-26 16:21:12 +08:00
yiguolei	615a5e7b51	[refactor](remove non vec code) remove non vec functions and AggregateInfo (#16138 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-25 12:53:05 +08:00
yiguolei	6e8eedc521	[refactor](remove unused code) remove storage buffer and orc reader (#16137 ) remove olap storage byte buffer remove orc reader remove time operator remove read_write_util remove aggregate funcs remove compress.h and cpp remove bhp_lib Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-24 22:29:32 +08:00
yiguolei	79ad74637d	[refactor](remove expr) remove non vectorized Expr and ExprContext related codes (#16136 )	2023-01-24 10:45:35 +08:00
lihangyu	116e17428b	[Enhancement](point query optimize) improve performace of point query on primary keys (#15491 ) 1. support row format using codec of jsonb 2. short path optimize for point query 3. support prepared statement for point query 4. support mysql binary format	2023-01-20 13:33:01 +08:00
YueW	6485221ffb	[Feature-WIP](inverted index)(bkd) Support try query before query bkd to improve query efficiency (#16075 )	2023-01-20 11:19:36 +08:00
Xin Huang	05f0f63718	[fix](daemon) should use GetMonoTimeMicros() (#16070 )	2023-01-19 10:44:06 +08:00
lihangyu	3894de49d2	[Enhancement](topn) support two phase read for topn query (#15642 ) This PR optimize topn query like `SELECT * FROM tableX ORDER BY columnA ASC/DESC LIMIT N`. TopN is is compose of SortNode and ScanNode, when user table is wide like 100+ columns the order by clause is just a few columns.But ScanNode need to scan all data from storage engine even if the limit is very small.This may lead to lots of read amplification.So In this PR I devide TopN query into two phase: 1. The first phase we just need to read `columnA`'s data from storage engine along with an extra RowId column called `__DORIS_ROWID_COL__`.The other columns are pruned from ScanNode. 2. The second phase I put it in the ExchangeNode beacuase it's the central node for topn nodes in the cluster.The ExchangeNode will spawn a RPC to other nodes using the RowIds(sorted and limited from SortNode) read from the first phase and read row by row from storage engine. After the second phase read, Block will contain all the data needed for the query	2023-01-19 10:01:33 +08:00
YueW	e579530c99	[Feature-WIP](inverted index) support use inverted index searcher cache (#16003 ) use inverted index searcher cache to improve query performance dependency pr: #14211 #15807 #15823	2023-01-18 09:30:55 +08:00
YueW	b1caa68706	[Feature-WIP](inverted index) inverted index reader's implementation, and add mysql_fulltext regression case to test fulltext query (#15823 ) Issue Number: Step2 of DSIP-023: Add inverted index for full text search implementation of inverted index reader dependency pr: #14211 #15807 #15821	2023-01-17 09:13:56 +08:00
airborne12	0206e0bc57	[Feature](inverted index) implementation of inverted index writer for numeric types, using bkd index (#15918 ) Step3 of DSIP-023: Add inverted index for full text search implementation of inverted index writer for numeric types, using bkd index dependency pr: #14207 #15807 #15821	2023-01-14 21:06:51 +08:00
yiguolei	98c74f9ab8	[improvement](signal) add tid during core dump,the tid is equal to tid in be.INFO (#15893 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-14 18:40:02 +08:00

1 2 3 4 5 ...

472 Commits