doris

Author	SHA1	Message	Date
camby	798d9d6fc6	[pick21][opt](mow) reduce memory usage for mow table compaction (#36865 ) (#36968 ) cherry-pick https://github.com/apache/doris/pull/36865 to branch-2.1	2024-07-01 15:33:18 +08:00
Yongqiang YANG	07278e9dcb	[improvement](segmentcache) limit segment cache by memory or segment … (#37035 ) …num (#37026) pick ##37026	2024-06-30 20:34:13 +08:00
yujun	22cb7b8fcb	[improvement](compaction) be do not compact invisible version to avoid query error -230 #28082 (#36222 ) cherry pick from #28082	2024-06-27 13:45:21 +08:00
walter	a79b56ac23	[chore](be) Support config max message size for be thrift server (#36595 ) Cherry-pick #36467	2024-06-20 20:15:43 +08:00
Ashin Gau	f59dc4fb37	[opt](split) generate and get split batch concurrently (#36044 ) bp #36045, and turn on batch split, which is turn off in #36109 Generate and get split batch concurrently. `SplitSource.getNextBatch` remove the synchronization, and make each get their splits concurrently, and `SplitAssignment` generates splits asynchronously.	2024-06-19 16:16:02 +08:00
Tiewei Fang	c84b56140c	[Fix](outfile) Add a configuration for exporting data in Parquet format using `select into outfile` (#36143 ) backport: #36142	2024-06-13 11:49:46 +08:00
lihangyu	0b28420e1c	[pick](Variant) make remote schema fetch rpc timeout configurable (#35296 ) (#36174 )	2024-06-12 19:51:53 +08:00
Xin Liao	d1eb917076	[fix](rpc) fix transfer large data and enable transfer_large_data_by_brpc by default #35770 (#36169 ) cherry pick from #35770	2024-06-12 19:39:07 +08:00
Mingyu Chen	fbc82e0253	[opt](log) refine the BE logger (#35942 ) (#35988 ) bp #35942	2024-06-06 22:25:22 +08:00
plat1ko	c2b830e1e7	Pick "[Fix](Tablet) Fix the issue of redundant loading of stale rowset (#35768 )" (#35882 )	2024-06-05 07:55:04 +08:00
Mingyu Chen	e755d64e62	[feature](be jvm monitor)append enable_jvm_monitor in be.conf to control jvm monitor. (#35608 ) (#35764 ) bp #35608 Co-authored-by: daidai <2017501503@qq.com>	2024-06-02 00:18:44 +08:00
Qi Chen	b91d2caab8	[Feature](iceberg-writer) Implements iceberg sink basic functionality for inserting into table. (#35587 ) backport #34929	2024-05-29 16:40:54 +08:00
Mingyu Chen	5c40e87667	[opt](s3) auto retry when meeting 429 error (#35397 ) - Add 2 new BE config - `s3_read_base_wait_time_ms` and `s3_read_max_wait_time_ms` When meet s3 429 error, the "get" request will sleep `s3_read_base_wait_time_ms (1, 2, 3, 4)` ms get try again. The max sleep time is s3_read_max_wait_time_ms and the max retry time is max_s3_client_retry - Add more metrics for s3 file reader - `s3_file_reader_too_many_request`: counter of 429 error. - `s3_file_reader_s3_get_request`: the QPS of s3 get request. - `TotalGetRequest`: Get request counter in profile - `TooManyRequestErr`: 429 error counter in profile - `TooManyRequestSleepTime`: Sum of sleep time after 429 error in profile - `TotalBytesRead`: Total bytes read from s3 in profile	2024-05-28 23:00:31 +08:00
TengJianPing	eefeb4d80c	[fix](spill) fix wrong disk usage of spill (#35423 ) ## Proposed changes Issue Number: close #xxx <!--Describe your changes.--> ## Further comments If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...	2024-05-28 18:53:55 +08:00
Xinyi Zou	b6eaf95720	[fix](memory) Fix BE memory info compatible with Cgroup (#35412 ) (#35425 ) 1. `memory.usage_in_bytes ~= free.used + free.(buff/cache) - (buff)`, free cache can be reused, so, modify cgroup_memory_usage = memory.usage_in_bytes - memory.meminfo["Cached"]. 2. If system not configured with cgroup, find cgroup file path will failed, refactor refresh cgroup memory info, compatible with find failed.	2024-05-27 12:31:44 +08:00
HHoflittlefish777	c6c90ff63e	[chore](routine-load) make routine_load_consumer_pool_size can update using HTTP API (#35315 )	2024-05-25 17:46:29 +08:00
Kaijie Chen	a6f7747d29	[feature](datatype) add BE config to allow zero date (#34961 ) Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>	2024-05-23 19:12:39 +08:00
Ashin Gau	98f8eb5c43	[opt](split) get file splits in batch mode (#34032 ) (#35107 ) bp #34032	2024-05-21 22:27:07 +08:00
Xin Liao	5019aa03e9	[enhancement](be-meta) disable sync rocksdb by default for better performance (#32714 ) (#35122 )	2024-05-21 15:30:49 +08:00
HHoflittlefish777	1a24895257	[opt](routine-load) optimize routine load task thread pool and related param(#32282 ) (#34896 )	2024-05-15 12:42:02 +08:00
Mingyu Chen	cadbbdd2c0	[fix](config) for compatibility issue of log dir config (#34734 ) * [fix](config) for compatibility issue of log dir config * 1	2024-05-12 09:44:50 +08:00
Lightman	093fe354c8	[Improve](cache) Estimated column reader memory to control segment cache (#34526 )	2024-05-10 22:05:20 +08:00
wangbo	8abd136ba2	[Improvement](executor)Refactor Workload group memory GC (#33797 ) * just gc group's overcommit query when minor gc * add process usage	2024-04-30 19:34:31 +08:00
TengJianPing	2c3e838971	[improvement](spill) improve config of spill thread pool (#33992 )	2024-04-25 12:01:44 +08:00
TengJianPing	615765c1c0	[improvement](spill) improve spill directory and fix bugs (#33900 ) * [improvement](spill) improve spill directory and fix bugs * fix	2024-04-22 11:28:22 +08:00
Luwei	439027119e	[fix](schema change) fix schema change check does not calculate reader merged rows (#33825 ) (#33908 )	2024-04-19 22:57:25 +08:00
zhengyu	ee3b6fdf58	[fix](conf) make be conf disable_storage_page_cache modifiable (#33773 ) Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2024-04-17 23:42:14 +08:00
TengJianPing	07a8f44443	[improvement](spill) improve config and fix spill bugs (#33519 )	2024-04-17 23:42:13 +08:00
Mingyu Chen	38c5030f97	[opt](log) refactor the log dir config (#32933 ) Refactor the config for log dir of FE and BE TLDR: - Use env variable `LOG_DIR` to set root log dir - Remove `sys_log_dir` for FE and BE Details: 1. FE 1. The root log dir is set by env variable `LOG_DIR` in `fe.conf` 2. The default value of `audit_log_dir` is same as `${LOG_DIR}/` 3. The default value of `spark_launcher_log_dir` is `${LOG_DIR}/spark_launcher_log` 4. The default value of `nereids_trace_log_dir` is `${LOG_DIR}/nereids_trace_log` 5. The origin `sys_log_dir` is deprecated, and default value is `""`. But for compatibility, if user already set `sys_log_dir` before, Doris will still use it as root log dir. 2. BE 1. The root log dir is set by env variable `LOG_DIR` in `be.conf` 2. Remove `pipeline_tracing_log_dir`, use `${LOG_DIR}` directly. 3. The origin `sys_log_dir` is deprecated, and default value is `""`. But for compatibility, if user already set `sys_log_dir` before, Doris will still use it as root log dir.	2024-04-17 23:41:59 +08:00
Qi Chen	e841d82ffb	[Enhancement](hive-writer) Adjust table sink exchange rebalancer params. (#33397 ) Issue Number: #31442 Change table sink exchange rebalancer params to node level and adjust these params to improve write performance by better balance. rebalancer params: ``` DEFINE_mInt64(table_sink_partition_write_min_data_processed_rebalance_threshold, "26214400"); // 25MB // Minimum partition data processed to rebalance writers in exchange when partition writing DEFINE_mInt64(table_sink_partition_write_min_partition_data_processed_rebalance_threshold, "15728640"); // 15MB ```	2024-04-12 13:09:56 +08:00
abmdocrt	1b3a11a02b	[Enhancement](merge-on-write) Support dynamic delete bitmap cache (#32991 ) * The default delete bitmap cache is set to 100MB, which can be insufficient and cause performance issues when the amount of user data is large. To mitigate the problem of an inadequate cache, we will take the larger of 5% of the total memory and 100MB as the delete bitmap cache size.	2024-04-10 14:53:56 +08:00
TengJianPing	517c12478f	[improvement](spill) spill trigger improvement (#32641 )	2024-04-10 14:52:46 +08:00
Xinyi Zou	cf7595d423	[opt](memory) Optimize mem tracker accuracy (#32039 ) (#33140 )	2024-04-10 11:42:19 +08:00
Mingyu Chen	c758a25dd8	[opt](fqdn) Add DNS Cache for FE and BE (#32869 ) In previously, when enabling FQDN, Doris will call dns resolver to get IP from hostname each time when 1) FE gets BE's grpc client. 2) BE gets other BE's brpc client. So when in high concurrency case, the dns resolver be overloaded and failed to resolve hostname. This PR mainly changes: 1. Add DNSCache for both FE and BE. The DNSCache will run on every FE and BE node. It has a cache, key is hostname and value is IP. Caller can get IP by hostname from this cache, and if hostname does not exist, it will try to resolve it and update the cache. In addition, DNSCache has a daemon thread to refresh the cache every 1 min, in case that the IP may be changed at anytime. There are other implements of this dns cache: 1. `36fed13997` This is for BE side, but it does not handle the IP change case. 3. https://github.com/apache/doris/pull/28479 This is for FE side, but it can only work with Master FE. Other FE node will not be aware of the IP change. And there are a bunch of BackendServiceProxy, this PR only handle cache in one of them.	2024-04-07 22:16:04 +08:00
airborne12	0122b8a6b4	[Update](inverted index) add config for inverted index query cache shards (#32666 )	2024-03-26 20:27:33 +08:00
zhangstar333	4ebbabf15e	[test](fuzzy) test fuzzy in BE (#31607 ) test fuzzy in BE	2024-03-24 08:06:13 +08:00
qiye	a4a191fe56	[fix](index compaction)Fix MOW index compaction core (#32121 ) (#32657 )	2024-03-22 14:20:19 +08:00
Mingyu Chen	e99b33c274	[opt](file-meta-cache) reduce file meta cache size and disable cache for some cases (#32340 ) File meta cache on BE is used to cache the meta for external table's file such as parquet footer. This cache is counted by number, not memory consumption. So if the cache object is big(eg, a large parquet footer), the total memory consumption of this cache will be large and causing OOM. This PR mainly changes: 1. Add a new method `exceed_prune_limit()` for `CachePolicy` For `ObjLRUCache`, it always return true so that the minor of full gc on BE will prune the cache each time. 2. Reduce the default capability of file meta cache, from 20000 to 1000 Also change the default capability of hdfs file handle cache, from 20000 to 1000 4. Change judgement of whether enable file meta cache when querying If the number of file need to be read is larger than the 1/3 of the file meta cache's capability, file meta cache will be disabled for this query. Because cache is useless if there are too many files.	2024-03-21 14:07:22 +08:00
Mingyu Chen	ef2151ae66	[Feature-WIP](multi-catalog) Add Hive sink on BE side. (#32306 ) (#32364 ) bp #32306 Co-authored-by: Qi Chen <kaka11.chen@gmail.com>	2024-03-18 11:23:01 +08:00
Gabriel	4bf202db04	[pipelineX](exchange) Make exchange buffer size configurable (#32201 )	2024-03-16 20:58:20 +08:00
ryanzryu	c5ffeff833	[fix](s3 client)add default ca cert list for s3 client to avoid problem:'curlCode:77' (#32285 ) Co-authored-by: ryanzryu <ryanzryu@tencent.com>	2024-03-16 20:55:28 +08:00
TengJianPing	3358f76a7f	[feature](spill) Implement spill to disk for hash join, aggregation and sort for pipelineX (#31910 ) Co-authored-by: Jerry Hu <mrhhsg@gmail.com>	2024-03-12 14:12:09 +08:00
airborne12	daa171ee3a	[Update](cloud) add inverted index tmp dir support (#31484 )	2024-03-02 01:08:51 +08:00
zhangstar333	c40c16b8b3	[improve](conf)refactor fuzzy mode in BE (#31412 ) refactor the code of fuzzy in BE, and will be add more variables in it, then could test case at different mode.	2024-02-29 19:51:07 +08:00
zclllyybb	82add8dfc1	[Fix](timezone) Introduce a config to use Doris tzdata directly (#31561 )	2024-02-29 12:38:03 +08:00
zclllyybb	b177b26d39	[branch-2.1](tracing) Pick pipeline tracing and relative bugfix (#31367 ) * [Feature](pipeline) Trace pipeline scheduling (part I) (#31027) * [fix](compile) Fix performance compile fail #31305 * [fix](compile) Fix macOS compilation issues for PURE macro and CPU core identification (#31357) * [fix](compile) Correct PURE macro definition to fix compilation on macOS * 2 --------- Co-authored-by: zy-kkk <zhongyk10@gmail.com>	2024-02-29 08:42:35 +08:00
AlexYue	f18c853495	[enhance](S3) Init default retry strategy for aws s3 sdk (#31329 )	2024-02-28 13:08:36 +08:00
Xinyi Zou	4c3a96e7df	[fix](memory) Fix LRU cache frequent prune (#31220 )	2024-02-22 19:51:20 +08:00
Mingyu Chen	a8d8c6a271	[fix](file-writer) opt s3 file writer and fix empty file related issue #28983 #30703 #31169 (#31213 ) * (feature)(cloud) Use dynamic allocator instead of static buffer pool for better elasticity. (#28983) * [fix](outfile) Fix unable to export empty data (#30703) Issue Number: close #30600 Fix unable to export empty data to hdfs / S3, this behavior is inconsistent with version 1.2.7, version 1.2.7 can export empty data to hdfs/ S3, and there will be exported files on S3/HDFS. * [fix](file-writer) avoid empty file for segment writer (#31169) --------- Co-authored-by: AlexYue <yj976240184@gmail.com> Co-authored-by: zxealous <zhouchangyue@baidu.com>	2024-02-21 16:48:54 +08:00
Xin Liao	9a708806e0	[fix](segcompaction) enable segcompaction by default (#30810 )	2024-02-19 19:04:22 +08:00

1 2 3 4 5 ...

275 Commits