doris

Author	SHA1	Message	Date
Socrates	c6df7c21a3	[branch-2.1](hive) support hive write text table (#38549 ) (#40063 ) 1. Support write hive text table 2. Add SessionVariable `hive_text_compression` to write compressed hive text table 3. Supported compression type: gzip, bzip2, snappy, lz4, zstd pick from https://github.com/apache/doris/pull/38549	2024-08-29 16:50:40 +08:00
Mingyu Chen	6915d76731	[opt](file-cache) add evict file number per round (#39721 ) Previously, when getting block from file cache, it may try to evict lots of blocks to reserve capacity for lru cache. This operation may take long time while hold the lock, causing other operation blocked. This PR add a new BE config `file_cache_max_evict_num_per_round`, default is 1000, so that it will not hold lock for a long time.	2024-08-28 08:49:12 +08:00
Luwei	cb312cabb2	[Fix](tablet-meta) limit the data size of tablet meta (#39455 ) (#39974 ) pick master #39455	2024-08-27 20:10:17 +08:00
Xinyi Zou	ae4d747c13	[branch-2.1](memory) Modify memory gc conf and add `crash_in_alloc_large_memory_bytes` (#39834 ) pick #39611	2024-08-24 09:21:35 +08:00
Xinyi Zou	1367f74e7a	[branch-2.1](memory) Optimize ClearCacheActionimplementation (#39796 ) pick #38438	2024-08-23 01:51:14 +08:00
Xinyi Zou	8ce8887b75	[branch-2.1](memory) Refactor refresh workload groups weighted memory ratio and record refresh interval memory growth (#39760 ) pick #38168 overwrites changes in #37221 on workload_group_manager.cpp. If need to pick 37221, ignore it.	2024-08-22 17:33:11 +08:00
Yongqiang YANG	610f69432a	[improvement](segmentcache) limit segment cache by fd limit or memory… (#39689 ) … (#39658) remove a useless config.	2024-08-21 15:19:52 +08:00
zhiqiang	830f250a80	[opt](query cancel) cancel query if it has pipeline task leakage #39223 (#39537 ) pick #39223 with some modifications. Optimization will only be applied to pipeline x.	2024-08-19 14:33:59 +08:00
苏小刚	0680c8d314	[improve](cache) File cache async init (#39036 ) ## Proposed changes Do `load_cache_info_into_memory()` asynchronously in a background thread in `LRUFileCache::initialize()`. When the cache is not ready, `LRUFileCache::get_or_set()` will return the FileBlock which state is SKIP_CACHE.	2024-08-15 16:27:51 +08:00
qiye	8678fcea32	[config](inverted index)Make inverted_index_ram_dir enable by default(#35094 ) (#39120 ) ## Proposed changes bp #35094 Co-authored-by: Luennng <luennng@gmail.com>	2024-08-09 01:38:14 +08:00
Xr Ling	2543b569bb	[Optimize](Row store) pick #37145 , #38236 (#38932 )	2024-08-07 09:55:42 +08:00
Mingyu Chen	e9bf0776d7	[fix](parquet) disable parquet page index by default #38691 (#38901 ) bp #38691	2024-08-06 08:51:39 +08:00
Luwei	0603ec1d9d	[enhancement](compaction) optimizing memory usage for compaction (#37099 ) (#37486 )	2024-08-04 10:49:18 +08:00
qiye	b3f335ba5f	[enhancement](index compaction) Enable index compaction by default (#36812 ) (#38676 ) ## Proposed changes bp #36812	2024-08-02 12:03:57 +08:00
Kaijie Chen	0152a4e86f	[config](be) add be config migration_lock_timeout_ms (#38000 ) (#38337 ) backport #38000	2024-07-25 17:36:34 +08:00
Xinyi Zou	10c5c336d8	[branch-2.1](arrow-flight-sql) Add config arrow_flight_result_sink_buffer_size_rows (#38223 ) pick #38221	2024-07-24 15:15:39 +08:00
wangbo	7b141ffde7	[pick]add min scan thread num for workload group's scan thread (#38123 ) ## Proposed changes pick #38096	2024-07-19 18:43:05 +08:00
lihangyu	b15ccdbe98	[Pick](Variant) pick some fix (#37922 ) #37674 #37839 #37883 #37857 #37794	2024-07-16 21:38:47 +08:00
Xinyi Zou	9861f81630	[branch-2.1](memory) Fix Jemalloc Cache Memory Tracker (#37905 ) pick #37464	2024-07-16 19:01:31 +08:00
Pxl	010d9d88f8	[Feature](rpc) support set brpc_idle_timeout_sec and enable thrift so… (#37808 ) pick from #37333	2024-07-15 21:12:25 +08:00
Mingyu Chen	a4d37d96ca	[opt](file-scanner) add not found file number in profile (#37042 ) (#37764 ) bp #37042	2024-07-15 17:11:06 +08:00
Kaijie Chen	232202b71f	[improve](load) reduce memory reserved in memtable limiter (#37511 ) (#37699 ) cherry-pick #37511	2024-07-15 11:09:09 +08:00
zclllyybb	2759383365	[branch-2.1](timezone) refactor tzdata load to accelerate and unify timezone parsing (#37062 ) (#37269 ) pick https://github.com/apache/doris/pull/37062 1. revert https://github.com/apache/doris/pull/25097. we decide to rely on OS. not maintain independent tzdata anymore to keep result consistency 2. refactor timezone load. removed rwlock. before: ```sql mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates; +-------------------------------------------------------------------------------------+--------------------------------------------------------+ \| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) \| count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) \| +-------------------------------------------------------------------------------------+--------------------------------------------------------+ \| 16000000 \| 16000000 \| +-------------------------------------------------------------------------------------+--------------------------------------------------------+ 1 row in set (6.88 sec) ``` now: ```sql mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates; +-------------------------------------------------------------------------------------+--------------------------------------------------------+ \| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) \| count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) \| +-------------------------------------------------------------------------------------+--------------------------------------------------------+ \| 16000000 \| 16000000 \| +-------------------------------------------------------------------------------------+--------------------------------------------------------+ 1 row in set (2.61 sec) ``` 3. now don't support timezone offset format string like 'UTC+8', like we already said in https://doris.apache.org/docs/dev/query/query-variables/time-zone/#usage 4. support case-insensitive timezone parsing in nereids. 5. a bug when parse timezone using nereids. should check DST by input, but wrongly by now before. now fixed. doc pr: https://github.com/apache/doris-website/pull/810	2024-07-15 10:56:48 +08:00
Xinyi Zou	747172237a	[branch-2.1](memory) Pick some memory GC patch (#37725 ) pick #36768 #37164 #37174 #37525	2024-07-14 15:19:40 +08:00
Xinyi Zou	cf2fb6945a	[branch-2.1](memory) Refactor LRU cache policy memory tracking (#37658 ) pick #36235 #35965	2024-07-11 21:04:01 +08:00
Luwei	3337c1bbe3	[[enhancement](compaction) adjust compaction concurrency based on compaction score and workload (#37491 ) adjust compaction concurrency based on compaction score and workload #36672 fix null pointer when retrieving CPU load average #37171	2024-07-09 09:56:35 +08:00
zhannngchen	494b54a5a5	[enhancement](trash) support skip trash, update trash default expire time (#37170 ) (#37409 ) cherry-pick #37170	2024-07-08 15:33:02 +08:00
wangbo	b272247a57	[pick]log thread num (#37258 ) ## Proposed changes pick #37159	2024-07-04 15:27:52 +08:00
Pxl	70e1c563b3	[Chore](runtime-filter) enlarge sync filter size rpc timeout limit (#37103 ) (#37225 ) pick from #37103	2024-07-03 21:02:26 +08:00
Mingyu Chen	e25717458e	[opt](catalog) add some profile for parquet reader and change meta cache config (#37040 ) (#37146 ) bp #37040	2024-07-02 20:58:43 +08:00
wangbo	f5572ac732	[pick]reset memtable flush thread num (#37092 ) ## Proposed changes pick #37028	2024-07-02 19:20:17 +08:00
camby	798d9d6fc6	[pick21][opt](mow) reduce memory usage for mow table compaction (#36865 ) (#36968 ) cherry-pick https://github.com/apache/doris/pull/36865 to branch-2.1	2024-07-01 15:33:18 +08:00
Yongqiang YANG	07278e9dcb	[improvement](segmentcache) limit segment cache by memory or segment … (#37035 ) …num (#37026) pick ##37026	2024-06-30 20:34:13 +08:00
yujun	22cb7b8fcb	[improvement](compaction) be do not compact invisible version to avoid query error -230 #28082 (#36222 ) cherry pick from #28082	2024-06-27 13:45:21 +08:00
walter	a79b56ac23	[chore](be) Support config max message size for be thrift server (#36595 ) Cherry-pick #36467	2024-06-20 20:15:43 +08:00
Ashin Gau	f59dc4fb37	[opt](split) generate and get split batch concurrently (#36044 ) bp #36045, and turn on batch split, which is turn off in #36109 Generate and get split batch concurrently. `SplitSource.getNextBatch` remove the synchronization, and make each get their splits concurrently, and `SplitAssignment` generates splits asynchronously.	2024-06-19 16:16:02 +08:00
Tiewei Fang	c84b56140c	[Fix](outfile) Add a configuration for exporting data in Parquet format using `select into outfile` (#36143 ) backport: #36142	2024-06-13 11:49:46 +08:00
lihangyu	0b28420e1c	[pick](Variant) make remote schema fetch rpc timeout configurable (#35296 ) (#36174 )	2024-06-12 19:51:53 +08:00
Xin Liao	d1eb917076	[fix](rpc) fix transfer large data and enable transfer_large_data_by_brpc by default #35770 (#36169 ) cherry pick from #35770	2024-06-12 19:39:07 +08:00
Mingyu Chen	fbc82e0253	[opt](log) refine the BE logger (#35942 ) (#35988 ) bp #35942	2024-06-06 22:25:22 +08:00
plat1ko	c2b830e1e7	Pick "[Fix](Tablet) Fix the issue of redundant loading of stale rowset (#35768 )" (#35882 )	2024-06-05 07:55:04 +08:00
Mingyu Chen	e755d64e62	[feature](be jvm monitor)append enable_jvm_monitor in be.conf to control jvm monitor. (#35608 ) (#35764 ) bp #35608 Co-authored-by: daidai <2017501503@qq.com>	2024-06-02 00:18:44 +08:00
Qi Chen	b91d2caab8	[Feature](iceberg-writer) Implements iceberg sink basic functionality for inserting into table. (#35587 ) backport #34929	2024-05-29 16:40:54 +08:00
Mingyu Chen	5c40e87667	[opt](s3) auto retry when meeting 429 error (#35397 ) - Add 2 new BE config - `s3_read_base_wait_time_ms` and `s3_read_max_wait_time_ms` When meet s3 429 error, the "get" request will sleep `s3_read_base_wait_time_ms (1, 2, 3, 4)` ms get try again. The max sleep time is s3_read_max_wait_time_ms and the max retry time is max_s3_client_retry - Add more metrics for s3 file reader - `s3_file_reader_too_many_request`: counter of 429 error. - `s3_file_reader_s3_get_request`: the QPS of s3 get request. - `TotalGetRequest`: Get request counter in profile - `TooManyRequestErr`: 429 error counter in profile - `TooManyRequestSleepTime`: Sum of sleep time after 429 error in profile - `TotalBytesRead`: Total bytes read from s3 in profile	2024-05-28 23:00:31 +08:00
TengJianPing	eefeb4d80c	[fix](spill) fix wrong disk usage of spill (#35423 ) ## Proposed changes Issue Number: close #xxx <!--Describe your changes.--> ## Further comments If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...	2024-05-28 18:53:55 +08:00
Xinyi Zou	b6eaf95720	[fix](memory) Fix BE memory info compatible with Cgroup (#35412 ) (#35425 ) 1. `memory.usage_in_bytes ~= free.used + free.(buff/cache) - (buff)`, free cache can be reused, so, modify cgroup_memory_usage = memory.usage_in_bytes - memory.meminfo["Cached"]. 2. If system not configured with cgroup, find cgroup file path will failed, refactor refresh cgroup memory info, compatible with find failed.	2024-05-27 12:31:44 +08:00
HHoflittlefish777	c6c90ff63e	[chore](routine-load) make routine_load_consumer_pool_size can update using HTTP API (#35315 )	2024-05-25 17:46:29 +08:00
Kaijie Chen	a6f7747d29	[feature](datatype) add BE config to allow zero date (#34961 ) Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>	2024-05-23 19:12:39 +08:00
Ashin Gau	98f8eb5c43	[opt](split) get file splits in batch mode (#34032 ) (#35107 ) bp #34032	2024-05-21 22:27:07 +08:00
Xin Liao	5019aa03e9	[enhancement](be-meta) disable sync rocksdb by default for better performance (#32714 ) (#35122 )	2024-05-21 15:30:49 +08:00

1 2 3 4 5 ...

306 Commits