Commit Graph

304 Commits

Author SHA1 Message Date
cb312cabb2 [Fix](tablet-meta) limit the data size of tablet meta (#39455) (#39974)
pick master #39455
2024-08-27 20:10:17 +08:00
ae4d747c13 [branch-2.1](memory) Modify memory gc conf and add crash_in_alloc_large_memory_bytes (#39834)
pick #39611
2024-08-24 09:21:35 +08:00
1367f74e7a [branch-2.1](memory) Optimize ClearCacheActionimplementation (#39796)
pick #38438
2024-08-23 01:51:14 +08:00
8ce8887b75 [branch-2.1](memory) Refactor refresh workload groups weighted memory ratio and record refresh interval memory growth (#39760)
pick #38168
overwrites changes in #37221 on workload_group_manager.cpp. If need to
pick 37221, ignore it.
2024-08-22 17:33:11 +08:00
610f69432a [improvement](segmentcache) limit segment cache by fd limit or memory… (#39689)
… (#39658)

remove a useless config.
2024-08-21 15:19:52 +08:00
830f250a80 [opt](query cancel) cancel query if it has pipeline task leakage #39223 (#39537)
pick #39223 with some modifications. Optimization will only be applied
to pipeline x.
2024-08-19 14:33:59 +08:00
0680c8d314 [improve](cache) File cache async init (#39036)
## Proposed changes

Do `load_cache_info_into_memory()` asynchronously in a background thread
in `LRUFileCache::initialize()`.
When the cache is not ready, `LRUFileCache::get_or_set()` will return
the FileBlock which state is SKIP_CACHE.
2024-08-15 16:27:51 +08:00
8678fcea32 [config](inverted index)Make inverted_index_ram_dir enable by default(#35094) (#39120)
## Proposed changes

bp #35094

Co-authored-by: Luennng <luennng@gmail.com>
2024-08-09 01:38:14 +08:00
2543b569bb [Optimize](Row store) pick #37145, #38236 (#38932) 2024-08-07 09:55:42 +08:00
e9bf0776d7 [fix](parquet) disable parquet page index by default #38691 (#38901)
bp #38691
2024-08-06 08:51:39 +08:00
0603ec1d9d [enhancement](compaction) optimizing memory usage for compaction (#37099) (#37486) 2024-08-04 10:49:18 +08:00
b3f335ba5f [enhancement](index compaction) Enable index compaction by default (#36812) (#38676)
## Proposed changes

bp #36812
2024-08-02 12:03:57 +08:00
0152a4e86f [config](be) add be config migration_lock_timeout_ms (#38000) (#38337)
backport #38000
2024-07-25 17:36:34 +08:00
10c5c336d8 [branch-2.1](arrow-flight-sql) Add config arrow_flight_result_sink_buffer_size_rows (#38223)
pick #38221
2024-07-24 15:15:39 +08:00
7b141ffde7 [pick]add min scan thread num for workload group's scan thread (#38123)
## Proposed changes

pick #38096
2024-07-19 18:43:05 +08:00
b15ccdbe98 [Pick](Variant) pick some fix (#37922)
#37674
#37839
#37883 
#37857 
#37794
2024-07-16 21:38:47 +08:00
9861f81630 [branch-2.1](memory) Fix Jemalloc Cache Memory Tracker (#37905)
pick #37464
2024-07-16 19:01:31 +08:00
Pxl
010d9d88f8 [Feature](rpc) support set brpc_idle_timeout_sec and enable thrift so… (#37808)
pick from #37333
2024-07-15 21:12:25 +08:00
a4d37d96ca [opt](file-scanner) add not found file number in profile (#37042) (#37764)
bp #37042
2024-07-15 17:11:06 +08:00
232202b71f [improve](load) reduce memory reserved in memtable limiter (#37511) (#37699)
cherry-pick #37511
2024-07-15 11:09:09 +08:00
2759383365 [branch-2.1](timezone) refactor tzdata load to accelerate and unify timezone parsing (#37062) (#37269)
pick https://github.com/apache/doris/pull/37062

1. revert https://github.com/apache/doris/pull/25097. we decide to rely
on OS. not maintain independent tzdata anymore to keep result
consistency
2. refactor timezone load. removed rwlock.

before:
```sql
mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates;
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) | count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
|                                                                            16000000 |                                               16000000 |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
1 row in set (6.88 sec)
```
now:
```sql
mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates;
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) | count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
|                                                                            16000000 |                                               16000000 |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
1 row in set (2.61 sec)
```
3. now don't support timezone offset format string like 'UTC+8', like we
already said in
https://doris.apache.org/docs/dev/query/query-variables/time-zone/#usage
4. support case-insensitive timezone parsing in nereids.
5. a bug when parse timezone using nereids. should check DST by input,
but wrongly by now before. now fixed.

doc pr: https://github.com/apache/doris-website/pull/810
2024-07-15 10:56:48 +08:00
747172237a [branch-2.1](memory) Pick some memory GC patch (#37725)
pick
#36768
#37164
#37174
#37525
2024-07-14 15:19:40 +08:00
cf2fb6945a [branch-2.1](memory) Refactor LRU cache policy memory tracking (#37658)
pick 
#36235
#35965
2024-07-11 21:04:01 +08:00
3337c1bbe3 [[enhancement](compaction) adjust compaction concurrency based on compaction score and workload (#37491)
adjust compaction concurrency based on compaction score and workload
#36672
fix null pointer when retrieving CPU load average #37171
2024-07-09 09:56:35 +08:00
494b54a5a5 [enhancement](trash) support skip trash, update trash default expire time (#37170) (#37409)
cherry-pick #37170
2024-07-08 15:33:02 +08:00
b272247a57 [pick]log thread num (#37258)
## Proposed changes

pick #37159
2024-07-04 15:27:52 +08:00
Pxl
70e1c563b3 [Chore](runtime-filter) enlarge sync filter size rpc timeout limit (#37103) (#37225)
pick from #37103
2024-07-03 21:02:26 +08:00
e25717458e [opt](catalog) add some profile for parquet reader and change meta cache config (#37040) (#37146)
bp #37040
2024-07-02 20:58:43 +08:00
f5572ac732 [pick]reset memtable flush thread num (#37092)
## Proposed changes

pick #37028
2024-07-02 19:20:17 +08:00
798d9d6fc6 [pick21][opt](mow) reduce memory usage for mow table compaction (#36865) (#36968)
cherry-pick https://github.com/apache/doris/pull/36865 to branch-2.1
2024-07-01 15:33:18 +08:00
07278e9dcb [improvement](segmentcache) limit segment cache by memory or segment … (#37035)
…num (#37026)

pick ##37026
2024-06-30 20:34:13 +08:00
22cb7b8fcb [improvement](compaction) be do not compact invisible version to avoid query error -230 #28082 (#36222)
cherry pick from #28082
2024-06-27 13:45:21 +08:00
a79b56ac23 [chore](be) Support config max message size for be thrift server (#36595)
Cherry-pick #36467
2024-06-20 20:15:43 +08:00
f59dc4fb37 [opt](split) generate and get split batch concurrently (#36044)
bp #36045, and turn on batch split, which is turn off in #36109
Generate and get split batch concurrently.
`SplitSource.getNextBatch` remove the synchronization, and make each get their splits concurrently, and `SplitAssignment` generates splits asynchronously.
2024-06-19 16:16:02 +08:00
c84b56140c [Fix](outfile) Add a configuration for exporting data in Parquet format using select into outfile (#36143)
backport: #36142
2024-06-13 11:49:46 +08:00
0b28420e1c [pick](Variant) make remote schema fetch rpc timeout configurable (#35296) (#36174) 2024-06-12 19:51:53 +08:00
d1eb917076 [fix](rpc) fix transfer large data and enable transfer_large_data_by_brpc by default #35770 (#36169)
cherry pick from #35770
2024-06-12 19:39:07 +08:00
fbc82e0253 [opt](log) refine the BE logger (#35942) (#35988)
bp #35942
2024-06-06 22:25:22 +08:00
c2b830e1e7 Pick "[Fix](Tablet) Fix the issue of redundant loading of stale rowset (#35768)" (#35882) 2024-06-05 07:55:04 +08:00
e755d64e62 [feature](be jvm monitor)append enable_jvm_monitor in be.conf to control jvm monitor. (#35608) (#35764)
bp #35608

Co-authored-by: daidai <2017501503@qq.com>
2024-06-02 00:18:44 +08:00
b91d2caab8 [Feature](iceberg-writer) Implements iceberg sink basic functionality for inserting into table. (#35587)
backport #34929
2024-05-29 16:40:54 +08:00
5c40e87667 [opt](s3) auto retry when meeting 429 error (#35397)
- Add 2 new BE config

	- `s3_read_base_wait_time_ms` and `s3_read_max_wait_time_ms`

		When meet s3 429 error, the "get" request will
		sleep `s3_read_base_wait_time_ms (*1, *2, *3, *4)` ms get try again.
		The max sleep time is s3_read_max_wait_time_ms
		and the max retry time is max_s3_client_retry
		
- Add more metrics for s3 file reader

	- `s3_file_reader_too_many_request`: counter of 429 error.
	- `s3_file_reader_s3_get_request`: the QPS of s3 get request.

	- `TotalGetRequest`: Get request counter in profile
	- `TooManyRequestErr`: 429 error counter in profile
	- `TooManyRequestSleepTime`: Sum of sleep time after 429 error in profile
	- `TotalBytesRead`: Total bytes read from s3 in profile
2024-05-28 23:00:31 +08:00
eefeb4d80c [fix](spill) fix wrong disk usage of spill (#35423)
## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

## Further comments

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
2024-05-28 18:53:55 +08:00
b6eaf95720 [fix](memory) Fix BE memory info compatible with Cgroup (#35412) (#35425)
1. `memory.usage_in_bytes ~= free.used + free.(buff/cache) - (buff)`, free cache can be reused,
   so, modify cgroup_memory_usage = memory.usage_in_bytes - memory.meminfo["Cached"].
2. If system not configured with cgroup, find cgroup file path will failed, refactor refresh cgroup memory info, compatible with find failed.
2024-05-27 12:31:44 +08:00
c6c90ff63e [chore](routine-load) make routine_load_consumer_pool_size can update using HTTP API (#35315) 2024-05-25 17:46:29 +08:00
a6f7747d29 [feature](datatype) add BE config to allow zero date (#34961)
Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
2024-05-23 19:12:39 +08:00
98f8eb5c43 [opt](split) get file splits in batch mode (#34032) (#35107)
bp  #34032
2024-05-21 22:27:07 +08:00
5019aa03e9 [enhancement](be-meta) disable sync rocksdb by default for better performance (#32714) (#35122) 2024-05-21 15:30:49 +08:00
1a24895257 [opt](routine-load) optimize routine load task thread pool and related param(#32282) (#34896) 2024-05-15 12:42:02 +08:00
cadbbdd2c0 [fix](config) for compatibility issue of log dir config (#34734)
* [fix](config) for compatibility issue of log dir config

* 1
2024-05-12 09:44:50 +08:00