Commit Graph

306 Commits

Author SHA1 Message Date
c6df7c21a3 [branch-2.1](hive) support hive write text table (#38549) (#40063)
1. Support write hive text table
2. Add SessionVariable `hive_text_compression` to write compressed hive
text table
3. Supported compression type: gzip, bzip2, snappy, lz4, zstd

pick from https://github.com/apache/doris/pull/38549
2024-08-29 16:50:40 +08:00
6915d76731 [opt](file-cache) add evict file number per round (#39721)
Previously, when getting block from file cache, it may try to evict
lots of blocks to reserve capacity for lru cache. This operation may
take long time
while hold the lock, causing other operation blocked.

This PR add a new BE config `file_cache_max_evict_num_per_round`,
default is 1000, so that it will not hold lock for a long time.
2024-08-28 08:49:12 +08:00
cb312cabb2 [Fix](tablet-meta) limit the data size of tablet meta (#39455) (#39974)
pick master #39455
2024-08-27 20:10:17 +08:00
ae4d747c13 [branch-2.1](memory) Modify memory gc conf and add crash_in_alloc_large_memory_bytes (#39834)
pick #39611
2024-08-24 09:21:35 +08:00
1367f74e7a [branch-2.1](memory) Optimize ClearCacheActionimplementation (#39796)
pick #38438
2024-08-23 01:51:14 +08:00
8ce8887b75 [branch-2.1](memory) Refactor refresh workload groups weighted memory ratio and record refresh interval memory growth (#39760)
pick #38168
overwrites changes in #37221 on workload_group_manager.cpp. If need to
pick 37221, ignore it.
2024-08-22 17:33:11 +08:00
610f69432a [improvement](segmentcache) limit segment cache by fd limit or memory… (#39689)
… (#39658)

remove a useless config.
2024-08-21 15:19:52 +08:00
830f250a80 [opt](query cancel) cancel query if it has pipeline task leakage #39223 (#39537)
pick #39223 with some modifications. Optimization will only be applied
to pipeline x.
2024-08-19 14:33:59 +08:00
0680c8d314 [improve](cache) File cache async init (#39036)
## Proposed changes

Do `load_cache_info_into_memory()` asynchronously in a background thread
in `LRUFileCache::initialize()`.
When the cache is not ready, `LRUFileCache::get_or_set()` will return
the FileBlock which state is SKIP_CACHE.
2024-08-15 16:27:51 +08:00
8678fcea32 [config](inverted index)Make inverted_index_ram_dir enable by default(#35094) (#39120)
## Proposed changes

bp #35094

Co-authored-by: Luennng <luennng@gmail.com>
2024-08-09 01:38:14 +08:00
2543b569bb [Optimize](Row store) pick #37145, #38236 (#38932) 2024-08-07 09:55:42 +08:00
e9bf0776d7 [fix](parquet) disable parquet page index by default #38691 (#38901)
bp #38691
2024-08-06 08:51:39 +08:00
0603ec1d9d [enhancement](compaction) optimizing memory usage for compaction (#37099) (#37486) 2024-08-04 10:49:18 +08:00
b3f335ba5f [enhancement](index compaction) Enable index compaction by default (#36812) (#38676)
## Proposed changes

bp #36812
2024-08-02 12:03:57 +08:00
0152a4e86f [config](be) add be config migration_lock_timeout_ms (#38000) (#38337)
backport #38000
2024-07-25 17:36:34 +08:00
10c5c336d8 [branch-2.1](arrow-flight-sql) Add config arrow_flight_result_sink_buffer_size_rows (#38223)
pick #38221
2024-07-24 15:15:39 +08:00
7b141ffde7 [pick]add min scan thread num for workload group's scan thread (#38123)
## Proposed changes

pick #38096
2024-07-19 18:43:05 +08:00
b15ccdbe98 [Pick](Variant) pick some fix (#37922)
#37674
#37839
#37883 
#37857 
#37794
2024-07-16 21:38:47 +08:00
9861f81630 [branch-2.1](memory) Fix Jemalloc Cache Memory Tracker (#37905)
pick #37464
2024-07-16 19:01:31 +08:00
Pxl
010d9d88f8 [Feature](rpc) support set brpc_idle_timeout_sec and enable thrift so… (#37808)
pick from #37333
2024-07-15 21:12:25 +08:00
a4d37d96ca [opt](file-scanner) add not found file number in profile (#37042) (#37764)
bp #37042
2024-07-15 17:11:06 +08:00
232202b71f [improve](load) reduce memory reserved in memtable limiter (#37511) (#37699)
cherry-pick #37511
2024-07-15 11:09:09 +08:00
2759383365 [branch-2.1](timezone) refactor tzdata load to accelerate and unify timezone parsing (#37062) (#37269)
pick https://github.com/apache/doris/pull/37062

1. revert https://github.com/apache/doris/pull/25097. we decide to rely
on OS. not maintain independent tzdata anymore to keep result
consistency
2. refactor timezone load. removed rwlock.

before:
```sql
mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates;
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) | count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
|                                                                            16000000 |                                               16000000 |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
1 row in set (6.88 sec)
```
now:
```sql
mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates;
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) | count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
|                                                                            16000000 |                                               16000000 |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
1 row in set (2.61 sec)
```
3. now don't support timezone offset format string like 'UTC+8', like we
already said in
https://doris.apache.org/docs/dev/query/query-variables/time-zone/#usage
4. support case-insensitive timezone parsing in nereids.
5. a bug when parse timezone using nereids. should check DST by input,
but wrongly by now before. now fixed.

doc pr: https://github.com/apache/doris-website/pull/810
2024-07-15 10:56:48 +08:00
747172237a [branch-2.1](memory) Pick some memory GC patch (#37725)
pick
#36768
#37164
#37174
#37525
2024-07-14 15:19:40 +08:00
cf2fb6945a [branch-2.1](memory) Refactor LRU cache policy memory tracking (#37658)
pick 
#36235
#35965
2024-07-11 21:04:01 +08:00
3337c1bbe3 [[enhancement](compaction) adjust compaction concurrency based on compaction score and workload (#37491)
adjust compaction concurrency based on compaction score and workload
#36672
fix null pointer when retrieving CPU load average #37171
2024-07-09 09:56:35 +08:00
494b54a5a5 [enhancement](trash) support skip trash, update trash default expire time (#37170) (#37409)
cherry-pick #37170
2024-07-08 15:33:02 +08:00
b272247a57 [pick]log thread num (#37258)
## Proposed changes

pick #37159
2024-07-04 15:27:52 +08:00
Pxl
70e1c563b3 [Chore](runtime-filter) enlarge sync filter size rpc timeout limit (#37103) (#37225)
pick from #37103
2024-07-03 21:02:26 +08:00
e25717458e [opt](catalog) add some profile for parquet reader and change meta cache config (#37040) (#37146)
bp #37040
2024-07-02 20:58:43 +08:00
f5572ac732 [pick]reset memtable flush thread num (#37092)
## Proposed changes

pick #37028
2024-07-02 19:20:17 +08:00
798d9d6fc6 [pick21][opt](mow) reduce memory usage for mow table compaction (#36865) (#36968)
cherry-pick https://github.com/apache/doris/pull/36865 to branch-2.1
2024-07-01 15:33:18 +08:00
07278e9dcb [improvement](segmentcache) limit segment cache by memory or segment … (#37035)
…num (#37026)

pick ##37026
2024-06-30 20:34:13 +08:00
22cb7b8fcb [improvement](compaction) be do not compact invisible version to avoid query error -230 #28082 (#36222)
cherry pick from #28082
2024-06-27 13:45:21 +08:00
a79b56ac23 [chore](be) Support config max message size for be thrift server (#36595)
Cherry-pick #36467
2024-06-20 20:15:43 +08:00
f59dc4fb37 [opt](split) generate and get split batch concurrently (#36044)
bp #36045, and turn on batch split, which is turn off in #36109
Generate and get split batch concurrently.
`SplitSource.getNextBatch` remove the synchronization, and make each get their splits concurrently, and `SplitAssignment` generates splits asynchronously.
2024-06-19 16:16:02 +08:00
c84b56140c [Fix](outfile) Add a configuration for exporting data in Parquet format using select into outfile (#36143)
backport: #36142
2024-06-13 11:49:46 +08:00
0b28420e1c [pick](Variant) make remote schema fetch rpc timeout configurable (#35296) (#36174) 2024-06-12 19:51:53 +08:00
d1eb917076 [fix](rpc) fix transfer large data and enable transfer_large_data_by_brpc by default #35770 (#36169)
cherry pick from #35770
2024-06-12 19:39:07 +08:00
fbc82e0253 [opt](log) refine the BE logger (#35942) (#35988)
bp #35942
2024-06-06 22:25:22 +08:00
c2b830e1e7 Pick "[Fix](Tablet) Fix the issue of redundant loading of stale rowset (#35768)" (#35882) 2024-06-05 07:55:04 +08:00
e755d64e62 [feature](be jvm monitor)append enable_jvm_monitor in be.conf to control jvm monitor. (#35608) (#35764)
bp #35608

Co-authored-by: daidai <2017501503@qq.com>
2024-06-02 00:18:44 +08:00
b91d2caab8 [Feature](iceberg-writer) Implements iceberg sink basic functionality for inserting into table. (#35587)
backport #34929
2024-05-29 16:40:54 +08:00
5c40e87667 [opt](s3) auto retry when meeting 429 error (#35397)
- Add 2 new BE config

	- `s3_read_base_wait_time_ms` and `s3_read_max_wait_time_ms`

		When meet s3 429 error, the "get" request will
		sleep `s3_read_base_wait_time_ms (*1, *2, *3, *4)` ms get try again.
		The max sleep time is s3_read_max_wait_time_ms
		and the max retry time is max_s3_client_retry
		
- Add more metrics for s3 file reader

	- `s3_file_reader_too_many_request`: counter of 429 error.
	- `s3_file_reader_s3_get_request`: the QPS of s3 get request.

	- `TotalGetRequest`: Get request counter in profile
	- `TooManyRequestErr`: 429 error counter in profile
	- `TooManyRequestSleepTime`: Sum of sleep time after 429 error in profile
	- `TotalBytesRead`: Total bytes read from s3 in profile
2024-05-28 23:00:31 +08:00
eefeb4d80c [fix](spill) fix wrong disk usage of spill (#35423)
## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

## Further comments

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
2024-05-28 18:53:55 +08:00
b6eaf95720 [fix](memory) Fix BE memory info compatible with Cgroup (#35412) (#35425)
1. `memory.usage_in_bytes ~= free.used + free.(buff/cache) - (buff)`, free cache can be reused,
   so, modify cgroup_memory_usage = memory.usage_in_bytes - memory.meminfo["Cached"].
2. If system not configured with cgroup, find cgroup file path will failed, refactor refresh cgroup memory info, compatible with find failed.
2024-05-27 12:31:44 +08:00
c6c90ff63e [chore](routine-load) make routine_load_consumer_pool_size can update using HTTP API (#35315) 2024-05-25 17:46:29 +08:00
a6f7747d29 [feature](datatype) add BE config to allow zero date (#34961)
Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
2024-05-23 19:12:39 +08:00
98f8eb5c43 [opt](split) get file splits in batch mode (#34032) (#35107)
bp  #34032
2024-05-21 22:27:07 +08:00
5019aa03e9 [enhancement](be-meta) disable sync rocksdb by default for better performance (#32714) (#35122) 2024-05-21 15:30:49 +08:00