Commit Graph

1579 Commits

Author SHA1 Message Date
e0ec2da29b [fix](routine-load) fix get kafka offset timeout may too long (#33502) 2024-04-17 23:42:12 +08:00
Pxl
e85a2c8866 [Chore](status) change unknow filter error to internal error (#33633) 2024-04-17 23:42:12 +08:00
8ee8de7857 [Fix](executor)reset remote scan thread num #33579 2024-04-17 23:42:11 +08:00
01f333086d [pipelineX](fix) Fix data pooling judgement for bucket join (#33533) 2024-04-17 23:42:00 +08:00
b2b385a4ff [improve](fold) support complex type for constant folding (#32867) 2024-04-17 23:41:59 +08:00
fefbde8927 [log](move-memtable) improve logs in vtablet_writer_v2 and load_stream (#33103) 2024-04-12 15:09:25 +08:00
a4924dabb7 [enhancement](exception) enble exception logic in pipeline execute thread (#33437)
* [enhancement](exception) enble exception logic in pipeline execute thread

* f

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-04-12 15:09:25 +08:00
Pxl
5f30463bb3 [Chore](descriptors) remove unused codes for descriptors (#33408)
remove unused codes for descriptors
2024-04-12 15:09:25 +08:00
e841d82ffb [Enhancement](hive-writer) Adjust table sink exchange rebalancer params. (#33397)
Issue Number:  #31442

Change table sink exchange rebalancer params to node level and adjust these params to improve write performance by better balance.

rebalancer params:
```
DEFINE_mInt64(table_sink_partition_write_min_data_processed_rebalance_threshold,
              "26214400"); // 25MB
// Minimum partition data processed to rebalance writers in exchange when partition writing
DEFINE_mInt64(table_sink_partition_write_min_partition_data_processed_rebalance_threshold,
              "15728640"); // 15MB
```
2024-04-12 13:09:56 +08:00
3d66723214 [branch-2.1](auto-partition) pick auto partition and some more prs (#33523) 2024-04-11 17:12:17 +08:00
Pxl
5688c28364 [Bug](runtime-filter) try to fix heap use after free on runtime filter send filter size (#33465) (#33522) 2024-04-11 13:10:24 +08:00
Pxl
3081fc584d [Improvement](runtime-filter) support sync join node build side's size to init bloom runtime filter (#32180)
support sync join node build side's size to init bloom runtime filter
2024-04-11 09:31:50 +08:00
6bef95eb4f [fix](memory) Fix memory tracker destructor deadlock (#33497) 2024-04-10 22:46:53 +08:00
f8d1fa2be3 [chore](multi-table-load) add context info in log when using single-stream-multi-table load (#33317) 2024-04-10 16:03:05 +08:00
2b1ab89b5b [fix](memory) Fix memory log compile by ASAN (#33162)
ASAN compiles BE, add markers in memory logs
2024-04-10 15:26:09 +08:00
Pxl
8fd6d4c41b [Chore](build) add -Wconversion and remove some unused code (#33127)
add -Wconversion and remove some unused code
2024-04-10 15:26:08 +08:00
cc363f26c2 [fix](Nereids) fix group concat (#33091)
Fix failed in regression_test/suites/query_p0/group_concat/test_group_concat.groovy

select
group_concat( distinct b1, '?'), group_concat( distinct b3, '?')
from
table_group_concat
group by
b2

exception:

lowestCostPlans with physicalProperties(GATHER) doesn't exist in root group

The root cause is '?' is push down to slot by NormalizeAggregate, AggregateStrategies treat the slot as a distinct parameter and generate a invalid PhysicalHashAggregate, and then reject by ChildOutputPropertyDeriver.

I fix this bug by avoid push down literal to slot in NormalizeAggregate, and forbidden generate stream aggregate node when group by slots is empty
2024-04-10 14:59:46 +08:00
517c12478f [improvement](spill) spill trigger improvement (#32641) 2024-04-10 14:52:46 +08:00
cf7595d423 [opt](memory) Optimize mem tracker accuracy (#32039) (#33140) 2024-04-10 11:42:19 +08:00
28e2d89ce3 [Improve](inverted_index) update clucene and improve array inverted index writer (#32436) 2024-04-10 11:34:29 +08:00
005f7af21f [bugfix](deadlock) should not use query cancelled in fragment mgr 2024-04-09 16:09:01 +08:00
bfc9260507 [bugfix](deadlock) avoid deadlock in memtracker cancel query (#33400)
get_query_ctx(hold query ctx map lock) ---> QueryCtx ---> runtime statistics mgr --->

runtime statistics mgr ---> allocate block memory ---> cancel query

memtracker will try to cancel query when memory is not available during allocator.
BUT the allocator is a foundermental API, if it call the upper API it may deadlock.
Should not call any API during allocator.
2024-04-09 12:20:54 +08:00
a8232c67f9 [pipelineX](runtime filter) Fix task timeout caused by runtime filter (#33332) (#33369) 2024-04-08 16:30:32 +08:00
d60d804d9c [fix](memory) Fix task repeat attach task DCHECK failed #32784 (#33343)
[branch-2.1](memory) Fix CCR task repeat attach task DCHECK failed3 #33366
2024-04-08 16:15:04 +08:00
466972926e [fix](dns-cache) do not detach the refresh thread (#33182) 2024-04-07 22:18:56 +08:00
c758a25dd8 [opt](fqdn) Add DNS Cache for FE and BE (#32869)
In previously, when enabling FQDN, Doris will call dns resolver to get IP from hostname
each time when 1) FE gets BE's grpc client. 2) BE gets other BE's brpc client.
So when in high concurrency case, the dns resolver be overloaded and failed to resolve hostname.

This PR mainly changes:

1. Add DNSCache for both FE and BE.
    The DNSCache will run on every FE and BE node. It has a cache, key is hostname and value is IP.
    Caller can get IP by hostname from this cache, and if hostname does not exist, it will try to resolve it
    and update the cache.
    In addition, DNSCache has a daemon thread to refresh the cache every 1 min, in case that the IP may
    be changed at anytime.

There are other implements of this dns cache:

1.  36fed13997
    This is for BE side, but it does not handle the IP change case.

3. https://github.com/apache/doris/pull/28479
    This is for FE side, but it can only work with Master FE. Other FE node will not be aware of the IP change.
    And there are a bunch of BackendServiceProxy, this PR only handle cache in one of them.
2024-04-07 22:16:04 +08:00
77349ca71a [pipelineX](fix) Fix coredump by incorrect cancel order (#33294) 2024-04-07 12:06:12 +08:00
950ca68fac [fix](move-memtable) fix timeout to get tablet schema (#33256) (#33260) 2024-04-04 21:45:55 +08:00
7675383c40 [bugfix](deadlock) fix dead lock in cancel fragment (#33181)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-04-03 13:41:24 +08:00
0122b8a6b4 [Update](inverted index) add config for inverted index query cache shards (#32666) 2024-03-26 20:27:33 +08:00
7b94cfdba1 Revert "[Fix](tests) add regression tests for trino-connector (#32552)"
This reverts commit 3fc3a4650681cb519405730899a2f22f268b38c1.
2024-03-25 22:38:21 +08:00
3fc3a46506 [Fix](tests) add regression tests for trino-connector (#32552) 2024-03-25 22:31:55 +08:00
62c7d0a421 [Fix](point query) add query options for short circuit queries (#32530) (#32684)
Some options like `be_exec_version` needed for functions
2024-03-22 18:03:18 +08:00
326a264fcd [Improvement](executor)Add spill property for workload group #32554 2024-03-22 16:38:19 +08:00
e41311d77d [bug](fold) fix fold constant core dump with variant type (#32265)
1. variant type core dump at call get_data_at function, as not impl this function.
2. some case can't pass at old planner and fold_constant_by_be = on.
3. open enable_fold_constant_by_be = true.
2024-03-22 16:37:33 +08:00
6b54171778 [bugfix](deadlock) pipelinex map lock should only scope in map not about pipelinectx's cancel method (#32622)
both global lock in fragment mgr should only protect the map logic, could not use it to protect cancel method.
fragment ctx cancel method should be protected by a lock.
query ctx cancel --> pipelinex fragment cancel ---> query ctx cancel will dead lock.
2024-03-22 08:52:38 +08:00
a40463617e [feature](cpu cores) get the cores when running within a cgroup. (#32370)
get the cores when running within a cgroup
2024-03-21 14:07:49 +08:00
b6a35d68b0 [code](Refactor) Del unless filter id in runtime filter func (#32502)
Del unless filter id in runtime filter func
2024-03-21 14:07:49 +08:00
50c247e08c [fix](snapshot-loader) Fix be crash caused by deref end() iterator (#32489)
The standard said that the input parameter `pos` of std::vector::erase
must be valid and dereferenceable, the `end()` iterator cannot be used
as a value of `pos`. I did some tests and the crash only occurs when the
vector is empty. Fortunately `local_files` is usually not empty.
2024-03-21 14:07:24 +08:00
2196c534e8 [fix](group commit) Fix compatibility issues on serializing and deserializing wal file (#32299) 2024-03-21 14:07:24 +08:00
ab512f935c [pipelineX](api) Add api for long-running tasks (#32459) 2024-03-21 14:07:24 +08:00
4bf5a21ba3 [pipelineX](cancel) Remove lock for mapping query ctx to fragment (#32346) 2024-03-21 14:07:23 +08:00
e99b33c274 [opt](file-meta-cache) reduce file meta cache size and disable cache for some cases (#32340)
File meta cache on BE is used to cache the meta for external table's file such as parquet footer.
This cache is counted by number, not memory consumption.
So if the cache object is big(eg, a large parquet footer), the total memory consumption of this cache
will be large and causing OOM.

This PR mainly changes:

1. Add a new method `exceed_prune_limit()` for `CachePolicy`
    For `ObjLRUCache`, it always return true so that the minor of full gc on BE will prune the cache each time.

2. Reduce the default capability of file meta cache, from 20000 to 1000

    Also change the default capability of hdfs file handle cache, from 20000 to 1000

4. Change judgement of whether enable file meta cache when querying

    If the number of file need to be read is larger than the 1/3 of the file meta cache's capability, file meta cache
    will be disabled for this query. Because cache is useless if there are too many files.
2024-03-21 14:07:22 +08:00
9eb2f90e27 [Optimize](inverted index) optimize inverted index bitmap copy (#32279) (#32469) 2024-03-19 17:28:59 +08:00
ef2151ae66 [Feature-WIP](multi-catalog) Add Hive sink on BE side. (#32306) (#32364)
bp #32306
Co-authored-by: Qi Chen <kaka11.chen@gmail.com>
2024-03-18 11:23:01 +08:00
Pxl
5e4da61df9 [Bug](top-n) do not get runtime predicate when predicate not initialized (#32208) 2024-03-15 18:06:15 +08:00
c8f3643890 [exec](runtimefilter) support null aware in runtime filter (#32152)
null aware in runtime filter
2024-03-15 18:05:13 +08:00
62023d705d [refactor](rename) rename task group to workload group in be (#32204)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-03-15 18:04:02 +08:00
56a14c912a [fix](routineload) fix consume data too slow in partial partitions (#32126) 2024-03-15 18:01:22 +08:00
7b74b199a5 [fix](memory) Fix LRU cache deleter and memory tracking (#32080)
In order to add common code to the value deleter of LRU cache, let all lru cache values inherit from LRUCacheValueBase class and tracking memory in destructor.
2024-03-15 17:57:58 +08:00