Commit Graph

72 Commits

Author SHA1 Message Date
bd8113f424 [bugfix](scannerscheduler) should minus num_of_scanners before check should schedule #28926 (#29331)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-01-03 20:47:35 +08:00
2ed122b787 [improvement](task exec context) add parent class HasTaskExecutionCtx to own the task ctx (#29388)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-01-02 15:28:27 +08:00
c75e63a2a5 [Improvement](scan) Use scanner to do projection of scan node (#29124) 2023-12-27 16:00:52 +08:00
f30e50676e [opt](scanner) optimize the number of threads of scanners (#28640)
1. Remove `doris_max_remote_scanner_thread_pool_thread_num`, use `doris_scanner_thread_pool_thread_num` only.
2. Set the default value `doris_scanner_thread_pool_thread_num` as `std::max(48, CpuInfo::num_cores() * 4)`
2023-12-26 10:24:12 +08:00
7081139bdc [fix](block) fix be core while mutable block merge may cause different row size between columns in origin block (#27943) 2023-12-25 20:35:22 +08:00
1545c36d16 Revert "[bugfix](scannercore) scanner will core in deconstructor during collect profile (#28727)" (#28931)
This reverts commit 4066de375efe6ff8e156a61df4f9316b3d9eaa4e.
2023-12-24 20:37:33 +08:00
4066de375e [bugfix](scannercore) scanner will core in deconstructor during collect profile (#28727) 2023-12-23 11:09:46 +08:00
aca8406e31 [refactor](executor)remove scan group #28847 2023-12-22 17:05:50 +08:00
73f7b61019 [refactor](scanner) use weak ptr to lock task execution context to avoid core in scanner dctor (#28493)
using weak ptr as a lock between fragment execute thread and scanner thread, to solve the core problem in scanner's dctor to access scannode's profile.
2023-12-18 14:09:32 +08:00
8df94f0d07 [fix](remote-scanner-pool) missing _remote_thread_pool_max_size value (#28057) 2023-12-07 11:18:42 +08:00
54fe1a166b [Refactor](scan) refactor scan scheduler to improve performance (#27948)
* [Refactor](scan) refactor scan scheduler to improve performance

* fix pipeline x core
2023-12-05 13:03:16 +08:00
7398c3daf1 [Feature-Variant](Variant Type) support variant type query and index (#27676) 2023-11-29 10:37:28 +08:00
b457856bd2 [chore](be) remove bthread scanner related codes (#27417) 2023-11-23 15:18:49 +08:00
0491437a86 [Opt](scanner-scheduler) Optimize BlockingQueue, BlockingPriorityQueue and change remote scan thread pool. (#26784)
## Proposed changes
- Optimize `BlockingQueue`, `BlockingPriorityQueue` by swapping `notify` and `unlock` to reduce lock competition. Ref: https://www.boost.org/doc/libs/1_54_0/boost/thread/sync_bounded_queue.hpp
- Change remote scan thread pool to `PriorityQueue`.

### Test result
Before:
```
mysql> select  sum(lo_partkey)  from  lineorder;
+-----------------+
| sum(lo_partkey) |
+-----------------+
| 300021444265405 |
+-----------------+
1 row in set (1.11 sec)
```

After:
```
mysql> select  sum(lo_partkey)  from  lineorder;
+-----------------+
| sum(lo_partkey) |
+-----------------+
| 300021444265405 |
+-----------------+
1 row in set (0.80 sec)
```
2023-11-15 18:24:36 +08:00
5ad49dceaa [fix](scanner_schedule) scanner hangs due to negative num_running_scanners (#26816)
* [fix] scanner hangs due to negative num_running_scanners

Before the patch, num_running_scanners is increased after submitting,
then it may be decreased before increasing then negative values can
be seen by get_block_from_queue and a expected submit does not happend.

Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>
2023-11-13 23:03:49 +08:00
66054a5c78 [opt](scanner) increase the connection num of s3 client (#26795) 2023-11-12 00:29:11 -06:00
196fadc044 [enhancement](metrics) enhance visibility of flush thread pool (#26544) 2023-11-11 19:53:24 +08:00
a5565f68b2 [Refactor](opentelemetry) Remove opentelemetry (#26605) 2023-11-09 18:05:34 +08:00
a4e415ab09 [feature](hive)Support hive tables after alter type. (#25138)
1.Reconstruct the logic of decode to read parquet. The parquet  reader first reads the data according to the parquet physical type, and then performs a type conversion.

2.Support hive alter table.
2023-11-02 00:24:21 +08:00
46d40b1952 [refactor](executor)Remove empty group logic #26005 2023-10-27 14:24:41 +08:00
54780c62e0 [improvement](executor)Using cgroup to implement cpu hard limit (#25489)
* Using cgroup to implement cpu hard limit

* code style
2023-10-19 18:56:26 +08:00
7b2ff38401 query cpu hard limit based on doris scheduler (#24844) 2023-10-07 12:03:07 +08:00
642e5cdb69 [Fix](Status) Make Status [[nodiscard]] and handle returned Status correctly (#23395) 2023-09-29 22:38:52 +08:00
39e6512a21 [bug](scanner) Fix memory out of bound in scanner scheduler (#24840) 2023-09-25 09:58:26 +08:00
c9b2f4cb92 [workload](pipeline) Add cgroup cpu controller (#24052) 2023-09-21 21:49:33 +08:00
1405b7ca82 [improve](scan) support lower the thread priority of scan thread (#24526)
The configuration item is used to lower the priority of the scanner thread,
typically employed to ensure CPU scheduling for write operations.
2023-09-20 17:00:24 +08:00
71dcb58db9 [improvement](scanner_schedule) reduce memory consumption of scanner (#24199)
* [improvement](scanner_schedule) reduce memory consumption of scanner

1. limit scanner by memory consumptin rather than blocks.
2. scheduler run correcty instread of at lest 1.
2023-09-19 21:36:23 +08:00
Pxl
35c5d71549 [Improvement](join) some improvement of hash join (#23972)
some improvement of hash join
2023-09-14 17:55:35 +08:00
c7ae2a7d22 [Refactor & Bugfix](static variables) move some static vairables to exec_env (#24029) 2023-09-13 09:27:03 +08:00
3317909141 [pipelineX](join) support nested loop join operator (#23756) 2023-09-04 10:08:22 +08:00
962221cb18 [test](log) add log for debug case failure (#23506) 2023-08-28 10:45:25 +08:00
Pxl
cf1865a1c8 [Bug](scan) fix core dump due to store_path_map (#23084)
fix core dump due to store_path_map
2023-08-17 15:24:43 +08:00
Pxl
7839a0e708 [Bug](brpc) fix brpc failed on big query came concurrently (#22600)
fix PriorityThreadPool get_info get wrong number
change brpc pool from priority to fifo
do not use brpc pool when send eos
2023-08-05 21:24:32 +08:00
bc87002028 [opt](conf) remote scanner thread num is changed to core num * 10 (#22427) 2023-08-01 23:09:49 +08:00
ae8a26335c [opt](hive)opt select count(*) stmt push down agg on parquet in hive . (#22115)
Optimization "select count(*) from table" stmtement , push down "count" type to BE.
support file type : parquet ,orc in hive .

1. 4kfiles , 60kwline num 
    before:  1 min 37.70 sec 
    after:   50.18 sec

2. 50files , 60kwline num
    before: 1.12 sec
    after: 0.82 sec
2023-07-29 00:31:01 +08:00
Pxl
4171309b9b [Bug](scanner) fix core dump due to release ScannerContext too early #21946 2023-07-19 00:53:23 +08:00
Pxl
b3d3ffa2de [Bug](pipeline) adjust scanner scheduler.submit and _num_scheduling_ctx maintain (#21843)
adjust scanner scheduler.submit and _num_scheduling_ctx maintain
2023-07-18 11:55:21 +08:00
d3317aa33b [Fix](executor)Fix scan entity core #21696
After the last time to call scan_task.scan_func(),the should be ended, this means PipelineFragmentContext could be released.
Then after PipelineFragmentContext is released, visiting its field such as query_ctx or _state may cause core dump.
But it can only explain core 2

void ScannerScheduler::_task_group_scanner_scan(ScannerScheduler* scheduler,
                                                taskgroup::ScanTaskTaskGroupQueue* scan_queue) {
    while (!_is_closed) {
        taskgroup::ScanTask scan_task;
        auto success = scan_queue->take(&scan_task);
        if (success) {
            int64_t time_spent = 0;
            {
                SCOPED_RAW_TIMER(&time_spent);
                scan_task.scan_func();
            }
            scan_queue->update_statistics(scan_task, time_spent);
        }
    }
}
2023-07-11 15:56:13 +08:00
009b300abd [Fix](ScannerScheduler) fix dead lock when shutdown group_local_scan_thread_pool (#21553) 2023-07-06 13:09:37 +08:00
76bdcf1d26 [improvement](pipeline) task group scan entity (#19924) 2023-06-25 14:43:35 +08:00
81abdeffbc [Improvement](pipeline) Improve shared scan performance (#20785) 2023-06-21 14:36:05 +08:00
0c98355fff [fix](catalog) fix create catalog with resource replay issue and kerberos auth issue (#20137)
1. Fix create catalog with resource replay bug.
	If user create catalog using `create catalog hive with resource xxx`, when replaying edit log,
	there is a bug that resource may be dropped, causing NPE and FE will fail to start.

	In this PR, I add a new FE config `disallow_create_catalog_with_resource`, default is true.
	So that `with resource` will not be allowed, and it will be deprecated later.

	And also fix the replay bug to avoid NPE.

2. Fix issue when creating 2 hive catalogs to connect with and without kerberos authentication.

	When user create 2 hive catalogs, one use simple auth, the other use kerberos auth.
	The query may fail with error like: `Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections.`

	So I add a default property for hive catalog: `"ipc.client.fallback-to-simple-auth-allowed" = "true"`.
	Which means this property will be added automatically when user creating hive catalog, to avoid such problem.

3. Fix calling `hdfsExists()` issue

	When calling `hdfsExists()` with non-zero return code, should check if it encounters error or is file not found.

3. Some code refactor

	Avoid import `org.apache.parquet.Strings`
2023-05-30 16:57:39 +08:00
1d421a26d9 [bugfix](memory) merge block may allocate failed (#19507) 2023-05-11 10:42:47 +08:00
cf8ceb8586 [fix](scan) fix scanner mem tracker (#19354) 2023-05-10 09:56:41 +08:00
a262f42a28 [refactor](exceptionsafe) make scanner and scancontext exception safe (#19057) 2023-04-27 09:23:01 +08:00
b2c26e17e1 [Compile](vec) Fix compile by BHREAD_SCANNER (#18979) 2023-04-24 17:07:06 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
fd5dd9a391 [Opt](Pipeline) opt pipeline code in mult tablet (#17999) 2023-03-27 10:02:48 +08:00
9b7596f1c6 [Feature](Dynamic schema table) step1 support schema change expression (#17494)
1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns
2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility
2023-03-13 15:12:42 +08:00
c0bb2e33a8 [improvement](scan) separate scanner into local and remote scanner pool (#16891)
There are 2 kinds for scanner thread pool, local and remote.
Local is for local file read, specially for olap scanner.
Remote is for other external data source, such as file scanner, jdbc scanner.

This PR mainly changes:

For olap scanner, use cold or hot rowset to decide whether to use local or remote pool.
For other scanner, user remote pool by default.
Add a new BE config doris_max_remote_scanner_thread_pool_thread_num, default is 512,
indicate the max thread number of the remote scanner thread pool

This will alleviate the problem of interaction between olap queries with load job and external queries.
2023-02-21 14:13:09 +08:00