We use the time wheel algorithm to complete the scheduling and triggering of periodic tasks. The implementation of the time wheel algorithm refers to netty's HashedWheelTimer.
We will periodically (10 minutes by default) put the events that need to be triggered in the future cycle into the time wheel for periodic scheduling. In order to ensure the efficient triggering of tasks and avoid task blocking and subsequent task scheduling delays, we use Disruptor to implement the production and consumption model.
When the task expires and needs to be triggered, the task will be put into the RingBuffer of the Disruptor, and then the consumer thread will consume the task.
Consumers need to register for events, and event registration needs to provide event executors. Event executors are a functional interface with only one method for executing events.
If it is a single event, the event definition will be deleted after the scheduling is completed; if it is a periodic event, it will be put back into the time wheel according to the periodic scheduling after the scheduling is completed.
When compiling FunctionArrayEnumerateUniq::_execute_by_hash, AllocatorWithStackMemory::free(buf)
will be called when delete HashMapContainer. the gcc compiler will think that size > N and buf is not heap memory,
and report an error ' void free(void*)' called on unallocated object 'hash_map'
This only fails on doris docker + gcc 11.1, no problem on doris docker + clang 16.0.1,
no problem on ldb_toolchanin gcc 11.1 and clang 16.0.1.
Infer distinct from Distinct SetOperator, and put distinct above children to reduce data.
tpcds_sf100 q14:
before
100 rows in set (7.60 sec)
after
100 rows in set (6.80 sec)
add whether use Nereids or pipeline engine in profile, for example:
Summary:
- Profile ID: 460e710601674438-9df2d685bdfc20f8
- Task Type: QUERY
...
- Is Nereids: Yes
- Is Pipeline: Yes
- Is Cached: No
This file will be used when compiling Doris in regression pipeline.
And we can modify it to control the compile behavior.
I add BUILD_FS_BENCHMARK=ON, so that it will build fs_benchmark_tool.
1.
Fix bug that the field of s3_file_write_bufferpool is not initialized, causing undefined behavior.
2.
add fs_s3 benchmark tool,Reference to the usage of tools https://github.com/apache/doris/pull/20770
And opt the output:
`sh bin/run-fs-benchmark.sh --conf=conf/s3.conf --fs_type=s3 --operation=single_read --threads=1 --iterations=1`
```
------------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
------------------------------------------------------------------------------------------------------------------------------
S3ReadBenchmark/iterations:1/repeats:3/manual_time/threads:1 7366 ms 123 ms 1 ReadRate(B/S)=12.1823M/s ReadTime(S)=7.36572 ReadTotal(B)=89.7314M
S3ReadBenchmark/iterations:1/repeats:3/manual_time/threads:1 6163 ms 116 ms 1 ReadRate(B/S)=14.5597M/s ReadTime(S)=6.16299 ReadTotal(B)=89.7314M
S3ReadBenchmark/iterations:1/repeats:3/manual_time/threads:1 6048 ms 110 ms 1 ReadRate(B/S)=14.8366M/s ReadTime(S)=6.04796 ReadTotal(B)=89.7314M
S3ReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_mean 6526 ms 116 ms 3 ReadRate(B/S)=13.8596M/s ReadTime(S)=6.52556 ReadTotal(B)=89.7314M
S3ReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_median 6163 ms 116 ms 3 ReadRate(B/S)=14.5597M/s ReadTime(S)=6.16299 ReadTotal(B)=89.7314M
S3ReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_stddev 730 ms 6.68 ms 3 ReadRate(B/S)=1.45914M/s ReadTime(S)=0.729876 ReadTotal(B)=0
S3ReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_cv 11.18 % 5.75 % 3 ReadRate(B/S)=10.53% ReadTime(S)=11.18% ReadTotal(B)=0.00%
S3ReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_max 7366 ms 123 ms 3 ReadRate(B/S)=14.8366M/s ReadTime(S)=7.36572 ReadTotal(B)=89.7314M
S3ReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_min 6048 ms 110 ms 3 ReadRate(B/S)=12.1823M/s ReadTime(S)=6.04796 ReadTotal(B)=89.7314M
```
* work around, ingest binlog after backup/restore which local_tablet.partition_id is not correct, use by
req.partition_id
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
Agg stats estimation should use the biggest groupby key's NDV as base, and multiply expansion factor, which is calculated by other groupby key' ndv.
Before, we use the smallest ndv as base
support bind external relation out of Doris fe environment, for example, analyze sql in other java application.
see BindRelationTest.bindExternalRelation.
This pr is to add the collecting hive statistic function. While the CBO fetching hive table statistics, statistic cache will
first load from internal stats olap table. If not found, then using this pr's function to fetch from remote Hive metastore.
Keep hadoop-aliyun version consistent with hadoop main version (3.3.5)
upgrade jackson to 2.14.3
upgrade netty version to 4.1.94.final
binding check.freamework version to 3.32.0
upgrade snappy-java to 1.1.10.1
upgrade hudi version to 0.13.1
upgrade spring version to 2.7.13
upgrade orc version to 1.8.4
revert nonsensical changes
in PR #21168 , we refactor physcial properties and translator
to ensure not generating useless excahange. olap scan node
could be gather in Nereids but translate to hash partitioned.
since coordinator could not process gather olap scan node,
we remove the candidate distribution spec of olap scan
When creating a new hive catalog or refresh the hive catalog, it will refresh the HiveMetaStore cache.
And it will call "FileInputFormat.setInputPaths()".
In this method, it will create a new FileSystem instance and store it in FileSystem's cache.
So if refresh catalog frequently, there will be too many FileSystem instances in cache, causing OOM.
This PR disable the FileSystem Cache.
Try to reuse an existed ugi at DFSFileSystem, otherwise if we query a more then ten-thousands partitons hms table, we will do more than ten-thousands login operations, each login operation will cost hundreds of ms from my test.
Co-authored-by: 王翔宇 <wangxiangyu@360shuke.com>
testEliminatingSortNode needs to check if SortNode is existed in plan tree, so it should check plan1.contains("order by:"), but rather than plan1.contains("SORT INFO:") or plan1.contains("SORT LIMIT:").