The current distribution model for Doris is as follows:
OlapTableSink seperate the original Block into serveral subblocks of each node(BE) by tablets distribution and distributes subblocks to storage engine of backends, then the storage engine will seperate the subblock into multiple tablets channel and each delta writer will handle partial of the block.
This model causes blocks to be split according to tablets, and the splitting process can be a relatively heavy operation. After splitting, the blocks are distributed to different DeltaWriters (Memtables) through RPCs to TabletChannels. The distribution operation on TabletChannels is also a relatively heavy operation. If the distribution property of the table is RANDOM distribution, then we have the opportunity to distribute the blocks according to the complete block during distribution. The advantage of doing so is to reduce memory copying and improve write locality, similar to appending the entire block to the memtable.
This optimze could save 10% ~ 20% CPU cost of RANDOM distribution table load when enable load_to_single_tablet
In the past, only simple predicates (slot=const), and, like, or (only bitmap index) could be pushed down to the storage layer. scan process:
Read part of the column first, and calculate the row ids with a simple push-down predicate.
Use row ids to read the remaining columns and pass them to the scanner, and the scanner filters the remaining predicates.
This pr will also push-down the remaining predicates (functions, nested predicates...) in the scanner to the storage layer for filtering. scan process:
Read part of the column first, and use the push-down simple predicate to calculate the row ids, (same as above)
Use row ids to read the columns needed for the remaining predicates, and use the pushed-down remaining predicates to reduce the number of row ids again.
Use row ids to read the remaining columns and pass them to the scanner.
1. change PipelineTaskState to enum class
2. remove some row-based code on FoldConstantExecutor::_get_result
3. reduce memcpy on minmax runtime filter function(Now we can guarantee that the input data is aligned)
4. add Wunused-template check, and remove some unused function, change some static function to inline function.
This pr does three things:
1. Use Druid instead of HikariCP in JdbcClient
2. when download udf jar, add the name of the jar package after the local file name.
3. refactor some jdbcResource code
remove duplicate type definition in function context
remove unused method in function context
not need stale state in vexpr context because vexpr is stateless and function context saves state and they are cloned.
remove useless slot_size in all tuple or slot descriptor.
remove doris_udf namespace, it is useless.
remove some unused macro definitions.
init v_conjuncts in vscanner, not need write the same code in every scanner.
using unique ptr to manage function context since it could only belong to a single expr context.
Issue Number: close #xxx
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
* [enhancement](stream load pipe) using queryid or load id to identify stream load pipe instead of fragment instance id
NewLoadStreamMgr already has pipe and other info. Do not need save the pipe into fragment state. and FragmentState should be more clear.
But this pr will change the behaviour of BE.
I will pick the pr to doris 1.2.3 and add the load id to FE support. The user could upgrade from 1.2.3 to 2.x
Co-authored-by: yiguolei <yiguolei@gmail.com>
Reason: column_name[k5], decimal value is not valid for definition, value=123.123, precision=9, scale=3, min=-999999.999, max=-999999.999; . src line [];
#17273
Add a special counter for memtracker, faster, but relaxed ordering and not accurate in real time
Track thread create and destroy memory, which was previously removed due to performance loss and added back
* [enhancement](execute model) using thread pool to execute report or join task instead of staring too many thread
Doris will start report thread and join thread during fragment execution. There are many problems if create and destroy thread very frequently. Jemalloc may not behave very well, it may crashed.
jemalloc/jemalloc#1405
It is better to using thread pool to do these tasks.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
add a defer in fold constant to close.
add more type when call _get_result function in fold constant.3.
fix in can't handle null. eg:select 1 in (2, NULL, 1);
in java udf jni_ctx will be nullptr, so call close will be core dump.
Describe your changes.
Fix Redhat 4.x OS /proc/meminfo has no MemAvailable, disable MemAvailable to control memory.
vm_rss_str and mem_available_str recorded when gc is triggered, to avoid memory changes during gc and cause inaccurate logs.
join probe catch bad_alloc, this may alloc 64G memory at a time, avoid OOM.
Modify document doris_be_all_segments_num and doris_be_all_rowsets_num names.
fmt::format dosen't support non-template object as args, even if it implements
`to_string()` or `operator<<`. so orignal code may cause `false` to be printed
instead of real cause of the failure. So to_string() need to be manually invoked.
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
Now we use a thrift message per fragment instance. However, there are many same messages between instances in a fragment. So this PR aims to extract the same messages and we only need to send thrift message once for a fragment
Support IPV6 in Apache Doris, the main changes are:
1. enable binding to IPV6 address if network priority in config file contains an IPV6 CIDR string
2. BRPC and HTTP support binding to IPV6 address
3. BRPC and HTTP support visiting IPV6 Services
Use FE cluster token to auth stream load.
This auth is only open for be, and fe auth still only support http basic auth.
I will use this auth for mysql load to build a no-auth stream load from fe to be.
And this will avoid double auth in mysql load.
More information to see the design doc.
Add cache for inverted index query match bitmap to accelerate common query keyword, especially for keyword matching many rows.
Tests result:
- large result: matching 99% out of 247 million rows shows 8x speed up.
- small result: matching 0.1% out of 247 million rows shows 2x speed up.
Issue Number: close#16351
Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.
This commit support:
1、Insert + select for struct/map type
2、Json stream load for struct type
3、m[key] function for map type
How to use:
Set the fe config to create table for struct and map type
1、admin set frontend config("enable_struct_type" = "true");
2、admin set frontend config("enable_map_type" = "true");
#16547
Co-authored-by: xy720 <xuyang25@baidu.com>
Co-authored-by: amory <wangqiannan@selectdb.com>
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
Co-authored-by: hucheng01 <hucheng01@baidu.com>
make rows_read correct so that the scheduler could using this correctly.
use single scanner if has limit clause. Move it from fragment context to scannode.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
both update status and open_vectorized_internal will call send_report and stop report thread. move update_status code to open method and remove unnecessary send_report and stop_report_thread.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
call pthread condition wait may block brpc thread.
no need wait for fragment because two phase exec fragment already guarantee that the fragment instance exits when runtime filter comes. So that I remove the condition wait code.
Co-authored-by: yiguolei <yiguolei@gmail.com>
MetricRegistry::trigger_all_hooks holds the metrics lock and is stuck in get_je_metrics, to_prometheus is waiting for MetricRegistry::trigger_all_hooks to release the lock, so get_je_metrics is no longer called in MetricRegistry::trigger_all_hooks.
1. When mapping column from external datasource, use date/datetimev2 as default type
2. check `is_cancelled` when read data, to avoid endless loop after query is cancelled