1. Added memory leak detection for `DeltaWriter` and `MemTable` mem tracker
2. Modify memtable mem tracker to virtual to avoid frequent recursive consumption of parent tracker.
3. Disable memtable flush thread attach memtable tracker, ensure that memtable mem tracker is completely accurate.
4. Modify `memory_verbose_track=false`. At present, there is a performance problem in the frequent switch thread mem tracker.
- Because the mem tracker exists as a shared_ptr in the thread local. Each time it is switched, the atomic variable use_count in the shared_ptr of the current tracker will be -1, and the tracker to be replaced use_count +1, multi-threading Frequent changes to the same tracker shared_ptr are slow.
- TODO: 1. Reduce unnecessary thread mem tracker switch, 2. Consider using raw pointers for mem tracker in thread local.
Now column `Array<T>` contains column `offsets` and `data`, and type of column `offsets` is UInt32 now.
If we call array_union to merge arrays repeatedly, the size of array may overflow.
So we need to extend it before `Array Data Type` release.
* [Vectorized][Function] add orthogonal bitmap agg functions
save some file about orthogonal bitmap function
add some file to rebase
update functions file
* refactor union_count function
refactor orthogonal union count functions
* remove bool is_variadic
In some cases, query mem tracker does not exist in BE when transmit block. This will result in a null pointer for get query mem tracker in brpc transmit_block
1. Provide a FE conf to test the reliability in single replica case when tablet scheduling are frequent.
2. According to #6063, almost apply this fix on current code.
1. Fix the memory leak. When the load task is canceled, the `IndexChannel` and `NodeChannel` mem trackers cannot be destructed in time.
2. Fix Load task being frequently canceled by oom and inaccurate `LoadChannel` mem tracker limit, and rewrite the variable name of `mem limit` in `LoadChannel`.
3. Fix core dump, when logout task mem tracker, phmap erase fails, resulting in repeated logout of the same tracker.
4. Fix the deadlock, when add_child_tracker mem limit exceeds, calling log_usage causes `_child_trackers_lock` deadlock.
5. Fix frequent log printing when thread mem tracker limit exceeds, which will affect readability and performance.
6. Optimize some details of mem tracker display.
When the length of `Tuple/Block data` is greater than 2G, serialize the protoBuf request and embed the
`Tuple/Block data` into the controller attachment and transmit it through http brpc.
This is to avoid errors when the length of the protoBuf request exceeds 2G:
`Bad request, error_text=[E1003]Fail to compress request`.
In #7164, `Tuple/Block data` was put into attachment and sent via default `baidu_std brpc`,
but when the attachment exceeds 2G, it will be truncated. There is no 2G limit for sending via `http brpc`.
Also, in #7921, consider putting `Tuple/Block data` into attachment transport by default, as this theoretically
reduces one serialization and improves performance. However, the test found that the performance did not improve,
but the memory peak increased due to the addition of a memory copy.
This CL mainly changes:
1. Reducing the rpc timeout problem caused by rpc waiting for the worker thread of brpc.
1. Merge multiple fragment instances on the same BE to send requests to reduce the number of send fragment rpcs
2. If fragments size >= 3, use 2 phase RPC: one is to send all fragments, two is to start these fragments. So that there
will be at most 2 RPC for each query on one BE.
3. Set the timeout of send fragment rpc to the query timeout to ensure the consistency of users' expectation of query timeout period.
4. Do not close the connection anymore when rpc timeout occurs.
5. Change some log level from info to debug to simplify the fe.log content.
NOTICE:
1. Change the definition of execPlanFragment rpc, must first upgrade BE.
3. Remove FE config `remote_fragment_exec_timeout_ms`
1. Fix Lru Cache MemTracker consumption value is negative.
2. Fix compaction Cache MemTracker has no track.
3. Add USE_MEM_TRACKER compile option.
4. Make sure the malloc/free hook is not stopped at any time.
Hive and trino/presto would automatically trim the trailing spaces but Doris doesn't.
This would cause different query result with hive.
Add a new session variable "trim_tailing_spaces_for_external_table_query".
If set to true, when reading csv from broker scan node, it will trim the tailing space of the column
PaloExternalSourcesService is designed for es_scan_node using tcp protocol.
But es tcp protocol need deploy a tcp jar into es code. Both es version and lucene version are upgraded,
and the tcp jar is not maintained any more.
So that I remove all the related code and thrift definitions.
This patch supports utf8mb4 for mysql external table.
if someone needs a mysql external table with utf8mb4 charset, but only support charset utf8 right now.
When create mysql external table, it can add an optional propertiy "charset" which can set character fom mysql connection,
default value is "utf8". You can set "utf8mb4" instead of "utf8" when you need.
1. Fix LoadTask, ChunkAllocator, TabletMeta, Brpc, the accuracy of memory track.
2. Modified some MemTracker names, deleted some unnecessary trackers, and improved readability.
3. More powerful MemTracker debugging capabilities.
4. Avoid creating TabletColumn temporary objects and improve BE startup time by 8%.
5. Fix some other details.
```
CREATE ROUTINE LOAD iaas.dws_nat ON dws_nat
WITH APPEND PROPERTIES (
"desired_concurrent_number"="2",
"max_batch_interval" = "20",
"max_batch_rows" = "400000",
"max_batch_size" = "314572800",
"format" = "json",
"max_error_number" = "0"
)
FROM KAFKA (
"kafka_broker_list" = "xxxx:xxxx",
"kafka_topic" = "nat_nsq",
"property.kafka_default_offsets" = "2022-04-19 13:20:00"
);
```
In the create statement example below, you can see
The user didn't specify the custom partitions.
So that 1. Fe will get all kafka partitions from server in routine load's scheduler.
The user set the default offset by datetime.
So that 2. Fe will get kafka offset by time from server in routine load's scheduler.
When 1 is success, meanwhile 2 is failed, the progress of this routine load may not contains any partitions and offsets.
Nevertheless, since newCurrentKafkaPartition which is get by kafka server may be always equal to currentKafkaPartitions,
the wrong progress will never be updated.