Early Design Documentation: https://shimo.im/docs/DT6JXDRkdTvdyV3G
Implement a new way of memory statistics based on TCMalloc New/Delete Hook,
MemTracker and TLS, and it is expected that all memory new/delete/malloc/free
of the BE process can be counted.
Modify the implementation of MemTracker:
1. Simplify a lot of useless logic;
2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing;
3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes;
4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection
5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently;
6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later;
7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env;
8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.;
Modify where MemTracker is used:
1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code;
2. Added trackers for global objects such as ChunkAllocator and StorageEngine;
3. Added more fine-grained trackers such as ExprContext;
4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode;
5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;
This PR mainly changes:
1. Fix bug when enable `transfer_data_by_brpc_attachment`
In `data_stream_sender`, we will send a serialized PRowBatch data to multiple Channels.
And if `transfer_data_by_brpc_attachment` is enabled, we will mistakenly clear the data in PRowBatch
after sending PRowBatch to the first Channel.
As a result, the following Channel cannot receive the correct data, causing an error.
So I use a separate buffer instead of `tuple_data` in PRowBatch to store the serialized data
and reuse it in multiple channels.
2. Fix bug that the the offset in serialized row batch may overflow
Use int64 to replace int32 offset. And for compatibility, add a new field `new_tuple_offsets` in PRowBatch.
Currently, if we encounter a problem with a replica of a tablet during the load process,
such as a write error, rpc error, -235, etc., it will cause the entire load job to fail,
which results in a significant reduction in Doris' fault tolerance.
This PR mainly changes:
1. refined the judgment of failed replicas in the load process, so that the failure of a few replicas will not affect the normal completion of the load job.
2. fix a bug introduced from #7754 that may cause BE coredump
Now minidump file will be created when BE crashes.
And user can manually trigger a minidump by sending SIGUSR1 to BE process.
More details can be found in minidump.md documents
1. replace all boost::shared_ptr to std::shared_ptr
2. replace all boost::scopted_ptr to std::unique_ptr
3. replace all boost::scoped_array to std::unique<T[]>
4. replace all boost:thread to std::thread
1. Fix a memory leak in `collect_iterator.cpp` (Fix#6700)
2. Add a new BE config `max_segment_num_per_rowset` to limit the num of segment in new rowset.(Fix#6701)
3. Make the error msg of stream load more friendly.
* [Optimize] optimize the speed of converting integer to string
* Use fmt and std::from_chars to make convert integer to string and convert string to integer more efficient
Co-authored-by: caiconghui <caiconghui@xiaomi.com>
This is part of the array type support and has not been fully completed.
The following functions are implemented
1. fe array type support and implementation of array function, support array syntax analysis and planning
2. Support import array type data through insert into
3. Support select array type data
4. Only the array type is supported on the value lie of the duplicate table
this pr merge some code from #4655#4650#4644#4643#4623#2979
For PR #5792. This patch add a new param `cache type` to distinguish sql cache and partition cache.
When update sql cache, we make assure one sql key only has one version cache.
1. relocation R_X86_64_32 against `__gxx_personality_v0' can not be used when making a shared object; recompile with -fPIC
2. warning: the use of `tmpnam' is dangerous, better use `mkstemp'
3. Death tests use fork(), which is unsafe particularly in a threaded context. For this test, Google Test couldn't detect the number of threads.
Some invalid config value may cause BE work in an unexpected behavior,
this patch aim to support config validating when BE bootstrap and update BE's config by API
to reject invalid value.
This is a work to accomplish PR #4423
In the previous broker load, multiple OlapTableSinks would send data to the same LoadChannel,
and because of the lock granularity problem, LoadChannel could only process these requests serially,
which made it impossible to make full use of cluster resources.
This CL modifies the related locks so that LoadChannel can process these requests in parallel.
In the test, with a size of 20G, the load speed of 334 million rows of data in 3 nodes has been
increased from 9min to 5min, and after enabling 2 concurrency, it can be increased to 3min.
Also modify the profile of load job.
There are some long loops and sleeps in unit tests, it will cost a
very long time to run all unit tests, especially run in TSAN mode.
This patch speed up unit tests by shortening long loops and sleeps,
on my environment all unit tests finished in 1 minite. It's useful
to do basic functional unit tests.
You can switch to run in this mode by adding a new environment variable
'DORIS_ALLOW_SLOW_TESTS'. For example, you can set:
export DORIS_ALLOW_SLOW_TESTS=1
and also you can disable it by setting:
export DORIS_ALLOW_SLOW_TESTS=0
When partition cache is not cached continuely, range query may fail.
For example, partition key 20201011 and 20201013 is cached,
but rang query is between 20201011 and 20201013, the query will not hit the cache.
issue:#5059
This CL mainly changes:
1. Avoid repeated sending of common components in Fragments
In the previous implementation, a query may generate multiple Fragments,
these Fragments contain some common information, such as DescriptorTable.
Fragment will be sent to BE in a certain order, so these public information will be sent repeatedly
and generated repeatedly on the BE side.
In some complex SQL, these public information may be very large,
thereby increasing the execution time of Fragment.
So I improved this. For multiple Fragments sent to the same BE, only the first Fragment will carry
these public information, and it will be cached on the BE side, and subsequent Fragments
no longer need to carry this information.
In the local test, the execution time of some complex SQL can be reduced from 3 seconds to 1 second.
2. Add the time-consuming part of FE logic in Profile
Including SQL analysis, planning, Fragment scheduling and sending on the FE side, and the time to fetch data.