code refactor: improve code's readability, avoid const_cast
1. make loop simpler and clearer by using range-based loop grammar, it's safer than old loop style
2. iteration for _row_desc.tuple_descriptors() use index replace index and iterator mixed
3. add new function To cast_to(From from), use this union-based casting between two types to replace reinterpret_cast, this new cast is more readable
4. avoid using the same variable name for nested loop, it's dangerous
5. add const keyword for member functions followed CppCoreGuidelines
the tuple String Slot's ptr and len are not assigned appropriately on send side, the receive side may crash in some situation.
detail description:
on send side, when we call RowBatch::serialize(PRowBatch* output_batch) to pack RowBatch, the Tuple::deep_copy()
will be called, for each String Slot, only String Slots that is not null will set ptr and len with proper value, the null String
Slots will keep original status, the ptr member will point randomly and the len member may unexpect.
on recv side, unpack is processed by RowBatch::RowBatch(const RowDescriptor&, const PRowBatch&...), in this
function, each String Slot will transfer offset to valid string_val->ptr whether the String Slot is null or not.
but some business logic depends on string_val->len=0, such as AggregateFuncTraits::init(), HyperLogLog::deserialize()
will return correctly if slice.size<=0. so if string_val->len is set to 0 in send side, everything will be ok, otherwise server
may crash.
by netcomm viewpoint, we should make sure transfer correct data, it's sender's responsibility to set data with proper
value, and do not make any presume which way the recv side will use it.
Transfer RowBatch in Protobuf Request to Controller Attachment,
when the maximum length of the RowBatch in the Protobuf Request is exceeded.
This can avoid reaching the upper limit of the Protobuf Request length (2G),
and it is expected that performance can be improved.
This is beacuse of an const MAX_PHYSICAL_PACKET_LENGTH in fe should be 2^24 -1,
but it is set as 2^24 -2 by mistake.
2. Fix bitmap_to_string may failed when the result is large than 2G
1. setting _report_thread_active to false is not necessary protected by _report_thread_lock, because
_report_thread_active's type is bool, writing data is multi-threadly safety if size <= marchine word length
2. report_profile thread terminates early is possiable, in the function report_profile(), while (_report_thread_active) may
break if _report_thread_active is false, the thread of calling open() may be scheduled out between
_report_thread_started_cv.wait(l) and _report_thread_active = true, we should not assume that how long time elapsed
between a thread be scheduled twice
Now minidump file will be created when BE crashes.
And user can manually trigger a minidump by sending SIGUSR1 to BE process.
More details can be found in minidump.md documents
BitShufflePageDecoder reuses the memory for storing decoder results, allocate memory directly from the
`ChunkAllocator`, the performance is improved to a certain extent.
In the case of #6285, the total time consumption is reduced by 13.5%, and the time consumption ratio of `~Reader()`
has also been reduced from 17.65% to 1.53%, and the memory allocation is unified to `ChunkAllocator` for centralized
management , Which is conducive to subsequent memory optimization.
which can avoid the memory waste caused by `Mempool`, because the chunk can be free at any time, but the
performance is lower than the allocation from `Mempool`. The guess is that there is no `Mempool` after secondary
allocation of large chunks , Will directly apply for a large number of small chunks from `ChunkAllocator`, and it takes
longer to lock in `pop_free_chunk` and `push_free_chunk` (but this is not proven from the flame graphs of BE's cpu and
contention).
1. replace all boost::shared_ptr to std::shared_ptr
2. replace all boost::scopted_ptr to std::unique_ptr
3. replace all boost::scoped_array to std::unique<T[]>
4. replace all boost:thread to std::thread
1. Optimize HighWaterMarkCounter::add(), call `UpdateMax()` only if delta greater than 0
to reduce function call times
2. delete useless code lines to keep MemTracker clean
some member datas never be set, but check its value,the if condition never meet, so clean these codes
Added bprc stub cache check and reset api, used to test whether the bprc stub cache is available, and reset the bprc stub cache
add a config used for auto check and reset bprc stub
## Case
In the load process, each tablet will have a memtable to save the incoming data,
and if the data in a memtable is larger than 100MB, it will be flushed to disk as a `segment` file. And then
a new memtable will be created to save the following data/
Assume that this is a table with N buckets(tablets). So the max size of all memtables will be `N * 100MB`.
If N is large, it will cost too much memory.
So for memory limit purpose, when the size of all memtables reach a threshold(2GB as default), Doris will
try to flush all current memtables to disk(even if their size are not reach 100MB).
So you will see that the memtable will be flushed when it's size reach `2GB/N`, which maybe much smaller
than 100MB, resulting in too many small segment files.
## Solution
When decide to flush memtable to reduce memory consumption, NOT to flush all memtable, but to flush part
of them.
For example, there are 50 tablets(with 50 memtables). The memory limit is 1GB, so when each memtable reach
20MB, the total size reach 1GB, and flush will occur.
If I only flush 25 of 50 memtables, then next time when the total size reach 1GB, there will be 25 memtables with
size 10MB, and other 25 memtables with size 30MB. So I can flush those memtables with size 30MB, which is larger
than 20MB.
The main idea is to introduce some jitter during flush to ensure the small unevenness of each memtable, so as to ensure that flush will only be triggered when the memtable is large enough.
In my test, loading a table with 48 buckets, mem limit 2G, in previous version, the average memtable size is 44MB,
after modification, the average size is 82MB
Add a use_path_style property for S3
Upgrade hadoop-common and hadoop-aws to 2.8.0 to support path style property
Fix some S3 URI bugs
Add some logs for tracing load process.
When the memory usage of BE reaches mem_limit, printing ReservationTrackerCounters through MemTracker
may cause BE crash in high concurrency.
ReservationTrackerCounters is not actually used in the current Doris, and the memory tracker in Doris
will be redesigned in the future.
1. Refactor the create method of hdfs reader & writer.
libhdfs3 does not support arm64. So we should not support hdfs reader & writer on arm64.
2. And micro for LowerUpperImpl
1 Avoid sending tasks if there is no data to be consumed
By fetching latest offset of partition before sending tasks.(Fix [Optimize] Avoid too many abort task in routine load job #6803 )
2 Add a preCheckNeedSchedule phase in update() of routine load.
To avoid taking write lock of job for long time when getting all kafka partitions from kafka server.
3 Upgrade librdkafka's version to 1.7.0 to fix a bug of "Local: Unknown partition"
See offsetsForTimes fails with 'Local: Unknown partition' edenhill/librdkafka#3295
4 Avoid unnecessary storage migration task if there is no that storage medium on BE.
Fix [Bug] Too many unnecessary storage migration tasks #6804
1. This bug is introduced from #6582
2. Optimize the error log of Address used used error msg.
3. Add some document about compilation.
1. Add a custom thirdparty download url.
2. Add a custom com.alibaba maven jar package for DataX.
4. Fix bug that BE crash when closing scan node, introduced from #6622.
Support hdfs in select outfile clause without broker.
This PR implement a HDFS writer in BE which is used to write HDFS file directly without using broker.
Also the hdfs outfile clause syntax check has been added in FE.
The syntax:
```
select * from xx into outfile "hdfs://user/outfile_" format as csv
properties ("hdfs.fs.dafultFS" = "xxx", "hdfs.hdfs_user" = "xxx");
```
Note that all hdfs configurations need to carry a prefix `hdfs.`.
1. Fix a memory leak in `collect_iterator.cpp` (Fix#6700)
2. Add a new BE config `max_segment_num_per_rowset` to limit the num of segment in new rowset.(Fix#6701)
3. Make the error msg of stream load more friendly.
1. Support boolean data type for spark-doris-connector because Doris has previously supported the boolean data type
2. Bug-Fix for the Doris BE core when spark request data from be
1. insert very large string value may coredump
2. some analitic functiuon and agg function result may be incorrect
3. string compare may be coredump when string type is too large
4. string type in delete condition can not process correctly
5. add text/blob as alias of string to compitable with mysql
6. fix string type min/max agg may process incorrectly
This pr mainly supports
1. Export query result sets concurrently
2. Query result set export supports s3 protocol
Among them, there are several preconditions for concurrently exporting query result sets
1. Enable concurrent export variables
2. The query itself can be exported concurrently
(some queries containing sort nodes at the top level cannot be exported concurrently)
3. Export the s3 protocol used instead of the broker
After exporting the result set concurrently,
the file prefix is changed to outfile_{query_instance_id}_filenumber.{file_format}
* fix bugs with string type
1. not support string with agg type min/max
2. agg_update with large string may coredump
3. stringval with large string may coredump
4. not support string as partition key
1. Add license/total line/release badegs.
2. Add monthly active contributor and contributor growth graph
3. fix a pom.xml bug
4. Modify some routine load log on BE side