MemTracker can provide memory consumption for us to find out which
module consume more memory, but it's just a current value, this patch
add metrics for some large memory consumers, then we can find out
which module consume more memory in timeline, it would be useful to
troubleshoot OOM problems and optimize configs.
* update gcc to gcc 10 and support c++17
update brpc to 0.9.7
update boost to 1.73
remove third-party boost 1.54 for mysql
* update cmake version
* ignore jdk version
* remove unused patch
* avoid use SYS_getrandom call
At present, the application of vlog in the code is quite confusing.
It is inherited from impala VLOG_XX format, and there is also VLOG(number) format.
VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG
There are some long loops and sleeps in unit tests, it will cost a
very long time to run all unit tests, especially run in TSAN mode.
This patch speed up unit tests by shortening long loops and sleeps,
on my environment all unit tests finished in 1 minite. It's useful
to do basic functional unit tests.
You can switch to run in this mode by adding a new environment variable
'DORIS_ALLOW_SLOW_TESTS'. For example, you can set:
export DORIS_ALLOW_SLOW_TESTS=1
and also you can disable it by setting:
export DORIS_ALLOW_SLOW_TESTS=0
For #4674
This is a udaf for approximate topn using Space-Saving algorithm. At present, we can only calculate
the frequent items and their frequencies in a certain column, based on which we can implement similar
topN functions supported by Kylin in the future.
I have also added a test to calculate the accuracy of this algorithm. The following is a rough running result.
The total amount of data is 1 million lines and follows the Zipfian distribution, where Element Cardinality
represents the data cardinality, 20X, 50X.. The value representing space_expand_rate is 20,50, which is
used to set the counter number in the space-saving algorithm
```
zf exponent = 0.5
Element cardinality 20X 50X 100X
1000 100% 100% 100%
10000 100% 100% 100%
100000 100% 100% 100%
500000 94% 98% 99%
zf exponent = 0.6,1
Element cardinality 20X 50X 100X
1000 100% 100% 100%
10000 100% 100% 100%
100000 100% 100% 100%
500000 100% 100% 100%
```
#4619
Add time_round functions that provides `time_floor` & `time_ceil` at each time unit.
Fix two related bugs.
- #4618
- Fix `struct TimeInterval` to use `int64_t` instead of `int32_t`, in case when the second diff overflow
Use static local variable instead of create it every calls.
Time cost of the new added unit benchmark test could reduce
from about 60 seconds to 10 seconds.
The parameter 'part' of parse_url function does not support lower case, and parse protocol not right.
And This function does not support parse 'port'.
This PR tries to make parse_url function case insensitive and support parse 'port'.
The issue: #4451
Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet,
but we have no insight about tablets in the cluster.
This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `.
However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request,
and not return tablet level metrics by default.
replace is an user defined function, which is to replace all old substrings with a new substring in a string, as follow:
mysql> select replace("http://www.baidu.com:9090", "9090", "");
+------------------------------------------------------+
| replace('http://www.baidu.com:9090', '9090', '') |
+------------------------------------------------------+
| http://www.baidu.com: |
+------------------------------------------------------+
1. Rename run-ut.sh to run-be-ut.sh
2. Find all test files from build dir instead of declaring separately in the script
3. Add gtest output to collect the result of unit test.
from/to_base64 may return incorrect value when the value is null #4130
remove the duplicated base64 code
fix the base64 encoded string length is wrong, and this will cause the memory error
* Support bitmap_intersect
Support aggregate function Bitmap Intersect, it is mainly used to take intersection of grouped data.
The function 'bitmap_intersect(expr)' calculates the intersection of bitmap columns and returns a bitmap object.
The defination is following:
FunctionName: bitmap_intersect,
InputType: bitmap,
OutputType: bitmap
The scenario is as follows:
Query which users satisfy the three tags a, b, and c at the same time.
```
select bitmap_to_string(bitmap_intersect(user_id)) from
(
select bitmap_union(user_id) user_id from bitmap_intersect_test
where tag in ('a', 'b', 'c')
group by tag
) a
```
Closed#3552.
* Add docs of bitmap_union and bitmap_intersect
* Support null of bitmap_intersect
LSAN detected errors have been fixed by a prior pathch (#3326), but
there are still some ASAN detected errors.
This patch try to fix these errors to make Doris BE more robustness.
And then we can add CI run in LSAN/ASAN mode to detect memory errors
as early as possible.
In the past, when we want to modify some BE configs, we have to modify be.conf and then restart BE.
This patch provides a way to modify configs in the type of 'threshold', 'interval', 'enable flag'
when BE is running without restarting it.
You can update a single config once by BE's http API: `be_host:be_http_port/api/update_config?config_name=new_value`
Fixes#2771
Main changes in this CL
* RoaringBitmap is renamed to BitmapValue and moved into bitmap_value.h
* leveraging Roaring64Map to support unsigned BIGINT for BITMAP type
* introduces two new format (SINGLE64 and BITMAP64) for BITMAP type
So far we have three storage format for BITMAP type
```
EMPTY := TypeCode(0x00)
SINGLE32 := TypeCode(0x01), UInt32LittleEndian
BITMAP32 := TypeCode(0x02), RoaringBitmap(defined by https://github.com/RoaringBitmap/RoaringFormatSpec/)
```
In order to support BIGINT element and keep backward compatibility, introduce two new format
```
SINGLE64 := TypeCode(0x03), UInt64LittleEndian
BITMAP64 := TypeCode(0x04), CustomRoaringBitmap64
```
Please note that SINGLE64/BITMAP64 doesn't replace SINGLE32/BITMAP32. Doris will choose the smaller (in terms of space) type automatically during serializing. For example, BITMAP32 is preferred over BITMAP64 when the maximum element is <= UINT32_MAX. This will also make BE rollback possible as long as user didn't write element larger than UINT32_MAX into bitmap column.
Another important design decision is that we fork and maintain our own version of Roaring64Map instead of using the one in "roaring/roaring64map.hh". The reasons are
1. RoaringBitmap doesn't define a standard for the binary format of 64-bits bitmap. As a result, different implementations of Roaring64Map use different format. For example the [C++ version](https://github.com/RoaringBitmap/CRoaring/blob/v0.2.60/cpp/roaring64map.hh#L545) is different from the [Java version](35104c564e/src/main/java/org/roaringbitmap/longlong/Roaring64NavigableMap.java (L1097)). Even for CRoaring, the format may change in future releases. However Doris require the serialized format to be stable across versions. Fork is a safe way to achieve this.
2. We may want to make some code changes to Roaring64Map according to our needs. For example, in order to use the BITMAP32 format when the maximum element can be represented in 32 bits, we may want to access the private member of Roaring64Map. Another example is we want to further customize and optimize the format for BITMAP64 case, such as using vint64 instead of uint64 for map size.