Support Grouping Sets, Rollup and Cube to extend group by statement
support GROUPING SETS syntax
```
SELECT a, b, SUM( c ) FROM tab1 GROUP BY GROUPING SETS ( (a, b), (a), (b), ( ) );
```
cube or rollup like
```
SELECT a, b,c, SUM( d ) FROM tab1 GROUP BY ROLLUP|CUBE(a,b,c)
```
[ADD] support grouping functions in expr like grouping(a) + grouping(b) (#2039)
[FIX] fix analyzer error in window function(#2039)
Fixes#2771
Main changes in this CL
* RoaringBitmap is renamed to BitmapValue and moved into bitmap_value.h
* leveraging Roaring64Map to support unsigned BIGINT for BITMAP type
* introduces two new format (SINGLE64 and BITMAP64) for BITMAP type
So far we have three storage format for BITMAP type
```
EMPTY := TypeCode(0x00)
SINGLE32 := TypeCode(0x01), UInt32LittleEndian
BITMAP32 := TypeCode(0x02), RoaringBitmap(defined by https://github.com/RoaringBitmap/RoaringFormatSpec/)
```
In order to support BIGINT element and keep backward compatibility, introduce two new format
```
SINGLE64 := TypeCode(0x03), UInt64LittleEndian
BITMAP64 := TypeCode(0x04), CustomRoaringBitmap64
```
Please note that SINGLE64/BITMAP64 doesn't replace SINGLE32/BITMAP32. Doris will choose the smaller (in terms of space) type automatically during serializing. For example, BITMAP32 is preferred over BITMAP64 when the maximum element is <= UINT32_MAX. This will also make BE rollback possible as long as user didn't write element larger than UINT32_MAX into bitmap column.
Another important design decision is that we fork and maintain our own version of Roaring64Map instead of using the one in "roaring/roaring64map.hh". The reasons are
1. RoaringBitmap doesn't define a standard for the binary format of 64-bits bitmap. As a result, different implementations of Roaring64Map use different format. For example the [C++ version](https://github.com/RoaringBitmap/CRoaring/blob/v0.2.60/cpp/roaring64map.hh#L545) is different from the [Java version](35104c564e/src/main/java/org/roaringbitmap/longlong/Roaring64NavigableMap.java (L1097)). Even for CRoaring, the format may change in future releases. However Doris require the serialized format to be stable across versions. Fork is a safe way to achieve this.
2. We may want to make some code changes to Roaring64Map according to our needs. For example, in order to use the BITMAP32 format when the maximum element can be represented in 32 bits, we may want to access the private member of Roaring64Map. Another example is we want to further customize and optimize the format for BITMAP64 case, such as using vint64 instead of uint64 for map size.
Standardize the return results of INSERT operations,
which is convenient for users to use and locate problems.
More details can be found in insert-into-manual.md
This CL changes:
1. add function bitmap_to_string and bitmap_from_string, which will
convert a bitmap to/from string which contains all bit in bitmap
2. add function murmur_hash3_32, which will compute murmur hash for
input strings
3. make the function cast float to string the same with user result
logic
This CL make bitmap_count, bitmap_union, and bitmap_union_count accept any expression whose return type is bitmap as input so that we can support flexible bitmap expression such as bitmap_count(bitmap_and(to_bitmap(1), to_bitmap(2))).
This CL also create separate documentation for each bitmap UDF to conform with other functions.
This commit adds a new statement named alter view, like
ALTER VIEW view_name
(
col_1,
col_2,
col_3,
)
AS SELECT k1, k2, SUM(v1) FROM exampleDb.testTbl GROUP BY k1,k2
**Authorization checking logic**
There are some problems with the current password and permission checking logic. For example:
First, we create a user by:
`create user cmy@"%" identified by "12345";`
And then 'cmy' can login with password '12345' from any hosts.
Second, we create another user by:
`create user cmy@"192.168.%" identified by "abcde";`
Because "192.168.%" has a higher priority in the permission table than "%". So when "cmy" try
to login in by password "12345" from host "192.168.1.1", it should match the second permission
entry, and will be rejected because of invalid password.
But in current implementation, Doris will continue to check password on first entry, than let it pass. So we should change it.
**Permission checking logic**
After a user login, it should has a unique identity which is got from permission table. For example,
when "cmy" from host "192.168.1.1" login, it's identity should be `cmy@"192.168.%"`. And Doris
should use this identity to check other permission, not by using the user's real identity, which is
`cmy@"192.168.1.1"`.
**Black list**
Functionally speaking, Doris only support adding WHITE LIST, which is to allow user to login from
those hosts in the white list. But is some cases, we do need a BLACK LIST function.
Fortunately, by changing the logic described above, we can simulate the effect of the BLACK LIST.
For example, First we add a user by:
`create user cmy@'%' identified by '12345';`
And now user 'cmy' can login from any hosts. and if we don't want 'cmy' to login from host A, we
can add a new user by:
`create user cmy@'A' identified by 'other_passwd';`
Because "A" has a higher priority in the permission table than "%". If 'cmy' try to login from A using password '12345', it will be rejected.
Some use has the requirment that only some of columns will be update in
one load operation, and others will retain as original. However, Doris
can't handle this situation, because user must specify value for all
columns. Then if a column aggregation method is REPLACE, use must query
original value to overwrite it. This often needs some work for user to
do.
If this CL is applied, user can use REPLACE_IF_NOT_NULL instead of
REPLACE. Then when load data to table, if user don't intent to change
value of this column, user can specify NULL for this column. Doris will
retain original value for this column.
Add a new type: Object. Currently, it's mainly for complex aggregate metrics(HLL , Bitmap).
The Object type has the following constraints:
1 Object type could not as key column type
2 Object type doesn't support all indices (BloomFilter, short key, zone map, invert index)
3 Object type doesn't support filter and group by
In the implementation:
The Object type reuse the StringValue and StringVal, because in storage engine, the Object type is binary, it has a pointer and length.
The prepare/close step of scalar function is already supported in execution framework, We only need to do is that support it in syntax and meta in frontend.
In addition, 'Hive' binary type of scalar function NOT supports prepare/close step, we need to make it supports.
Random distribution is no longer supported since version 0.9.
And we need a way to convert the random distribution to hash distribution.
ALTER TABLE db.tbl SET ("distribution_type" = "hash");