This CL make bitmap_count, bitmap_union, and bitmap_union_count accept any expression whose return type is bitmap as input so that we can support flexible bitmap expression such as bitmap_count(bitmap_and(to_bitmap(1), to_bitmap(2))). This CL also create separate documentation for each bitmap UDF to conform with other functions.
4.4 KiB
BITMAP
Create table
The aggregation model needs to be used when creating the table. The data type is bitmap and the aggregation function is bitmap_union.
CREATE TABLE `pv_bitmap` (
`dt` int (11) NULL COMMENT" ",
`page` varchar (10) NULL COMMENT" ",
`user_id` bitmap BITMAP_UNION NULL COMMENT" "
) ENGINE = OLAP
AGGREGATE KEY (`dt`,` page`)
COMMENT "OLAP"
DISTRIBUTED BY HASH (`dt`) BUCKETS 2;
Note: When the amount of data is large, it is best to create a corresponding rollup table for high-frequency bitmap_union queries
ALTER TABLE pv_bitmap ADD ROLLUP pv (page, user_id);
Data Load
TO_BITMAP (expr): Convert 0 ~ 4294967295 unsigned int to bitmap
BITMAP_EMPTY (): Generate empty bitmap columns, used for insert or import to fill the default value
BITMAP_HASH (expr): Convert any type of column to a bitmap by hashing
Stream Load
cat data | curl --location-trusted -u user: passwd -T--H "columns: dt, page, user_id, user_id = to_bitmap (user_id)" http: // host: 8410 / api / test / testDb / _stream_load
cat data | curl --location-trusted -u user: passwd -T--H "columns: dt, page, user_id, user_id = bitmap_hash (user_id)" http: // host: 8410 / api / test / testDb / _stream_load
cat data | curl --location-trusted -u user: passwd -T--H "columns: dt, page, user_id, user_id = bitmap_empty ()" http: // host: 8410 / api / test / testDb / _stream_load
Insert Into
id2's column type is bitmap
insert into bitmap_table1 select id, id2 from bitmap_table2;
id2's column type is bitmap
INSERT INTO bitmap_table1 (id, id2) VALUES (1001, to_bitmap (1000)), (1001, to_bitmap (2000));
id2's column type is bitmap
insert into bitmap_table1 select id, bitmap_union (id2) from bitmap_table2 group by id;
id2's column type is int
insert into bitmap_table1 select id, to_bitmap (id2) from table;
id2's column type is String
insert into bitmap_table1 select id, bitmap_hash (id_string) from table;
Data Query
Syntax
BITMAP_UNION (expr): Calculate the union of two Bitmaps. The return value is the new Bitmap value.
BITMAP_UNION_COUNT (expr): Calculate the cardinality of the union of two Bitmaps, equivalent to BITMAP_COUNT (BITMAP_UNION (expr)). It is recommended to use the BITMAP_UNION_COUNT function first, its performance is better than BITMAP_COUNT (BITMAP_UNION (expr)).
BITMAP_UNION_INT (expr): Count the number of different values in columns of type TINYINT, SMALLINT and INT, return the sum of COUNT (DISTINCT expr) same
INTERSECT_COUNT (bitmap_column_to_count, filter_column, filter_values ...): The calculation satisfies
filter_column The cardinality of the intersection of multiple bitmaps of the filter.
bitmap_column_to_count is a column of type bitmap, filter_column is a column of varying dimensions, and filter_values is a list of dimension values.
Example
The following SQL uses the pv_bitmap table above as an example:
Calculate the deduplication value for user_id:
select bitmap_union_count (user_id) from pv_bitmap;
select bitmap_count (bitmap_union (user_id)) from pv_bitmap;
Calculate the deduplication value of id:
select bitmap_union_int (id) from pv_bitmap;
Calculate the retention of user_id:
select intersect_count (user_id, page, 'meituan') as meituan_uv,
intersect_count (user_id, page, 'waimai') as waimai_uv,
intersect_count (user_id, page, 'meituan', 'waimai') as retention // Number of users appearing on both 'meituan' and 'waimai' pages
from pv_bitmap
where page in ('meituan', 'waimai');
keyword
BITMAP, BITMAP_COUNT, BITMAP_EMPTY, BITMAP_UNION, BITMAP_UNION_INT, TO_BITMAP, BITMAP_UNION_COUNT, INTERSECT_COUNT