Files
doris/docs/en/sql-reference/sql-functions/aggregate-functions/bitmap.md

4.5 KiB

title, language
title language
BITMAP en

BITMAP

Create table

The aggregation model needs to be used when creating the table. The data type is bitmap and the aggregation function is bitmap_union.

CREATE TABLE `pv_bitmap` (
  `dt` int (11) NULL COMMENT" ",
  `page` varchar (10) NULL COMMENT" ",
  `user_id` bitmap BITMAP_UNION NULL COMMENT" "
) ENGINE = OLAP
AGGREGATE KEY (`dt`,` page`)
COMMENT "OLAP"
DISTRIBUTED BY HASH (`dt`) BUCKETS 2;

Note: When the amount of data is large, it is best to create a corresponding rollup table for high-frequency bitmap_union queries

ALTER TABLE pv_bitmap ADD ROLLUP pv (page, user_id);

Data Load

TO_BITMAP (expr): Convert 0 ~ 18446744073709551615 unsigned bigint to bitmap

BITMAP_EMPTY (): Generate empty bitmap columns, used for insert or import to fill the default value

BITMAP_HASH (expr): Convert any type of column to a bitmap by hashing

Stream Load

cat data | curl --location-trusted -u user: passwd -T--H "columns: dt, page, user_id, user_id = to_bitmap (user_id)" http: // host: 8410 / api / test / testDb / _stream_load
cat data | curl --location-trusted -u user: passwd -T--H "columns: dt, page, user_id, user_id = bitmap_hash (user_id)" http: // host: 8410 / api / test / testDb / _stream_load
cat data | curl --location-trusted -u user: passwd -T--H "columns: dt, page, user_id, user_id = bitmap_empty ()" http: // host: 8410 / api / test / testDb / _stream_load

Insert Into

id2's column type is bitmap

insert into bitmap_table1 select id, id2 from bitmap_table2;

id2's column type is bitmap

INSERT INTO bitmap_table1 (id, id2) VALUES (1001, to_bitmap (1000)), (1001, to_bitmap (2000));

id2's column type is bitmap

insert into bitmap_table1 select id, bitmap_union (id2) from bitmap_table2 group by id;

id2's column type is int

insert into bitmap_table1 select id, to_bitmap (id2) from table;

id2's column type is String

insert into bitmap_table1 select id, bitmap_hash (id_string) from table;

Data Query

Syntax

BITMAP_UNION (expr): Calculate the union of two Bitmaps. The return value is the new Bitmap value.

BITMAP_UNION_COUNT (expr): Calculate the cardinality of the union of two Bitmaps, equivalent to BITMAP_COUNT (BITMAP_UNION (expr)). It is recommended to use the BITMAP_UNION_COUNT function first, its performance is better than BITMAP_COUNT (BITMAP_UNION (expr)).

BITMAP_UNION_INT (expr): Count the number of different values ​​in columns of type TINYINT, SMALLINT and INT, return the sum of COUNT (DISTINCT expr) same

INTERSECT_COUNT (bitmap_column_to_count, filter_column, filter_values ​​...): The calculation satisfies filter_column The cardinality of the intersection of multiple bitmaps of the filter. bitmap_column_to_count is a column of type bitmap, filter_column is a column of varying dimensions, and filter_values ​​is a list of dimension values.

Example

The following SQL uses the pv_bitmap table above as an example:

Calculate the deduplication value for user_id:

select bitmap_union_count (user_id) from pv_bitmap;

select bitmap_count (bitmap_union (user_id)) from pv_bitmap;

Calculate the deduplication value of id:

select bitmap_union_int (id) from pv_bitmap;

Calculate the retention of user_id:

select intersect_count (user_id, page, 'meituan') as meituan_uv,
intersect_count (user_id, page, 'waimai') as waimai_uv,
intersect_count (user_id, page, 'meituan', 'waimai') as retention // Number of users appearing on both 'meituan' and 'waimai' pages
from pv_bitmap
where page in ('meituan', 'waimai');

keyword

BITMAP, BITMAP_COUNT, BITMAP_EMPTY, BITMAP_UNION, BITMAP_UNION_INT, TO_BITMAP, BITMAP_UNION_COUNT, INTERSECT_COUNT