I started a discussion on this before, you can check it in the mail group https://lists.apache.org/thread/o770bc3k623kyfks2mzkt21qsc4g6328 In order to facilitate everyone to organize the documents, I created a new-docs directory under the incubator-doris directory. The new directory structure is below this. I just created a directory structure here, which needs to be rearranged. In the data import scenario, in order to take into account the viewing habits of previous users, the import is organized in two ways: 1. According to the usage scenario: This will give users clearer guidance. For example, the user is the data source of kafka, then the user can directly select the routine to load 2. According to the import method: it is the introduction of the various import methods we provided before In order to facilitate everyone to run and debug locally, I migrated the entire .vuepress under the original document. After completion, you only need to delete the original docs directory and rename the new new-docs directory to docs. At the same time, you can also run it locally, so that you can organize documents and know the content of each document directory. In the local debugging execution, switch to the new-docs directory and execute the following command: ```` npm install npm run dev ```` then through the browser http://ip:port/zh-CN http://ip:port/en
4.5 KiB
title, language
| title | language |
|---|---|
| BITMAP_UNION | en |
BITMAP_UNION
Create table
The aggregation model needs to be used when creating the table. The data type is bitmap and the aggregation function is bitmap_union.
CREATE TABLE `pv_bitmap` (
`dt` int (11) NULL COMMENT" ",
`page` varchar (10) NULL COMMENT" ",
`user_id` bitmap BITMAP_UNION NULL COMMENT" "
) ENGINE = OLAP
AGGREGATE KEY (`dt`,` page`)
COMMENT "OLAP"
DISTRIBUTED BY HASH (`dt`) BUCKETS 2;
Note: When the amount of data is large, it is best to create a corresponding rollup table for high-frequency bitmap_union queries
ALTER TABLE pv_bitmap ADD ROLLUP pv (page, user_id);
Data Load
TO_BITMAP (expr): Convert 0 ~ 18446744073709551615 unsigned bigint to bitmap
BITMAP_EMPTY (): Generate empty bitmap columns, used for insert or import to fill the default value
BITMAP_HASH (expr): Convert any type of column to a bitmap by hashing
Stream Load
cat data | curl --location-trusted -u user: passwd -T--H "columns: dt, page, user_id, user_id = to_bitmap (user_id)" http: // host: 8410 / api / test / testDb / _stream_load
cat data | curl --location-trusted -u user: passwd -T--H "columns: dt, page, user_id, user_id = bitmap_hash (user_id)" http: // host: 8410 / api / test / testDb / _stream_load
cat data | curl --location-trusted -u user: passwd -T--H "columns: dt, page, user_id, user_id = bitmap_empty ()" http: // host: 8410 / api / test / testDb / _stream_load
Insert Into
id2's column type is bitmap
insert into bitmap_table1 select id, id2 from bitmap_table2;
id2's column type is bitmap
INSERT INTO bitmap_table1 (id, id2) VALUES (1001, to_bitmap (1000)), (1001, to_bitmap (2000));
id2's column type is bitmap
insert into bitmap_table1 select id, bitmap_union (id2) from bitmap_table2 group by id;
id2's column type is int
insert into bitmap_table1 select id, to_bitmap (id2) from table;
id2's column type is String
insert into bitmap_table1 select id, bitmap_hash (id_string) from table;
Data Query
Syntax
BITMAP_UNION (expr): Calculate the union of two Bitmaps. The return value is the new Bitmap value.
BITMAP_UNION_COUNT (expr): Calculate the cardinality of the union of two Bitmaps, equivalent to BITMAP_COUNT (BITMAP_UNION (expr)). It is recommended to use the BITMAP_UNION_COUNT function first, its performance is better than BITMAP_COUNT (BITMAP_UNION (expr)).
BITMAP_UNION_INT (expr): Count the number of different values in columns of type TINYINT, SMALLINT and INT, return the sum of COUNT (DISTINCT expr) same
INTERSECT_COUNT (bitmap_column_to_count, filter_column, filter_values ...): The calculation satisfies
filter_column The cardinality of the intersection of multiple bitmaps of the filter.
bitmap_column_to_count is a column of type bitmap, filter_column is a column of varying dimensions, and filter_values is a list of dimension values.
Example
The following SQL uses the pv_bitmap table above as an example:
Calculate the deduplication value for user_id:
select bitmap_union_count (user_id) from pv_bitmap;
select bitmap_count (bitmap_union (user_id)) from pv_bitmap;
Calculate the deduplication value of id:
select bitmap_union_int (id) from pv_bitmap;
Calculate the retention of user_id:
select intersect_count (user_id, page, 'meituan') as meituan_uv,
intersect_count (user_id, page, 'waimai') as waimai_uv,
intersect_count (user_id, page, 'meituan', 'waimai') as retention // Number of users appearing on both 'meituan' and 'waimai' pages
from pv_bitmap
where page in ('meituan', 'waimai');
keyword
BITMAP, BITMAP_COUNT, BITMAP_EMPTY, BITMAP_UNION, BITMAP_UNION_INT, TO_BITMAP, BITMAP_UNION_COUNT, INTERSECT_COUNT