146 lines
4.5 KiB
Markdown
146 lines
4.5 KiB
Markdown
---
|
|
{
|
|
"title": "BITMAP",
|
|
"language": "en"
|
|
}
|
|
---
|
|
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
|
|
|
|
# BITMAP
|
|
|
|
## Create table
|
|
|
|
The aggregation model needs to be used when creating the table. The data type is bitmap and the aggregation function is bitmap_union.
|
|
```
|
|
CREATE TABLE `pv_bitmap` (
|
|
`dt` int (11) NULL COMMENT" ",
|
|
`page` varchar (10) NULL COMMENT" ",
|
|
`user_id` bitmap BITMAP_UNION NULL COMMENT" "
|
|
) ENGINE = OLAP
|
|
AGGREGATE KEY (`dt`,` page`)
|
|
COMMENT "OLAP"
|
|
DISTRIBUTED BY HASH (`dt`) BUCKETS 2;
|
|
```
|
|
|
|
Note: When the amount of data is large, it is best to create a corresponding rollup table for high-frequency bitmap_union queries
|
|
|
|
```
|
|
ALTER TABLE pv_bitmap ADD ROLLUP pv (page, user_id);
|
|
```
|
|
|
|
## Data Load
|
|
|
|
`TO_BITMAP (expr)`: Convert 0 ~ 18446744073709551615 unsigned bigint to bitmap
|
|
|
|
`BITMAP_EMPTY ()`: Generate empty bitmap columns, used for insert or import to fill the default value
|
|
|
|
`BITMAP_HASH (expr)`: Convert any type of column to a bitmap by hashing
|
|
|
|
### Stream Load
|
|
|
|
```
|
|
cat data | curl --location-trusted -u user: passwd -T--H "columns: dt, page, user_id, user_id = to_bitmap (user_id)" http: // host: 8410 / api / test / testDb / _stream_load
|
|
```
|
|
|
|
```
|
|
cat data | curl --location-trusted -u user: passwd -T--H "columns: dt, page, user_id, user_id = bitmap_hash (user_id)" http: // host: 8410 / api / test / testDb / _stream_load
|
|
```
|
|
|
|
```
|
|
cat data | curl --location-trusted -u user: passwd -T--H "columns: dt, page, user_id, user_id = bitmap_empty ()" http: // host: 8410 / api / test / testDb / _stream_load
|
|
```
|
|
|
|
### Insert Into
|
|
|
|
id2's column type is bitmap
|
|
```
|
|
insert into bitmap_table1 select id, id2 from bitmap_table2;
|
|
```
|
|
|
|
id2's column type is bitmap
|
|
```
|
|
INSERT INTO bitmap_table1 (id, id2) VALUES (1001, to_bitmap (1000)), (1001, to_bitmap (2000));
|
|
```
|
|
|
|
id2's column type is bitmap
|
|
```
|
|
insert into bitmap_table1 select id, bitmap_union (id2) from bitmap_table2 group by id;
|
|
```
|
|
|
|
id2's column type is int
|
|
```
|
|
insert into bitmap_table1 select id, to_bitmap (id2) from table;
|
|
```
|
|
|
|
id2's column type is String
|
|
```
|
|
insert into bitmap_table1 select id, bitmap_hash (id_string) from table;
|
|
```
|
|
|
|
|
|
## Data Query
|
|
|
|
### Syntax
|
|
|
|
|
|
`BITMAP_UNION (expr)`: Calculate the union of two Bitmaps. The return value is the new Bitmap value.
|
|
|
|
`BITMAP_UNION_COUNT (expr)`: Calculate the cardinality of the union of two Bitmaps, equivalent to BITMAP_COUNT (BITMAP_UNION (expr)). It is recommended to use the BITMAP_UNION_COUNT function first, its performance is better than BITMAP_COUNT (BITMAP_UNION (expr)).
|
|
|
|
`BITMAP_UNION_INT (expr)`: Count the number of different values in columns of type TINYINT, SMALLINT and INT, return the sum of COUNT (DISTINCT expr) same
|
|
|
|
`INTERSECT_COUNT (bitmap_column_to_count, filter_column, filter_values ...)`: The calculation satisfies
|
|
filter_column The cardinality of the intersection of multiple bitmaps of the filter.
|
|
bitmap_column_to_count is a column of type bitmap, filter_column is a column of varying dimensions, and filter_values is a list of dimension values.
|
|
|
|
### Example
|
|
|
|
The following SQL uses the pv_bitmap table above as an example:
|
|
|
|
Calculate the deduplication value for user_id:
|
|
|
|
```
|
|
select bitmap_union_count (user_id) from pv_bitmap;
|
|
|
|
select bitmap_count (bitmap_union (user_id)) from pv_bitmap;
|
|
```
|
|
|
|
Calculate the deduplication value of id:
|
|
|
|
```
|
|
select bitmap_union_int (id) from pv_bitmap;
|
|
```
|
|
|
|
Calculate the retention of user_id:
|
|
|
|
```
|
|
select intersect_count (user_id, page, 'meituan') as meituan_uv,
|
|
intersect_count (user_id, page, 'waimai') as waimai_uv,
|
|
intersect_count (user_id, page, 'meituan', 'waimai') as retention // Number of users appearing on both 'meituan' and 'waimai' pages
|
|
from pv_bitmap
|
|
where page in ('meituan', 'waimai');
|
|
```
|
|
|
|
|
|
## keyword
|
|
|
|
BITMAP, BITMAP_COUNT, BITMAP_EMPTY, BITMAP_UNION, BITMAP_UNION_INT, TO_BITMAP, BITMAP_UNION_COUNT, INTERSECT_COUNT |