Files
doris/docs/en/sql-reference/sql-functions/aggregate-functions/bitmap.md

146 lines
4.5 KiB
Markdown

---
{
"title": "BITMAP",
"language": "en"
}
---
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# BITMAP
## Create table
The aggregation model needs to be used when creating the table. The data type is bitmap and the aggregation function is bitmap_union.
```
CREATE TABLE `pv_bitmap` (
  `dt` int (11) NULL COMMENT" ",
  `page` varchar (10) NULL COMMENT" ",
  `user_id` bitmap BITMAP_UNION NULL COMMENT" "
) ENGINE = OLAP
AGGREGATE KEY (`dt`,` page`)
COMMENT "OLAP"
DISTRIBUTED BY HASH (`dt`) BUCKETS 2;
```
Note: When the amount of data is large, it is best to create a corresponding rollup table for high-frequency bitmap_union queries
```
ALTER TABLE pv_bitmap ADD ROLLUP pv (page, user_id);
```
## Data Load
`TO_BITMAP (expr)`: Convert 0 ~ 18446744073709551615 unsigned bigint to bitmap
`BITMAP_EMPTY ()`: Generate empty bitmap columns, used for insert or import to fill the default value
`BITMAP_HASH (expr)`: Convert any type of column to a bitmap by hashing
### Stream Load
```
cat data | curl --location-trusted -u user: passwd -T--H "columns: dt, page, user_id, user_id = to_bitmap (user_id)" http: // host: 8410 / api / test / testDb / _stream_load
```
```
cat data | curl --location-trusted -u user: passwd -T--H "columns: dt, page, user_id, user_id = bitmap_hash (user_id)" http: // host: 8410 / api / test / testDb / _stream_load
```
```
cat data | curl --location-trusted -u user: passwd -T--H "columns: dt, page, user_id, user_id = bitmap_empty ()" http: // host: 8410 / api / test / testDb / _stream_load
```
### Insert Into
id2's column type is bitmap
```
insert into bitmap_table1 select id, id2 from bitmap_table2;
```
id2's column type is bitmap
```
INSERT INTO bitmap_table1 (id, id2) VALUES (1001, to_bitmap (1000)), (1001, to_bitmap (2000));
```
id2's column type is bitmap
```
insert into bitmap_table1 select id, bitmap_union (id2) from bitmap_table2 group by id;
```
id2's column type is int
```
insert into bitmap_table1 select id, to_bitmap (id2) from table;
```
id2's column type is String
```
insert into bitmap_table1 select id, bitmap_hash (id_string) from table;
```
## Data Query
### Syntax
`BITMAP_UNION (expr)`: Calculate the union of two Bitmaps. The return value is the new Bitmap value.
`BITMAP_UNION_COUNT (expr)`: Calculate the cardinality of the union of two Bitmaps, equivalent to BITMAP_COUNT (BITMAP_UNION (expr)). It is recommended to use the BITMAP_UNION_COUNT function first, its performance is better than BITMAP_COUNT (BITMAP_UNION (expr)).
`BITMAP_UNION_INT (expr)`: Count the number of different values ​​in columns of type TINYINT, SMALLINT and INT, return the sum of COUNT (DISTINCT expr) same
`INTERSECT_COUNT (bitmap_column_to_count, filter_column, filter_values ​​...)`: The calculation satisfies
filter_column The cardinality of the intersection of multiple bitmaps of the filter.
bitmap_column_to_count is a column of type bitmap, filter_column is a column of varying dimensions, and filter_values ​​is a list of dimension values.
### Example
The following SQL uses the pv_bitmap table above as an example:
Calculate the deduplication value for user_id:
```
select bitmap_union_count (user_id) from pv_bitmap;
select bitmap_count (bitmap_union (user_id)) from pv_bitmap;
```
Calculate the deduplication value of id:
```
select bitmap_union_int (id) from pv_bitmap;
```
Calculate the retention of user_id:
```
select intersect_count (user_id, page, 'meituan') as meituan_uv,
intersect_count (user_id, page, 'waimai') as waimai_uv,
intersect_count (user_id, page, 'meituan', 'waimai') as retention // Number of users appearing on both 'meituan' and 'waimai' pages
from pv_bitmap
where page in ('meituan', 'waimai');
```
## keyword
BITMAP, BITMAP_COUNT, BITMAP_EMPTY, BITMAP_UNION, BITMAP_UNION_INT, TO_BITMAP, BITMAP_UNION_COUNT, INTERSECT_COUNT