[typo](docs)add hive-bitmap compile and package des #13237

This commit is contained in:
Liqf
2022-10-10 14:52:50 +08:00
committed by GitHub
parent 63903136c4
commit e094e6ca71
6 changed files with 78 additions and 18 deletions

View File

@ -63,6 +63,17 @@ under the License.
2. `brew extract --version='0.13.0' thrift $USER/local-tap`
3. `brew install thrift@0.13.0`
Reference link: `https://gist.github.com/tonydeng/02e571f273d6cce4230dc8d5f394493c`
Linux:
1.Download source package:`wget https://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.tar.gz`
2.Install dependencies:`yum install -y autoconf automake libtool cmake ncurses-devel openssl-devel lzo-devel zlib-devel gcc gcc-c++`
3.`tar zxvf thrift-0.13.0.tar.gz`
4.`cd thrift-0.13.0`
5.`./configure --without-tests`
6.`make`
7.`make install`
Check the version after installation is complete:thrift --version
Note: If you have compiled Doris, you do not need to install thrift, you can directly use $DORIS_HOME/thirdparty/installed/bin/thrift
```
4. Go to `./fe` folder and run the following maven command to generate sources.

View File

@ -603,7 +603,7 @@ The data type applicable to the aggregate column of the doris table is bitmap ty
There is no need to build a global dictionary, just specify the corresponding field in the load command, the format is: ```doris field name=binary_bitmap (hive table field name)```
Similarly, the binary (bitmap) type of data import is currently only supported when the upstream data source is a hive table.
Similarly, the binary (bitmap) type of data import is currently only supported when the upstream data source is a hive table,You can refer to the use of hive bitmap [hive-bitmap-udf](../../../ecosystem/external-table/hive-bitmap-udf)
### Show Load

View File

@ -43,23 +43,46 @@ CREATE TABLE IF NOT EXISTS `hive_bitmap_table`(
`k3` String COMMENT '',
`uuid` binary COMMENT 'bitmap'
) comment 'comment'
-- Example:Create Hive Table
CREATE TABLE IF NOT EXISTS `hive_table`(
`k1` int COMMENT '',
`k2` String COMMENT '',
`k3` String COMMENT '',
`uuid` int COMMENT ''
) comment 'comment'
```
### Hive Bitmap UDF Usage:
Hive Bitmap UDF used in Hive/Spark
Hive Bitmap UDF used in Hive/Spark,First, you need to compile fe to get hive-udf-jar-with-dependencies.jar.
Compilation preparation:If you have compiled the ldb source code, you can directly compile fe,If you have compiled the ldb source code, you can compile it directly. If you have not compiled the ldb source code, you need to manually install thrift,
Reference:[Setting Up dev env for FE](../../../community/developer-guide/fe-idea-dev) .
```sql
--clone doris code
git clone https://github.com/apache/doris.git
--install thrift
--Enter the fe directory
cd fe
--Execute the maven packaging command(All sub modules of fe will be packaged)
mvn package -Dmaven.test.skip=true
--You can also just package the hive-udf module
mvn package -pl hive-udf -am -Dmaven.test.skip=true
```
After packaging and compiling, enter the hive-udf directory and there will be a target directory,There will be hive-udf-jar-with-dependencies.jar package
```sql
-- Load the Hive Bitmap Udf jar package (Upload the compiled hive-udf jar package to HDFS)
add jar hdfs://node:9001/hive-udf-jar-with-dependencies.jar;
-- Create Hive Bitmap UDAF function
create temporary function to_bitmap as 'org.apache.doris.udf.ToBitmapUDAF';
create temporary function bitmap_union as 'org.apache.doris.udf.BitmapUnionUDAF';
create temporary function to_bitmap as 'org.apache.doris.udf.ToBitmapUDAF' USING JAR 'hdfs://node:9001/hive-udf-jar-with-dependencies.jar';
create temporary function bitmap_union as 'org.apache.doris.udf.BitmapUnionUDAF' USING JAR 'hdfs://node:9001/hive-udf-jar-with-dependencies.jar';
-- Create Hive Bitmap UDF function
create temporary function bitmap_count as 'org.apache.doris.udf.BitmapCountUDF';
create temporary function bitmap_and as 'org.apache.doris.udf.BitmapAndUDF';
create temporary function bitmap_or as 'org.apache.doris.udf.BitmapOrUDF';
create temporary function bitmap_xor as 'org.apache.doris.udf.BitmapXorUDF';
create temporary function bitmap_count as 'org.apache.doris.udf.BitmapCountUDF' USING JAR 'hdfs://node:9001/hive-udf-jar-with-dependencies.jar';
create temporary function bitmap_and as 'org.apache.doris.udf.BitmapAndUDF' USING JAR 'hdfs://node:9001/hive-udf-jar-with-dependencies.jar';
create temporary function bitmap_or as 'org.apache.doris.udf.BitmapOrUDF' USING JAR 'hdfs://node:9001/hive-udf-jar-with-dependencies.jar';
create temporary function bitmap_xor as 'org.apache.doris.udf.BitmapXorUDF' USING JAR 'hdfs://node:9001/hive-udf-jar-with-dependencies.jar';
-- Example: Generate bitmap by to_bitmap function and write to Hive Bitmap table
insert into hive_bitmap_table
select
@ -83,4 +106,4 @@ select k1,bitmap_union(uuid) from hive_bitmap_table group by k1
## Hive Bitmap import into Doris
see details: Load Data -> Spark Load -> Basic operation -> Create load(Example 3: when the upstream data source is hive binary type table)
see details: [Spark Load](../../data-operate/import/import-way/spark-load-manual) -> Basic operation -> Create load(Example 3: when the upstream data source is hive binary type table)

View File

@ -56,6 +56,18 @@ JDK1.8+, IntelliJ IDEA
2. `brew extract --version='0.13.0' thrift $USER/local-tap`
3. `brew install thrift@0.13.0`
参考链接: `https://gist.github.com/tonydeng/02e571f273d6cce4230dc8d5f394493c`
Linux:
1.下载源码包:`wget https://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.tar.gz`
2.安装依赖:`yum install -y autoconf automake libtool cmake ncurses-devel openssl-devel lzo-devel zlib-devel gcc gcc-c++`
3.`tar zxvf thrift-0.13.0.tar.gz`
4.`cd thrift-0.13.0`
5.`./configure --without-tests`
6.`make`
7.`make install`
安装完成后查看版本:thrift --version
注:如果编译过Doris,则不需要安装thrift,可以直接使用 $DORIS_HOME/thirdparty/installed/bin/thrift
4. 如果是Mac 或者 Linux 环境 可以通过 如下命令自动生成代码:

View File

@ -559,7 +559,7 @@ WITH RESOURCE 'spark0'
**hive binary(bitmap)类型列的导入**
适用于 doris 表聚合列的数据类型为 bitmap 类型,且数据源 hive 表中对应列的数据类型为 binary(通过 FE 中 spark-dpp 中的 `org.apache.doris.load.loadv2.dpp.BitmapValue` 类序列化)类型。 无需构建全局字典,在 load 命令中指定相应字段即可,格式为:`doris 字段名称= binary_bitmap( hive 表字段名称)` 同样,目前只有在上游数据源为hive表时才支持 binary( bitmap )类型的数据导入。
适用于 doris 表聚合列的数据类型为 bitmap 类型,且数据源 hive 表中对应列的数据类型为 binary(通过 FE 中 spark-dpp 中的 `org.apache.doris.load.loadv2.dpp.BitmapValue` 类序列化)类型。 无需构建全局字典,在 load 命令中指定相应字段即可,格式为:`doris 字段名称= binary_bitmap( hive 表字段名称)` 同样,目前只有在上游数据源为hive表时才支持 binary( bitmap )类型的数据导入hive bitmap使用可参考 [hive-bitmap-udf](../../../ecosystem/external-table/hive-bitmap-udf)
### 查看导入

View File

@ -58,7 +58,21 @@ CREATE TABLE IF NOT EXISTS `hive_table`(
### Hive Bitmap UDF 使用:
Hive Bitmap UDF 需要在 Hive/Spark 中使用
Hive Bitmap UDF 需要在 Hive/Spark 中使用,首先需要编译fe得到hive-udf-jar-with-dependencies.jar。
编译准备工作:如果进行过ldb源码编译可直接编译fe,如果没有进行过ldb源码编译,则需要手动安装thrift,可参考:[FE开发环境搭建](../../../community/developer-guide/fe-idea-dev) 中的编译与安装
```sql
--clone doris源码
git clone https://github.com/apache/doris.git
--安装thrift
--进入fe目录
cd fe
--执行maven打包命令(fe的子module会全部打包)
mvn package -Dmaven.test.skip=true
--也可以只打hive-udf module
mvn package -pl hive-udf -am -Dmaven.test.skip=true
```
打包编译完成进入hive-udf目录会有target目录,里面就会有打包完成的hive-udf-jar-with-dependencies.jar包
```sql
@ -66,14 +80,14 @@ CREATE TABLE IF NOT EXISTS `hive_table`(
add jar hdfs://node:9001/hive-udf-jar-with-dependencies.jar;
-- 创建UDAF函数
create temporary function to_bitmap as 'org.apache.doris.udf.ToBitmapUDAF';
create temporary function bitmap_union as 'org.apache.doris.udf.BitmapUnionUDAF';
create temporary function to_bitmap as 'org.apache.doris.udf.ToBitmapUDAF' USING JAR 'hdfs://node:9001/hive-udf-jar-with-dependencies.jar';
create temporary function bitmap_union as 'org.apache.doris.udf.BitmapUnionUDAF' USING JAR 'hdfs://node:9001/hive-udf-jar-with-dependencies.jar';
-- 创建UDF函数
create temporary function bitmap_count as 'org.apache.doris.udf.BitmapCountUDF';
create temporary function bitmap_and as 'org.apache.doris.udf.BitmapAndUDF';
create temporary function bitmap_or as 'org.apache.doris.udf.BitmapOrUDF';
create temporary function bitmap_xor as 'org.apache.doris.udf.BitmapXorUDF';
create temporary function bitmap_count as 'org.apache.doris.udf.BitmapCountUDF' USING JAR 'hdfs://node:9001/hive-udf-jar-with-dependencies.jar';
create temporary function bitmap_and as 'org.apache.doris.udf.BitmapAndUDF' USING JAR 'hdfs://node:9001/hive-udf-jar-with-dependencies.jar';
create temporary function bitmap_or as 'org.apache.doris.udf.BitmapOrUDF' USING JAR 'hdfs://node:9001/hive-udf-jar-with-dependencies.jar';
create temporary function bitmap_xor as 'org.apache.doris.udf.BitmapXorUDF' USING JAR 'hdfs://node:9001/hive-udf-jar-with-dependencies.jar';
-- 例子:通过 to_bitmap 生成 bitmap 写入 Hive Bitmap 表
insert into hive_bitmap_table
@ -101,4 +115,4 @@ select k1,bitmap_union(uuid) from hive_bitmap_table group by k1
## Hive bitmap 导入 doris
详见: 数据导入 -> Spark Load -> 基本操作 -> 创建导入 (示例3:上游数据源是hive binary类型情况)
详见: [Spark Load](../../data-operate/import/import-way/spark-load-manual) -> 基本操作 -> 创建导入 (示例3:上游数据源是hive binary类型情况)