[Improve]The connector supports spark 3.0, flink 1.13 (#6449)

Modify the flink/spark compilation documentation
This commit is contained in:
jiafeng.zhang
2021-08-18 15:57:50 +08:00
committed by GitHub
parent 66a7a4b294
commit 4ea2fcefbc
6 changed files with 362 additions and 8 deletions

View File

@ -26,7 +26,7 @@ under the License.
# Flink Doris Connector
Flink Doris Connector can support reading data stored in Doris through Flink.
Flink Doris Connector can support read and write data stored in Doris through Flink.
- You can map the `Doris` table to` DataStream` or `Table`.
@ -35,12 +35,33 @@ Flink Doris Connector can support reading data stored in Doris through Flink.
| Connector | Flink | Doris | Java | Scala |
| --------- | ----- | ------ | ---- | ----- |
| 1.0.0 | 1.11.2 | 0.13+ | 8 | 2.12 |
| 1.0.0 | 1.13.x | 0.13.+ | 8 | 2.12 |
**For Flink 1.13.x version adaptation issues**
```xml
<properties>
<scala.version>2.12</scala.version>
<flink.version>1.11.2</flink.version>
<libthrift.version>0.9.3</libthrift.version>
<arrow.version>0.15.1</arrow.version>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<doris.home>${basedir}/../../</doris.home>
<doris.thirdparty>${basedir}/../../thirdparty</doris.thirdparty>
</properties>
```
Just change the `flink.version` here to be the same as your Flink cluster version, and edit again
## Build and Install
Execute following command in dir `extension/flink-doris-connector/`:
**Notice:**
1. If you have not compiled the doris source code as a whole, you need to compile the Doris source code first, otherwise the thrift command will not be found, and you need to execute `sh build.sh` in the `incubator-doris` directory.
2. It is recommended to compile under the docker compile environment `apache/incubator-doris:build-env-1.2` of doris, because the JDK version below 1.3 is 11, there will be compilation problems.
```bash
sh build.sh
```

View File

@ -37,14 +37,21 @@ Spark Doris Connector can support reading data stored in Doris through Spark.
| Connector | Spark | Doris | Java | Scala |
| --------- | ----- | ------ | ---- | ----- |
| 1.0.0 | 2.x | 0.12+ | 8 | 2.11 |
| 1.0.0 | 3.x | 0.12.+ | 8 | 2.12 |
## Build and Install
Execute following command in dir `extension/spark-doris-connector/`:
**Notice:**
1. If you have not compiled the doris source code as a whole, you need to compile the Doris source code first, otherwise the thrift command will not be found, and you need to execute `sh build.sh` in the `incubator-doris` directory.
2. It is recommended to compile under the docker compile environment `apache/incubator-doris:build-env-1.2` of doris, because the JDK version below 1.3 is 11, there will be compilation problems.
```bash
sh build.sh
sh build.sh 3 ## spark 3.x version, the default is 3.1.2
sh build.sh 2 ## soark 2.x version, the default is 2.3.4
```
After successful compilation, the file `doris-spark-1.0.0-SNAPSHOT.jar` will be generated in the `output/` directory. Copy this file to `ClassPath` in `Spark` to use `Spark-Doris-Connector`. For example, `Spark` running in `Local` mode, put this file in the `jars/` folder. `Spark` running in `Yarn` cluster mode, put this file in the pre-deployment package.

View File

@ -26,7 +26,7 @@ under the License.
# Flink Doris Connector
Flink Doris Connector 可以支持通过 Flink 读 Doris 中存储的数据。
Flink Doris Connector 可以支持通过 Flink 读 Doris 中存储的数据。
- 可以将`Doris`表映射为`DataStream`或者`Table`
@ -34,13 +34,34 @@ Flink Doris Connector 可以支持通过 Flink 读取 Doris 中存储的数据
| Connector | Flink | Doris | Java | Scala |
| --------- | ----- | ------ | ---- | ----- |
| 1.0.0 | 1.11.2 | 0.13+ | 8 | 2.12 |
| 1.0.0 | 1.11.x , 1.12.x | 0.13+ | 8 | 2.12 |
| 1.0.0 | 1.13.x | 0.13.+ | 8 | 2.12 |
**针对Flink 1.13.x版本适配问题**
```xml
<properties>
<scala.version>2.12</scala.version>
<flink.version>1.11.2</flink.version>
<libthrift.version>0.9.3</libthrift.version>
<arrow.version>0.15.1</arrow.version>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<doris.home>${basedir}/../../</doris.home>
<doris.thirdparty>${basedir}/../../thirdparty</doris.thirdparty>
</properties>
```
只需要将这里的 `flink.version` 改成和你 Flink 集群版本一致,重新编辑即可
## 编译与安装
`extension/flink-doris-connector/` 源码目录下执行:
**注意:**
1. 这里如果你没有整体编译过 doris 源码,需要首先编译一次 Doris 源码,不然会出现 thrift 命令找不到的情况,需要到 `incubator-doris` 目录下执行 `sh build.sh`
2. 建议在 doris 的 docker 编译环境 `apache/incubator-doris:build-env-1.2` 下进行编译,因为 1.3 下面的JDK 版本是 11,会存在编译问题。
```bash
sh build.sh
```

View File

@ -37,14 +37,21 @@ Spark Doris Connector 可以支持通过 Spark 读取 Doris 中存储的数据
| Connector | Spark | Doris | Java | Scala |
| --------- | ----- | ------ | ---- | ----- |
| 1.0.0 | 2.x | 0.12+ | 8 | 2.11 |
| 1.0.0 | 3.x | 0.12.+ | 8 | 2.12 |
## 编译与安装
`extension/spark-doris-connector/` 源码目录下执行:
**注意:**
1. 这里如果你没有整体编译过 doris 源码,需要首先编译一次 Doris 源码,不然会出现 thrift 命令找不到的情况,需要到 `incubator-doris` 目录下执行 `sh build.sh`
2. 建议在 doris 的 docker 编译环境 `apache/incubator-doris:build-env-1.2` 下进行编译,因为 1.3 下面的JDK 版本是 11,会存在编译问题。
```bash
sh build.sh
sh build.sh 3 ## spark 3.x版本,默认是3.1.2
sh build.sh 2 ## soark 2.x版本,默认是2.3.4
```
编译成功后,会在 `output/` 目录下生成文件 `doris-spark-1.0.0-SNAPSHOT.jar`。将此文件复制到 `Spark``ClassPath` 中即可使用 `Spark-Doris-Connector`。例如,`Local` 模式运行的 `Spark`,将此文件放入 `jars/` 文件夹下。`Yarn`集群模式运行的`Spark`,则将此文件放入预部署包中。

View File

@ -28,6 +28,7 @@ set -eo pipefail
ROOT=`dirname "$0"`
ROOT=`cd "$ROOT"; pwd`
export DORIS_HOME=${ROOT}/../../
# include custom environment variables
@ -37,6 +38,8 @@ fi
# check maven
MVN_CMD=mvn
if [[ ! -z ${CUSTOM_MVN} ]]; then
MVN_CMD=${CUSTOM_MVN}
fi
@ -45,9 +48,14 @@ if ! ${MVN_CMD} --version; then
exit 1
fi
export MVN_CMD
${MVN_CMD} clean package
if [ $1 == 3 ]
then
${MVN_CMD} clean package -f pom_3.0.xml
fi
if [ $1 == 2 ]
then
${MVN_CMD} clean package
fi
mkdir -p output/
cp target/doris-spark-1.0.0-SNAPSHOT.jar ./output/

View File

@ -0,0 +1,290 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.apache</groupId>
<artifactId>doris-spark</artifactId>
<version>1.0.0-SNAPSHOT</version>
<properties>
<scala.version>2.12</scala.version>
<spark.version>3.1.2</spark.version>
<libthrift.version>0.9.3</libthrift.version>
<arrow.version>1.0.1</arrow.version>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<profiles>
<!-- for custom internal repository -->
<profile>
<id>custom-env</id>
<activation>
<property>
<name>env.CUSTOM_MAVEN_REPO</name>
</property>
</activation>
<repositories>
<repository>
<id>custom-nexus</id>
<url>${env.CUSTOM_MAVEN_REPO}</url>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>custom-nexus</id>
<url>${env.CUSTOM_MAVEN_REPO}</url>
</pluginRepository>
</pluginRepositories>
</profile>
<!-- for general repository -->
<profile>
<id>general-env</id>
<activation>
<property>
<name>!env.CUSTOM_MAVEN_REPO</name>
</property>
</activation>
<repositories>
<repository>
<id>central</id>
<name>central maven repo https</name>
<url>https://repo.maven.apache.org/maven2</url>
</repository>
</repositories>
</profile>
</profiles>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.thrift</groupId>
<artifactId>libthrift</artifactId>
<version>${libthrift.version}</version>
</dependency>
<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-vector</artifactId>
<version>${arrow.version}</version>
</dependency>
<dependency>
<groupId>org.hamcrest</groupId>
<artifactId>hamcrest-core</artifactId>
<version>1.3</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-scala_${scala.version}</artifactId>
<version>1.4.7</version>
<exclusions>
<exclusion>
<artifactId>hamcrest-core</artifactId>
<groupId>org.hamcrest</groupId>
</exclusion>
</exclusions>
<scope>test</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<exclusions>
<exclusion>
<artifactId>hamcrest-core</artifactId>
<groupId>org.hamcrest</groupId>
</exclusion>
</exclusions>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.10.0</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.10.0</version>
</dependency>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
<version>4.1.27.Final</version>
<scope>provided</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.thrift.tools</groupId>
<artifactId>maven-thrift-plugin</artifactId>
<version>0.1.11</version>
<executions>
<execution>
<id>thrift-sources</id>
<phase>generate-sources</phase>
<goals>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.1</version>
<executions>
<execution>
<id>scala-compile-first</id>
<phase>process-resources</phase>
<goals>
<goal>compile</goal>
</goals>
</execution>
<execution>
<id>scala-test-compile</id>
<phase>process-test-resources</phase>
<goals>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
<configuration>
<args>
<arg>-feature</arg>
</args>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.1</version>
<configuration>
<artifactSet>
<excludes>
<exclude>com.google.code.findbugs:*</exclude>
<exclude>org.slf4j:*</exclude>
</excludes>
</artifactSet>
<relocations>
<relocation>
<pattern>org.apache.arrow</pattern>
<shadedPattern>org.apache.doris.shaded.org.apache.arrow</shadedPattern>
</relocation>
<relocation>
<pattern>io.netty</pattern>
<shadedPattern>org.apache.doris.shaded.io.netty</shadedPattern>
</relocation>
<relocation>
<pattern>com.fasterxml.jackson</pattern>
<shadedPattern>org.apache.doris.shaded.com.fasterxml.jackson</shadedPattern>
</relocation>
<relocation>
<pattern>org.apache.commons.codec</pattern>
<shadedPattern>org.apache.doris.shaded.org.apache.commons.codec</shadedPattern>
</relocation>
<relocation>
<pattern>com.google.flatbuffers</pattern>
<shadedPattern>org.apache.doris.shaded.com.google.flatbuffers</shadedPattern>
</relocation>
<relocation>
<pattern>org.apache.thrift</pattern>
<shadedPattern>org.apache.doris.shaded.org.apache.thrift</shadedPattern>
</relocation>
</relocations>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.jacoco</groupId>
<artifactId>jacoco-maven-plugin</artifactId>
<version>0.7.8</version>
<configuration>
<excludes>
<exclude>**/thrift/**</exclude>
</excludes>
</configuration>
<executions>
<execution>
<id>prepare-agent</id>
<goals>
<goal>prepare-agent</goal>
</goals>
</execution>
<execution>
<id>check</id>
<goals>
<goal>check</goal>
</goals>
</execution>
<execution>
<id>report</id>
<phase>test</phase>
<goals>
<goal>report</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.1</version>
<configuration>
<source>8</source>
<target>8</target>
</configuration>
</plugin>
</plugins>
</build>
</project>