[Improve]The connector supports spark 3.0, flink 1.13 (#6449)
Modify the flink/spark compilation documentation
This commit is contained in:
@ -26,7 +26,7 @@ under the License.
|
||||
|
||||
# Flink Doris Connector
|
||||
|
||||
Flink Doris Connector can support reading data stored in Doris through Flink.
|
||||
Flink Doris Connector can support read and write data stored in Doris through Flink.
|
||||
|
||||
- You can map the `Doris` table to` DataStream` or `Table`.
|
||||
|
||||
@ -35,12 +35,33 @@ Flink Doris Connector can support reading data stored in Doris through Flink.
|
||||
| Connector | Flink | Doris | Java | Scala |
|
||||
| --------- | ----- | ------ | ---- | ----- |
|
||||
| 1.0.0 | 1.11.2 | 0.13+ | 8 | 2.12 |
|
||||
| 1.0.0 | 1.13.x | 0.13.+ | 8 | 2.12 |
|
||||
|
||||
**For Flink 1.13.x version adaptation issues**
|
||||
|
||||
```xml
|
||||
<properties>
|
||||
<scala.version>2.12</scala.version>
|
||||
<flink.version>1.11.2</flink.version>
|
||||
<libthrift.version>0.9.3</libthrift.version>
|
||||
<arrow.version>0.15.1</arrow.version>
|
||||
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
|
||||
<doris.home>${basedir}/../../</doris.home>
|
||||
<doris.thirdparty>${basedir}/../../thirdparty</doris.thirdparty>
|
||||
</properties>
|
||||
```
|
||||
|
||||
Just change the `flink.version` here to be the same as your Flink cluster version, and edit again
|
||||
|
||||
## Build and Install
|
||||
|
||||
Execute following command in dir `extension/flink-doris-connector/`:
|
||||
|
||||
**Notice:**
|
||||
|
||||
1. If you have not compiled the doris source code as a whole, you need to compile the Doris source code first, otherwise the thrift command will not be found, and you need to execute `sh build.sh` in the `incubator-doris` directory.
|
||||
2. It is recommended to compile under the docker compile environment `apache/incubator-doris:build-env-1.2` of doris, because the JDK version below 1.3 is 11, there will be compilation problems.
|
||||
|
||||
```bash
|
||||
sh build.sh
|
||||
```
|
||||
|
||||
@ -37,14 +37,21 @@ Spark Doris Connector can support reading data stored in Doris through Spark.
|
||||
| Connector | Spark | Doris | Java | Scala |
|
||||
| --------- | ----- | ------ | ---- | ----- |
|
||||
| 1.0.0 | 2.x | 0.12+ | 8 | 2.11 |
|
||||
| 1.0.0 | 3.x | 0.12.+ | 8 | 2.12 |
|
||||
|
||||
|
||||
## Build and Install
|
||||
|
||||
Execute following command in dir `extension/spark-doris-connector/`:
|
||||
|
||||
**Notice:**
|
||||
|
||||
1. If you have not compiled the doris source code as a whole, you need to compile the Doris source code first, otherwise the thrift command will not be found, and you need to execute `sh build.sh` in the `incubator-doris` directory.
|
||||
2. It is recommended to compile under the docker compile environment `apache/incubator-doris:build-env-1.2` of doris, because the JDK version below 1.3 is 11, there will be compilation problems.
|
||||
|
||||
```bash
|
||||
sh build.sh
|
||||
sh build.sh 3 ## spark 3.x version, the default is 3.1.2
|
||||
sh build.sh 2 ## soark 2.x version, the default is 2.3.4
|
||||
```
|
||||
|
||||
After successful compilation, the file `doris-spark-1.0.0-SNAPSHOT.jar` will be generated in the `output/` directory. Copy this file to `ClassPath` in `Spark` to use `Spark-Doris-Connector`. For example, `Spark` running in `Local` mode, put this file in the `jars/` folder. `Spark` running in `Yarn` cluster mode, put this file in the pre-deployment package.
|
||||
|
||||
@ -26,7 +26,7 @@ under the License.
|
||||
|
||||
# Flink Doris Connector
|
||||
|
||||
Flink Doris Connector 可以支持通过 Flink 读取 Doris 中存储的数据。
|
||||
Flink Doris Connector 可以支持通过 Flink 读写 Doris 中存储的数据。
|
||||
|
||||
- 可以将`Doris`表映射为`DataStream`或者`Table`。
|
||||
|
||||
@ -34,13 +34,34 @@ Flink Doris Connector 可以支持通过 Flink 读取 Doris 中存储的数据
|
||||
|
||||
| Connector | Flink | Doris | Java | Scala |
|
||||
| --------- | ----- | ------ | ---- | ----- |
|
||||
| 1.0.0 | 1.11.2 | 0.13+ | 8 | 2.12 |
|
||||
| 1.0.0 | 1.11.x , 1.12.x | 0.13+ | 8 | 2.12 |
|
||||
| 1.0.0 | 1.13.x | 0.13.+ | 8 | 2.12 |
|
||||
|
||||
**针对Flink 1.13.x版本适配问题**
|
||||
|
||||
```xml
|
||||
<properties>
|
||||
<scala.version>2.12</scala.version>
|
||||
<flink.version>1.11.2</flink.version>
|
||||
<libthrift.version>0.9.3</libthrift.version>
|
||||
<arrow.version>0.15.1</arrow.version>
|
||||
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
|
||||
<doris.home>${basedir}/../../</doris.home>
|
||||
<doris.thirdparty>${basedir}/../../thirdparty</doris.thirdparty>
|
||||
</properties>
|
||||
```
|
||||
|
||||
只需要将这里的 `flink.version` 改成和你 Flink 集群版本一致,重新编辑即可
|
||||
|
||||
## 编译与安装
|
||||
|
||||
在 `extension/flink-doris-connector/` 源码目录下执行:
|
||||
|
||||
**注意:**
|
||||
|
||||
1. 这里如果你没有整体编译过 doris 源码,需要首先编译一次 Doris 源码,不然会出现 thrift 命令找不到的情况,需要到 `incubator-doris` 目录下执行 `sh build.sh`
|
||||
2. 建议在 doris 的 docker 编译环境 `apache/incubator-doris:build-env-1.2` 下进行编译,因为 1.3 下面的JDK 版本是 11,会存在编译问题。
|
||||
|
||||
```bash
|
||||
sh build.sh
|
||||
```
|
||||
|
||||
@ -37,14 +37,21 @@ Spark Doris Connector 可以支持通过 Spark 读取 Doris 中存储的数据
|
||||
| Connector | Spark | Doris | Java | Scala |
|
||||
| --------- | ----- | ------ | ---- | ----- |
|
||||
| 1.0.0 | 2.x | 0.12+ | 8 | 2.11 |
|
||||
| 1.0.0 | 3.x | 0.12.+ | 8 | 2.12 |
|
||||
|
||||
|
||||
## 编译与安装
|
||||
|
||||
在 `extension/spark-doris-connector/` 源码目录下执行:
|
||||
|
||||
**注意:**
|
||||
|
||||
1. 这里如果你没有整体编译过 doris 源码,需要首先编译一次 Doris 源码,不然会出现 thrift 命令找不到的情况,需要到 `incubator-doris` 目录下执行 `sh build.sh`
|
||||
2. 建议在 doris 的 docker 编译环境 `apache/incubator-doris:build-env-1.2` 下进行编译,因为 1.3 下面的JDK 版本是 11,会存在编译问题。
|
||||
|
||||
```bash
|
||||
sh build.sh
|
||||
sh build.sh 3 ## spark 3.x版本,默认是3.1.2
|
||||
sh build.sh 2 ## soark 2.x版本,默认是2.3.4
|
||||
```
|
||||
|
||||
编译成功后,会在 `output/` 目录下生成文件 `doris-spark-1.0.0-SNAPSHOT.jar`。将此文件复制到 `Spark` 的 `ClassPath` 中即可使用 `Spark-Doris-Connector`。例如,`Local` 模式运行的 `Spark`,将此文件放入 `jars/` 文件夹下。`Yarn`集群模式运行的`Spark`,则将此文件放入预部署包中。
|
||||
|
||||
@ -28,6 +28,7 @@ set -eo pipefail
|
||||
ROOT=`dirname "$0"`
|
||||
ROOT=`cd "$ROOT"; pwd`
|
||||
|
||||
|
||||
export DORIS_HOME=${ROOT}/../../
|
||||
|
||||
# include custom environment variables
|
||||
@ -37,6 +38,8 @@ fi
|
||||
|
||||
# check maven
|
||||
MVN_CMD=mvn
|
||||
|
||||
|
||||
if [[ ! -z ${CUSTOM_MVN} ]]; then
|
||||
MVN_CMD=${CUSTOM_MVN}
|
||||
fi
|
||||
@ -45,9 +48,14 @@ if ! ${MVN_CMD} --version; then
|
||||
exit 1
|
||||
fi
|
||||
export MVN_CMD
|
||||
|
||||
${MVN_CMD} clean package
|
||||
|
||||
if [ $1 == 3 ]
|
||||
then
|
||||
${MVN_CMD} clean package -f pom_3.0.xml
|
||||
fi
|
||||
if [ $1 == 2 ]
|
||||
then
|
||||
${MVN_CMD} clean package
|
||||
fi
|
||||
|
||||
mkdir -p output/
|
||||
cp target/doris-spark-1.0.0-SNAPSHOT.jar ./output/
|
||||
|
||||
290
extension/spark-doris-connector/pom_3.0.xml
Normal file
290
extension/spark-doris-connector/pom_3.0.xml
Normal file
@ -0,0 +1,290 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one
|
||||
or more contributor license agreements. See the NOTICE file
|
||||
distributed with this work for additional information
|
||||
regarding copyright ownership. The ASF licenses this file
|
||||
to you under the Apache License, Version 2.0 (the
|
||||
"License"); you may not use this file except in compliance
|
||||
with the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing,
|
||||
software distributed under the License is distributed on an
|
||||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations
|
||||
under the License.
|
||||
-->
|
||||
|
||||
<project xmlns="http://maven.apache.org/POM/4.0.0"
|
||||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
|
||||
<modelVersion>4.0.0</modelVersion>
|
||||
|
||||
<groupId>org.apache</groupId>
|
||||
<artifactId>doris-spark</artifactId>
|
||||
<version>1.0.0-SNAPSHOT</version>
|
||||
|
||||
<properties>
|
||||
<scala.version>2.12</scala.version>
|
||||
<spark.version>3.1.2</spark.version>
|
||||
<libthrift.version>0.9.3</libthrift.version>
|
||||
<arrow.version>1.0.1</arrow.version>
|
||||
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
|
||||
</properties>
|
||||
|
||||
<profiles>
|
||||
<!-- for custom internal repository -->
|
||||
<profile>
|
||||
<id>custom-env</id>
|
||||
<activation>
|
||||
<property>
|
||||
<name>env.CUSTOM_MAVEN_REPO</name>
|
||||
</property>
|
||||
</activation>
|
||||
|
||||
<repositories>
|
||||
<repository>
|
||||
<id>custom-nexus</id>
|
||||
<url>${env.CUSTOM_MAVEN_REPO}</url>
|
||||
</repository>
|
||||
</repositories>
|
||||
|
||||
<pluginRepositories>
|
||||
<pluginRepository>
|
||||
<id>custom-nexus</id>
|
||||
<url>${env.CUSTOM_MAVEN_REPO}</url>
|
||||
</pluginRepository>
|
||||
</pluginRepositories>
|
||||
</profile>
|
||||
|
||||
<!-- for general repository -->
|
||||
<profile>
|
||||
<id>general-env</id>
|
||||
<activation>
|
||||
<property>
|
||||
<name>!env.CUSTOM_MAVEN_REPO</name>
|
||||
</property>
|
||||
</activation>
|
||||
|
||||
<repositories>
|
||||
<repository>
|
||||
<id>central</id>
|
||||
<name>central maven repo https</name>
|
||||
<url>https://repo.maven.apache.org/maven2</url>
|
||||
</repository>
|
||||
</repositories>
|
||||
</profile>
|
||||
</profiles>
|
||||
|
||||
<dependencies>
|
||||
<dependency>
|
||||
<groupId>org.apache.spark</groupId>
|
||||
<artifactId>spark-core_${scala.version}</artifactId>
|
||||
<version>${spark.version}</version>
|
||||
<scope>provided</scope>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.apache.spark</groupId>
|
||||
<artifactId>spark-sql_${scala.version}</artifactId>
|
||||
<version>${spark.version}</version>
|
||||
<scope>provided</scope>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.apache.thrift</groupId>
|
||||
<artifactId>libthrift</artifactId>
|
||||
<version>${libthrift.version}</version>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.apache.arrow</groupId>
|
||||
<artifactId>arrow-vector</artifactId>
|
||||
<version>${arrow.version}</version>
|
||||
</dependency>
|
||||
|
||||
<dependency>
|
||||
<groupId>org.hamcrest</groupId>
|
||||
<artifactId>hamcrest-core</artifactId>
|
||||
<version>1.3</version>
|
||||
<scope>test</scope>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.mockito</groupId>
|
||||
<artifactId>mockito-scala_${scala.version}</artifactId>
|
||||
<version>1.4.7</version>
|
||||
<exclusions>
|
||||
<exclusion>
|
||||
<artifactId>hamcrest-core</artifactId>
|
||||
<groupId>org.hamcrest</groupId>
|
||||
</exclusion>
|
||||
</exclusions>
|
||||
<scope>test</scope>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>junit</groupId>
|
||||
<artifactId>junit</artifactId>
|
||||
<version>4.11</version>
|
||||
<exclusions>
|
||||
<exclusion>
|
||||
<artifactId>hamcrest-core</artifactId>
|
||||
<groupId>org.hamcrest</groupId>
|
||||
</exclusion>
|
||||
</exclusions>
|
||||
<scope>test</scope>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>com.fasterxml.jackson.core</groupId>
|
||||
<artifactId>jackson-databind</artifactId>
|
||||
<version>2.10.0</version>
|
||||
</dependency>
|
||||
|
||||
<dependency>
|
||||
<groupId>com.fasterxml.jackson.core</groupId>
|
||||
<artifactId>jackson-core</artifactId>
|
||||
<version>2.10.0</version>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>io.netty</groupId>
|
||||
<artifactId>netty-all</artifactId>
|
||||
<version>4.1.27.Final</version>
|
||||
<scope>provided</scope>
|
||||
</dependency>
|
||||
|
||||
</dependencies>
|
||||
|
||||
<build>
|
||||
<plugins>
|
||||
<plugin>
|
||||
<groupId>org.apache.thrift.tools</groupId>
|
||||
<artifactId>maven-thrift-plugin</artifactId>
|
||||
<version>0.1.11</version>
|
||||
<executions>
|
||||
<execution>
|
||||
<id>thrift-sources</id>
|
||||
<phase>generate-sources</phase>
|
||||
<goals>
|
||||
<goal>compile</goal>
|
||||
</goals>
|
||||
</execution>
|
||||
</executions>
|
||||
</plugin>
|
||||
<plugin>
|
||||
<groupId>net.alchim31.maven</groupId>
|
||||
<artifactId>scala-maven-plugin</artifactId>
|
||||
<version>3.2.1</version>
|
||||
<executions>
|
||||
<execution>
|
||||
<id>scala-compile-first</id>
|
||||
<phase>process-resources</phase>
|
||||
<goals>
|
||||
<goal>compile</goal>
|
||||
</goals>
|
||||
</execution>
|
||||
<execution>
|
||||
<id>scala-test-compile</id>
|
||||
<phase>process-test-resources</phase>
|
||||
<goals>
|
||||
<goal>testCompile</goal>
|
||||
</goals>
|
||||
</execution>
|
||||
</executions>
|
||||
<configuration>
|
||||
<args>
|
||||
<arg>-feature</arg>
|
||||
</args>
|
||||
</configuration>
|
||||
</plugin>
|
||||
<plugin>
|
||||
<groupId>org.apache.maven.plugins</groupId>
|
||||
<artifactId>maven-shade-plugin</artifactId>
|
||||
<version>3.2.1</version>
|
||||
<configuration>
|
||||
<artifactSet>
|
||||
<excludes>
|
||||
<exclude>com.google.code.findbugs:*</exclude>
|
||||
<exclude>org.slf4j:*</exclude>
|
||||
</excludes>
|
||||
</artifactSet>
|
||||
<relocations>
|
||||
<relocation>
|
||||
<pattern>org.apache.arrow</pattern>
|
||||
<shadedPattern>org.apache.doris.shaded.org.apache.arrow</shadedPattern>
|
||||
</relocation>
|
||||
<relocation>
|
||||
<pattern>io.netty</pattern>
|
||||
<shadedPattern>org.apache.doris.shaded.io.netty</shadedPattern>
|
||||
</relocation>
|
||||
<relocation>
|
||||
<pattern>com.fasterxml.jackson</pattern>
|
||||
<shadedPattern>org.apache.doris.shaded.com.fasterxml.jackson</shadedPattern>
|
||||
</relocation>
|
||||
<relocation>
|
||||
<pattern>org.apache.commons.codec</pattern>
|
||||
<shadedPattern>org.apache.doris.shaded.org.apache.commons.codec</shadedPattern>
|
||||
</relocation>
|
||||
<relocation>
|
||||
<pattern>com.google.flatbuffers</pattern>
|
||||
<shadedPattern>org.apache.doris.shaded.com.google.flatbuffers</shadedPattern>
|
||||
</relocation>
|
||||
<relocation>
|
||||
<pattern>org.apache.thrift</pattern>
|
||||
<shadedPattern>org.apache.doris.shaded.org.apache.thrift</shadedPattern>
|
||||
</relocation>
|
||||
</relocations>
|
||||
</configuration>
|
||||
<executions>
|
||||
<execution>
|
||||
<phase>package</phase>
|
||||
<goals>
|
||||
<goal>shade</goal>
|
||||
</goals>
|
||||
</execution>
|
||||
</executions>
|
||||
</plugin>
|
||||
<plugin>
|
||||
<groupId>org.jacoco</groupId>
|
||||
<artifactId>jacoco-maven-plugin</artifactId>
|
||||
<version>0.7.8</version>
|
||||
<configuration>
|
||||
<excludes>
|
||||
<exclude>**/thrift/**</exclude>
|
||||
</excludes>
|
||||
</configuration>
|
||||
<executions>
|
||||
<execution>
|
||||
<id>prepare-agent</id>
|
||||
<goals>
|
||||
<goal>prepare-agent</goal>
|
||||
</goals>
|
||||
</execution>
|
||||
<execution>
|
||||
<id>check</id>
|
||||
<goals>
|
||||
<goal>check</goal>
|
||||
</goals>
|
||||
</execution>
|
||||
<execution>
|
||||
<id>report</id>
|
||||
<phase>test</phase>
|
||||
<goals>
|
||||
<goal>report</goal>
|
||||
</goals>
|
||||
</execution>
|
||||
</executions>
|
||||
</plugin>
|
||||
<plugin>
|
||||
<groupId>org.apache.maven.plugins</groupId>
|
||||
<artifactId>maven-compiler-plugin</artifactId>
|
||||
<version>3.8.1</version>
|
||||
<configuration>
|
||||
<source>8</source>
|
||||
<target>8</target>
|
||||
</configuration>
|
||||
</plugin>
|
||||
</plugins>
|
||||
</build>
|
||||
|
||||
</project>
|
||||
|
||||
Reference in New Issue
Block a user