From 3aae27634ae14f33a5bc389143b41b4ec24880ce Mon Sep 17 00:00:00 2001 From: wudi <676366545@qq.com> Date: Wed, 28 Dec 2022 16:15:49 +0800 Subject: [PATCH] [doc](flink-connector) update flink connector faq (#15405) --- .../docs/ecosystem/flink-doris-connector.md | 36 ++++++++++++++++--- .../docs/ecosystem/flink-doris-connector.md | 34 +++++++++++++++--- 2 files changed, 61 insertions(+), 9 deletions(-) diff --git a/docs/en/docs/ecosystem/flink-doris-connector.md b/docs/en/docs/ecosystem/flink-doris-connector.md index e5d947babd..15a2caf2f2 100644 --- a/docs/en/docs/ecosystem/flink-doris-connector.md +++ b/docs/en/docs/ecosystem/flink-doris-connector.md @@ -139,6 +139,8 @@ Add flink-doris-connector Maven dependencies 1. Please replace the corresponding Connector and Flink dependency versions according to different Flink and Scala versions. Version 1.1.0 only supports Flink1.14 +2. You can also download the relevant version jar package from [here](https://repo.maven.apache.org/maven2/org/apache/doris/). + ## How to use There are three ways to use Flink Doris Connector. @@ -419,9 +421,17 @@ The most suitable scenario for using Flink Doris Connector is to synchronize sou 1. The Flink Doris Connector mainly relies on Checkpoint for streaming writing, so the interval between Checkpoints is the visible delay time of the data. 2. To ensure the Exactly Once semantics of Flink, the Flink Doris Connector enables two-phase commit by default, and Doris enables two-phase commit by default after version 1.1. 1.0 can be enabled by modifying the BE parameters, please refer to [two_phase_commit](../data-operate/import/import-way/stream-load-manual.md). -### common problem +## FAQ -1. **Bitmap type write** +1. **After Doris Source finishes reading data, why does the stream end?** + +Currently Doris Source is a bounded stream and does not support CDC reading. + +2. **Can Flink read Doris and perform conditional pushdown?** + +By configuring the doris.filter.query parameter, refer to the configuration section for details. + +3. **How to write Bitmap type?** ```sql CREATE TABLE bitmap_sink ( @@ -439,12 +449,28 @@ WITH ( 'sink.properties.columns' = 'dt,page,user_id,user_id=to_bitmap(user_id)' ) ```` -2. **errCode = 2, detailMessage = Label [label_0_1] has already been used, relate to txn [19650]** +4. **errCode = 2, detailMessage = Label [label_0_1] has already been used, relate to txn [19650]** In the Exactly-Once scenario, the Flink Job must be restarted from the latest Checkpoint/Savepoint, otherwise the above error will be reported. When Exactly-Once is not required, it can also be solved by turning off 2PC commits (sink.enable-2pc=false) or changing to a different sink.label-prefix. -3. **errCode = 2, detailMessage = transaction [19650] not found** +5. **errCode = 2, detailMessage = transaction [19650] not found** Occurred in the Commit phase, the transaction ID recorded in the checkpoint has expired on the FE side, and the above error will occur when committing again at this time. -At this time, it cannot be started from the checkpoint, and the expiration time can be extended by modifying the streaming_label_keep_max_second configuration in fe.conf, which defaults to 12 hours. \ No newline at end of file +At this time, it cannot be started from the checkpoint, and the expiration time can be extended by modifying the streaming_label_keep_max_second configuration in fe.conf, which defaults to 12 hours. + +6. **errCode = 2, detailMessage = current running txns on db 10006 is 100, larger than limit 100** + +This is because the concurrent import of the same library exceeds 100, which can be solved by adjusting the parameter `max_running_txn_num_per_db` of fe.conf. For details, please refer to [max_running_txn_num_per_db](https://doris.apache.org/zh-CN/docs/dev/admin-manual/config/fe-config/#max_running_txn_num_per_db) + +7. **How to ensure the order of a batch of data when Flink writes to the Uniq model?** + +You can add sequence column configuration to ensure that, for details, please refer to [sequence](https://doris.apache.org/zh-CN/docs/dev/data-operate/update-delete/sequence-column-manual) + +8. **The Flink task does not report an error, but the data cannot be synchronized? ** + +Before Connector1.1.0, it was written in batches, and the writing was driven by data. It was necessary to determine whether there was data written upstream. After 1.1.0, it depends on Checkpoint, and Checkpoint must be enabled to write. + +9. **tablet writer write failed, tablet_id=190958, txn_id=3505530, err=-235** + +It usually occurs before Connector1.1.0, because the writing frequency is too fast, resulting in too many versions. The frequency of Streamload can be reduced by setting the sink.batch.size and sink.batch.interval parameters. \ No newline at end of file diff --git a/docs/zh-CN/docs/ecosystem/flink-doris-connector.md b/docs/zh-CN/docs/ecosystem/flink-doris-connector.md index 0be44b1545..90438ed2ea 100644 --- a/docs/zh-CN/docs/ecosystem/flink-doris-connector.md +++ b/docs/zh-CN/docs/ecosystem/flink-doris-connector.md @@ -144,6 +144,8 @@ enable_http_server_v2 = true 1.请根据不同的 Flink 和 Scala 版本替换对应的 Connector 和 Flink 依赖版本。 +2.也可从[这里](https://repo.maven.apache.org/maven2/org/apache/doris/)下载相关版本jar包。 + ## 使用方法 Flink 读写 Doris 数据主要有两种方式 @@ -416,9 +418,17 @@ insert into doris_sink select id,name from cdc_mysql_source; 1. Flink Doris Connector主要是依赖Checkpoint进行流式写入,所以Checkpoint的间隔即为数据的可见延迟时间。 2. 为了保证Flink的Exactly Once语义,Flink Doris Connector 默认开启两阶段提交,Doris在1.1版本后默认开启两阶段提交。1.0可通过修改BE参数开启,可参考[two_phase_commit](../data-operate/import/import-way/stream-load-manual.md)。 -### 常见问题 +## 常见问题 -1. **Bitmap类型写入** +1. **Doris Source在数据读取完成后,流为什么就结束了?** + +目前Doris Source是有界流,不支持CDC方式读取。 + +2. **Flink读取Doris可以进行条件下推吗?** + +通过配置doris.filter.query参数,详情参考配置小节。 + +3. **如何写入Bitmap类型?** ```sql CREATE TABLE bitmap_sink ( @@ -436,12 +446,28 @@ WITH ( 'sink.properties.columns' = 'dt,page,user_id,user_id=to_bitmap(user_id)' ) ``` -2. **errCode = 2, detailMessage = Label [label_0_1] has already been used, relate to txn [19650]** +4. **errCode = 2, detailMessage = Label [label_0_1] has already been used, relate to txn [19650]** Exactly-Once场景下,Flink Job重启时必须从最新的Checkpoint/Savepoint启动,否则会报如上错误。 不要求Exactly-Once时,也可通过关闭2PC提交(sink.enable-2pc=false) 或更换不同的sink.label-prefix解决。 -3. **errCode = 2, detailMessage = transaction [19650] not found** +5. **errCode = 2, detailMessage = transaction [19650] not found** 发生在Commit阶段,checkpoint里面记录的事务ID,在FE侧已经过期,此时再次commit就会出现上述错误。 此时无法从checkpoint启动,后续可通过修改fe.conf的streaming_label_keep_max_second配置来延长过期时间,默认12小时。 + +6. **errCode = 2, detailMessage = current running txns on db 10006 is 100, larger than limit 100** + +这是因为同一个库并发导入超过了100,可通过调整 fe.conf的参数 `max_running_txn_num_per_db` 来解决。具体可参考 [max_running_txn_num_per_db](https://doris.apache.org/zh-CN/docs/dev/admin-manual/config/fe-config/#max_running_txn_num_per_db) + +7. **Flink写入Uniq模型时,如何保证一批数据的有序性?** + +可以添加sequence列配置来保证,具体可参考 [sequence](https://doris.apache.org/zh-CN/docs/dev/data-operate/update-delete/sequence-column-manual) + +8. **Flink任务没报错,但是无法同步数据?** + +Connector1.1.0版本以前,是攒批写入的,写入均是由数据驱动,需要判断上游是否有数据写入。1.1.0之后,依赖Checkpoint,必须开启Checkpoint才能写入。 + +9. **tablet writer write failed, tablet_id=190958, txn_id=3505530, err=-235** + +通常发生在Connector1.1.0之前,是由于写入频率过快,导致版本过多。可以通过设置sink.batch.size 和 sink.batch.interval参数来降低Streamload的频率。 \ No newline at end of file