Apache Flink容错

编程入门行业动态更新时间:2024-10-11 15:22:23

本文介绍了Apache Flink容错的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

Apache Flink提供了一种容错机制，可以一致地恢复数据流应用程序的状态.该机制可确保即使出现故障，程序的状态最终也将准确地反映出数据流中的每个记录一次.

Apache Flink offers a fault tolerance mechanism to consistently recover the state of data streaming applications. The mechanism ensures that even in the presence of failures, the program’s state will eventually reflect every record from the data stream exactly once.

我需要了解以下链接中的答案: Flink精确一次消息处理

I need to understand the answer in the following link: Flink exactly-once message processing

这是否意味着Flink Sink将向外部系统(如Cassandra)产生重复事件?

Does this means that Flink Sink will produce duplicate events to the external system like Cassandra?

例如:

1-我有以下流程:源->带有状态的FlatMap->接收器，配置的快照间隔为20秒.

1 - I have the following flow: source -> flatMap with state -> sink and a configured snapshot interval as 20 seconds.

如果任务管理器在两个快照之间(杀死最后一个快照后10秒钟，下一个快照之前10秒)关闭(终止)，将会发生什么情况.

What will happen if the task manger goes down (Killed) between two snapshots (after 10 seconds form the last snapshot and 10 seconds before the next snapshot).

我知道Flink将从上一个快照重新启动作业.

What I know is that Flink will restart the job from the last snapshot.

在这种情况下，接收器将重新处理上次快照与停机时间之间已经处理过的所有记录?

In this case the Sink will reprocess all the records that already processed between the last snapshot and the down time?

推荐答案

在您描述的场景中，Flink接收器实际上将重新处理自上次快照以来先前已发送给它的记录.

In the scenario you've described, the Flink sink will indeed reprocess the records that had previously been sent to it since the last snapshot.

但这并不意味着连接到接收器的外部数据存储(例如数据库，文件系统或消息队列)最终将保留这些重复项.如果接收器支持事务，或者数据是以幂等方式写入的，则Flink可以提供我们有时称为一次一次的端到端"保证.

But this does not necessarily mean that the external data store (e.g., database, filesystem, or message queue) connected to the sink will end up persisting these duplicates. Flink can provide what we sometimes refer to as "exactly-once end-to-end" guarantees if the sink supports transactions, or the data is being written in an idempotent way.

Flink的Kafka生产者和StreamingFileSink就是接收器的示例，这些接收器可以利用事务来避免产生重复的(或不一致的)结果.

Flink's Kafka producer and the StreamingFileSink are examples of sinks that can take advantage of transactions to avoid producing duplicate (or inconsistent) results.

Cassandra的情况要复杂一些-请参阅文档-并且Flink仅在使用幂等查询时才可以提供一次精确的语义.

The situation with Cassandra is somewhat more complex -- see the documentation -- and Flink can only provide exactly-once semantics if you are using idempotent queries.

更多推荐

Apache Flink容错

本文发布于:2023-11-25 11:24:32，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1629552.html