Spark结构化流:多个接收器

编程入门 行业动态 更新时间:2024-10-24 12:22:57
本文介绍了Spark结构化流:多个接收器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述
  • 我们从Kafka那里使用结构化流,并将处理后的数据集写入s3.

  • We are consuming from Kafka using structured streaming and writing the processed data set to s3.

    我们还希望将处理后的数据向前写入Kafka,是否可以从同一流查询中进行处理?(火花版本2.1.1)

    We also want to write the processed data to Kafka moving forward, is it possible to do it from the same streaming query ? (spark version 2.1.1)

    在日志中,我看到了流式查询进度输出,并且从日志中获得了一个示例持续时间JSON,能否有人请更清楚地说明 addBatch 和 getBatch ?

    In the logs, I see the streaming query progress output and I have a sample duration JSON from the log, can some one please provide more clarity on what the difference is between addBatch and getBatch?

    TriggerExecution-处理获取的数据和写入接收器都需要时间吗?

    TriggerExecution - is it the time take to both process the fetched data and writing to the sink?

    "durationMs" : { "addBatch" : 2263426, "getBatch" : 12, "getOffset" : 273, "queryPlanning" : 13, "triggerExecution" : 2264288, "walCommit" : 552 },

  • 推荐答案

  • 是.

    在Spark 2.1.1中,您可以使用 writeStream.foreach 将数据写入Kafka.此博客中有一个示例: databricks/blog/2017/04/04/real-time-end-to-end-integration-with-apache-kafka-in-apache-sparks-structured-streaming.html

    In Spark 2.1.1, you can use writeStream.foreach to write your data into Kafka. There is an example in this blog: databricks/blog/2017/04/04/real-time-end-to-end-integration-with-apache-kafka-in-apache-sparks-structured-streaming.html

    或者您可以使用Spark 2.2.0,该版本添加了Kafka接收器以支持正式写入Kafka.

    Or you can use Spark 2.2.0 which adds Kafka sink to support writing to Kafka officially.

    getBatch 测量从源创建DataFrame的时间.这通常非常快. addBatch 测量在接收器中运行DataFrame的时间.

    getBatch measures how long to create a DataFrame from source. This is usually pretty fast. addBatch measures how long to run the DataFrame in a sink.

    triggerExecution 测量执行一次触发器执行的时间,通常与 getOffset + getBatch + 几乎相同addBatch .

    triggerExecution measures how long to run a trigger execution, is usually almost the same as getOffset + getBatch + addBatch.

  • 更多推荐

    Spark结构化流:多个接收器

    本文发布于:2023-11-26 00:45:36,感谢您对本站的认可!
    本文链接:https://www.elefans.com/category/jswz/34/1631984.html
    版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
    本文标签:多个   接收器   结构化   Spark

    发布评论

    评论列表 (有 0 条评论)
    草根站长

    >www.elefans.com

    编程频道|电子爱好者 - 技术资讯及电子产品介绍!