运行计划的Spark作业

编程入门行业动态更新时间:2024-10-26 20:28:06

本文介绍了运行计划的Spark作业的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我有一个Spark作业，该作业读取源表，执行许多map/flatten/reduce操作，然后将结果存储到用于报告的单独表中.当前，此作业是使用spark-submit脚本手动运行的.我想安排它每天晚上运行，以便在一天的开始时预先填充结果.我可以吗?

I have a Spark job which reads a source table, does a number of map / flatten / reduce operations and then stores the results into a separate table we use for reporting. Currently this job is run manually using the spark-submit script. I want to schedule it to run every night so the results are pre-populated for the start of the day. Do I:

设置cron作业以调用spark-submit脚本吗?

将计划添加到我的工作类别中，以便只提交一次但每天晚上执行操作吗?

Spark中是否有内置机制或单独的脚本可以帮助我做到这一点?

Set up a cron job to call the spark-submit script?

Add scheduling into my job class, so that it is submitted once but performs the actions every night?

Is there a built-in mechanism in Spark or a separate script that will help me do this?

我们在独立模式下运行Spark.

We are running Spark in Standalone mode.

任何建议表示赞赏！

推荐答案

Spark中没有内置的机制会有所帮助.对于您的情况，执行Cron工作似乎是合理的.如果发现自己不断向计划的作业添加依赖项，请尝试 Azkaban .

There is no built-in mechanism in Spark that will help. A cron job seems reasonable for your case. If you find yourself continuously adding dependencies to the scheduled job, try Azkaban.

更多推荐

运行计划的Spark作业

本文发布于:2023-11-24 03:15:40，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1623771.html