我有一个Spark作业,该作业读取源表,执行许多map/flatten/reduce操作,然后将结果存储到用于报告的单独表中.当前,此作业是使用spark-submit脚本手动运行的.我想安排它每天晚上运行,以便在一天的开始时预先填充结果.我可以吗?
I have a Spark job which reads a source table, does a number of map / flatten / reduce operations and then stores the results into a separate table we use for reporting. Currently this job is run manually using the spark-submit script. I want to schedule it to run every night so the results are pre-populated for the start of the day. Do I:
我们在独立模式下运行Spark.
We are running Spark in Standalone mode.
任何建议表示赞赏!
推荐答案Spark中没有内置的机制会有所帮助.对于您的情况,执行Cron工作似乎是合理的.如果发现自己不断向计划的作业添加依赖项,请尝试 Azkaban .
There is no built-in mechanism in Spark that will help. A cron job seems reasonable for your case. If you find yourself continuously adding dependencies to the scheduled job, try Azkaban.
更多推荐
运行计划的Spark作业
发布评论