启动集群时在EMR上配置Zeppelin的Spark解释器

编程入门 行业动态 更新时间:2024-10-14 00:25:59
本文介绍了启动集群时在EMR上配置Zeppelin的Spark解释器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我正在EMR上创建集群,并配置Zeppelin从S3读取笔记本.为此,我使用了一个看起来像这样的json对象:

I am creating clusters on EMR and configure Zeppelin to read the notebooks from S3. To do that I am using a json object that looks like that:

[ { "Classification": "zeppelin-env", "Properties": { }, "Configurations": [ { "Classification": "export", "Properties": { "ZEPPELIN_NOTEBOOK_STORAGE":"org.apache.zeppelin.notebook.repo.S3NotebookRepo", "ZEPPELIN_NOTEBOOK_S3_BUCKET":"hs-zeppelin-notebooks", "ZEPPELIN_NOTEBOOK_USER":"user" }, "Configurations": [ ] } ] } ]

我将这个对象粘贴到EMR的Stoftware配置页面中: 我的问题是,如何/在何处可以直接配置Spark解释器,而无需在每次启动集群时从Zeppelin手动配置它?

I am pasting this object in the Stoftware configuration page of EMR: My question is, how/where I can configure the Spark interpreter directly without the need to manually configure it from Zeppelin each time I start a cluster?

推荐答案

这有点涉及,您需要做两件事:

This is a bit involved, you will need to do 2 things:

  • 编辑Zeppelin的interpreter.json
  • 重新启动解释器
  • 因此,您需要做的是编写一个Shell脚本,然后向运行此Shell脚本的EMR群集配置中添加一个额外的步骤.

    So what you need to do is write a shell script and then add an extra step to the EMR cluster configuration that runs this shell script.

    Zeppelin配置位于json中,您可以使用jq(一种工具)来处理json.我不知道您要确切更改什么,但是这里有一个示例(添加了(神秘缺失的)DepInterpreter:

    The Zeppelin configuration is in json, you can use jq (a tool) to manipulate json. I don't know what you want to change exactly, but here is an example that adds the (mysteriously missing) DepInterpreter:

    #!/bin/bash # 1 edit the Spark interpreter set -e cat /etc/zeppelin/conf/interpreter.json | jq '.interpreterSettings."2ANGGHHMQ".interpreterGroup |= .+ [{"class":"org.apache.zeppelin.spark.DepInterpreter", "name":"dep"}]' | sudo -u zeppelin tee /etc/zeppelin/conf/interpreter.json # Trigger restart of Spark interpreter curl -X PUT localhost:8890/api/interpreter/setting/restart/2ANGGHHMQ

    将此Shell脚本放入s3存储桶中. 然后,使用

    Put this shell script in a s3 bucket. Then start your EMR cluster with

    --steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,Jar=s3://eu-west-1.elasticmapreduce/libs/script-runner/script-runner.jar,Args=[s3://mybucket/script.sh]

    更多推荐

    启动集群时在EMR上配置Zeppelin的Spark解释器

    本文发布于:2023-11-23 16:36:21,感谢您对本站的认可!
    本文链接:https://www.elefans.com/category/jswz/34/1622149.html
    版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
    本文标签:集群   EMR   Spark   Zeppelin

    发布评论

    评论列表 (有 0 条评论)
    草根站长

    >www.elefans.com

    编程频道|电子爱好者 - 技术资讯及电子产品介绍!