启动集群时在EMR上配置Zeppelin的Spark解释器

编程入门行业动态更新时间:2024-10-14 00:25:59

本文介绍了启动集群时在EMR上配置Zeppelin的Spark解释器的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我正在EMR上创建集群，并配置Zeppelin从S3读取笔记本.为此，我使用了一个看起来像这样的json对象:

I am creating clusters on EMR and configure Zeppelin to read the notebooks from S3. To do that I am using a json object that looks like that:

[ { "Classification": "zeppelin-env", "Properties": { }, "Configurations": [ { "Classification": "export", "Properties": { "ZEPPELIN_NOTEBOOK_STORAGE":"org.apache.zeppelin.notebook.repo.S3NotebookRepo", "ZEPPELIN_NOTEBOOK_S3_BUCKET":"hs-zeppelin-notebooks", "ZEPPELIN_NOTEBOOK_USER":"user" }, "Configurations": [ ] } ] } ]

我将这个对象粘贴到EMR的Stoftware配置页面中: 我的问题是，如何/在何处可以直接配置Spark解释器，而无需在每次启动集群时从Zeppelin手动配置它?

I am pasting this object in the Stoftware configuration page of EMR: My question is, how/where I can configure the Spark interpreter directly without the need to manually configure it from Zeppelin each time I start a cluster?

推荐答案

这有点涉及，您需要做两件事:

This is a bit involved, you will need to do 2 things:

编辑Zeppelin的interpreter.json

重新启动解释器

因此，您需要做的是编写一个Shell脚本，然后向运行此Shell脚本的EMR群集配置中添加一个额外的步骤.

So what you need to do is write a shell script and then add an extra step to the EMR cluster configuration that runs this shell script.

Zeppelin配置位于json中，您可以使用jq(一种工具)来处理json.我不知道您要确切更改什么，但是这里有一个示例(添加了(神秘缺失的)DepInterpreter:

The Zeppelin configuration is in json, you can use jq (a tool) to manipulate json. I don't know what you want to change exactly, but here is an example that adds the (mysteriously missing) DepInterpreter:

#!/bin/bash # 1 edit the Spark interpreter set -e cat /etc/zeppelin/conf/interpreter.json | jq '.interpreterSettings."2ANGGHHMQ".interpreterGroup |= .+ [{"class":"org.apache.zeppelin.spark.DepInterpreter", "name":"dep"}]' | sudo -u zeppelin tee /etc/zeppelin/conf/interpreter.json # Trigger restart of Spark interpreter curl -X PUT localhost:8890/api/interpreter/setting/restart/2ANGGHHMQ

将此Shell脚本放入s3存储桶中. 然后，使用

Put this shell script in a s3 bucket. Then start your EMR cluster with

--steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,Jar=s3://eu-west-1.elasticmapreduce/libs/script-runner/script-runner.jar,Args=[s3://mybucket/script.sh]

更多推荐

启动集群时在EMR上配置Zeppelin的Spark解释器

本文发布于:2023-11-23 16:36:21，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1622149.html