是否有一种暂停Dataproc集群的方法,所以当我不主动运行spark-shell或spark-submit作业时,我不会被计费吗?此链接上的集群管理说明: cloud.google. com/sdk/gcloud/reference/beta/dataproc/clusters/
is there a way of pausing a Dataproc cluster so I don't get billed when I am not actively running spark-shell or spark-submit jobs ? The cluster management instructions at this link: cloud.google/sdk/gcloud/reference/beta/dataproc/clusters/
仅显示如何破坏群集,但是例如,我已经安装了spark cassandra连接器API.除了创建每次需要安装的映像之外,我唯一的选择是吗?
only show how to destroy a cluster but I have installed spark cassandra connector API for example. Is my only alternative to just creating an image that I'll need to install every time ?
推荐答案通常,最好的做法是将用于自定义群集的步骤提取到一些安装脚本中,然后使用Dataproc的初始化操作,可轻松在集群部署期间自动执行安装.
In general, the best thing to do is to distill out the steps you used to customize your cluster into some setup scripts, and then use Dataproc's initialization actions to easily automate doing the installation during cluster deployment.
这样,如果您想要在多个并发的Dataproc集群上进行相同的设置,或者想要更改计算机类型或收到次要版本的错误,就可以轻松地重现自定义项,而无需人工参与修复了Dataproc偶尔发布的问题.
This way, you can easily reproduce the customizations without requiring manual involvement if you ever want, for example, to do the same setup on multiple concurrent Dataproc clusters, or want to change machine types, or receive sub-minor-version bug fixes that Dataproc releases occasionally.
目前,确实没有官方支持的暂停Dataproc集群的方法,很大程度上是因为能够进行可复制的集群部署以及下面列出的其他注意事项,这意味着99%的时间最好使用初始化-操作自定义,而不是就地暂停群集.也就是说,可能会有短期的黑客入侵,例如进入 Google Compute Engine页面,选择要暂停的Dataproc集群中的实例,然后单击停止"而不删除它们.
There's indeed no officially supported way of pausing a Dataproc cluster at the moment, in large part simply because being able to have reproducible cluster deployments along with several other considerations listed below means that 99% of the time it's better to use initialization-action customizations instead of pausing a cluster in-place. That said, there are possible short-term hacks, such as going into the Google Compute Engine page, selecting the instances that are part of the Dataproc cluster you want to pause, and clicking "stop" without deleting them.
仅当基础实例正在运行时才发生Compute Engine每小时收费和Dataproc的每vCPU收费,因此,即使您手动停止"了实例,您也不会产生Dataproc或Compute Engine的实例小时收费. Dataproc仍将集群列为"RUNNING",尽管有警告,如果您进入Dataproc集群摘要页面的"VM Instances"选项卡,将会看到警告.
The Compute Engine hourly charges and Dataproc's per-vCPU charges are only incurred when the underlying instance is running, so while you've "stopped" the instances manually, you won't incur Dataproc or Compute Engine's instance-hour charges despite Dataproc still listing the cluster as "RUNNING", albeit with warnings that you'll see if you go to the "VM Instances" tab of the Dataproc cluster summary page.
您应该,然后只需从 Google Compute Engine页面页面以使集群再次运行,但是考虑以下警告很重要:
You should then be able to just click "start" from the Google Compute Engine page page to have the cluster running again, but it's important to consider the following caveats:
更多推荐
暂停Dataproc集群
发布评论