谷歌云ml需要什么级别的控制(What level of control is required for google cloud ml)

编程入门行业动态更新时间:2024-10-27 08:37:32

使用Google Cloud ML训练模型时：

官方示例https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/census/tensorflowcore/trainer/task.py使用hooks，is_client，MonitoredTrainingSession和其他一些复杂性。

这是否需要cloud ml或者足够使用这个示例： https ： //github.com/amygdala/tensorflow-workshop/tree/master/workshop_sections/wide_n_deep ？

文档在最佳实践和优化方面有点受限，GCP ML是否会处理客户端/工作模式，还是需要设置设备，例如replica_device_setter等等？

When using google cloud ML to train models:

The official examples https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/census/tensorflowcore/trainer/task.py uses hooks, is_client, MonitoredTrainingSession and some other complexity.

Is this required for cloud ml or is using this example enough: https://github.com/amygdala/tensorflow-workshop/tree/master/workshop_sections/wide_n_deep?

The documentation is a bit limited in terms of best practices and optimisation, will GCP ML handle the client/worker mode or do we need to set devices e.g. replica_device_setter and so on?

最满意答案

CloudML Engine在很大程度上与您编写TensorFlow程序的方式无关。您提供了一个Python程序，该服务为您执行它，为它提供一些环境变量，您可以使用它们来执行分布式培训（如果需要），例如任务索引等。

census / tensorflowcore演示了如何使用“核心”TensorFlow库进行操作 - 如何“从头开始”执行所有操作，包括使用replica_device_setters ， MonitoredTrainingSessions等。这可能有时需要最终的灵活性，但可能很乏味。

除了人口普查/ tensorflowcore示例，您还会看到一个名为census / estimator的示例。这个例子基于一个更高级别的库，遗憾的是它在contrib ，因此还没有完全稳定的API（期望大量的弃用警告等）。期待它在未来版本的TensorFlow中稳定下来。

特别是库（称为Estimators ）是一个更高级别的API，可以为您处理大量的脏工作。它将为您解析TF_CONFIG并设置replica_device_setter以及处理MonitoredTrainingSession和必要的Hook ，同时保持相当可定制。

这是您指向的广泛而深入的示例所基于的库，并且它们在服务上完全受支持。

CloudML Engine is largely agnostic to how you write your TensorFlow programs. You provide a Python program, and the service executes it for you, providing it with some environment variables you can use to perform distributed training (if necessary), e.g., task index, etc.

census/tensorflowcore demonstrates how to do things with the "core" TensorFlow library -- how to do everything "from scratch", including using replica_device_setters, MonitoredTrainingSessions, etc.. This may be necessary sometimes for ultimate flexibility, but can be tedious.

Alongside the census/tensorflowcore example, you'll also see a sample called census/estimator. This example is based on a higher level library, which unfortunately is in contrib and therefore does not yet have a fully stable API (expect lots of deprecation warnings, etc.). Expect it to stabilize in a future version of TensorFlow.

That particularly library (known as Estimators) is a higher level API that takes care of a lot of the dirty work for you. It will parse TF_CONFIG for you and setup the replica_device_setter as well as handle the MonitoredTrainingSession and necessary Hooks, while remaining fairly customizable.

This is the same library that the wide and deep example you pointed to is based on and they are fully supported on the service.

更多推荐

本文发布于:2023-07-09 14:31:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1087005.html