任何人都可以提供如何设置张量流在网络上的许多CPU上工作的指导?到目前为止,我发现的所有例子最多只使用一个本地框和多gpus。我发现我可以传递的目标在session_opts的列表,但我不知道如何在每个框设置tensorflow监听网络节点/任务。任何示例都将非常感谢!
Can anyone provide guidance on how to setup tensorflow to work on many CPUs across a network? All of the examples I have found thus far use only one local box and multi-gpus at best. I have found that I can pass in a list of targets in the session_opts, but I'm not sure how to setup tensorflow on each box to listen for networked nodes/tasks. Any example would be greatly appreciated!
推荐答案TensorFlow的开源版本(目前为0.6.0)支持单进程执行只有:特别是 tensorflow :: SessionOptions 中唯一有效的目标是空字符串,这意味着当前进程。
The open-source version (currently 0.6.0) of TensorFlow supports single-process execution only: in particular, the only valid target in the tensorflow::SessionOptions is the empty string, which means "current process."
TensorFlow白皮书介绍了分布式实施的结构(参见图3 ),我们在Google内部使用。基本思想是,Session接口可以使用RPC来实现主;并且主机可以在也使用RPC进行通信的多个工作者进程中的一组设备上分割计算。唉,目前的版本在很大程度上取决于Google内部技术(例如 Borg ),所以很多工作仍然是使它准备外部消费。我们目前正在进行此项工作,您可以按照此GitHub问题的进度操作。
The TensorFlow whitepaper describes the structure of the distributed implementation (see Figure 3) that we use inside Google. The basic idea is that the Session interface can be implemented using RPC to a master; and the master can partition the computation across a set of devices in multiple worker processes, which also communicate using RPC. Alas, the current version depends heavily on Google-internal technologies (like Borg), so a lot of work remains to make it ready for external consumption. We are currently working on this, and you can follow the progress on this GitHub issue.
EDIT on 2/26/2016:今天我们发布了分布式运行时的初始版本到GitHub。它支持多台机器和多个GPU。
EDIT on 2/26/2016: Today we released an initial version of the distributed runtime to GitHub. It supports multiple machines and multiple GPUs.
更多推荐
分布式计算的Tensorflow设置
发布评论