问题描述
限时送ChatGPT账号..MNIST For ML Beginners
教程在我运行 print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
.其他一切都运行良好.
The MNIST For ML Beginners
tutorial is giving me an error when I run print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
. Everything else runs fine.
错误和跟踪:
InternalErrorTraceback (most recent call last)
<ipython-input-16-219711f7d235> in <module>()
----> 1 print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
338 try:
339 result = self._run(None, fetches, feed_dict, options_ptr,
--> 340 run_metadata_ptr)
341 if run_metadata:
342 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
562 try:
563 results = self._do_run(handle, target_list, unique_fetches,
--> 564 feed_dict_string, options, run_metadata)
565 finally:
566 # The movers are no longer used. Delete them.
/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
635 if handle is None:
636 return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
--> 637 target_list, options, run_metadata)
638 else:
639 return self._do_call(_prun_fn, self._session, handle, feed_dict,
/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _do_call(self, fn, *args)
657 # pylint: disable=protected-access
658 raise errors._make_specific_exception(node_def, op, error_message,
--> 659 e.code)
660 # pylint: enable=protected-access
661
InternalError: Dst tensor is not initialized.
[[Node: _recv_Placeholder_3_0/_1007 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_312__recv_Placeholder_3_0", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
[[Node: Mean_1/_1011 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_319_Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
我刚刚切换到更新版本的 CUDA,所以这可能与此有关吗?似乎这个错误是关于将张量复制到 GPU.
I just switched to a more recent version of CUDA, so maybe this has something to do with that? Seems like this error is about copying a tensor to the GPU.
堆栈:EC2 g2.8xlarge 机器,Ubuntu 14.04
Stack: EC2 g2.8xlarge machine, Ubuntu 14.04
更新:
print(sess.run(accuracy, feed_dict={x: batch_xs, y_: batch_ys}))
运行良好.这让我怀疑问题在于我试图将一个巨大的张量传输到 GPU 并且它无法接受.像 minibatch 这样的小张量工作得很好.
print(sess.run(accuracy, feed_dict={x: batch_xs, y_: batch_ys}))
runs fine. This leads me to suspect that the issue is that I'm trying to transfer a huge tensor to the GPU and it can't take it. Small tensors like a minibatch work just fine.
更新 2:
我已经弄清楚了导致这个问题的张量到底有多大:
I've figured out exactly how big the tensors have to be to cause this issue:
batch_size = 7509 #Works.
print(sess.run(accuracy, feed_dict={x: mnist.test.images[0:batch_size], y_: mnist.test.labels[0:batch_size]}))
batch_size = 7510 #Doesn't work. Gets the Dst error.
print(sess.run(accuracy, feed_dict={x: mnist.test.images[0:batch_size], y_: mnist.test.labels[0:batch_size]}))
推荐答案
为简洁起见,当没有足够的内存来处理批量大小时会生成此错误消息.
For brevity, this error message is generated when there is not enough memory to handle the batch size.
扩展 Steven 的链接(我还不能发表评论),这里有一些监控/控制的技巧Tensorflow 中的内存使用情况:
Expanding on Steven's link (I cannot post comments yet), here are a few tricks to monitor/control memory usage in Tensorflow:
要在运行期间监控内存使用情况,请考虑记录运行元数据.然后,您可以在 Tensorboard 的图表中查看每个节点的内存使用情况.有关更多信息,请参阅 Tensorboard 信息页面信息和示例.默认情况下,Tensorflow 会尝试分配尽可能多的 GPU 内存.您可以使用 GPUConfig 选项更改此设置,以便 Tensorflow 仅根据需要分配尽可能多的内存.请参阅文档在这一点上.在那里,您还可以找到一个选项,该选项允许您仅分配 GPU 内存的一部分(不过我发现有时会损坏. To monitor memory usage during runs, consider logging run metadata. You can then see the memory usage per node in your graph in Tensorboard. See the Tensorboard information page for more information and an example of this. By default, Tensorflow will try to allocate as much GPU memory as possible. You can change this using the GPUConfig options, so that Tensorflow will only allocate as much memory as needed. See the documentation on this. There you also find an option that will allow you to only allocate a certain fraction of your GPU memory (I have found this to be broken sometimes though.).这篇关于TensorFlow:Dst 张量未初始化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
更多推荐
[db:关键词]
发布评论