TensorFlow:如何记录 GPU 内存 (VRAM) 利用率?

编程入门行业动态更新时间:2024-10-09 15:17:59

本文介绍了TensorFlow:如何记录 GPU 内存 (VRAM) 利用率?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

限时送ChatGPT账号..

TensorFlow 总是(预)分配我显卡上的所有可用内存 (VRAM)，这没问题，因为我希望我的模拟在我的工作站上尽可能快地运行.

TensorFlow always (pre-)allocates all free memory (VRAM) on my graphics card, which is ok since I want my simulations to run as fast as possible on my workstation.

但是，我想记录 TensorFlow 实际使用了多少内存(总和).此外，如果我还可以记录单个张量使用了多少内存，那就太好了.

However, I would like to log how much memory (in sum) TensorFlow really uses. Additionally it would be really nice, if I could also log how much memory single tensors use.

此信息对于衡量和比较不同 ML/AI 架构所需的内存大小非常重要.

This information is important to measure and compare the memory size that different ML/AI architectures need.

有什么建议吗?

推荐答案

更新，可以使用TensorFlow ops查询allocator:

Update, can use TensorFlow ops to query allocator:

# maximum across all sessions and .run calls so far
sess.run(tf.contrib.memory_stats.MaxBytesInUse())
# current usage
sess.run(tf.contrib.memory_stats.BytesInUse())

您还可以通过查看RunMetadata 获取有关session.run 调用的详细信息，包括在run 调用期间分配的所有内存.IE 是这样的

Also you can get detailed information about session.run call including all memory being allocations during run call by looking at RunMetadata. IE something like this

run_metadata = tf.RunMetadata()
sess.run(c, options=tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE, output_partition_graphs=True), run_metadata=run_metadata)

这是一个端到端的例子——取列向量、行向量并将它们相加以获得加法矩阵:

Here's an end-to-end example -- take column vector, row vector and add them to get a matrix of additions:

import tensorflow as tf

no_opt = tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0,
                             do_common_subexpression_elimination=False,
                             do_function_inlining=False,
                             do_constant_folding=False)
config = tf.ConfigProto(graph_options=tf.GraphOptions(optimizer_options=no_opt),
                        log_device_placement=True, allow_soft_placement=False,
                        device_count={"CPU": 3},
                        inter_op_parallelism_threads=3,
                        intra_op_parallelism_threads=1)
sess = tf.Session(config=config)

with tf.device("cpu:0"):
    a = tf.ones((13, 1))
with tf.device("cpu:1"):
    b = tf.ones((1, 13))
with tf.device("cpu:2"):
    c = a+b

sess = tf.Session(config=config)
run_metadata = tf.RunMetadata()
sess.run(c, options=tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE, output_partition_graphs=True), run_metadata=run_metadata)
with open("/tmp/run2.txt", "w") as out:
  out.write(str(run_metadata))

如果你打开 run.txt 你会看到这样的消息:

If you open run.txt you'll see messages like this:

  node_name: "ones"

      allocation_description {
        requested_bytes: 52
        allocator_name: "cpu"
        ptr: 4322108320
      }
  ....

  node_name: "ones_1"

      allocation_description {
        requested_bytes: 52
        allocator_name: "cpu"
        ptr: 4322092992
      }
  ...
  node_name: "add"
      allocation_description {
        requested_bytes: 676
        allocator_name: "cpu"
        ptr: 4492163840

所以在这里你可以看到 a 和 b 各分配了 52 个字节(13*4)，结果分配了 676 个字节.

So here you can see that a and b allocated 52 bytes each (13*4), and the result allocated 676 bytes.

这篇关于TensorFlow:如何记录 GPU 内存 (VRAM) 利用率?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

更多推荐

[db:关键词]

本文发布于:2023-05-01 05:11:45，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1405406.html