问题描述
限时送ChatGPT账号..TensorFlow 总是(预)分配我显卡上的所有可用内存 (VRAM),这没问题,因为我希望我的模拟在我的工作站上尽可能快地运行.
TensorFlow always (pre-)allocates all free memory (VRAM) on my graphics card, which is ok since I want my simulations to run as fast as possible on my workstation.
但是,我想记录 TensorFlow 实际使用了多少内存(总和).此外,如果我还可以记录单个张量使用了多少内存,那就太好了.
However, I would like to log how much memory (in sum) TensorFlow really uses. Additionally it would be really nice, if I could also log how much memory single tensors use.
此信息对于衡量和比较不同 ML/AI 架构所需的内存大小非常重要.
This information is important to measure and compare the memory size that different ML/AI architectures need.
有什么建议吗?
推荐答案
更新,可以使用TensorFlow ops查询allocator:
Update, can use TensorFlow ops to query allocator:
# maximum across all sessions and .run calls so far
sess.run(tf.contrib.memory_stats.MaxBytesInUse())
# current usage
sess.run(tf.contrib.memory_stats.BytesInUse())
您还可以通过查看RunMetadata
获取有关session.run
调用的详细信息,包括在run
调用期间分配的所有内存.IE 是这样的
Also you can get detailed information about session.run
call including all memory being allocations during run
call by looking at RunMetadata
. IE something like this
run_metadata = tf.RunMetadata()
sess.run(c, options=tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE, output_partition_graphs=True), run_metadata=run_metadata)
这是一个端到端的例子——取列向量、行向量并将它们相加以获得加法矩阵:
Here's an end-to-end example -- take column vector, row vector and add them to get a matrix of additions:
import tensorflow as tf
no_opt = tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0,
do_common_subexpression_elimination=False,
do_function_inlining=False,
do_constant_folding=False)
config = tf.ConfigProto(graph_options=tf.GraphOptions(optimizer_options=no_opt),
log_device_placement=True, allow_soft_placement=False,
device_count={"CPU": 3},
inter_op_parallelism_threads=3,
intra_op_parallelism_threads=1)
sess = tf.Session(config=config)
with tf.device("cpu:0"):
a = tf.ones((13, 1))
with tf.device("cpu:1"):
b = tf.ones((1, 13))
with tf.device("cpu:2"):
c = a+b
sess = tf.Session(config=config)
run_metadata = tf.RunMetadata()
sess.run(c, options=tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE, output_partition_graphs=True), run_metadata=run_metadata)
with open("/tmp/run2.txt", "w") as out:
out.write(str(run_metadata))
如果你打开 run.txt
你会看到这样的消息:
If you open run.txt
you'll see messages like this:
node_name: "ones"
allocation_description {
requested_bytes: 52
allocator_name: "cpu"
ptr: 4322108320
}
....
node_name: "ones_1"
allocation_description {
requested_bytes: 52
allocator_name: "cpu"
ptr: 4322092992
}
...
node_name: "add"
allocation_description {
requested_bytes: 676
allocator_name: "cpu"
ptr: 4492163840
所以在这里你可以看到 a
和 b
各分配了 52 个字节(13*4),结果分配了 676 个字节.
So here you can see that a
and b
allocated 52 bytes each (13*4), and the result allocated 676 bytes.
这篇关于TensorFlow:如何记录 GPU 内存 (VRAM) 利用率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
更多推荐
[db:关键词]
发布评论