在Tensorflow中计算欧几里德距离不在GPU上运行(Computation of Euclidean Distance in Tensorflow isn't running on G

在Tensorflow中计算欧几里德距离不在GPU上运行(Computation of Euclidean Distance in Tensorflow isn't running on GPU)

我需要计算阵列中每个图像i与k个图像（1 <= i <= k）之间的欧式距离与输入图像xr j ， （1 <= j <= m） （ k个图像和输入图像xr j是矩阵IR的列j ）。当过程完成IR的所有列时，它返回一个形状为k，m的矩阵D ，其中包含每个k图像之间的所有欧式距离以输入图像xr j 。用于执行此任务的原始代码显示在代码1中 。

代码1

D = np.zeros(shape=[ir_set[0].shape[0]-1, len(ir_set)]) for i in range(len(ir_set)): # number of team members. shape = ir_set[i].shape qtd_images = shape[0] # number of 'k' similar images dim_image = shape[1:] # dimensions of the reduced image for j in range(qtd_images-1): k = tf.placeholder(shape=dim_image, dtype=tf.float32) x = tf.placeholder(shape=dim_image, dtype=tf.float32) # L2 metric d = tf.sqrt(tf.reduce_sum(tf.square(tf.subtract(k, x)))) distance = sess.run(d, feed_dict={k: ir_set[i][j], x: ir_set[i][qtd_images-1]}) print('Computing distance: Model {0}/{1}, Image {2}/{3}'.format(i+1, len(ir_set), j+1, qtd_images-1), end='\r') D[j][i] = distance print('\nAll distances computed. Matrix D shape: {0}'.format(D.shape)) return D

代码1的问题在于，计算所有距离需要很长时间，让我的GPU一直处于空闲状态。为什么这段代码没有使用我的GPU，因为所有其他的Tensorflow程序都是？我如何修改代码1以便使用GPU并因此运行得更快？

提前致谢。

I need to compute the euclidean distance between each image i in an array with k images (1 <= i <= k) with an input image xrj, (1 <= j <= m) (the k images and the input image xrj are the column j of matrix IR). When the process finishes for all columns of IR, it's returned a matrix D with shape k,m, containing all the euclidean distances between each k image to input image xrj. The original code used to perform this task is shown in Code 1.

Code 1

D = np.zeros(shape=[ir_set[0].shape[0]-1, len(ir_set)]) for i in range(len(ir_set)): # number of team members. shape = ir_set[i].shape qtd_images = shape[0] # number of 'k' similar images dim_image = shape[1:] # dimensions of the reduced image for j in range(qtd_images-1): k = tf.placeholder(shape=dim_image, dtype=tf.float32) x = tf.placeholder(shape=dim_image, dtype=tf.float32) # L2 metric d = tf.sqrt(tf.reduce_sum(tf.square(tf.subtract(k, x)))) distance = sess.run(d, feed_dict={k: ir_set[i][j], x: ir_set[i][qtd_images-1]}) print('Computing distance: Model {0}/{1}, Image {2}/{3}'.format(i+1, len(ir_set), j+1, qtd_images-1), end='\r') D[j][i] = distance print('\nAll distances computed. Matrix D shape: {0}'.format(D.shape)) return D

The problem with Code 1 is that it takes too long to compute all the distances, letting my GPU idle all the time. Why this code isn't making use of my GPU, since all other Tensorflow procedures are? How can I modify Code 1 in order to make use of the GPU and thus, run faster?

Thanks in advance.

最满意答案

您应该利用张量tensorflow's broadcasting支持来一次计算所有图像组合的欧氏距离，而不是循环遍历每个图像组合的欧氏距离。

例如：

k = 10 m = 5 im_size = 32*32 IR = tf.random_normal((k+1,m,im_size)) #split IR into (k,m,imsize) and (1,m,imsize) ir, xr = tf.split(IR,[k, 1], axis=0 ) # Distance for all k*m values distances = tf.reduce_sum(tf.square(tf.subtract(ir, xr)), 2)

您的代码在CPU和GPU之间循环，并且大部分时间都是由于提供占位符而花在CPU上。上面的一个完全在GPU上运行。

You should make use of tensorflow's broadcasting support to calculate the euclidean distance of all the image combinations at once instead of looping over each one of them.

For example:

k = 10 m = 5 im_size = 32*32 IR = tf.random_normal((k+1,m,im_size)) #split IR into (k,m,imsize) and (1,m,imsize) ir, xr = tf.split(IR,[k, 1], axis=0 ) # Distance for all k*m values distances = tf.reduce_sum(tf.square(tf.subtract(ir, xr)), 2)

Your code, loops between CPU and GPU, and most of the the time is spend on CPU because of the feeding placeholders. The above one runs entirely on GPU.

更多推荐