在 tensorflow 上使用批量大小作为“2 的幂"是否更快?

编程入门行业动态更新时间:2024-10-28 14:36:52

本文介绍了在 tensorflow 上使用批量大小作为“2 的幂"是否更快?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我从某处读到，如果您选择的批量大小为 2 的幂，则训练速度会更快.这是什么规则?这适用于其他应用程序吗?你能提供一份参考论文吗?

解决方案

从算法上讲，使用更大的 mini-batch 可以减少随机梯度更新的方差(通过取 mini-batch 中梯度的平均值)，这反过来又允许您采用更大的步长，这意味着优化算法将取得更快的进展.

然而，在目标中达到一定精度所完成的工作量(就梯度计算的数量而言)将是相同的:当 mini-batch 大小为 n 时，更新方向的方差将减少乘以因子 n，因此该理论允许您采用大 n 倍的步长，因此单步将带您大致达到与小批量大小为 1 的 SGD 的 n 步相同的准确度.

关于tensorFlow，我没有找到你肯定的证据，而且是github上已经关闭的问题:https://github/tensorflow/tensorflow/issues/4132

请注意，将图像大小调整为 2 的幂是有意义的(因为池化通常在 2X2 窗口中完成)，但这完全不同.

I read from somewhere that if you choose a batch size that is a power 2, training will be faster. What is this rule? Is this applicable to other applications? Can you provide a reference paper?

解决方案

Algorithmically speaking, using larger mini-batches allows you to reduce the variance of your stochastic gradient updates (by taking the average of the gradients in the mini-batch), and this in turn allows you to take bigger step-sizes, which means the optimization algorithm will make progress faster.

However, the amount of work done (in terms of number of gradient computations) to reach a certain accuracy in the objective will be the same: with a mini-batch size of n, the variance of the update direction will be reduced by a factor n, so the theory allows you to take step-sizes that are n times larger, so that a single step will take you roughly to the same accuracy as n steps of SGD with a mini-batch size of 1.

As for tensorFlow, I found no evidence of your affirmation, and its a question that has been closed on github : https://github/tensorflow/tensorflow/issues/4132

Note that image resized to power of two makes sense (because pooling is generally done in 2X2 windows), but that’s a different thing altogether.

这篇关于在 tensorflow 上使用批量大小作为“2 的幂"是否更快?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

更多推荐

[db:关键词]

本文发布于:2023-03-31 00:31:33，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/791796.html