admin管理员组文章数量:1608633
近日在对一个包含InplaceABN模块的网络进行魔改的时候,遇到了如下报错:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64, 256, 7, 7]], which is output 0 of InPlaceABNBackward, is at version 3; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
之前应用InplaceABN的时候,并没有研读过paper和代码,所以在解决这个问题的时候,花费了数小时,像无头苍蝇一样试错,虽然知道是连续的inplace操作引发的问题,但是没有定位到具体引发问题是在哪个block的哪块代码,居然一直在错误地方尝试clone()来解决。次日常看github的issue,才将问题原因真正搞清楚。
1. InplaceABN提供的block
ABN
is standard BN + activation (no memory savings).InPlaceABN
is BN+activation done inplace (with memory savings).InPlaceABNSync
is BN+activation done inplace (with memory savings) + computation of BN (fwd+bwd) with data from all the gpus.
2. Inplace shortcut
out += residual
to out = out + residual
+=和add_()是Inplace操作
我遇到的问题其实是,在
ResidualBlock中,有InplaceABN和add_两个连续的inplce操作。
3. 解决方案
reference:
https://github/mapillary/inplace_abn/issues/6
inplace_abn/resnet.py at main · mapillary/inplace_abn · GitHub
inplace_abn/residual.py at main · mapillary/inplace_abn · GitHub
本文标签: InplaceABNError
版权声明:本文标题:InplaceABN Backward Error 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:https://www.elefans.com/dianzi/1728550536a1163374.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论