admin管理员组

文章数量:1608852

Bug: 

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.LongTensor [4, 512, 512]] is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

[W python_anomaly_mode.cpp:104] Warning: Error detected in NllLoss2DBackward0. Traceback of forward call that caused the error:
  File "<string>", line 1, in <module>
  File "/data2/user10/anaconda3/envs/zwk/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/data2/user10/anaconda3/envs/zwk/lib/python3.8/multiprocessing/spawn.py", line 129, in _main
    return self._bootstrap(parent_sentinel)
  File "/data2/user10/anaconda3/envs/zwk/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/data2/user10/anaconda3/envs/zwk/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/data2/user10/anaconda3/envs/zwk/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/data2/user10/code/UDA_SS/ReCo_loveDA/train_semisup_loveDA.py", line 211, in main_train
    unsup_loss = compute_unsupervised_loss(pred_u_large, train_u_aug_label, train_u_aug_logits, args.strong_threshold,ignore_index=250)
  File "/data2/user10/code/UDA_SS/ReCo_loveDA/module_list.py", line 71, in compute_unsupervised_loss
    loss = F.cross_entropy(predict, target, reduction='none', ignore_index=ignore_index)# loss shape : torch.Size([2, 512, 512])
  File "/data2/user10/anaconda3/envs/zwk/lib/python3.8/site-packages/torch/nn/functional.py", line 2846, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
 (function _print_stack)
[W python_anomaly_mode.cpp:104] Warning: Error detected in NllLoss2DBackward0. Traceback of forward call that caused the error:
  File "<string>", line 1, in <module>
  File "/data2/user10/anaconda3/envs/zwk/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/data2/user10/anaconda3/envs/zwk/lib/python3.8/multiprocessing/spawn.py", line 129, in _main
    return self._bootstrap(parent_sentinel)
  File "/data2/user10/anaconda3/envs/zwk/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/data2/user10/anaconda3/envs/zwk/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/data2/user10/anaconda3/envs/zwk/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/data2/user10/code/UDA_SS/ReCo_loveDA/train_semisup_loveDA.py", line 211, in main_train
    unsup_loss = compute_unsupervised_loss(pred_u_large, train_u_aug_label, train_u_aug_logits, args.strong_threshold,ignore_index=250)
  File "/data2/user10/code/UDA_SS/ReCo_loveDA/module_list.py", line 71, in compute_unsupervised_loss
    loss = F.cross_entropy(predict, target, reduction='none', ignore_index=ignore_index)# loss shape : torch.Size([2, 512, 512])
  File "/data2/user10/anaconda3/envs/zwk/lib/python3.8/site-packages/torch/nn/functional.py", line 2846, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
 (function _print_stack)
  0%|                                                                                                                                                                               | 0/145 [00:20<?, ?it/s]
  0%|                                                                                                                                                                               | 0/145 [00:20<?, ?it/s]
Traceback (most recent call last):
  File "train_semisup_loveDA.py", line 398, in <module>
    mp.spawn(main_train, args=(world_size, args), nprocs=world_size, join=True)
  File "/data2/user10/anaconda3/envs/zwk/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/data2/user10/anaconda3/envs/zwk/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/data2/user10/anaconda3/envs/zwk/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/data2/user10/anaconda3/envs/zwk/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/data2/user10/code/UDA_SS/ReCo_loveDA/train_semisup_loveDA.py", line 231, in main_train
    loss.backward()
  File "/data2/user10/anaconda3/envs/zwk/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/data2/user10/anaconda3/envs/zwk/lib/python3.8/site-packages/torch/autograd/__init__.py", line 154, in backward
    Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.LongTensor [4, 512, 512]] is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!


对输入到compute_unsupervised_loss函数的train_u_aug_label 进行clone()操作。 

 

本文标签: NeededvariablesRuntimeErrorGradientinplace