admin管理员组

文章数量:1584620

deepspeed或bitsandbytes遇到CUDA Setup failed.

解决方式:将bitsandbytes版本将为0.39.0(0.40以下)即可解决问题

# 环境如下:
# python==3.9.17
tqdm==4.62.3
bitsandbytes==0.39.0
deepspeed==0.9.5
numpy==1.25.1
pandas==1.2.5
protobuf==3.20.1
sentencepiece==0.1.99
tokenizers==0.13.3
torch==1.13.0+cu116
torchaudio==0.13.0+cu116
torchvision==0.14.0+cu116
urllib3==1.26.16
peft==0.4.0.dev0
transformers==4.31.0.dev0
accelerate==0.22.0.dev0

报错信息如下:

CUDA SETUP: Something unexpected happened. Please compile from source:
git clone git@github:TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=116 make cuda11x
python setup.py install
CUDA SETUP: Setup Failed!
Traceback (most recent call last):
  File "/mnt/sda/yuzhao/anaconda3/envs/llm/lib/python3.9/site-packages/transformers/utils/import_utils.py", line 1096, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/mnt/sda/yuzhao/anaconda3/envs/llm/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/mnt/sda/yuzhao/anaconda3/envs/llm/lib/python3.9/site-packages/transformers/training_args.py", line 67, in <module>
    from accelerate.state import AcceleratorState, PartialState
  File "/mnt/sda/yuzhao/anaconda3/envs/llm/lib/python3.9/site-packages/accelerate/__init__.py", line 3, in <module>
    from .accelerator import Accelerator
  File "/mnt/sda/yuzhao/anaconda3/envs/llm/lib/python3.9/site-packages/accelerate/accelerator.py", line 35, in <module>
    from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
  File "/mnt/sda/yuzhao/anaconda3/envs/llm/lib/python3.9/site-packages/accelerate/checkpointing.py", line 24, in <module>
    from .utils import (
  File "/mnt/sda/yuzhao/anaconda3/envs/llm/lib/python3.9/site-packages/accelerate/utils/__init__.py", line 131, in <module>
    from .bnb import has_4bit_bnb_layers, load_and_quantize_model
  File "/mnt/sda/yuzhao/anaconda3/envs/llm/lib/python3.9/site-packages/accelerate/utils/bnb.py", line 42, in <module>
    import bitsandbytes as bnb
  File "/mnt/sda/yuzhao/anaconda3/envs/llm/lib/python3.9/site-packages/bitsandbytes/__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "/mnt/sda/yuzhao/anaconda3/envs/llm/lib/python3.9/site-packages/bitsandbytes/research/__init__.py", line 1, in <module>
    from . import nn
  File "/mnt/sda/yuzhao/anaconda3/envs/llm/lib/python3.9/site-packages/bitsandbytes/research/nn/__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "/mnt/sda/yuzhao/anaconda3/envs/llm/lib/python3.9/site-packages/bitsandbytes/research/nn/modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "/mnt/sda/yuzhao/anaconda3/envs/llm/lib/python3.9/site-packages/bitsandbytes/optim/__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "/mnt/sda/yuzhao/anaconda3/envs/llm/lib/python3.9/site-packages/bitsandbytes/cextension.py", line 20, in <module>
    raise RuntimeError('''
RuntimeError:
        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github/TimDettmers/bitsandbytes/issues

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/sda/yuzhao/code/github/Firefly/train.py", line 1, in <module>
    from transformers import (
  File "<frozen importlib._bootstrap>", line 1055, in _handle_fromlist
  File "/mnt/sda/yuzhao/anaconda3/envs/llm/lib/python3.9/site-packages/transformers/utils/import_utils.py", line 1086, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/mnt/sda/yuzhao/anaconda3/envs/llm/lib/python3.9/site-packages/transformers/utils/import_utils.py", line 1098, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.training_args because of the following error (look up to see its traceback):

        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github/TimDettmers/bitsandbytes/issues

本文标签: CUDAbitsandbytesDeepSpeedGPUfailed