torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB. GPU 0 has a total capacity of 7.78 GiB of which 44.69 MiB is free. Including non-PyTorch memory, this process has 7.73 GiB memory in use.

ai文章 1 年前 0 24

 initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
/home/rong/.local/lib/python3.10/site-packages/torch/__init__.py:696: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:451.)
  _C._set_default_tensor_type(t)
Traceback (most recent call last):
  File "/home/rong/python/ai_project/llama/./example_text_completion.py", line 69, in <module>
    fire.Fire(main)
  File "/home/rong/.local/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/rong/.local/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/rong/.local/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/rong/python/ai_project/llama/./example_text_completion.py", line 32, in main
    generator = Llama.build(
  File "/home/rong/python/ai_project/llama/llama/generation.py", line 119, in build
    model = Transformer(model_args)
  File "/home/rong/python/ai_project/llama/llama/model.py", line 443, in __init__
    self.layers.append(TransformerBlock(layer_id, params))
  File "/home/rong/python/ai_project/llama/llama/model.py", line 376, in __init__
    self.feed_forward = FeedForward(
  File "/home/rong/python/ai_project/llama/llama/model.py", line 337, in __init__
    self.w1 = ColumnParallelLinear(
  File "/home/rong/.local/lib/python3.10/site-packages/fairscale/nn/model_parallel/layers.py", line 262, in __init__
    self.weight = Parameter(torch.Tensor(self.output_size_per_partition, self.in_features))
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB. GPU 0 has a total capacity of 7.78 GiB of which 44.69 MiB is free. Including non-PyTorch memory, this process has 7.73 GiB memory in use. Of the allocated memory 7.62 GiB is allocated by PyTorch, and 1.73 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
[2024-03-28 16:23:42,808] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 5405) of binary: /usr/bin/python3
Traceback (most recent call last):
  File "/home/rong/.local/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/rong/.local/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/home/rong/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main
    run(args)
  File "/home/rong/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run
    elastic_launch(
  File "/home/rong/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/rong/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
./example_text_completion.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-03-28_16:23:42
  host      : pc
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 5405)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

从提供的错误信息来看，主要问题是torch.cuda.OutOfMemoryError: CUDA out of memory，这意味着您的GPU内存不足以满足您的模型或代码的需求。具体来说，代码试图在GPU上分配86.00 MiB的内存，但GPU 0的总容量只有7.78 MiB。这通常发生在试图在资源有限的硬件上训练大型模型时。

以下是解决此问题的一些建议：

减小批量大小：批量大小（batch size）是每次迭代中用于训练模型的样本数量。减小批量大小可以减少GPU内存的使用。您可以在训练循环或模型配置中调整它。
使用更小的模型：如果您正在尝试训练一个非常大的模型（如LLaMA），并且您的GPU内存有限，那么您可能需要考虑使用一个较小的模型版本。
使用梯度累积：梯度累积是一种在不增加内存使用的情况下模拟大批量训练的技术。在每个迭代中，您会计算梯度但不立即更新模型权重。在累积了一定数量的梯度后，您会进行一次权重更新。这允许您使用较小的批量大小来模拟较大的批量，而不必增加GPU内存使用。
使用更多的GPU：如果您有多个GPU可用，并且您的模型支持分布式训练，那么您可以考虑使用多个GPU来分摊内存负担。
释放不再需要的张量：确保在不再需要张量时释放它们。这可以通过使用torch.cuda.empty_cache()来手动清理未使用的缓存，或确保使用with torch.no_grad():来包裹不需要梯度的操作。
优化模型结构：检查您的模型结构，看看是否有任何可以优化或简化的地方，以减少内存使用。
升级硬件：如果可能的话，考虑升级到具有更大GPU内存的硬件。