torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB. GPU 0 has a total capacity of 7.78 GiB of which 44.69 MiB is free. Including non-PyTorch memory, this process has 7.73 GiB memory in use.
initializing model parallel with size 1 > initializing ddp with size 1 > initializing pipeline with size 1 /home/rong/.local/lib/python3.10/site-packages/torch/__init__.py:696: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:451.) _C._set_default_tensor_type(t) Traceback (most recent call last): File "/home/rong/python/ai_project/llama/./example_text_completion.py", line 69, in <module> fire.Fire(main) File "/home/rong/.local/lib/python3.10/site-packages/fire/core.py", line 143, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/rong/.local/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/rong/.local/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/home/rong/python/ai_project/llama/./example_text_completion.py", line 32, in main generator = Llama.build( File "/home/rong/python/ai_project/llama/llama/generation.py", line 119, in build model = Transformer(model_args) File "/home/rong/python/ai_project/llama/llama/model.py", line 443, in __init__ self.layers.append(TransformerBlock(layer_id, params)) File "/home/rong/python/ai_project/llama/llama/model.py", line 376, in __init__ self.feed_forward = FeedForward( File "/home/rong/python/ai_project/llama/llama/model.py", line 337, in __init__ self.w1 = ColumnParallelLinear( File "/home/rong/.local/lib/python3.10/site-packages/fairscale/nn/model_parallel/layers.py", line 262, in __init__ self.weight = Parameter(torch.Tensor(self.output_size_per_partition, self.in_features)) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB. GPU 0 has a total capacity of 7.78 GiB of which 44.69 MiB is free. Including non-PyTorch memory, this process has 7.73 GiB memory in use. Of the allocated memory 7.62 GiB is allocated by PyTorch, and 1.73 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) [2024-03-28 16:23:42,808] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 5405) of binary: /usr/bin/python3 Traceback (most recent call last): File "/home/rong/.local/bin/torchrun", line 8, in <module> sys.exit(main()) File "/home/rong/.local/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper return f(*args, **kwargs) File "/home/rong/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main run(args) File "/home/rong/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run elastic_launch( File "/home/rong/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/home/rong/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ ./example_text_completion.py FAILED ------------------------------------------------------------ Failures: <NO_OTHER_FAILURES> ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2024-03-28_16:23:42 host : pc rank : 0 (local_rank: 0) exitcode : 1 (pid: 5405) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
从提供的错误信息来看,主要问题是torch.cuda.OutOfMemoryError: CUDA out of memory
,这意味着您的GPU内存不足以满足您的模型或代码的需求。具体来说,代码试图在GPU上分配86.00 MiB的内存,但GPU 0的总容量只有7.78 MiB。这通常发生在试图在资源有限的硬件上训练大型模型时。
以下是解决此问题的一些建议:
- 减小批量大小:批量大小(batch size)是每次迭代中用于训练模型的样本数量。减小批量大小可以减少GPU内存的使用。您可以在训练循环或模型配置中调整它。
- 使用更小的模型:如果您正在尝试训练一个非常大的模型(如LLaMA),并且您的GPU内存有限,那么您可能需要考虑使用一个较小的模型版本。
- 使用梯度累积:梯度累积是一种在不增加内存使用的情况下模拟大批量训练的技术。在每个迭代中,您会计算梯度但不立即更新模型权重。在累积了一定数量的梯度后,您会进行一次权重更新。这允许您使用较小的批量大小来模拟较大的批量,而不必增加GPU内存使用。
- 使用更多的GPU:如果您有多个GPU可用,并且您的模型支持分布式训练,那么您可以考虑使用多个GPU来分摊内存负担。
- 释放不再需要的张量:确保在不再需要张量时释放它们。这可以通过使用
torch.cuda.empty_cache()
来手动清理未使用的缓存,或确保使用with torch.no_grad():
来包裹不需要梯度的操作。 - 优化模型结构:检查您的模型结构,看看是否有任何可以优化或简化的地方,以减少内存使用。
- 升级硬件:如果可能的话,考虑升级到具有更大GPU内存的硬件。
请根据您的具体情况选择适当的解决方案。通常,减小批量大小和使用梯度累积是开始解决此类问题的简单方法。
扫码领红包微信赞赏
支付宝扫码领红包
声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。侵权投诉:375170667@qq.com