[pytorch] device-side assert triggered error 해결방법

1. 에러 발생 상황

torchvision.model 에서 resnet 50을 불러와서 fully connected 부분을 변경시킨 모델을 응용하여 새로운 모델을 작성중에 에러가 발생되었습니다.

에러 내용은 다음과 같습니다.

RuntimeError: CUDA error: device-side assert triggered

또는

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

첫번째 에러의 경우 colab에서 확인된 에러이고, 두번째의 경우 로컬 컴퓨터에서의 에러입니다.

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.


혹은

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

2. 해결방법

이것저것 찾아보다가 batch size를 다르게 하면 해결된다고 해서 batch size도 조절해보았지만 안됨.

https://brstar96.github.io/devlog/shoveling/2020-01-03-device_error_summary/

이유를 알 수 없는 GPU 에러 정리(device-side assert, CUDA error, CUDNN_STATUS_NOT_INITIALIZED 등등…)

딥러닝 모델 학습에 있어서 빠지면 서러운 GPU는 간혹 알 수 없는 오류를 뿜으며 뻗을 때가 있죠. 이 포스팅에서는 깃허브 이슈 페이지와 스택 오버플로우에서 자주 만날 수 있는 GPU-side 에러들에

brstar96.github.io

위와 같은 사이트에서 다양한 이유를 설명해줬는데, 그중에서 클래스 인덱스 번호가 잘못되면 그럴 수 있다고 이야기함.

classification 하는건 10개인데, 1~10까지의 범위를 주면 생기는 에러일수 있음. python은 0부터 시작하기 때문에 고려해야함.

다시 확인해보니 fully connected에서 class index를 잘못 설정을 해서 정확한 class index를 올바르게 설정후 모델을 돌리니 잘 돌아가게됨.

728x90

저작자표시 비영리 변경금지

'딥러닝\머신러닝 > 에러 디버깅' 카테고리의 다른 글

[Pytorch] torch.hub.load를 사용 못하는 상황에서 로컬로 진행하기 (4)	2023.04.28
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 16 but got size 15 for tensor number 1 in the list. 에러 해결 (0)	2022.11.09
[pandas, DataFrame] ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). 에러해결 (0)	2022.09.08
[tensorflow 설정] get_config error 해결 (4)	2022.03.10
[pytorch] RuntimeError : expected scaler type Float but found Double에러 났을 경우 (0)	2021.06.15

Korean Bioinformatics

[pytorch] device-side assert triggered error 해결방법

1. 에러 발생 상황

2. 해결방법

'딥러닝\머신러닝 > 에러 디버깅' 카테고리의 다른 글

댓글

티스토리툴바

[pytorch] device-side assert triggered error 해결방법

1. 에러 발생 상황

2. 해결방법

'딥러닝\머신러닝 > 에러 디버깅' 카테고리의 다른 글

관련글

댓글

티스토리툴바