[pytorch] AttributeError: DistributedDataParallel has no attribute

2021. 4. 21. 22:27

기 학습된 모델과 파라미터를 로드해 특정 레이어를 제외한 나머지 레이어의 _weight_을 freeze하고 finetuning하고자 하다가 또
어이없는 실수를 하고 말아 다시 정리한다.

환경:

apex.parallel DistributedDataParallel 을 이용해 학습

에러 발생 순서:

DistributedDataParallel 을 이용해 학습 도중 _model.state_dict()_을 저장
학습 하던 모델에서 backbone 및 특정 _head_의 _weight_만 학습 시키기 위해 학습을 원하지 않는 _tensor_의 _requires_grad_를 _False_로 세팅 한다.
대략 적인 코드는 다음과 같다.

model = modelFactory(model)
#wrap model using distributedDataParallel
if torch.cuda.is_available():
    model.cuda()
    if torch.distributed.is_initialized():
        model = DistributedDataParallel(model)

#load trained parameters
checkpoint = torch.load(path, map_location = lambda storage, loc: storage.cuda(torch.distributed.get_rank()))
model.load_state_dict(checkpoint['state_dict'])
model.freezePartOfModel() #<- 에러 발생 부분

위와 같은 작업 흐름을 따라 갈 경우 아래와 같은 에러가 발생할 수 있다.

AttributeError: 'DistributedDataParallel' has bo attribute 'freezePartOfModel'

발생 원인:

아래와 같이 DistributedDataParallel 로 모델을 wrap up 할 경우 기존 _model_은 _module_로 감싸여 진다.

modelDDP= DistributedDaraParallel(model)

즉, 감싸여져 있다는 의미는 model.a_라는 *attribute_에 접근 하기 위해서는 *modelDDP.module.a 로 접근 해야 한다는 것이다.

해결 방법:

아래 처럼 코드를 고치면 해결이 가능 하다.

if torch.distributed.is_distributed():
    model.module.freezePartOfModel()
else: # for single gpu usage
    model.freezePartOfModel()

'pytorch' 카테고리의 다른 글

[pytorch] RuntimeError " All tensor must be on devices[0]: 0" (0)	2021.08.31
[Profile] GPU profile을 통한 병목 진단 및 개선 (6)	2021.07.19
[pytorch] 'Unexpected key(s) in state_dict' error (0)	2021.04.21
[pytorch] torch.gather 설명 (2)	2021.03.05

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

[pytorch] AttributeError: DistributedDataParallel has no attribute

환경:

에러 발생 순서:

발생 원인:

해결 방법:

'pytorch' 카테고리의 다른 글

+ Recent posts

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역