Skip to content

Commit

Permalink
PT distributed, fix GPU0 memory
Browse files Browse the repository at this point in the history
Fix #1469
  • Loading branch information
albertz committed Nov 29, 2023
1 parent 5b29a8c commit 8829b8f
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions returnn/torch/engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ def __init__(self, config: Config):
print(f"Start running torch distributed training on local rank {local_rank}.", file=log.v2)
assert self._device == "cuda", f"torch distributed: unexpected device {self._device!r}"
self._device = f"cuda:{local_rank}"
torch.cuda.set_device(local_rank)

# Theano and TensorFlow print sth like: Using gpu device 2: GeForce GTX 980 (...)
# Print in a similar format so that some scripts which grep our stdout work just as before.
Expand Down

0 comments on commit 8829b8f

Please sign in to comment.