-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA Out of Memory Error when Running get_embedding.py on Small Dataset #33
Comments
Hi, |
Unfortunately I don't currently have access to that large of a size GPU's. I tried running the code by using float16 instead of float32 (i.e using torch package In light of these memory shortcomings is there any recommendations on your side? And as a follow-up is there any chance you will release the medium and smaller sized models? |
Also it requires ~80GBs of RAM on a normal bulk RNA dataset which is quite crazy. I modified the encoder layers a bit to offload layers to different GPUs, and it worked well. However, it requires you to have a decent number of GPUs to add up to have a large total amount of RAM, which, in my case, is 4x3090s. To do so, modify the import torch
import torch.nn as nn
import torch.nn.functional as F
visible_gpus = torch.cuda.device_count()
class pytorchTransformerModule(nn.Module):
def __init__(
self,
max_seq_len,
dim,
depth,
heads,
ff_mult=4,
norm_first=False,
):
super(pytorchTransformerModule, self).__init__()
self.max_seq_len = max_seq_len
self.depth = depth
layers = []
for i in range(depth):
device_index = i % visible_gpus
layers.append(
nn.TransformerEncoderLayer(
d_model=dim,
nhead=heads,
dim_feedforward=dim * ff_mult,
batch_first=True,
norm_first=norm_first,
# activation="gelu",
).to(f"cuda:{device_index}") # add layers in a round-robin manner, but maybe better to make it consistent so no so many swaps happen
)
self.transformer_encoder = nn.ModuleList(layers)
self.norm = nn.LayerNorm(dim).to("cuda:0")
def forward(self, x, padding_mask):
b, n, _, device = *x.shape, x.device
assert (
n <= self.max_seq_len
), f"sequence length {n} must be less than the max sequence length {self.max_seq_len}"
# x get encodings [B, N, D] , batch_first is True
for index, mod in enumerate(self.transformer_encoder):
device_index = (
index % visible_gpus
) # get index of the device and copy x/mask to it
x = x.to(f"cuda:{device_index}")
padding_mask = padding_mask.to(f"cuda:{device_index}")
x = mod(
x, src_key_padding_mask=padding_mask
) # , src_mask=mask, src_key_padding_mask=src_key_padding_mask)
# x = self.transformer_encoder(x)
x = self.norm(x.to("cuda:0"))
return x And disable the |
I encountered a CUDA Out of Memory error when running the script
get_embedding.py
with a small dataset containing 2 rows.Below are the details of the error and the command used to run the script.
Also what is your suggested environment for running scFoundation? how much GPU capacity is recommended?
Command Used:
sbatch test.3.sh /home/sbnb/ddalton/projects/scFoundation/model/get_embedding.py --task_name SCAD_bulk_Etoposide --input_type bulk --output_type cell --pool_type all --tgthighres f1 --data_path X_df_sample.csv --save_path ./ --pre_normalized F --version ce --demo
X_df_sample.csv
contains the same data asX_df.csv
but with only 2 rows.Error Log:
Memory Tracking
I also tracked memory usage with this function:
At various steps in the
get_embeddings.py
script - jsut beforegeneemb = pretrainmodel.encoder(x,x_padding)
:With the following output:
Environment Details
PyTorch version: 1.13.1+cu117
CUDA version: 11.7
GPU: 24 GB total capacity
Thanks in advance!
The text was updated successfully, but these errors were encountered: