-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
batchConverter uses up a lot of RAM #2
Comments
hi @joelmeili, Can you show your example and error code? In theory, batchConverter does not take up a lot of memory. |
Hi, Thanks for responding! So I would like to "predict"/"calculate" the embeddings for a list of amino acid sequences, which I extract from a .fasta-file. So the corresponding .fasta-file can be downloaded from here. Attached you can find a python file with the code used. The following is the error message that gets thrown: |
hi, |
Hi again, `from Bio import SeqIO device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") model = load_prot_flash_base().to(device) class ProteinSequenceDataset(Dataset):
train_set = ProteinSequenceDataset(fasta_file) print(next(iter(train_loader)))` But I guess there might be a smarter way to go about it, for example, how can you apply the transformation in getitem on batch level instead of individual entry level? |
Okay I think I found this workaround, is this how you'd write it aswell? `from Bio import SeqIO device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") model = load_prot_flash_base().to(device) class ProteinSequenceDataset(Dataset):
def collate_fn(data):
train_set = ProteinSequenceDataset(fasta_file) print(next(iter(train_loader)))` |
@joelmeili Yes, I think your code is reasonable, but I suggest you can finetune the language model, which will bring huge benefits. Example:
|
@wangleiofficial Thanks! I'll look into it when I get to that point. Thanks so far! |
hi @wangleiofficial, so I tried to implement the model within a different model trying to predict protein functions, it seems to work for CPU, but when I try to use the CUDA framework it seems to not work anymore. Essentially I get this error message: |
Hello, the pytorch lighting framework does not place the batch_token processed by the batchConverter function on the GPU, you need to implement it manually:
If you have any problems using ProtFlash, you can contact me and I will be happy to respond. |
Cool, thanks! I did what you proposed, but also had to put the flash model itself on to (self.device) in the forward step. |
Is there a way to run batchConverter in a way to not use up a lot of RAM? When I'm trying to run it, it uses up all RAM and then crashes. The issue seems to come from the pad_sequence when there are a lot of proteins.
The text was updated successfully, but these errors were encountered: