Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robocasa Language Embedding Cuda Out of Memory Error #191

Open
JacobB33 opened this issue Aug 23, 2024 · 0 comments
Open

Robocasa Language Embedding Cuda Out of Memory Error #191

JacobB33 opened this issue Aug 23, 2024 · 0 comments
Assignees

Comments

@JacobB33
Copy link

JacobB33 commented Aug 23, 2024

In line 222 of the Robocasa branch of robomimic/utils/train_utils.py, upon dataset creation, the dataset kwargs are deepcoppied. Since the language embedding model is one of the dataset_kwargs, this makes a copy of the model as well. This has caused me to run into a cuda out-of-memory issue when you train on a large number of dataset files. For example in Libero if you have 90 datasets, there are 90 copies of the language embedding model in cuda memory.
I made a quick modification that fixed this problem:

for i in range(len(ds_weights)):     
        ds_kwargs_copy = deepcopy(ds_kwargs)
        # Change so that we do not run out of cuda memory
        if "lang_encoder" in ds_kwargs:
                ds_kwargs_copy["lang_encoder"] = ds_kwargs["lang_encoder"]

        keys = ["hdf5_path", "filter_by_attribute"]

        for k in keys:
            ds_kwargs_copy[k] = ds_kwargs[k][i]

        ds_kwargs_copy["dataset_lang"] = ds_langs[i]
        ds_list.append(ds_class(**ds_kwargs_copy))

Should I maybe make this a PR? It might be more efficient to pop the lang_encoder and then not copy it for every dataset (even though with the above fix it gets immediately deleted)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants