Fix excessive RAM usage of preference comparisons #842
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Previously we stored a view into the trajectory in the preference comparison dataset. This view is a reference to the original trajectory, and therefore keeps it from getting garbage collected for as long as the view exists (i.e., however long the comparison is stored in the dataset).
This is problematic when trajectories are large and long, e.g., in the case of atari (images) with SEALS (long episodes). It can cause the RAM to fill up quite quickly in that setting.
We can fix it by copying the fragments we want to store. That avoids keeping a reference to the original trajectory alive, we only need to store the fragment. The tradeoff is that copying adds some overhead and overlapping fragments are no longer deduplicated.
Testing
I trained atari Pong and verified that the RAM usage remains roughly constant after this change. Prior to this change, it kept increasing until the memory ran out.