[Proposal] Allow users to shuffle ActivationCache dataset, rather than shuffling the pieces of the activation cache #277

naterush · 2024-09-03T19:59:33Z

Proposal

Add shuffle_dataset_upfront as a config option to CacheActivationsRunnerConfig.

Motivation

Currently, users can cache activations using the CacheActivationsRunner class. However, when caching these activations, an enormous portion of runtime is spent shuffling data pairwise within buffers. In my (highly-unscientific) experiments, shuffling (with default values) was >50% of the runtime, and ended up consistently triggering OOM errors on my GPU.

While it's currently possible to configure the CacheActivationsRunner to avoid shuffling all-together, this might have a training impact on the resulting SAE depending on the organization of the initial dataset.

As such, it would be ideal to allow users to:

Disable all pairwise shuffling between different saved activation tensors.
Shuffle input token sequences upfront. On top of being less data to move around, we only need to shuffle once in this case.

Both of these changes could be enabled with backward-compatible extensions to CacheActivationsRunnerConfig. Simple a new param, shuffle_dataset_upfront (or something) - which must have streaming=False.

Pitch

I made this change - and it resulted in me being able to cache activations! Previously, I would just get OOM errors on my GPU during shuffling (which might be an related bug resulting in not cleaning up old buffers).

Alternatives

Not sure there are too many, if you're aiming for a random order of activations - you need to either shuffle before or during. This adds shuffling before.

Alternatively, the user could be responsible for shuffling and then reuploading the dataset to hugging face before just redownloading - but this is a lot of extra work that we could totally avoid and handle easily with a single extra param.

Checklist

I have checked that there is no similar issue in the repo (required)

The text was updated successfully, but these errors were encountered:

naterush · 2024-09-03T20:00:04Z

Happy to take a shot at adding this, btw (would be a good early contribution for me) -- but let me know if there's an appetite for it, before I go for it.

Thanks!

jbloomAus · 2024-09-26T12:44:51Z

I don't think shuffling tokens up front would help. We need activations from different contexts to get mixed. I'd be open to a PR which makes the shuffling less frequent or turns it off so we people can move more quickly sometimes (though the shuffling is supposed to be important according to Anthropic).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Allow users to shuffle ActivationCache dataset, rather than shuffling the pieces of the activation cache #277

[Proposal] Allow users to shuffle ActivationCache dataset, rather than shuffling the pieces of the activation cache #277

naterush commented Sep 3, 2024

naterush commented Sep 3, 2024

jbloomAus commented Sep 26, 2024

[Proposal] Allow users to shuffle ActivationCache dataset, rather than shuffling the pieces of the activation cache #277

[Proposal] Allow users to shuffle ActivationCache dataset, rather than shuffling the pieces of the activation cache #277

Comments

naterush commented Sep 3, 2024

Proposal

Motivation

Pitch

Alternatives

Checklist

naterush commented Sep 3, 2024

jbloomAus commented Sep 26, 2024