Release 2.4.0 #702

* Allow attention_mask to override the default mask in HookedTransformer.forward(). * Add attention_mask argument to loss_fn() and lm_cross_entropy_loss() and adjust the cross entropy calculation to ignore masked (padding) tokens. --------- Co-authored-by: Bryce Meyer <[email protected]>

* add a demo for Patchscopes and Generation with Patching * added pathscopes generation demo to tests * ignored a couple cells --------- Co-authored-by: Bryce Meyer <[email protected]>