Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReducedFunctional.optimize_tape will optionally _replace_ self.tape with the optimized tape, without modifying original tape. #171

Closed
wants to merge 6 commits into from

Conversation

JHopeCollins
Copy link
Contributor

This PR addresses #170

Currently ReducedFunctional.optimize_tape will modify self.tape in-place. This means that if anything else is using the same tape, then they now also have a modified tape.

This PR adds an optional replace argument to optimize_tape to give the option of copying the tape and replacing self.tape with the optimized tape, without modifying the original tape. The optimized tape is also returned (whether a copy was made or not).
I did it this way so that the default call has the same behaviour as before, so it shouldn't break any existing code - unless someone is relying on it returning None, but seeing as this is implicit and non-documented behaviour I don't think we care.

@dham also suggested that the default behaviour should change so that it doesn't modify self.tape, but just returns the optimized tape. Is this something we want to do? If so, should it raise a warning for now that the default behaviour will change in the future, and delay making the change for a bit?
I think if this behaviour does change, it would be good to have an optimize_tape(modify=False) argument to optionally still update self.tape, otherwise we'll probably see Jhat.tape = Jhat.optimize_tape() popping up all over.

…h optimized tape, but not modify original tape
@JHopeCollins JHopeCollins added enhancement New feature or request question Further information is requested labels Oct 11, 2024
@JHopeCollins JHopeCollins requested a review from dham October 11, 2024 10:54
@JHopeCollins JHopeCollins self-assigned this Oct 11, 2024
@jrmaddison
Copy link
Contributor

I think if this behaviour does change, it would be good to have an optimize_tape(modify=False) argument to optionally still update self.tape, otherwise we'll probably see Jhat.tape = Jhat.optimize_tape() popping up all over.

Is optimizing the tape actually expensive (it involves just a few passes over the tape)? I'd expect it's cheap enough (compared e.g. to a gradient calculation) that is isn't necessary to cache the optimized tape.

@JHopeCollins
Copy link
Contributor Author

I think if this behaviour does change, it would be good to have an optimize_tape(modify=False) argument to optionally still update self.tape, otherwise we'll probably see Jhat.tape = Jhat.optimize_tape() popping up all over.

Is optimizing the tape actually expensive (it involves just a few passes over the tape)? I'd expect it's cheap enough (compared e.g. to a gradient calculation) that is isn't necessary to cache the optimized tape.

The optional argument to update self.tape is so that you can still have (close to) the same behaviour as you have currently, i.e. optimize_tape actually does modify self.tape.

Otherwise anywhere that Jhat.optimize_tape() is currently called will inevitably get replaced with Jhat.tape = Jhat.optimize_tape(). This feels like a bit of a code smell because what if at some point the implementation changes and there's other things that ReducedFunctional needs to do if self.tape changes?

I guess the answer in that case would be to make ReducedFunctional.tape a property, but the Jhat.tape = Jhat.optimize_tape() still feels clunky compared to just having Jhat.optimize_tape(modify=True) (or whatever you want to call the argument).

@jrmaddison
Copy link
Contributor

What I mean is that, since it's (probably) cheap, the activity analysis can be performed each time a derivative is computed. i.e. I think it makes more sense as an argument, ReducedFunctional.derivative(..., optimize=True), than as a method modifying the ReducedFunctional. The optimized version of the tape could easily be cached if that's still wanted, but not as ReducedFunctional.tape.

Modifying ReducingFunctional.tape probably also breaks Hessian actions.

@dham
Copy link
Member

dham commented Oct 14, 2024

Modifying ReducingFunctional.tape probably also breaks Hessian actions.

Can you explain this? I don't understand why it would.

@jrmaddison
Copy link
Contributor

Can you explain this? I don't understand why it would.

You're correct, I've worked through the second order and I think it's OK.

@JHopeCollins
Copy link
Contributor Author

I've tested out having ReducedFunctional.__init__ take a copy of the tape and optimize it, so that ReducedFunctional only ever has a view over the most minimal part of the tape required.

The main issue I ran into is that Tape.copy creates an incomplete copy that only works in the most simple situations, mainly down to not copying checkpoints. The current copy doesn't transfer over the checkpoint manager, so there were errors anywhere a manager was used. I made two changes in Tape.copy to "fix" this. Better fixes are almost certainly needed.

  1. The checkpoint manager is attached to the new tape. This fixed the errors for in-memory checkpointing.
    The private _checkpoint_manager member is set explicitly because Tape.enable_checkpointing will throw an error if the tape already has blocks on. The checkpoint manager now has a reference to the wrong tape (but at the time of the copy that wrong tape still has all the right blocks for the new Tape in it). I imagine this is a Very Bad Thing to do, open to suggestions for the Right Thing.
    I could have a go at implementing a copy for CheckpointManager, I imagine it would need to take a tape argument and try to copy only the bits of the schedule that correspond to that tape (and error if it doesn't have something it needs).

  2. Any items in the package_data that don't have a copy method just get transferred to the package_data for the new Tape. This fixes the errors for disk checkpointing.
    This was to get around the fact that firedrake.DiskCheckpointer doesn't implement copy. Again, open to suggestions for the Right Thing.

@JHopeCollins
Copy link
Contributor Author

Closing this PR as it isn't clear how to deal with the issues in the comment above, particularly what it means to copy and optimize a tape with a checkpoint manager.
The aim of this PR (multiple ReducedFunctionals with very localised and independent tapes) can be achieved with liberal use of set_working_tape and stop_annotating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
3 participants