Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TAO interface #143

Merged
merged 72 commits into from
Oct 3, 2024
Merged

TAO interface #143

merged 72 commits into from
Oct 3, 2024

Conversation

jrmaddison
Copy link
Contributor

@jrmaddison jrmaddison commented May 2, 2024

Optimisation using PETSc/TAO.

Uses the pyadjoint OverloadedType interface to provide a generic implementation. However these interfaces don't provide enough information about data decomposition with MPI parallelism. This is worked around in this PR using OverloadedType._ad_assign_numpy, but this requires a temporary, but global, NumPy array. Fixing this requires a change to OverloadedType and then other updates elsewhere (in particular in Firedrake).

Related, OverloadedType._ad_to_list returns a list rather than a NumPy ndarray.

Not added:

  • MaximisationProblem support.
  • Equality or inequality constraints.
  • Any optimisations to prevent extra forward or first adjoint reruns (e.g. with second order methods which might call TAOObjective.objective_gradient followed by TAOObjective.hessian).

@jrmaddison jrmaddison marked this pull request as ready for review May 20, 2024 17:47
@jrmaddison jrmaddison requested a review from dham May 20, 2024 17:47
@dham
Copy link
Member

dham commented Jun 19, 2024

I think that even if you are going to go via Numpy, this should be done by giving OverloadedType an ._ad_as_petsc_vec method so that we can go back and do it properly for Firedrake Functions afterwards. As it stands, I think that the TAO interface would have to be substantially rewritten in order to support OverloadedTypes that had a native mechanism for casting to PETSc Vec, and that seems unfortunate.

@jrmaddison
Copy link
Contributor Author

I think that even if you are going to go via Numpy, this should be done by giving OverloadedType an ._ad_as_petsc_vec method so that we can go back and do it properly for Firedrake Functions afterwards. As it stands, I think that the TAO interface would have to be substantially rewritten in order to support OverloadedTypes that had a native mechanism for casting to PETSc Vec, and that seems unfortunate.

I think the 'one-or-more' pyadjoint interfaces mean that copying is going to be needed somewhere. e.g. there's nothing to stop a mixture of AdjFloats and firedrake.Function controls. Maybe those could be PETSc copy operations if the backend supplies PETSc Vecs -- I suggested NumPy as an interface layer as that's close to what OverloadedTypes already supply.

@dham
Copy link
Member

dham commented Jun 20, 2024

OK, I can see what's going on here. I actually think it would be an easy fix to do this without the global gather, it just requires a few very simple methods to be added to Overloaded type, but I guess that can be delayed. Please add a health warning about the global gather in the class docstring, because the comment on the PR will not be visible to users.

@jrmaddison
Copy link
Contributor Author

OK, I can see what's going on here. I actually think it would be an easy fix to do this without the global gather, it just requires a few very simple methods to be added to Overloaded type, but I guess that can be delayed. Please add a health warning about the global gather in the class docstring, because the comment on the PR will not be visible to users.

It's not even a global gather, we just need to known the number of process local (owned) degrees of freedom, and OverloadedType doesn't provide this except indirectly via _ad_convert_numpy.

@dham
Copy link
Member

dham commented Jun 20, 2024

OK, I can see what's going on here. I actually think it would be an easy fix to do this without the global gather, it just requires a few very simple methods to be added to Overloaded type, but I guess that can be delayed. Please add a health warning about the global gather in the class docstring, because the comment on the PR will not be visible to users.

It's not even a global gather, we just need to known the number of process local (owned) degrees of freedom, and OverloadedType doesn't provide this except indirectly via _ad_convert_numpy.

OK, I'm confused again. I think the current implementation does a global gather onto a single numpy array. Is that not correct?

@jrmaddison
Copy link
Contributor Author

jrmaddison commented Jun 20, 2024

OK, I'm confused again. I think the current implementation does a global gather onto a single numpy array. Is that not correct?

You're correct. I'd missed (or forgotten) how OverloadedType._ad_assign_numpy and OverloadedType._to_list behave -- e.g. using a gather in OverloadedType._to_list.

This obviously won't scale, but I also don't see an alternative via the current OverloadedType interface.

@jrmaddison
Copy link
Contributor Author

jrmaddison commented Jul 19, 2024

Firedrake test requires firedrakeproject/firedrake#3657

pyadjoint/adjfloat.py Outdated Show resolved Hide resolved
pyadjoint/adjfloat.py Outdated Show resolved Hide resolved
pyadjoint/optimization/tao_solver.py Outdated Show resolved Hide resolved
pyadjoint/optimization/tao_solver.py Show resolved Hide resolved
pyadjoint/optimization/tao_solver.py Show resolved Hide resolved
pyadjoint/optimization/tao_solver.py Outdated Show resolved Hide resolved
The method should implement a routine for assigning other to `self`.

Note:
This method should not be employed for immutable types.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line here highlights an issue with this approach. OverloadedType actually conflates two sets of functionality:

  1. Things needed in order to tape a variable.
  2. Things needed in order to work with variables when computing taped operations (replay, tlm, adjoint etc.) and optimisation.

This functionality is part of 2 (it's not needed for taping). At that stage of operations, all variables are treated as immutable, so assignment looks like a strange operation at best. The fact that it doesn't work for some types is a further red flag.

Assignment itself is also not really what is meant here. The use case in question seems to be assigning a scalar to a vector, which is not well-defined (think about non-point-evaluation FunctionSpaces, for example). I think what is really meant here is interpolate.

On the mutability point, I think this should be more like _ad_init_object and return a new value. I.e. the signature would be:

    @classmethod
    def _ad_interpolate(cls, obj, value):

and the required semantics would be to return a new object which is identical to obj except that its value approximates value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ad_iadd and _ad_imul are existing in-place operations. However I agree here an in-place operation is not needed. I think the easiest is just to change VecInterface.to_petsc to handle the scalar case using VecSetValues.

I also don't think scalar bounds are clearly defined. These could mean

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would bring us back to the option of requiring the user to specify bounds of the control type. I think this is probably the right thing - Pyadjoint can't interpret the meaning of variables beyond basic Hilbert space operations (i.e. vector space operations + inner product).

I guess the corner case is the no bounds case. We could either make that a user problem or we'd have to add _ad methods for producing variables that are element-wise minimal or maximal. That would be convenient for users but I'm not sure it smells good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requiring the user to specify bounds of the control type

I'd definitely prefer that. Existing pyadjoint examples use scalars, though.

I guess the corner case is the no bounds case.

That at least can be handled with VecSetValues and np.finfo(PETSc.ScalarType).min/np.finfo(PETSc.ScalarType).max.

Copy link
Contributor Author

@jrmaddison jrmaddison Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented the 'scalar bound = bound on all dofs' case using VecSet. This might still lead to some surprises depending on the basis, but I think is consistent with pyadjoint behaviour elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current pyadjoint behaviour is described here.

#: bounds: lower and upper bounds for the control (optional). None means
#: unbounded. if not None, then it must be a list of the same length as
#: the number controls for the reduced_functional. Each entry in the list
#: must be a tuple (lb, ub), where ub and lb are floats, or objects
#: of the same kind as the control.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual check allows None and int as well.

for b in bound:
b = create_overloaded_object(b, suppress_warning=True)
klass = control.tape_value().__class__
if not (isinstance(b, (int, float, type(None), klass))):
raise TypeError("This pair (lb, ub) should be None, a float, or a %s." % klass)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the meeting, this should be fixed separately from this PR.

pyadjoint/adjfloat.py Outdated Show resolved Hide resolved
pyadjoint/optimization/tao_solver.py Outdated Show resolved Hide resolved
pyadjoint/overloaded_type.py Outdated Show resolved Hide resolved
pyadjoint/overloaded_type.py Outdated Show resolved Hide resolved
pyadjoint/overloaded_type.py Outdated Show resolved Hide resolved
pyadjoint/optimization/tao_solver.py Outdated Show resolved Hide resolved
def finalize_callback(*args):
for arg in args:
if arg is not None:
arg.destroy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JDBetteridge says this needs to be .delayed_destroy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not available, but the possible deadlock is now documented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JDBetteridge is unconvinced. Let's discuss this next week.

Copy link
Contributor Author

@jrmaddison jrmaddison Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended to work around the following, which leaks memory on my machine (Ubuntu 24.04, run on two processes)

from firedrake import *
from firedrake.petsc import garbage_cleanup

import numpy as np

from itertools import count
import gc
import weakref


class Mat:
    def __init__(self, space):
        self._x = Function(space)
        self._test = TestFunction(space)
        self._memory = np.full(1000000, 1, dtype=np.uint8)

    def mult(self, x, y):
        with self._x.dat.vec_wo:
            x.copy(result=x_v)
        y_c = assemble(inner(self._x, self._test) * dx)
        with y_c.dat.vec_ro as y_v:
            y_v.copy(result=y)


def attach_destroy_finalizer(obj, *args):
    def finalize_callback(*args):
        for arg in args:
            if arg is not None:
                arg.destroy()

    finalize = weakref.finalize(obj, finalize_callback,
                                *args)
    finalize.atexit = False


mesh = UnitSquareMesh(10, 10)
space = FunctionSpace(mesh, "Lagrange", 1)
u = Function(space, name="u")
with u.dat.vec_ro as u_v:
    n, N = u_v.getSizes()


class Test:
    pass


for _ in count():
    t = Test()
    A = Mat(space)
    mat = PETSc.Mat().createPython(((n, N), (n, N)), A, comm=u.comm)
    # attach_destroy_finalizer(t, mat)  # Uncomment to fix leak

    del t
    gc.collect()
    garbage_cleanup(u.comm)

Arguably it would be better to track down the source of the leak -- but the case where I've hit this was much harder to debug than the above, hence the defensive use of destroy. The main downside is if properties are accessed and retained, e.g.

tao_solver = TAOSolver(...)
tao = tao_solver.tao
del tao_solver
# TaoDestroy has been called for tao

Copy link
Contributor Author

@jrmaddison jrmaddison Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous example leaks because garbage_cleanup(u.comm) doesn't call PETSc.garbage_cleanup(u.comm) (is this a separate bug?).

n = sum(vec.getLocalSize() for vec in vecs)
N = sum(vec.getSize() for vec in vecs)
_, isets = PETSc.Vec().concatenate(vecs)
for vec in vecs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the destructor is really not necessary, let's not do it.

if it is necessary then _ad_to_petsc() should probably be a context manager so as to remove the footgun.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

x_sub.restoreSubVector(iset, x_sub)


class PETScOptions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pirate the PETSc Options handler from Firedrake.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

pyadjoint/optimization/tao_solver.py Show resolved Hide resolved
to_petsc(ub_vec, ubs)
tao.setVariableBounds(lb_vec, ub_vec)
lb_vec.destroy()
ub_vec.destroy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't destroy unless needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

pyadjoint/optimization/tao_solver.py Show resolved Hide resolved
pyadjoint/optimization/tao_solver.py Show resolved Hide resolved
self, tao, H_matrix, M_inv_matrix, B_0_matrix_pc, B_0_matrix, x)

@property
def taoobjective(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 name should have underscore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, although I followed ROLSolver.rolobjective.

self.tao.solve()
self._vec_interface.from_petsc(self.x, M)
if self.tao.getConvergedReason() <= 0:
raise RuntimeError("Convergence failure")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the conventional exception for solver fails?

Copy link
Contributor Author

@jrmaddison jrmaddison Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use a TAOConvergenceError with a message similar to Firedrake linear solve failures.

pyadjoint doesn't seem to raise exceptions for optimizer failures in the SciPy or ROL interfaces. I'm pretty sure that's unsafe for the SciPy interface, #166.

@dham dham merged commit b9574fb into dolfin-adjoint:master Oct 3, 2024
2 checks passed
@jrmaddison jrmaddison deleted the jrmaddison/tao branch October 3, 2024 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants