[draft] Introduce the ADValue class for templated traceless forward mode #64

cgraeser · 2024-11-01T15:22:33Z

This was initially developed within the Dune module dune-fufem (https://gitlab.dune-project.org/fufem/dune-fufem).

The class adolc::ADValue<T, maxOrder,dim> implementes a traceless forward mode templated with respect to the underlying type T, the maximal tracked derivative order maxOrder and the domain dimension dim. Currently only maxOrder<=2 is implemented.

Using the special value dynamicDim allows to use a runtime dimension. In the latter case the default is to use std::vector as storage. If support for boost::pool is activated, this is used instead.

Notable differences to adtl::adouble are:

Being able to use other types that T=double e.g. allows for low/high precision or interval arithmetics.
Support for second order derivatives. The interface is also flexible for later implementation of higher order.
Fixing the dimension at compile time allows the compiler to do better optimization and generate faster code. Furthermore this guarantees thread-safety (x) by construction and allows to use different dimensions at the same time within a single program.
In the dynamic dimension case, the dimension is set per variable, which allows to have different dimensions at the same time. This is implemented by using one boost:pools per dimension instead of a single one for a globally fixed dimension. This also avoids the memory leaks of adtl::adouble when changing the global dimension. Since the pools are thread_local instead of global, this again provides thread-safety (X).

The last point could probably also be implemented for adtl::adouble.

(x) "Thread-safety" hear does not mean that concurrent access to a single ADValue object is safe. Instead if means that concurrent accesss to different objects in different threads is safe and not prone to race-conditions due to internally used global variables.

cgraeser · 2024-11-01T15:38:07Z

This is marked as draft, because there are several things that need to be discussed, details that may need to be changed, and features that may need to be added. This includes the following topics:

Naming: While the class is placed in the namespace adolc, the naming of types uses CamelCase while adolc uses lower case elsewhere.
User interface: This needs to be discussed. This in particular involves the was ADValues are created and initialized and derivatives are read. Currently the whole stored jet is exposed publicly. I'd propose to only expose individual partial derivatives in order to be more flexible for changing the internal data structure later on, e.g. to use a Taylor-based implementation for higher order.
Size policy for dynamic mode: Currently the dynamic mode carefully traces and adjusts internal dimensions. This allows to do arithmetics with default constructed values (with internal size=0). While the performance loss seems to be negligible, one may alternatively declare mixing dimensions UB to simplify the code and maybe improve performance.
Nonlinear functions: Only a small collection is implemented so far. But there's a simple interface to add more. Furthermore no special care has been taken for 'reasonable' return values in kinks or singularities.
Tests: Missing so far.
Benchmarks against existing AdolC modes may be useful.
Documentation only exists as Doxygen class docs but no manual style descriptive text or examples.
C++ standard: Currently this uses C++20 (for spaceship operator) but this could be easily downgraded to C++17 (maybe even C++14) at the expense of some more boiler-plate code.
Build system integration: This is currently a single self-contained header not using any other parts of AdolC. Thus the USE_BOOST_POOL macro has to be set manually so far.

cgraeser · 2024-11-07T21:25:10Z

Notice that the test failure is unrelated, since this pull request does not change anything about the existing code and the failure also shows up in current master.

cgraeser · 2024-11-11T12:03:51Z

In general one would expect that it is beneficial to exploit symmetry for second order derivatives. To my surprise this is slower for static dimension. My first guess on why computing only one triangle of the Hessian is slower was, that this is due to the conditional index flip when requesting values. It turns out that this is not the case. Even if one only queries the computed triangle and removes the index flip this is still slower. To be precise, using

for(int i=0; i<dim; ++i)
  for(int j=0; j<=dim; ++j)
    ...

did always produce faster code than

for(int i=0; i<dim; ++i)
  for(int j=i; j<=dim; ++j)
    ...

Surprisingly (in the static dim case) the following is still faster than the latter but not fully on par with the former

for(int i=0; i<dim; ++i)
  for(int j=0; j<=dim; ++j)
    if (j>=i)
      ...

I guess that the code for computing the full matrix benefits from auto-vectorization using SIMD operations.
In my tests (local Laplace assembler as Hessian of a local Dirichlet energy) the difference was almost a factor two (with static dim), i.e. this aspect is significant.

For dynamic dimension the situation is different: Here exploiting symmetry was measurably faster.

cgraeser · 2024-11-11T12:18:51Z

Another performance observation:

One can always compute higher order derivatives using nested ADValues. Simply using a nested 1st order dim-variate ADValue to compute the Hessian is slow. But interestingly you can even compute mixed 2nd order derivatives using nested univariate ADValues if you initialize the derivatives appropriately. Then you do one f-evaluation per derivative:

// Raw input vector
using Vector = std::array<double,dim>;

// Nested 1st order univariate AD-value to compute mixed second order derivatives
using Nested = ADValue<ADValue<double,1,1>, 1, 1>;

// AD-aware vector
using ADVector = std::array<NestedAD, dim>;

Vector x = ...
ADVector x_ad;
for(int i=0; i<dim; ++i)
  for(int j=i; j<=dim; ++j)
  {
    // Initialize values of AD-aware vector
    for(int k=0; k<=dim; ++k)
       x_ad[k] = x[k];
    // Track (i,j)-th derivative
    x_ad[i].partial(0).partial() = 1;
    x_ad[j].partial().partial(0) = 1;
    auto y = f_raw(x_ad);
    hessian[i][j] = y.partial(0).partial(0);
    hessian[j][i] = hessian[i][j];
  }

Surprisingly this was in my tests on par with the dynamic size 2nd order ADValue<double,2,dynamicDim> if you the latter also exploits symmetry. For me this put's a little bit in question if one really needs the (internally significantly more complicated) dynamic dimension version.

TimSiebert1 · 2024-11-11T14:17:51Z

I guess that the code for computing the full matrix benefits from auto-vectorization using SIMD operations.
In my tests (local Laplace assembler as Hessian of a local Dirichlet energy) the difference was almost a factor two (with static dim), i.e. this aspect is significant.

Hmmm interesting. Did your tests include higher dimensions and more complex functions? I'm not sure if I understand what your actual test function looks like. I would expect this to not hold in general. Otherwise, it would be something to keep in mind for the optimization of ADOL-C. I would further expect that this behavior changes for higher derivative (>2) tensors as well.

TimSiebert1 · 2024-11-11T14:23:07Z

Since we currently dont have a higher-order tape-less approach, its very cool that your approach can compute higher-order derivatives. At some point this could be replaced by an interface without having to nest all types, as with the tape-based method of ADOL-C.

cgraeser · 2024-11-11T14:34:00Z

I only tested 2nd order (more is not supported by ADValue currently). But since there's more duplicate derivatives for higher order you're probably right.

I basically used two test cases:

Compute Hessian of the local Dirichlet energy $f(v) = 1/2\int_e |\sum_{i=1}^8 v_i \lambda_i|^2$ where $v$ are the coefficients wrt the first order finite element shape functions $(\lambda_i)$ on a hexahedron $e$.
Compute gradient and Hessian of a nested nonlinear binary function designed to incorporate many "features" that ADValue should support (various combinations of arithmetic operations with constants/non-constant values, default construction, conditionals, interaction with custom matrix types, ...). This currently takes the following form, where Dune::FieldMatrix<T,n,m> is a simple statically sized nxm matrix with scalar entries of type T:

    auto f_raw = [](auto x) {
      using std::sin;
      using std::cos;
      using std::exp;
      using std::log;
      using std::pow;
      using std::fabs;
      using F = std::decay_t<decltype(x[0])>;
      F c1;
      c1 = 13;
      F c2(42);
      F c3;
      c3 = 1;
      Dune::FieldMatrix<F, 3, 3> A;
      for(auto i : Dune::range(3))
        for(auto j : Dune::range(3))
          A[i][j] = sin(x[0]*i + cos(x[1])) + exp(cos(j*x[0]*x[1]))/(1+x[0]*x[0]+x[1]*x[1]) + 2/(1+x[0]*x[0]+x[1]*x[1]);
      F z = 1;
      z *= pow(A.determinant(), 3) + pow(2, sin(x[0]*x[1])) + c1 + c2 + pow(sin(x[0])+2., cos(x[1])+2);
      z *= c3;
      if ((x[0]>0) and (x[0] <= x[1]))
        return F(z);
      else
        return F(0);
    };

cgraeser · 2024-11-11T14:38:12Z

Since we currently dont have a higher-order tape-less approach, its very cool that your approach can compute higher-order derivatives. At some point this could be replaced by an interface without having to nest all types, as with the tape-based method of ADOL-C.

This was initially developed within the Dune module dune-fufem (https://gitlab.dune-project.org/fufem/dune-fufem). The class `adolc::ADValue<T, maxOrder,dim>` implementes a traceless forward mode templated with respect to the underlying type `T`, the maximal tracked derivative order `maxOrder` and the domain dimension `dim`. Currently only `maxOrder<=2` is implemented. Using the special value `dynamicDim` allows to use a runtime dimension. In the latter case the default is to use `std::vector` as storage. If support for `boost::pool` is activated, this is used instead. Notable differences to `adtl::adouble` are: * Being able to use other types that `T=double` e.g. allows for low/high precision or interval arithmetics. * Support for second order derivatives. The interface is also flexible for later implementation of higher order. * Fixing the dimension at compile time allows the compiler to do better optimization and generate faster code. Furthermore this guarantees thread-safety (x) by construction and allows to use different dimensions at the same time within a single program. * In the dynamic dimension case, the dimension is set per variable, which allows to have different dimensions at the same time. This is implemented by using one `boost:pool`s per dimension instead of a single one for a globally fixed dimension. This also avoids the memory leaks of `adtl::adouble` when changing the global dimension. Since the pools are `thread_local` instead of global, this again provides thread-safety (X). The last point could probably also be implemented for `adtl::adouble`. (x) "Thread-safety" hear does _not_ mean that concurrent access to a single `ADValue` object is safe. Instead if means that concurrent accesss to different objects in different threads is safe and not prone to race-conditions due to internally used global variables.

The test is still missing checks for many features.

cgraeser mentioned this pull request Nov 1, 2024

Threadsafe traceless forward mode with 2nd order derivatives #62

Open

cgraeser force-pushed the feature/templated-traceless-forward branch from 3cd17ca to 263ff71 Compare November 1, 2024 15:52

cgraeser closed this Nov 11, 2024

cgraeser reopened this Nov 11, 2024

cgraeser added 2 commits November 15, 2024 11:46

[advalue] Add static assertion for non-implemented derivative order

418782f

cgraeser force-pushed the feature/templated-traceless-forward branch from de0872c to 418782f Compare November 15, 2024 10:46

cgraeser added 3 commits November 15, 2024 12:03

[advalue][format] Reformat advalue.h using clang-format

611dc1f

[test][advalue] Add rudimentary test of ADValue

8d912dd

The test is still missing checks for many features.

[advalue] Fix a bug in initialization for dim=0

ffece1a

cgraeser marked this pull request as draft November 15, 2024 23:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[draft] Introduce the ADValue class for templated traceless forward mode #64

[draft] Introduce the ADValue class for templated traceless forward mode #64

cgraeser commented Nov 1, 2024

cgraeser commented Nov 1, 2024 •

edited

Loading

cgraeser commented Nov 7, 2024

cgraeser commented Nov 11, 2024

cgraeser commented Nov 11, 2024

TimSiebert1 commented Nov 11, 2024

TimSiebert1 commented Nov 11, 2024

cgraeser commented Nov 11, 2024 •

edited

Loading

cgraeser commented Nov 11, 2024

[draft] Introduce the ADValue class for templated traceless forward mode #64

Are you sure you want to change the base?

[draft] Introduce the ADValue class for templated traceless forward mode #64

Conversation

cgraeser commented Nov 1, 2024

cgraeser commented Nov 1, 2024 • edited Loading

cgraeser commented Nov 7, 2024

cgraeser commented Nov 11, 2024

cgraeser commented Nov 11, 2024

TimSiebert1 commented Nov 11, 2024

TimSiebert1 commented Nov 11, 2024

cgraeser commented Nov 11, 2024 • edited Loading

cgraeser commented Nov 11, 2024

cgraeser commented Nov 1, 2024 •

edited

Loading

cgraeser commented Nov 11, 2024 •

edited

Loading