Cosmology, Cache, and Configuration data model #86

eelregit · 2022-01-26T01:48:23Z

frozen dataclass is semi-immutable
aux_fields can be specified to be the pytree aux_data
- I will add a Cosmology.config using this
- this might be more flexible than the Container, worth switching?
cached intermediate results survive through JAX transformation unflattening.
Are there cases where this is not desirable?

This makes Cosmology semi-immutable, and allows cached results to survive through unflattening of JAX transformations.

eelregit · 2022-01-26T01:57:14Z

Dataclass is introduced in python 3.7 though, maybe most people have moved on from 3.6?

Edit: 3.6 dropped and 3.9 & 3.10 added. CI are made faster.
Edit2: jax requires python >= 3.7 now

eelregit · 2022-01-26T18:59:37Z

With caches, many functions in background.py were not pure functions.
7c848b2 attempts to fix this, but it breaks the current API, and requires
cosmo, out = func(cosmo, in) type signatures on many (or probably all for convenience)
functions.

@EiffL What do you think about this?

Right now the cache is a dictionary, so there can still be side effects.
Do you think it's fine to use some immutable container for cache/workspace too?

Relevant discussion: jax-ml/jax#5344 (comment).

EiffL · 2022-01-28T15:49:58Z

Thanks @eelregit there is a lot of great things in there ^^! The cache and dataclass looks nice.

And so, yeah the way I see it there is a tradeoff between making pure functions or having a simple API....

The only drawback of the current implementation is in the following case:

cosmo = jc.Planck15()
x = jitted_function1(cosmo, ...)
y = jitted_function2(cosmo, ...)

in that case the cache computed by the first function is not communicated to the second one, so you do some of the cosmology computation twice, but it doesnt lead to any wrong results.

To avoid this and be able to reuse the cache I would just then recommend to write that same code this way:

cosmo = jc.Planck15()

@jax.jit
def my_fun(cosmo):
  x = function1(cosmo, ...)
  y = function2(cosmo, ...)
  return x,y

my_fun(cosmo)

In practice in many cases you would just jit the likelihood or the simulation code itself and then you have no problem.

So the question is whether allowing for using the cache over jitted functions is worth changing the API to have functions return the cosmology object...

I'm leaning towards keeping a simple interface:

chi = bkgrd.radial_comoving_distance(cosmo, a)

instead of

cosmo, chi = bkgrd.radial_comoving_distance(cosmo, a)

just because it appears very suprising to a typical user.

EiffL · 2022-01-28T15:53:55Z

Unless you have a compeling use case that really would benefit from the more optimiized implementation.

I'm also thinking it could be an option/config to have by default the non-pure API, but if an advanced user wants it, they could retrieved the cosmology and associated cache. What do you think?

eelregit · 2022-01-28T16:41:54Z

Thanks @EiffL !

The previous non-pure API does not allow functional cache in jitted inner functions like out = func(cosmo, in), which is the case in pmwd unfortunately. pmwd needs that for both functional (e.g. if I/O is needed between time steps) and performance reasons (for looping time steps is faster than scanning them). (Besides, inner jitted functions may be quite common, e.g. many jax.numpy or lax functions are.)

What do you think about the second pattern in jax-ml/jax#5344 (comment) ?
That separates init and eval, which is also cumbersome but quite common interfaces.
I am sure there should be some way to make the APIs compatible, right?

EiffL · 2022-01-28T18:40:55Z

Hummmm we could precompute everything at the instantiation of the cosmology object... We could imagine a mechanism that "registers" all functions that use cached values and computes the cache before anything else happens...

Then the user API would stay the same, the functions would be pure.

But.... It would mean that creating a cosmology would be slow for Interactive users....

Hummmm

EiffL · 2022-01-28T18:42:38Z

And we could have an option to decide which type of execution you want, one that plays nicely with jitted functions, and one that sticks to the current behavior for easy interactive use.

eelregit · 2022-01-28T19:52:41Z

Something like the following?

def compute_y(cosmo, x):
    # initialize cache and output cosmo with cache if input is None
    if x is None:
        if cosmo.is_cached(key):
            return cosmo
        value = ...
        return cosmo.cache_set(key, value)

    if not cosmo.is_cached(key):
        cosmo = comput_y.init(cosmo)    # or more strictly just raise runtimeerror?

    value = cosmo.cache_get(key)
    y = ...
    return y

# and/or something more explicit like
compute_y.init = partial(compute_y, x=None)

with some global Cosmology cache initialization like

class Cosmology:
    ...
    def cache_init(self, *args):
        cosmo = self
        for compute_y in args:
            cosmo = compute_y.init(cosmo)
        return cosmo

Contributor should use compute_y.init first in their probe's.
And interactive users can call the global init to speed things up

cosmo = Planck15()
cosmo = cosmo.cache_init(compute_y, compute_z)

and are encouraged to think functionally.

Maybe we can iterate on this to find convergence ^^

eelregit · 2022-01-28T20:20:16Z

If functools.lru_cache works with JAX transformations, that would be nice and simple

@lru_cache
def precompute_y(cosmo):
    table = ...
    return table

def compute_y(cosmo, x):
    table = precompute_y(cosmo)
    y = ...
    return y

With this it seems like everything can be pure and one doesn't need to touch Cosmology once instantiated?

eelregit · 2022-01-29T01:54:06Z

Unfortunately, lru_cache doesn't work with tracing:

from functools import lru_cache
from typing import NamedTuple

class C(NamedTuple):
    min: float = 0.
    max: float = 1.

@lru_cache()
def f(c):
    return jnp.linspace(c.min, c.max, 6)

@jit
def g(c, w, b):
    return b + w * f(c)

g(C(), 1., 0.)

results in

TypeError: unhashable type: 'DynamicJaxprTracer'

A similar issue in numba: numba/numba#4062

eelregit added 4 commits January 21, 2022 11:51

Remove unecessary gamma_growth flag

4edca62

Remove seemingly unecessary imports

333476e

Add pytree dataclass decorator

ae9ab99

Refactor Cosmology as pytree dataclass

d7cef13

This makes Cosmology semi-immutable, and allows cached results to survive through unflattening of JAX transformations.

eelregit changed the title ~~(WIP) Pytree dataclass as data containers~~ Pytree dataclass as data containers Jan 26, 2022

eelregit changed the title ~~Pytree dataclass as data containers~~ Pytree dataclass and Configuration data model Jan 26, 2022

eelregit changed the title ~~Pytree dataclass and Configuration data model~~ Cosmology, Cache, and Configuration data model Jan 26, 2022

Fix side effects due to Cosmology cache

7c848b2

eelregit force-pushed the eelregit_data_patch branch from 6e099e3 to 7c848b2 Compare January 27, 2022 00:47

eelregit added 3 commits January 27, 2022 11:19

Fix cache dict side effects

12fb76f

Add Configuration type, as part of Cosmology

efd6fcb

Add cache and configuration tests

0056535

eelregit force-pushed the eelregit_data_patch branch from 0dd2f76 to 0056535 Compare January 27, 2022 18:48

eelregit added 2 commits January 27, 2022 14:56

Fix func calls superficially to pass tests

b2f3c04

Add python 3.9 & 3.10 supports and Drop 3.6

b956647

eelregit force-pushed the eelregit_data_patch branch from 756422f to b956647 Compare January 27, 2022 22:24

eelregit added 3 commits January 27, 2022 17:39

Use Dict instead of dict for python < 3.10

defcb57

Speed up github actions with mamba

9758f32

Speed up github actions with pytest-xdist

c1821b6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cosmology, Cache, and Configuration data model #86

Cosmology, Cache, and Configuration data model #86

eelregit commented Jan 26, 2022

eelregit commented Jan 26, 2022 •

edited

Loading

eelregit commented Jan 26, 2022 •

edited

Loading

EiffL commented Jan 28, 2022

EiffL commented Jan 28, 2022

eelregit commented Jan 28, 2022 •

edited

Loading

EiffL commented Jan 28, 2022

EiffL commented Jan 28, 2022 •

edited

Loading

eelregit commented Jan 28, 2022 •

edited

Loading

eelregit commented Jan 28, 2022 •

edited

Loading

eelregit commented Jan 29, 2022

Cosmology, Cache, and Configuration data model #86

Are you sure you want to change the base?

Cosmology, Cache, and Configuration data model #86

Conversation

eelregit commented Jan 26, 2022

eelregit commented Jan 26, 2022 • edited Loading

eelregit commented Jan 26, 2022 • edited Loading

EiffL commented Jan 28, 2022

EiffL commented Jan 28, 2022

eelregit commented Jan 28, 2022 • edited Loading

EiffL commented Jan 28, 2022

EiffL commented Jan 28, 2022 • edited Loading

eelregit commented Jan 28, 2022 • edited Loading

eelregit commented Jan 28, 2022 • edited Loading

eelregit commented Jan 29, 2022

eelregit commented Jan 26, 2022 •

edited

Loading

eelregit commented Jan 26, 2022 •

edited

Loading

eelregit commented Jan 28, 2022 •

edited

Loading

EiffL commented Jan 28, 2022 •

edited

Loading

eelregit commented Jan 28, 2022 •

edited

Loading

eelregit commented Jan 28, 2022 •

edited

Loading