How to take two gradients for two separate flax.nn optimizers in one loss function? #390

BoyuanJackChen · 2020-08-05T09:50:06Z

BoyuanJackChen
Aug 5, 2020

Description of the model to be implemented

I am implementing NeRF with flax. The paper mentions a hierarchical rendering that implements a coarse network and fine network, which are updated simultaneously from one loss value. Below is my attempt in doing it. As you can see, rgb_c is generated by the coarse model; while rgb_f is generated by the fine model. The total loss function is the sum of the mean-squared-loss of each rendering. The code works, and the fine model is updated well. Nonetheless, the coarse model keeps getting worse until it renders everything totally black. I wonder if there is some problem in my loss function. I tried to give the coarse network a lower learning rate but it didn't work. The grads just meant to make it dark.

        def loss_fn(model_c, model_f):   
            rgb_c, depth_c, acc_c, rgb_f, depth_f, acc_f = render_rays_cnf(model_c, model_f,
                    rays_o, rays_d, near=near, far=far, batchify_size=batchify_size, coarse=coarse,
                    fine=fine, rand=True)
            return jnp.mean(jnp.square(rgb_c-target))+jnp.mean(jnp.square(rgb_f-target)), rgb_c+rgb_f
        grad_fn = jax.value_and_grad(fun=loss_fn, argnums=[0,1], has_aux=True)
        (loss, logits), (grad_c, grad_f) = grad_fn(optimizer_c.target, optimizer_f.target)
        print(type(grad_c))
        optimizer_c = optimizer_c.apply_gradient(grad_c)
        optimizer_f = optimizer_f.apply_gradient(grad_f)

Dataset the model could be trained on

Image data

Specific points to consider

Below is the render_rays_cnf function

def render_rays_cnf(coarse_model, fine_model, rays_o, rays_d, near, far, coarse=64, fine=64,
                    rand=False, key=jax.random.PRNGKey(0), batchify_size=1024*32):
    ''' Difference from render_rays is that instead of sampling once with N_samples number of
        points, here we first generate coarse number of points, then generate fine number of
        points using inverse transform sampling. '''
    def batchify(fn, chunk=batchify_size):
        return lambda inputs: jnp.concatenate([fn(inputs[i:i + chunk])
                                         for i in range(0, inputs.shape[0], chunk)], 0)
    # --- Coarse ---
    # Compute 3D query points. Page 6 equation (2).
    z_vals = jnp.linspace(near, far, coarse)
    key, subkey = jax.random.split(key)
    z_vals += jax.random.uniform(subkey, list(rays_o.shape[:-1]) + [coarse], dtype=jnp.float32) \
              * (far - near) / coarse
    pts = rays_o[..., None, :] + rays_d[..., None, :] * z_vals[..., :, None]
    # shape: (H, W, coarse, 3), in this case (100, 100, 16, 3), 3 is for 3D coordinate of that point
    pts_flat = jnp.reshape(pts, [-1, 3])   # (H*W*coarse, 3)
    pts_flat = embed_fn(pts_flat)     # pts_flat is an array of shape (H*W*N*3, 10*L+3)
    raw = batchify(coarse_model)(pts_flat)    # (H*W*N*3, 4)
    raw = jnp.reshape(raw, list(pts.shape[:-1]) + [4]) # raw shape is (H, W, coarse, 4), which is basically the output (r,g,b,\sigma)
    # Compute opacities and colors
    sigma_a = nn.relu(raw[..., 3])    # (H, W, coarse)
    rgb = nn.sigmoid(raw[..., :3])    # (H, W, coarse, 3)
    # Do volume rendering (P6 equation (3)) !!! This is the only part yet understood.
    dists = jnp.concatenate((z_vals[..., 1:] - z_vals[..., :-1],
                       jnp.broadcast_to([1e10], z_vals[..., :1].shape)), -1)   # (H, W, coarse)
    alpha = 1. - jnp.exp(-sigma_a * dists)
    weights_c = alpha * jnp.cumprod(1. - alpha + 1e-10, axis=-1, dtype=jnp.float32)  # (H, W, N)
    # This is the coarse network
    rgb_c   = jnp.sum(weights_c[..., None] * rgb, -2)
    depth_c = jnp.sum(weights_c * z_vals, -1)
    acc_c   = jnp.sum(weights_c, -1)

    # --- Fine ---
    uniform = jax.random.uniform(key=key, shape=(64,), minval=0.0, maxval=1.0)
    z_vals  = jnp.broadcast_to(z_vals, pts.shape[:3])
    z_vals_mid = .5 * (z_vals[..., 1:] + z_vals[..., :-1])
    # Attach to mid, otherwise cannot gather later
    nears = jnp.broadcast_to(near, [z_vals_mid.shape[0], z_vals_mid.shape[1], 1])
    fars  = jnp.broadcast_to(far,  [z_vals_mid.shape[0], z_vals_mid.shape[1], 1])
    z_vals_mid  = jnp.concatenate(arrays=(nears, z_vals_mid, fars), axis=-1)
    z_vals_fine = generate_fine(z_vals_mid, weights_c, pts, fine, uniform, key)
    concat = jnp.concatenate(arrays=(z_vals, z_vals_fine), axis=-1)
    z_vals_all  = jnp.sort(a=concat, axis=-1)
    pts_all = rays_o[..., None, :] + rays_d[..., None, :] * z_vals_all[..., :, None]
    # --- Run network for fine points ---
    pts_flat_f = jnp.reshape(pts_all, [-1, 3])
    pts_flat_f = embed_fn(pts_flat_f)     # pts_flat is an array of shape (H*W*N*3, 51)
    raw_f = batchify(fine_model)(pts_flat_f)
    raw_f = jnp.reshape(raw_f, list(pts_all.shape[:-1]) + [4])
    # raw shape is (H, W, coarse, 4), which is basically the output (r,g,b,\sigma)

    # Compute opacities and colors
    sigma_a_f = nn.relu(raw_f[..., 3])    # (H, W, coarse)
    rgb_f = nn.sigmoid(raw_f[..., :3])    # (H, W, coarse, 3)
    dists_f = jnp.concatenate((z_vals_all[..., 1:] - z_vals_all[..., :-1],
                       jnp.broadcast_to([1e10], z_vals_all[..., :1].shape)), -1)   # (H, W, coarse)
    alpha_f = 1. - jnp.exp(-sigma_a_f * dists_f)
    weights_f = alpha_f * jnp.cumprod(1. - alpha_f + 1e-10, axis=-1, dtype=jnp.float32)  # (H, W, N)
    # This is the coarse network
    rgb_f   = jnp.sum(weights_f[..., None] * rgb_f, -2)
    depth_f = jnp.sum(weights_f * z_vals_all, -1)
    acc_f   = jnp.sum(weights_f, -1)
    return rgb_c, depth_c, acc_c, rgb_f, depth_f, acc_f

Reference implementations in other frameworks

levskaya · 2020-08-05T21:50:39Z

levskaya
Aug 5, 2020
Maintainer

As far as I can tell I think that you're doing things correctly from the point of view of the flax and jax apis. (based on a first quick look). if you fix the "fine model" params does the coarse model still degenerate? My guess if I had to make one would be a subtle bug in the loss function or the model....

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to take two gradients for two separate flax.nn optimizers in one loss function? #390

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to take two gradients for two separate flax.nn optimizers in one loss function? #390

BoyuanJackChen Aug 5, 2020

Description of the model to be implemented

Dataset the model could be trained on

Specific points to consider

Reference implementations in other frameworks

Replies: 1 comment

levskaya Aug 5, 2020 Maintainer

BoyuanJackChen
Aug 5, 2020

levskaya
Aug 5, 2020
Maintainer