Regarding train_step in NNX #4069

qnixsynapse · 2024-07-10T04:59:24Z

qnixsynapse
Jul 10, 2024

Is this the correct way to write a train_step function in NNX?

@nnx.jit
def train_step(model, x, y, optimizer):
    def loss_fn(model, x, y):
        logits = model(x)
        loss = optax.softmax_cross_entropy_with_integer_labels(
        logits=logits, labels=y).mean()        
        return logits, loss
    grad_fn = nnx.grad(loss_fn, has_aux=True)
    (logits, loss), grads = grad_fn(model, x, y)
    optimizer.update(grads)
    return loss

Here optimizer.update() should update the state/weights right? In linen, I am doing something like updated_state = state.apply_gradients(grads=grads) where state is a train_state object, created like this state = train_state.TrainState.create( apply_fn=model.apply, params=weights["params"], tx=optimizer)

Unfortunately the flax documentation I'm following for NNX is not very detailed yet.

Edit: nnx.grad expects a scaler as output, but that is not possible when when are training a model in batches. It is very confusing to me. Also, what is the difference between nnx.grad and nnx.value_and_grad?

edit 2: Okay I got it, loss is a scalar so, the grad function is expecting that...

Answered by cgarciae

Jul 11, 2024

Hey! Check MNIST Tutorial, I think you want:

@nnx.jit
def train_step(model, x, y, optimizer):
    def loss_fn(model, x, y):
        logits = model(x)
        loss = optax.softmax_cross_entropy_with_integer_labels(
        logits=logits, labels=y).mean()        
        return loss, logits # invert order
    grad_fn = nnx.value_and_grad(loss_fn, has_aux=True) # use value_and_grad
    (loss, logits), grads = grad_fn(model, x, y)
    optimizer.update(grads)
    return loss

View full answer

cgarciae · 2024-07-11T09:53:40Z

cgarciae
Jul 11, 2024
Maintainer

Hey! Check MNIST Tutorial, I think you want:

@nnx.jit
def train_step(model, x, y, optimizer):
    def loss_fn(model, x, y):
        logits = model(x)
        loss = optax.softmax_cross_entropy_with_integer_labels(
        logits=logits, labels=y).mean()        
        return loss, logits # invert order
    grad_fn = nnx.value_and_grad(loss_fn, has_aux=True) # use value_and_grad
    (loss, logits), grads = grad_fn(model, x, y)
    optimizer.update(grads)
    return loss

2 replies

qnixsynapse Jul 11, 2024
Author

Thank you! I figured that out later on.

cisprague Sep 4, 2024

@cgarciae is it better to have model as an argument to train_step even though we could just use optimizer.model?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding train_step in NNX #4069

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Regarding train_step in NNX #4069

qnixsynapse Jul 10, 2024

Replies: 1 comment · 2 replies

cgarciae Jul 11, 2024 Maintainer

qnixsynapse Jul 11, 2024 Author

cisprague Sep 4, 2024

qnixsynapse
Jul 10, 2024

Replies: 1 comment 2 replies

cgarciae
Jul 11, 2024
Maintainer

qnixsynapse Jul 11, 2024
Author