Skip to content

Latest commit

 

History

History
50 lines (29 loc) · 2.33 KB

README.md

File metadata and controls

50 lines (29 loc) · 2.33 KB

LBFGS optimizer

An improved LBFGS (and LBFGS-B) optimizer for PyTorch is provided with the code. Further details are given in this paper. Also see this introduction.

Examples of use:

Files included are:

lbfgsnew.py: New LBFGS optimizer

lbfgsb.py: LBFGS-B optimizer (with bound constraints)

cifar10_resnet.py: CIFAR10 ResNet training example (see figures below)

kan_pde.py: Kolmogorov Arnold network PDE example using LBFGS-B

ResNet18/101 training loss/time

The above figure shows the training loss and training time using Colab with one GPU. ResNet18 and ResNet101 models are used. Test accuracy after 20 epochs: 84% for LBFGS and 82% for Adam.

Changing the activation from commonly used ReLU to others like ELU gives faster convergence in LBFGS, as seen in the figure below.

ResNet Wide 50-2 training loss

Here is a comparison of both training error and test accuracy for ResNet9 using LBFGS and Adam.

ResNet 9 training loss and test accuracy

Example usage in full batch mode:

from lbfgsnew import LBFGSNew
optimizer = LBFGSNew(model.parameters(), history_size=7, max_iter=100, line_search_fn=True, batch_mode=False)

Example usage in minibatch mode:

from lbfgsnew import LBFGSNew
optimizer = LBFGSNew(model.parameters(), history_size=7, max_iter=2, line_search_fn=True, batch_mode=True)

Note: for certain problems, the gradient can also be part of the cost, for example in TV regularization. In such situations, give the option cost_use_gradient=True to LBFGSNew(). However, this will increase the computational cost, so only use when needed.