You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thanks for the contribution of KFAC team !
While using KFAC optimizer to optimize an ANN, I noticed that the KFAC optimizer seems have some trouble to understand the structure of parameter tree if the parameter is used more than once while constructing the neural network.
If the original ANN denoted as f(params, inputs), then if we simply use a modified ANN as F(params, inputs) = f(params, inputs) + f(params, inputs),the program will throw an error. I have tried functools.partial to fix the parameters, but it seems the program will get stuck somehow. If I use vmap, some of the parameters would be labelled as 'orphan' and by experiments, this would affect the optimization process.
I wonder if there is already some methods to avoid these issues? Would you consider update the optimizer to fix this bug?
Thanks again for the well-designed optimizer !
The text was updated successfully, but these errors were encountered:
For K-FAC the optimizer doesn't currently support the parameters being used in more than once in the graph. This doesn't rule out RNNs and transformers since they usually only use the parameter once in the graph, just with an operation that has a time dimension. As of a week or two ago, the behavior when finding such a parameter is to automatically register it as "generic", which will resort to a crude curvature approximation. If you use the TNT feature in the code, you can get generically-registered layers to do something more useful.
First of all, thanks for the contribution of KFAC team !
While using KFAC optimizer to optimize an ANN, I noticed that the KFAC optimizer seems have some trouble to understand the structure of parameter tree if the parameter is used more than once while constructing the neural network.
If the original ANN denoted as f(params, inputs), then if we simply use a modified ANN as F(params, inputs) = f(params, inputs) + f(params, inputs),the program will throw an error. I have tried functools.partial to fix the parameters, but it seems the program will get stuck somehow. If I use vmap, some of the parameters would be labelled as 'orphan' and by experiments, this would affect the optimization process.
I wonder if there is already some methods to avoid these issues? Would you consider update the optimizer to fix this bug?
Thanks again for the well-designed optimizer !
The text was updated successfully, but these errors were encountered: