-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tweak torch parameter registration mechanism #19908
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #19908 +/- ##
=======================================
Coverage 79.27% 79.27%
=======================================
Files 501 501
Lines 46921 46979 +58
Branches 8648 8666 +18
=======================================
+ Hits 37195 37241 +46
- Misses 7981 7985 +4
- Partials 1745 1753 +8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
the failing pytorch test is actually passing on my env:
|
The uniqueness of the variable path should come from the parent object name, not from the variable name (e.g. |
for seed generator if using path it will be |
Variable names are never unique. For a unique string you can use |
i thought under same layer the variable name (excluding the variables from its sub layers) should be unique as an implicit requirement since otherwise my original thought is that then i found out that all seed_generator actually can actually create with same variable name if there are multiple seed generator in one layer since seed generator is not a layer. |
and i do notice that i probably want to add a test with nested seed generator. in theory, seed states should be recursively collected by torch since it basically get all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good -- thanks for the changes. I will apply docstring fixes after merging.
Works for me locally as well. Might be a fluke. |
There are actually various tests that reliably fail here: https://btx.cloud.google.com/invocations/c55a2ca4-5df3-411b-bd52-7c9873e839ce/targets/keras%2Fgithub%2Fubuntu%2Fgpu%2Ftorch%2Fpresubmit/log (not the numerical integration test) |
i will address those today / tmr 👍 - and is it possible to configure ci to run pytest regardless whether integration test passes or not? |
We'd have to move the integration testing to go after the general pytest command in |
i am seeing a weird issue on
i am able to isolate that the model json looks good but restored model here: https://github.com/keras-team/keras/blob/master/keras/src/saving/saving_lib.py#L242 have duplicated |
I don't understand the connection. You could try pruning things from your change until the test passes, then you'll have a good idea what particular lines are causing the issue. |
@fchollet most of the unit tests are fixed with one issue left. torch basically requires user to use there are two options that i think could work, let me know your thoughts:
let me know what do you think. |
I think we could do this, via |
in theory - let me try it |
it technically works but i think this will be a pretty impact workflow change for pytorch users:
I think supporting a |
^ @fchollet let me know if above proposal sgty or you have a better idea. |
I would recommend not adding any new API. What about this approach:
Could it be fixed? What's the issue with deserialization? |
@fchollet error message:
i think the main issue to me is that deserialization don't triggers |
What are those
Why so? I would expect deserialization to trigger the same code path as regular instantiation. |
the on deserialization trigger: the |
5cc5892
to
68c2565
Compare
@fchollet sorry for the delay on this pr. i am able to get the final unit test fixed, please TAL. @james77777778 fyi, since you had couple prs around torch variable tracking. please also take a look and let me know if this will break any of your current use cases. |
Thanks for the update! There seems to be one failing test: https://github.com/keras-team/keras/actions/runs/10128018370/job/28006468053?pr=19908 |
The changes are significant, so it is difficult for me to say if this PR will break anything. |
@fchollet should be ready to go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have taken a look since I was mentioned in this PR.
@@ -115,7 +115,8 @@ def add(self, layer, rebuild=True): | |||
f"add a different Input layer to it." | |||
) | |||
|
|||
self._layers.append(layer) | |||
# append will not trigger __setattr__ for tracking purpose. | |||
self._layers = self._layers + [layer] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this indicate that the tracking will fail if we add a stateful object without __setattr__
?
Or is this a special case for Sequential
model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is like keras thing: basically auto tracking relies on (sorry my understanding here is wrong, append should be well supported if just in keras)__setattr__
being triggered. if __setattr__
is not triggered then the variables / layers are actually not being tracked in auto tracker. When __setattr__
is not properly triggered, all layers can still properly run but anything relies on autotracking being properly worked will break.
this is a slightly bigger thing in torch since torch relies on torch.ModuleList
instead of plain dictionary for any collection of layers. with the keras3 logic (where a hook is added in torch layer when __setattr__
is triggered to register modules) it is very important to ensure __setattr__
is properly triggered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is still unclear to me why this PR breaks keras/src/models/sequential_test.py::SequentialTest::test_serialization
when using self._layers.append(layer)
.
Shouldn't we have tracked these layers in _maybe_rebuild
? It doesn't break in master branch.
I am worried that this discrepancy might break some models that use python containers (e.g. list
and dict
) for layer creation. Or is it already broken this way in the torch backend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is it already broken this way in the torch backend?
yes - this is already broken on torch backend. you can try running below snippet in main branch v.s. the change
class StateAddingWrong(keras.layers.Layer):
def build(self, _):
self.layers = []
self.layers.append(keras.layers.Dense(4, name="0"))
self.layers.append(keras.layers.Dense(5, name="1"))
self.layers.append(keras.layers.Dense(6, name="2"))
def call(self, x):
y = x
for l in self.layers:
y = l(y)
return y
class StateAddingCorrect(keras.layers.Layer):
def build(self, _):
layers = []
layers.append(keras.layers.Dense(4, name="0"))
layers.append(keras.layers.Dense(5, name="1"))
layers.append(keras.layers.Dense(6, name="2"))
self.layers = layers
def call(self, x):
y = x
for l in self.layers:
y = l(y)
return y
layer = StateAddingWrong(name='wrong')
layer(np.ones((2, 3)))
print(f"wrong tracked layers {len(layer._tracker.stored_ids['layers'])}")
print(f"wrong ", list(layer.named_modules()))
layer = StateAddingCorrect(name="correct")
layer(np.ones((2, 3)))
print(f"correct tracked layers {len(layer._tracker.stored_ids['layers'])}")
print(f"correct ", list(layer.named_modules()))
basically the named_modules()
won't return layers appends to layers. my change fixes the case by using the pattern of not appending to layers. the most correct way to tackle this might be handling callbacks here: https://github.com/keras-team/keras/blob/master/keras/src/utils/tracking.py#L144-L147 @fchollet let me know if you have better ideas that could also sync the appending cases for torch backend.
It is still unclear to me why this PR breaks keras/src/models/sequential_test.py::SequentialTest::test_serialization when using self._layers.append(layer)
so a little bit deeper rca:
the main error is on _layers_{0, 1, 2}
attributes missing in deserialization test and i have found that the main issue is due to build & deserialization order.
here is the minimum reproducible test:
model = keras.Sequential(name="seq")
model.add(keras.layers.Dense(4, name="0"))
model.add(keras.layers.Dense(5, name="1"))
model.add(keras.layers.Dense(6, name="2"))
model.build((2, 3))
for m in model.named_modules():
print(f"module {m}")
print("_layers_0" in dir(model))
config = model.get_config()
config_json = json.dumps(config, sort_keys=True, indent=4)
revived_model = keras.Sequential.from_config(config)
for m in model.named_modules():
print(f"module {m}")
print("_layers_1" in dir(revived_model))
during build phase, self._layers
is not set till model.build
is called since https://github.com/keras-team/keras/blob/master/keras/src/models/sequential.py#L137-L145 conditions are not met. Thus, the layers will actually being called with all layers in the model with model.build(shape)
call here: https://github.com/keras-team/keras/blob/master/keras/src/models/sequential.py#L179-L182. during deserialization phase, InputLayer
is actually available so every add
is rebuilding the model without resetting layers. the fix i have essentially forced resetting every time.
functionality wise all things are good since Functional
in Sequential
is actually the model being invoked and it has layers reset all the time here: https://github.com/keras-team/keras/blob/master/keras/src/models/functional.py#L140
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the detailed explanation!
If this is the case, would it be more reasonable to put the logic in _maybe_build
, since we might want to track these layers in both add
and pop
?
Actually, I think your solution of adding the logic in track.py
is the best, but it might add some backend-specific code to it. This should wait for @fchollet 's call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the case, would it be more reasonable to put the logic in _maybe_build, since we might want to track these layers in both add and pop?
Agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fchollet what's your thought on adding it to tracking.py
? putting logics in _maybe_build
can fix sequential specific issue but append/pop
still won't be tracked by torch in current integration unless user triggers a __setattr__
.
Options are:
- fixing this by adding some backend specific callbacks in
tracking.py
- add a validation step in torch layer and warn user there are layers that not properly registered in torch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ @fchollet let me know if you have any thoughts on above
self._track_variables() | ||
|
||
def _track_variables(self): | ||
"""Adaptation layer to make sure keras.layers.Layer works well with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use backticks around all code keywords in docstrings.
torch.nn.Module. Currently, the main modification are on parameter/module | ||
tracking and pointing torch.nn.Module.forward() to the right keras call. | ||
|
||
Module tracking: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a section outside of the normal set of sections to be highlighted, use markdown syntax, e.g.
**Module tracking:**
(content)
tracking and pointing torch.nn.Module.forward() to the right keras call. | ||
|
||
Module tracking: | ||
All sublayers are tracked as modules in Module._modules. All module level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Text content does not need to be indented.
non trainable and seed generator states belongs to the current layer. | ||
|
||
Few additional points that user should be aware of: | ||
1. When torch backend is enabled KerasVariable.value is torch.nn.Parameter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure to introduce a line break (blank line) before a list or after a section title.
# as symbolic build is needed to make sure all layers variables are | ||
# initialized before invoke torch.compile(). _symbolic_build() | ||
# should refactored to not require _compile_metrics and optimizer | ||
# is defined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_symbolic_build() should refactored to not require _compile_metrics and optimizer is defined.
Can you explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_symbolic_build
requires https://github.com/keras-team/keras/blob/master/keras/src/trainers/trainer.py#L1005-L1014 and these attributes are added during model.compile
.
in this change, for torch params to work properly the requirements is that all layers are built before making any torch specific api calls. the easiest way to achieve this inside the trainer is to ensure _symbolic_build
is called for all cases that jit_compile
is true since torch.compile
will call named_parameters()
to compile the graph.
during unit test, i have found few unit tests breaks due to not calling torch.compile
but requires calling named_parameters
and it fails during _symbolic_build
due to missing compile_metrics
, compile_loss
and optimizer
attribute. i can try remove these and fixing the unit test instead.
@@ -115,7 +115,8 @@ def add(self, layer, rebuild=True): | |||
f"add a different Input layer to it." | |||
) | |||
|
|||
self._layers.append(layer) | |||
# append will not trigger __setattr__ for tracking purpose. | |||
self._layers = self._layers + [layer] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the case, would it be more reasonable to put the logic in _maybe_build, since we might want to track these layers in both add and pop?
Agreed.
this is a follow up from #19885 discussion where i am trying to make torch / keras well played together on tracking parameters.
the solution i ended up with:
recurse=True
variable.name
with just tracking variables the current layer holds. however, current seed generator actually create duplicated variable names. if https://github.com/keras-team/keras/blob/master/keras/src/random/seed_generator.py#L80 can be changed to something likef"{self.name}_generator_state"
it will work with ParameterDict approach._post_track/untrack_variables
, refresh the entire torch params and it's sublayers. this could be changed to not re-create all sublayers if this function ever becomes too slow.i also added few torch specific tests to reflect some of the assumptions and usecases that torch user might have. eg. use
state_dict
.