-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bayes by backprop #40
Comments
Great! It would fit neatly into Parmesan so you are more than welcome to make a pull request. We are also a little bit more relaxed about catering for special layer needs if you need some special feature to make it work? |
I hope I'll not need any special features to make it work. |
Ok, let me know if you encounter any problems with parmesan.
Can't you just set the deterministic flag to True? |
I decorate class NormalQ(object):
"""Helper class for providing logics of initializing
random variable distributed like
N(mean, (log(1+exp(rho))^2)
with user defined prior
where `mean`, `rho` are variational params fitted while training
Parameters
----------
log_prior : callable - user defined prior
"""
def __init__(self, log_prior=prior(log_normal, 0, 1)):
self.log_prior = log_prior
def __call__(self, layer, spec, shape, **tags):
"""
Parameters
----------
layer : wrapped layer instance
shape : tuple of int
a tuple of integers representing the desired shape
of the parameter tensor.
tags : See :func:`lasagne.layers.base.Layer.add_param`
for more information
spec : Theano shared variable, expression, numpy array or callable
Initial value, expression or initializer for the embedding
matrix. This should be a matrix with shape
``(input_size, output_size)``.
See :func:`lasagne.utils.create_param` for more information.
.. Note
can also be a dict of same instances
``{'mu': spec, 'rho':spec}``
to avoid default rho initialization
Returns
-------
Theano tensor
"""
# case when user leaves default init specs
if not isinstance(spec, dict):
spec = {'mu': spec}
# important!
# we declare that params we add next
# are the ones we need to fit the distribution
# they are variational
tags['variational'] = True
rho_spec = spec.get('rho', lasagne.init.Normal(1))
mu_spec = spec.get('mu', lasagne.init.Normal(1))
rho = layer.add_param(rho_spec, shape, **tags)
mean = layer.add_param(mu_spec, shape, **tags)
e = layer.acc.srng.normal(shape, std=1)
# [reparametrization trick](https://www.reddit.com/r/MachineLearning/
# comments/3yrzks/eli5_the_reparameterization_trick/)
# so any time we add param we apply reparametrization
# but it's done in the __init__ and the only way I see to get
# deterministic output is to replace `e` with `0` in the graph,
# it's gonna be tricky
W = mean + T.log1p(T.exp(rho)) * e
q_p = self.log_posterior_approx(W, mean, rho) - self.log_prior(W)
layer.acc.add_cost(q_p)
return W
@staticmethod
def log_posterior_approx(W, mean, rho):
return log_normal3(W, mean, rho)
def bbpwrap(approximation=NormalQ()):
def decorator(cls):
def add_param_wrap(add_param):
@wraps(add_param)
def wrapped(self, spec, shape, name=None, **tags):
# we should take care about some user specification
# to avoid bbp hook just set tags['variational'] = True
if tags.get('variational', False):
return add_param(self, spec, shape, name, **tags)
else:
# they don't need to be regularized, strictly
tags['regularizable'] = False
param = self.approximation(self, spec, shape, **tags)
return param
return wrapped
def init_wrap(__init__):
@wraps(__init__)
def wrapped(self, acc, *args, **kwargs):
self.acc = acc # type: parmesan.utils.Accumulator
__init__(self, *args, **kwargs)
return wrapped
cls.approximation = approximation
cls.add_param = add_param_wrap(cls.add_param)
cls.__init__ = init_wrap(cls.__init__)
return cls
return decorator Another Idea is to create such params with special init.SomeClass thing that will return an expression that is already supported in lasagne |
can you give a bit of background of what you wan't to achieve ? |
I want to make a a thing that will make easy creation of variational topping. |
ok i get it (and sounds cool :) ) what is your problem specifically with getting a deterministic output. Do you want to get the output using the posterior mode of the weights instead of sampling the weights? |
Yes, current implementation supports only sampling to get some kind of deterministic output(it will be just more stable), it will be prediction posterior mean. But in real tasks i.e. production this approach is too slow and we can consider using prediction based on weights from q_w mean or mode. When I do this trick with overriding |
@casperkaae BTW we need to have a closer look at PyMC3-Lasagne bridge example. They do similar things just inline using lasagne API. That's what we need, exactly. |
Hi again!
I've finished the work on my topping. It seems to be flexible. I opened a PR in lasagne but think that here is better place to contribute to.
The text was updated successfully, but these errors were encountered: