Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for distributions with monotonically increasing
bijector
#297Add support for distributions with monotonically increasing
bijector
#297Changes from 14 commits
459082a
15ec103
011e41b
5ffa2ea
03fc4b4
d37da88
94bc3b9
ce26a29
c303eaa
84bd3ec
e8504f6
bab389d
3d404f1
505b92e
9144887
064de1e
790f62c
565aaf6
1cec356
e8cb6f9
2e89ce2
003c488
1929040
98f993c
173be81
85285f0
cb24aa3
fbc6ec3
bdc547c
f87a87b
d496221
35e7d31
1d6d5a6
6bc4a17
baf15c8
826032d
8b9761f
deece0f
d287a96
08ac70a
3ebbb12
688ecb7
4577c00
c610ec9
c837965
fa60b5d
6873d96
9f2aa92
c9304af
7b67cbf
79aa3ec
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be
true
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Good catch:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be useful to add to the docstring a comment about when this is safe to use? e.g. in general, if
d
has variable parameters, then this is unsafe, because it implicitly renormalizes the distribution, but the log of the new normalization constant is never computed so is never included in the log density or gradient, though it would need to be when the parameters are variable.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you say "variable parameters", do you mean that the size / length can vary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I mean parameters that are being sampled, optimized in the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you're saying
will be incorrect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so I understand what you mean by this now, but I'm struggling a bit to understand why this is an issue here but not for other "typical" diffeomorphisms. The jacobian correction is exactly so we don't have to worry about normalization, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well another way to put it is that the function is not in fact a diffeomorphisms, because it's injective, not bijective. The original distribution is defined on some support, but
ordered(...)
has a support that is a subset of the original distribution. So we're not just applying a bijective map and worrying about the change of measure. We're additionally restricting the support, which the Jacobian cannot capture. If we had a magic function that could compute normalization factors, then we would use that to compute the normalized logpdf after restricting the support, and then Ordered would indeed be a bijection.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this I can get behind for the ordered, but I don't get this for exp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same deal. Because you're taking a density whose support is the whole real line and using a change of variables from the real line to the positive reals (not the whole real line). So you're changing the support if you use
Exp
for this distribution.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, then I still don't get it. I'm with you that you're changing the domain, but$\exp: \mathbb{R} \to (0, \infty)$ is bijective, which are the domains we are interested in. Why does it matter that $\exp$ treated as a map from $\mathbb{R} \to \mathbb{R}$ is not bijective? o.O
Could you maybe write down exactly what you mean mathematically, or alternatively point me to a resource where I can read about this? I don't think I fully understand exactly what issue you're pointing to here, but really feel like this is something I should know so very keen on learning more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw just implemented sampling from the following model:
and even that fails to produce consistent results 😕
Are these two really supposed to be the same?
Here's the output of the above:
As you can see, the quantiles are different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hang on, is
transformed
even doing the right thing here? e.g. ifd
is an IID normal, then,ordered_b
is anOrderedBijector
that maps from unconstrainedy
to constrainedx
. So the result is an unconstrained distribution one samples from by first sampling from the IID normal and then constraining so the result is sorted. But this is not what we want. This is not the same thing as constrainingx
to be distributed according tod
but restricted to the ordered subset of the support. Instead, what we want here is something that wrapsd
but assigns it to have the bijectorordered_b
. Random generation is not possible, but log-density to within a normalization factor can be computed.e.g. something like
If you then use
return with_bijector(d, inverse(ordered_b))
in this line, your example in https://github.com/TuringLang/Bijectors.jl/pull/297/files#r1597099187 seems to work.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooookay so now I'm with you! I originally thought that this was just a transformation that moved us to an ordered distribution, but didn't necessary have any relation to the "base" distribution (until you're previous comment on the docstring clarifying what it was supposed to do), hence never considered this to be an issue.
But yeah this is a pretty significant bug then, and has been around for a while 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's definitely get some tests in the CI that would have caught this bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Working on a fix + tests as we speak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, something like
with_bijector
would be a convenient addition to the API for power users to manually select the bijector for a distribution. Applications include cases where the target distribution has high probability mass near a singularity of the default bijector l so one wants a custom bijector that moves the probability mass in the unconstrained space away from the singularity. Another application is programmatic benchmarking of alternative bijectors.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! But I think this is such a special case it warrants its own distribution (because we can implement
rand
using rejection sampling for several cases used in practice). But we should also add a more generalwith_bijector
method 👍But I tried implementing this, and I'm still seeing fairly drastic discrepancies in the quantiles 😕 Feeling a bit out of it today though so might just have messed up the impl or tests somewhere; let me know if you see anything obvious! I pushed the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, really? The example you posted earlier looks pretty good to me:
Comparing the quantiles, they're generally within 3 MCSEs of each other. Not really sure what the cut-off should be, but given that the MCSEs are themselves estimated using a method that uses an asymptotic approximation, I'd guess this is good enough:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My target was incorrect 🤦♂️ I implemented sequential rejection sampling, i.e. sample
x[1]
and then conditionally on this sample ordered vector using the marginals. This ofc leads to a different distribution.Using the correct rejection sampling approach now:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further progress: it now works for most things but getting correctness issues for product dist with negative support (see tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I just saw all your comments below 🤦♂️ Will address those soon, hopefully (seem to have caught something so a bit under the weather atm)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this also need implementation of
is_monotonically_decreasing
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep yep!