-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support fractional binding in algebras (spatial semantic pointers) #243
Conversation
Are the dtype and and fractional power changes in any way interdependent? Otherwise, it might be better to have two separate PRs to discuss the changes individually. My biggest questions at the moment are:
At the moment fractional binding seems much more useful to me (and to involve less complexity) than the dtype option. So I would probably add fractional binding (I have thought a little bit about it before), even though it makes the algebra interface even larger (but the documentation could provide some more details on which methods might be skipped when implementing a new algebra). I'm more sceptical about the dtype option. It seems to add quite a bit of complexity which is probably only used by very few users. |
Yep I can separate them. They should already be separate in the history except for a docstring in the documentation commit. But the dtype PR depends on this one to some degree as the
Yep that sounds right. I added logic for this to
Yes they should, and for VTB as well. But I'll add a check box to unit test this in the separate PR.
Right. We do have a network for fractional binding but I haven't personally tested it out to see how optimized / scalable it is... could add a baseline implementation that could likely be improved over time. Made a note.
Great. Yep the dtype was more for my own experimention and without a clear use case. I'll separate it out as a work in progress and then possibly revisit when there is a need. |
Split |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gave it a more detailed look. There a few points where the documentation might be improved a little bit and the question of how to name certain things. To me it seems that "autobinding" might be a better name than "fractional binding".
How useful is fractional binding with non-degenerate SPs for VTB? If each such vector is its own inverse, the applicability seems somewhat limited.
A neural implementation would definitely be nice, but I could be convinced that this is also useful without.
raise NotImplementedError() | ||
|
||
def make_nondegenerate(self, v): | ||
"""Returns a nondegenerate vector based on the vector *v*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a short explanation what make a vector (non)degenerate?
@@ -104,6 +104,38 @@ def bind(self, a, b): | |||
""" | |||
raise NotImplementedError() | |||
|
|||
def fractional_bind(self, v, exponent): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering a bit what the best name for this method would be. Could there be algebras that support multiple auto-bindings, but only with integer exponents (in which case it wouldn't really be fractional binding)? If so, I could see this being implemented via the same method, but you wouldn't necessarily be able to use that algebra everywhere (depending on whether non-integer exponents are used).
pow
would be analogous to the pow
function which this sort of binding is, but that might be not-obvious. What do you think of autobind
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plate (1995; section 3.6.5) calls this "the convolutive power" with "fractional exponents". The "fractional" term here is also consistent with fractional differentiation. It is a bit of a misnomer because "fractional" isn't limited to the rationals. However this naming appears to be used elsewhere as well in the same context (e.g., scipy.linalg.fractional_matrix_power), so to us it seemed reasonable to bring in the fractional term here as well, since the emphasis is on the same generalization. This is also the naming we went with in the cogsci papers.
Could there be algebras that support multiple auto-bindings, but only with integer exponents (in which case it wouldn't really be fractional binding)?
Probably. Binary Spatter Codes might be one such instance.
I could see this being implemented via the same method, but you wouldn't necessarily be able to use that algebra everywhere (depending on whether non-integer exponents are used).
If we get there, I could see a separate method being implemented for the integer case for at least two reasons. One, because it seems likely we'd want different neural (or non-neural) implementations for the two methods, and two because we might later want to allow the fractional exponent to be provided on-the-fly to the network by some other ensemble (for the fractional binding case, and likely not for the integer case). Then having them separate in the interface would help make these differences in what algebra supports what clearer to both developers and users (IMO).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are some good points for sure. Especially if this naming has been used in published literature before. Thinking a bit more about this, I might be less worried about the "fractional" part of the term, but more about the "binding" part. Binding is analogous to multiplication. But neither Plate nor in the matrix example are we talking about "fractional multiplication"; it is a "fractional power". Thus, it seems to mewe're missing a word analogous to "power" for "binding" that should be used here. Also, "binding" somewhat implies to me that something is bound to something else, i.e. there are two Semantic Pointers involved (but maybe others don't have that intuition).
If we want to provide separate implementations for integer and fractional powers (which might be reasonable), it raises another interesting question: There is only one **
operator. Which power would it represent? How would we handle the other variant if there were neural implementations (e.g. what would state ** scalar
mean, and how to get the alternate implementation).
Just want to make sure, the API and names are well thought out. Once it's in a release, it will be harder to change. :)
@@ -82,6 +91,28 @@ def bind(self, a, b): | |||
raise ValueError("Inputs must have same length.") | |||
return np.fft.irfft(np.fft.rfft(a) * np.fft.rfft(b), n=n) | |||
|
|||
def fractional_bind(self, v, exponent): | |||
r = np.fft.ifft(np.fft.fft(v) ** exponent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this work with irfft/rfft
? That would allow to return r
instead of r.real
and reduce the amount of operations by half. But would require to change the check below (I guess you need to check the complex angle of the returned Fourier coefficients, but that would more closely relate to the message anyways).
def fractional_bind(self, v, exponent): | ||
r = np.fft.ifft(np.fft.fft(v) ** exponent) | ||
if not np.allclose(r.imag, 0): | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether this should really be a hard error. From the error message it seems to me that it is a case one usually wants to avoid, but nothing that mathematically wouldn't work. However, the check of r.imag
suggests that we get imaginary components when we only want real-valued components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message here is confusing, as a complex angle of +/- pi isn't what causes imaginary components. Those are caused by a negative zero or Nyquist frequency. It seems like there are two types of 'degenerate' unitary vectors, ones with ambiguous output (only useful for integer powers) and those that produce output with imaginary components for non-integer powers. For the latter they could have a use when complex valued semantic pointers are implemented, but are not desired when working with purely reals.
if d % 2 == 0: | ||
v_fft[d // 2] = np.abs(v_fft[d // 2]) # +/- Nyquist frequency | ||
r = np.fft.ifft(v_fft) | ||
assert np.allclose(r.imag, 0) # this should never happen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment confused me ... if you are asserting the statement, it should always be true, shouldn't it?
Also, I'd prefer using rfft
and irfft
here which would avoid the issue altogether.
@@ -99,6 +107,30 @@ def bind(self, a, b): | |||
m = self.get_binding_matrix(b) | |||
return np.dot(m, a) | |||
|
|||
def fractional_bind(self, v, exponent): | |||
from scipy.linalg import fractional_matrix_power |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to document that this requires scipy?
@@ -41,7 +41,15 @@ class VtbAlgebra(AbstractAlgebra): | |||
|
|||
The VTB binding operation is neither associative nor commutative. | |||
|
|||
Publications with further information are forthcoming. | |||
Fractional binding for this algebra is currently an experimental feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the make_nondegenrate
method is part of this feature/also to be considered experimental?
The original object is not modified. | ||
|
||
A degenerate Semantic Pointer is one that cannot be fractionally bound | ||
using an arbitrary exponent in an unambiguous manner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't all unitary SPs u be degenerate by that definition because there will always be an exponent e such that u**e == u
(if I'm not mistaken)? And for non-unitary vectors, precision would become a problem if I recall correctly ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is true that there is an exponent such that u**e == u
, here I believe unambiguous is meant to mean there is only one output for a possible input. In the HRR case, fractional binding can be thought of as moving along a circle from an origin to the unitary vector taking the shortest direction. If one of the fourier coefficients has a complex angle of +/- pi, then it is on the opposite pole and ambiguous what direction is shortest. The output will depend on which way pi is rounded and could be different on different machines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, as Brent explains, we found this to be problematic when a complex value is at exactly -1
(a complex angle of +/- pi
). Epsilon floating point errors can make the difference between its square root (for instance) being +i
or -i
. When this happens, the inverse fft can be inconsistent with itself and produce imaginary values. Let us know if you think of any alternative workarounds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now have a better understanding of the issue. However, as far as I understand using a real valued Semantic Pointer as input, one half of the Fourier coefficients should be the complex conjugate of the other half and the exponentiation shouldn't change that; except numpy.fft.fft
is producing coefficients where the complex conjugate stuff doesn't hold if they are close to -1+0j
. In that case, I would ask why not use numpy.fft.rfft
to avoid the problem by having a single coefficient as source of truth (so to say). That would still leave some vectors ambiguous (where it also seems important to recognize that this ambiguity comes in degrees depending on the number of coefficients that are close to -1+0j
), but maybe it does not matter? You couldn't predict in what direction you would rotate on the complex circle, but usually vector are generated randomly anyways, so you wouldn't expect a particular direction. Moreover, this also seems to be an issue for integer binding, if you are a little bit off of -1+0j
and do multiple bindings.
assert np.allclose((x ** 0).v, algebra.identity_element(d)) | ||
assert np.allclose((x ** 1).v, (~x).v) | ||
assert np.allclose((x ** 2).v, (~x * x).v) | ||
assert np.allclose((x ** 3).v, (((~x) * x) * x).v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does x**1
give ~x
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to do with how I implemented fractional binding for VTB. Because it's repeated matrix multiplication, but a transpose is needed on one side (related to the non-commutativity). Sounds like @bjkomer has an alternative solution (#243 (comment)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably it was a mistake to define B(x, y) = V_y x
instead of B(x, y) = V_x y
. But well ... it seems to me that after doing the fractional binding, the identity is bound on the wrong side (B(fractional_bound_vector, identity)
instead of B(identity, fractional_bound_vector
)). Due to the definition, it might not be possible to do the binding on correct side directly, but applying the "swapping matrix" (Eq. 2.23 in the original VTB paper, there is an implementation in the algebra too) should do the trick. It's actually equivalent to the inversion matrix which explains the results above.
Still have to look at Brent's alternative solution.
assert np.allclose(algebra.make_unitary(x.v), x.v) | ||
|
||
# nondegenerate properties are more consistent with HRRs, but they always | ||
# oscillate with a period of 2 because V^2 = I |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why this happens, is this due to using a Householder matrix for the nondegenerate case? From my experiments using fractional VTB binding I got it to work with just unitary vectors, though it required a slight redefinition of the binding operator to make it commutative. I think this other fractional binding version may be useful to include as well, to allow unique powers greater than 2 and to not require negative powers to be equivalent to the positive ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is just what happens when using the householder matrix to generate those vectors. I don't think my solution is practical, but I didn't see another solution. Would you be open to submitting your version of VTB fractional binding (in this PR) to overwrite mine (and updating the unit tests to match their properties accordingly)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While making the tests I realized my version doesn't actually work for fractional powers for all seeds of unitary vectors, some have the dot product decay after repeated binds. Interestingly, it seems to work for exactly 50% of them (I got lucky/unlucky that everything I tried in the past worked). 100% of them work for repeated binds of integer powers. Definitely would be useful to characterize what makes them work and have a make_nondegenerate() based on that.
The binding change was simply to always bind with the inverse, since binding seemed to act more like subtraction, and 'flipping the sign' made it become more like addition and commutative. If you start by always binding on to the identity, you don't actually need this change.
I'll make a PR with the tests for now, haven't got around to a new make_nondegenerate() yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like it just needs to be unitary with a positive determinant to work. That explains the 50%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to think more about this degenerate vectors stuff and understand it better, but this statement seem weird to me:
If you start by always binding on to the identity, you don't actually need this change.
How could binding to the identity have any effect? Isn't the definition of the identity, that it does not change the bound SP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that with the current definition of binding in the VTB, there is actually no left-identity. Due to the orientation of the V_y'
matrix there can be no vector x
that selects the right elements from V_y'
, so that V_y * x = y
. This can be fixed by transposing V_y
as @bjkomer also pointed out in #247. I think this change would be desirable as it seems to give nicer mathematical properties. Though, we need should verify that this does not break properties in other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified that the plots from the original paper turn out the same even with the transpose. Still have to verify what the impact on the mathematical properties is (we already know that it results in a left-identity, but what about other things, e.g. does it influence commutativity?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's good to hear! I've played around with it a bit experimentally, and from what I've seen things are still non-commutative (except interestingly in 1D and 4D it is commutative for non-degenerate vectors, not true above that). Possibly there is a different higher dimensional subspace that is commutative, but "unitary with positive determinant" seems to be the only restriction that SSPs need for their properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commutativity and associativity properties are not changed by the transpose. However, I found one difference: there isn't a swapping matrix anymore, i.e. it is not possible to swap the operands in bound state.
To me it is not obvious whether one variant should be preferred:
- Original implementation which has a swapping matrix, but no left-identity (and probably no left-inverse, but I didn't verify this yet)
- Implementation with transposed binding matrix, which has not swapping matrix, but a left-identity, in fact left-identity = right-identity (I believe) and thus allows for proper fractional binding powers. (What about the left-inverse?).
Does any of you think that either implementation is strictly better? Otherwise, the best might be to add a third algebra as a variant of the existing VTB. It might make sense to have the original VTB not support fractional binding powers. Furthermore, the API should probably be changed to clarify what is a left or right identity (or both), though this will be a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, right now the implementation converts the matrix (obtained from the fractional matrix power) by multiplying with the left-identity (which is wrong). Wouldn't it be possible to take the matrix and reshape it back into vector?
How do we want to proceed with this PR? To me it seems that first a distinction of left- and right-identity/inverse etc. should be introduced as well as a VTB variant with the transposed matrix. These would be separate PRs (that I could do). Then two discussion points remain on this PR:
|
Just letting you know I haven't forgotten about this (in fact I've been using this branch quite regularly), but I have run out of time to look at requests here in detail. I will respond to this when I can. Thanks. Edit: Just FYI there's a simple formula for the dot product similarity between SSPs using the standard/nondegenerate base vectors (https://arxiv.org/abs/2007.13462). Maybe there's a natural place to incorporate or mention this somewhere. As a special case of this formula, the dot product between unitary |
Update from my side: I implemented changes to the API regarding left/right-side only special elements (PR #265). I'm now in the process of implementing a "transposed vector-derived transformation binding" (TVTB) algebra (essentially VTB, but with a transposed binding matrix). Doing so I noticed a few more interesting properties:
Once TVTB is done, I will look into getting binding powers/fractional binding into NengoSPA. |
I gained an (imo) exciting new insight into why there are "degenerate vectors" in fractional binding: they are the equivalent of negative numbers when doing powers of real numbers! 🤯 With real numbers One question I'm still pondering is, whether there is an easy way to determine the "sign" of a vector? My hypothesis (needs to be tested/proven though) is that the sum of the vector components equals the sign. |
The zero frequency of the DFT is always equal to the sum of the vector. That is, |
Yes, it seems that there in even dimensionalities one has to look at What I'm wondering about: Do we always have 4 distinct unitary SP regions for an even dimensionality (>=4) or does the number increase? 🤔 In the latter case, I suppose that increasingly more operations become necessary, but the procedure with those two operations seems to work for 64 dimensions (but it's no proof and I only tried a few instances). And what is the story for odd dimensionalities? It seems that the -1 multiply might suffice suggesting that there are only to distinct SSP regions.
What sort of relative structure is preserved? If I apply
|
Luckily we always do! The number of regions corresponds to the binary choice of +1/-1 for the constant fourier coefficient and +1/-1 for the nyquist fourier coefficient (the extra one that only shows up in even dimensions). So odd has exactly 2, and even has exactly 4 possibilities. These coefficients must all be positive for the unitary vector to be non-degenerate. Section 3.2 of my thesis goes into various properties/visualizations of these vectors.
By that I meant it ends up being just a simple translation of the space. I guess mirroring through the origin preserves the structure too, just a rotation instead of translation, but there could be other methods that don't preserve things. I can see an argument for either way of doing it, it depends what properties we consider important. I would think the one on the left would preserve distance and direction between nearby points converted, while the one on the right would only preserve distance. Would have to doublecheck though.
I'm surprised by this. Are these both degenerate unitary vectors? Actually, now that I think about it, they would have to be unitary vectors from the same 'region' for their relative properties to be preserved, and in this case I think they would be exactly preserved.
I think I missed what the shift was, I thought it was only multiplying by -1, or by shift do you mean the special case for even dimensionality?
I don't think we can make it truly invertible in the even-dimension case, because there are three spaces that map to one (unless we remember which one it came from). |
It's on my reading list.
There's a typo in the sentence. I meant to say 'some', not 'same' (I edited my post to correct this). Apart from that, in even dimensions you may need to shift the vector dimensions by one (or I guess any odd number) in addition to the -1 multiply.
Yes, you need to remember which operations you applied to be able to apply the inverse operations (or, if you now which distinct region you want to move to, you could derive it from that). But I think that's exactly the definition of an invertible function. |
Another way to obtain a non-degenerate vector is to square a unitary vector. |
Though a bit more about everything, took another look at the implementation of the Anyways, it might have taken a while, but it was important for me to think this through and really understand it. Thanks for the patience. :) |
I started implementing the notion of signs in #271. I will then continue to implement fractional binding powers, potentially cherry-picking from this PR. Though, this will be the "easy" part. Currently, I am still unclear on how to design the API for non-degenerate/"positive" vectors. Suggestions welcome! |
While #271 isn't completely ready yet, the fundamentals of the fractional binding can already be reviewed there. |
Superseded by #271. |
Motivation and context:
Fractional binding supports encoding and decoding of continuous quantities, which is useful when representing things like spatial maps (Komer et al., 2019). This is proposed as a core
nengo-spa
change since it is natural to have the pow (**
) operator implement fractional binding.In addition, a
dtype
is added to theSemanticPointer
class to enable experimentation with other datatypes including complex semantic pointers. Feedback is welcome on whether or not this addition is useful or necessary. Likewise, fractional binding with the VTB algebra is experimental and the implementations may be subject to change.How has this been tested?
Certain properties for both HRR and VTB are tested for when they hold or don't hold.
How long should this take to review?
Where should a reviewer start?
Can start by looking at the algebra files to see how the methods are implemented. Then the unit tests to see how properties are validated.
Types of changes:
Checklist:
Still to do:
Moved to #245:
Ensure that dtype is included everywhere (e.g., vocabulary?)Unit test complex dtypes with all operations.