-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Input argument dimensions and calculating DiffusionEMD between vectors in a feature matrix #5
Comments
Based on some simple tests, I think this is the general workflow I'm going for, but still not sure exactly how to handle labels: # Modules
import DiffusionEMD
from DiffusionEMD import DiffusionCheb
import numpy as np
from scipy.spatial import distance_matrix
#Setup
npts = 10
data = np.random.randn(npts, 3)
adj = graphtools.Graph(data, use_pygsp=True).W
labels = np.ones([npts,1]) # ???
#DiffusionEMD
D = DiffusionCheb()
embeddings = D.fit_transform(adj, labels)
dm = distance_matrix(embeddings, embeddings, p=1) At the very least, I'm getting the output size that I would expect. It does make me wonder about how labeling points as part of a single vs. several distributions affects the computation. Is there somewhere in the paper that addresses this? I haven't dug into the details too much yet. |
If you are trying to calculate the EMD between diracs (treating each point as a separate distribution), then you would want In general I'm surprised this example is definitely bugged, I would expect embeddings to be To me I would interpret Feel free to email me at |
Curious if you figured out where the np broadcasting issue might be (if that's the case). The shapes were a bit confusing to me as well. Thanks for the suggestions! To make sure we're on the same page (and if you have time), do you know how these parameters would correlate with |
TL;DR how to compute the earth mover's distance between arbitrary vectors using DiffusionEMD?
I'm going through the two notebooks (Line and Swiss) as well as testing on my own data. Part of this relates to #4 and defining a function with similar input/output to e.g. sklearn distance metrics.
Let's say you have 10 points embedded in 3 dimensions. We'll call this
data
, and say that it has 10 rows and 3 columns.data = np.random.randn(10, 3)
Based on the Jupyter notebooks, it seems like the first argument of
DiffusionCheb()
andDiffusionTree()
are the adjacency matrices (I couldn't find any documentation on this otherwise).The size of the adjacency matrix is the same as the number of vertices on a graph, so I believe this should be done by e.g.
adj = graphtools.Graph(data, use_pygsp=True).W
which produces a
10 x 10
matrix.The Line and Swiss examples show
ds.labels
as the 2nd argument to DiffusionCheb(). For Line, I notice thatself.labels = np.eye(N)
, whereas for the Swiss Rollself.labels = np.repeat(np.eye(n_distributions), n_points_per_distribution, axis=0)
. For the Line example, if there are 100 points, thenlabels
is100 x 100
, which is just an identity matrix.Assuming 5 points per distribution, this looks like:
So as the name implies, it's labeling each point by which distribution it came from. Assuming there are 50 distributions, naturally there will be 50 columns.
For the
10 x 3
data
I mentioned, I think I have the adjacency matrix correct, but what about the label matrix? Assuming it's unlabeled data, is it justnp.eye(data.shape[0])
? Or is it something else, such asnp.ones([data.shape[0],1])
? If I'm understanding correctly, these two cases would suggest that each point comes from a separate distribution and that each point comes from the same distribution, respectively. What I'm hoping to calculate is the earth mover's distance betweendata[0]
anddata[1]
, etc.Any help would be appreciated. Thanks!
The text was updated successfully, but these errors were encountered: