-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
som coordinates #6316
som coordinates #6316
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #6316 +/- ##
==========================================
+ Coverage 87.44% 87.48% +0.03%
==========================================
Files 316 316
Lines 67926 67971 +45
==========================================
+ Hits 59399 59462 +63
+ Misses 8527 8509 -18 |
@@ -774,6 +774,7 @@ def _assign_instances(self, weights, ssum_weights): | |||
for i, (x, y) in enumerate(assignments): | |||
members[(x, y)].append(i) | |||
members.pop(None, None) | |||
self.coordinates = np.array([[x, y] for i, (x, y) in enumerate(assignments)], dtype=int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- What am I overlooking? Doesn't this just copy assignments, as with
self.coordinates = assignments.copy()
? - If you do this here, it will record coordinates on each step of optimization. I think this is not what you want, because coordinates are only needed for output.
self.coordinates
stores the same information that is already stored inself.member_data
andself.cells
(combined). In my experience, it is better to have a single source of truth; if you store data in different places it may be too difficult to keep it synchronized and all too easy to forget to remove it. Case in point, methodclear
should includeself.coordinates = None
.
I suggest removing this line and reconstruct coordinates in update_output
.
More so because reconstructing coordinates from self.member_data
and self.cells
in one sweep (without loops) could be a great exercise in numpy
indexing, I think. :)
@@ -824,17 +825,20 @@ def update_output(self): | |||
rows = self.get_member_indices(x, y) | |||
indices[rows] = self.selection[x, y] | |||
|
|||
new_data = self.data.add_column(ContinuousVariable(name='som-x'), self.coordinates[:, 0], True) | |||
new_data = new_data.add_column(ContinuousVariable(name='som-y'), self.coordinates[:, 1], True) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one way of doing it, but I think it's missing great opportunity.
The new variables could include compute_value
that would compute the instance's cells based on its data. That is, given data, it would collect the same attributes as those used in this SOM projection and call SOM.winner_from_weights(self.cont_x, weights, ssum_weights, self.hexagonal)
. Using SharedComputeValue
this would be efficient.
With this we gain the possibility of running SOM on some data and then assign cells and coordinates to another, independent data. This can be great for any kind of testing, ensuring the sensibility/stability of SOM. For instance, your modifications will allow seeing SOM in, say Scatter plot, so one will be able to observe whether new data clusters similarly as the one used for training.
However, at this point, the cell assignments for "training" data are known and stored (self.cells
and self.member_data
), hence there's no need to recompute them. So this code here can stay as it is, and compute_value
can be added later.
If you're coming tomorrow, @markotoplak or I (but I'm not yet sure I'll come) can help you with that. It's quite easy, but requires knowing one specific mechanism in Orange. We can also help with some fancy numpy indexing to reconstruct coordinates. :)
@@ -824,17 +825,20 @@ def update_output(self): | |||
rows = self.get_member_indices(x, y) | |||
indices[rows] = self.selection[x, y] | |||
|
|||
new_data = self.data.add_column(ContinuousVariable(name='som-x'), self.coordinates[:, 0], True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A variable with this name may already exist in the data. You should use
name_x, name_y = get_unique_names(self.data.domain, ("som-x", "som-y"))
which will append numbers (e.g. som-x (3), som-y (3)) if needed.
I can't get this to work... It is advantageous to export model SOM coordinates (called Best Matching Unit BMU and similar to the U-matrix eg. for t-SNE). It looks like OWSOM was made to address this problem. I am not a programmer. I tried importing the OWSOM.py script into Orange Canvas and then fed this widget with data. However, it does not seem to have any output when I look at the results with a data table widget. Ideally, the export data would be the same as the input data plus columns for SOM-X and SOM-Y coordinates. Maybe I am just misunderstanding how to use OWSOM? Help gratefully received. As an aside it would be great to have a advanced usage tutorial on two-step clustering using SOM i.e having a large SOM (Emergent) which is then clustered (i.e DBScan or k-means)? |
Orange's code is huge and any programmer would need a lot of time to get used to it. What you've done is pretty good! It should be done differently, though. I opened an issue (#6540) as a reminder and somebody (probably me) will do it soon. |
Issue
Description of changes
Added SOM network coordinates to output of SOM widget.
Includes