Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

som coordinates #6316

Closed
wants to merge 1 commit into from
Closed

som coordinates #6316

wants to merge 1 commit into from

Conversation

lucixhub
Copy link
Contributor

Issue
Description of changes

Added SOM network coordinates to output of SOM widget.

Includes
  • Code changes
  • Tests
  • Documentation

@codecov
Copy link

codecov bot commented Jan 25, 2023

Codecov Report

Merging #6316 (49c7b8b) into master (da5fd99) will increase coverage by 0.03%.
Report is 343 commits behind head on master.
The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6316      +/-   ##
==========================================
+ Coverage   87.44%   87.48%   +0.03%     
==========================================
  Files         316      316              
  Lines       67926    67971      +45     
==========================================
+ Hits        59399    59462      +63     
+ Misses       8527     8509      -18     

@@ -774,6 +774,7 @@ def _assign_instances(self, weights, ssum_weights):
for i, (x, y) in enumerate(assignments):
members[(x, y)].append(i)
members.pop(None, None)
self.coordinates = np.array([[x, y] for i, (x, y) in enumerate(assignments)], dtype=int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. What am I overlooking? Doesn't this just copy assignments, as with self.coordinates = assignments.copy()?
  2. If you do this here, it will record coordinates on each step of optimization. I think this is not what you want, because coordinates are only needed for output.
  3. self.coordinates stores the same information that is already stored in self.member_data and self.cells (combined). In my experience, it is better to have a single source of truth; if you store data in different places it may be too difficult to keep it synchronized and all too easy to forget to remove it. Case in point, method clear should include self.coordinates = None.

I suggest removing this line and reconstruct coordinates in update_output.

More so because reconstructing coordinates from self.member_data and self.cells in one sweep (without loops) could be a great exercise in numpy indexing, I think. :)

@@ -824,17 +825,20 @@ def update_output(self):
rows = self.get_member_indices(x, y)
indices[rows] = self.selection[x, y]

new_data = self.data.add_column(ContinuousVariable(name='som-x'), self.coordinates[:, 0], True)
new_data = new_data.add_column(ContinuousVariable(name='som-y'), self.coordinates[:, 1], True)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one way of doing it, but I think it's missing great opportunity.

The new variables could include compute_value that would compute the instance's cells based on its data. That is, given data, it would collect the same attributes as those used in this SOM projection and call SOM.winner_from_weights(self.cont_x, weights, ssum_weights, self.hexagonal). Using SharedComputeValue this would be efficient.

With this we gain the possibility of running SOM on some data and then assign cells and coordinates to another, independent data. This can be great for any kind of testing, ensuring the sensibility/stability of SOM. For instance, your modifications will allow seeing SOM in, say Scatter plot, so one will be able to observe whether new data clusters similarly as the one used for training.

However, at this point, the cell assignments for "training" data are known and stored (self.cells and self.member_data), hence there's no need to recompute them. So this code here can stay as it is, and compute_value can be added later.

If you're coming tomorrow, @markotoplak or I (but I'm not yet sure I'll come) can help you with that. It's quite easy, but requires knowing one specific mechanism in Orange. We can also help with some fancy numpy indexing to reconstruct coordinates. :)

@@ -824,17 +825,20 @@ def update_output(self):
rows = self.get_member_indices(x, y)
indices[rows] = self.selection[x, y]

new_data = self.data.add_column(ContinuousVariable(name='som-x'), self.coordinates[:, 0], True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A variable with this name may already exist in the data. You should use

name_x, name_y = get_unique_names(self.data.domain, ("som-x", "som-y"))

which will append numbers (e.g. som-x (3), som-y (3)) if needed.

@seain3d
Copy link

seain3d commented Aug 18, 2023

I can't get this to work... It is advantageous to export model SOM coordinates (called Best Matching Unit BMU and similar to the U-matrix eg. for t-SNE). It looks like OWSOM was made to address this problem. I am not a programmer. I tried importing the OWSOM.py script into Orange Canvas and then fed this widget with data. However, it does not seem to have any output when I look at the results with a data table widget. Ideally, the export data would be the same as the input data plus columns for SOM-X and SOM-Y coordinates. Maybe I am just misunderstanding how to use OWSOM? Help gratefully received. As an aside it would be great to have a advanced usage tutorial on two-step clustering using SOM i.e having a large SOM (Emergent) which is then clustered (i.e DBScan or k-means)?

@janezd janezd reopened this Aug 18, 2023
@janezd janezd closed this Aug 18, 2023
@janezd
Copy link
Contributor

janezd commented Aug 18, 2023

I am not a programmer.

Orange's code is huge and any programmer would need a lot of time to get used to it. What you've done is pretty good!

It should be done differently, though. I opened an issue (#6540) as a reminder and somebody (probably me) will do it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants