Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] 4+ Angle Polarisation widget #615

Closed
wants to merge 51 commits into from

Conversation

callumgassner
Copy link
Contributor

Introduces a widget ("4+ Angle Polarisation") to calculate molecular orientation information from polarised vibrational spectroscopy methods collected at 4 or more polarisation angles.
Adds features to hyperspectra which compliments results from 4+ angle polarisation widget:

  • Added ability to overlay a vector field on hyperspectral images
  • Added "Jet", "Viridis" and "HSV" matplotlib colour maps

@markotoplak
Copy link
Collaborator

@callumgassner, thanks a lot.

I did not check it yet. :)

Could you perhaps describe how we could most easily test the functionality of this widget manually: which data files can we use, how should we set the widget and what should we see. The datasets you added to the PR could work, of course.

Regarding the data for automated tests and demos; could you make them smaller? Now they take 4MB for this feature alone, which would double the distribution size. I believe they could be always made smaller by taking a subset of wavenumbers or not? If I understood the method, wavenumbers are processed independently so this should not influence results in any way.

Also, please store them in some other format than pickle, preferably in Orange's tab format. The pickle file format is the least stable (in terms of how often it changes).

@callumgassner
Copy link
Contributor Author

The 4+ Angle Polarisation (polar) widget can be manually tested with the datasets in the PR (the 4 files named 4-angle-ftir_multiin1.tab, ... 4.tab).
As outlined in the documentation, these datasets can be loaded into quasar and passed to the polar widget 1 at a time with the "File" widget or all at once with the "Multifile" widget.
When using the former, connect the file widget which loaded the file 4-angle-ftir_multiin1.tab to the polar widget first, then 4-angle-ftir_multiin2.tab, and so on. Using this method, as you connect the file widgets a list should be constructed within the "inputs" section of the polar widgets GUI showing the connected widgets name and a box below each name to enter in the polarisation angle at which this data was collected. The polarisation angles will be entered automatically with a spacing of 180/# of inputs.
Alternatively, using the multifile widget to load the data, when the multifile widget is connected to the polar widget the drop down menu in the Inputs section of the polar widget becomes enabled. This allows the data to be sorted into the individual polarisation angles by any categorical variable - in this instance the Filename variable. As with the previous method, once the filename variable is selected from the menu a list is created, this time with each category as a separate polarisation angle.

For testing the resulting data from this widget, within the features column select the features 1214.0 - 1258.0 and 1500.0 - 1580.0, set the X and Y axis to map_x and map_y, respectively. The alpha value should be 0 and the invert angles checkbox should be selected. Click apply and connect the "Polar Data" output of the polar widget to a hyperspectral widget.
Within the hyperspectral widget, use the "viridis" colourmap and plot the colourmap with either "Intensity" feature. To plot the vector plot, select the "plot vector overlay" checkbox, use the Azimuth Angle as the Vector angle and the Amplitude as the Vector magnitude - the scale slider can be used to reduce or increase the respective lengths of the vectors if they are too long. Results should be almost identical to the below image:

image

@markotoplak I have added a new dataset for this widget with your concerns (hopefully) addressed.

@markotoplak markotoplak changed the title 4+ Angle Polarisation widget [ENH] 4+ Angle Polarisation widget Jul 25, 2022
@markotoplak
Copy link
Collaborator

Hm, for me the tests work locally, so I do not quite understand why they complain about a missing file here.

@markotoplak markotoplak mentioned this pull request Jul 25, 2022
@markotoplak
Copy link
Collaborator

I have finally been looking into this PR. Works well, thanks! This is how I used the widget.

a

I like how flexible it is in terms of inputs.

It would probably be better if the "Filename" column coming out of the Multifile widget was made Discrete in the future, right? Then that would save me one widget when used with Multifile. I think we could easily change this in the future. I can not think of anything else that would break because of the change.

This widget outputs all original data + new fitted polarization stuff. Therefore, the hyperspectral image itself is outputted 4-times, each time slightly different for a different angle. In terms of how informative the output is, this is perfect.

Do you think that an option to have intensities averaged in the new widget would help users? That would not influence resulting angles and intensities in any way but might give a friendlier output. Now, even I do not know what exactly HyperSpectra shows when given for 4 variants of an image (I know I could look it up, but normal users would not :) ). I could control HyperSpectra by using select rows with a specific angle criterium.

@markotoplak
Copy link
Collaborator

Some other bugs/ideas:

  • If I save the previous workflow and opes, both 4+pol widgets lose some settings.
  • If I reload the above workflow, the multi-input variant still shows the "Filename" as a correct separator, but for the widget to actually work, I need to reselect "Filename" so that the angles are shown.
  • The separate file option only loses the chosen features.
  • It would be cool if the Hyperspectra's vector angle and vector azimuth had the "None" options which would just mean something constant. For some other kinds it might be useful to show just the angle, it gives you a cool extra dimension on the graph.

Can you verify if the settings issues are also happening to you? Were the things I notes ad setting bugs perhaps intentional? I do understand that Orange settings are difficult to get correct (I, at least, always struggle) so I can try fixing them myself. Did you also notice any other issues regarding settings that I missed?

I attach the workflow I played with. The data I used were your 4 sample files. untitled.ows.zip

-------
MultiFile Input

![](images/Polar-Example1.png)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check that your images use the exact filename: also lower and uppercase letters are important on some platforms (Linux). Two images committed have ".PNG" (uppercase) extension, so these references need to be fixed.

@markotoplak
Copy link
Collaborator

Also, before doing any new changes it would be best to get rid of merge commits in this branch. Updating your master branch to the latest version and then doing get rebase master in this branch worked for me.

@markotoplak
Copy link
Collaborator

@callumgassner, I fixed the "Features" setting saving in my "Test" PR. The other one, single input initialization remains. I'll fix that next.

All Orange widgets are meant to be written to be canvas-independent; as long as they get the inputs, that is fine. But this widget also ask the schema who is sending stuff onto its input. I understand why this is the case, it is so that you can label angles of inputs so that they do not get mixed-up.

This is a no-go as it breaks assumptions of what are widgets allowed to do (and might also hide bugs - what is the order is different and the labels are thus invalid?). Instead, we could:

  • Use data.name and name accordingly. Widget name won't be shown anywhere.
  • What we could do instead, in the future, is to think how to extend signalling so that name is also passed onto the input. But that change needs to go into orange-widget-base or orange-canvas-core.

@codecov-commenter
Copy link

Codecov Report

Merging #615 (4cd35b6) into master (3f8a0ea) will decrease coverage by 0.59%.
The diff coverage is 77.68%.

@@            Coverage Diff             @@
##           master     #615      +/-   ##
==========================================
- Coverage   89.06%   88.46%   -0.60%     
==========================================
  Files          71       72       +1     
  Lines       11767    12483     +716     
==========================================
+ Hits        10480    11043     +563     
- Misses       1287     1440     +153     

@callumgassner
Copy link
Contributor Author

@markotoplak, thank you for having a look at this pr.

First off, your fix for the features setting looks to work well - I have made these changes in the latest commit. I worked today on getting the single input initialization to save to settings. Changing the "angles" variable to a ContextSetting and adding a call to _change_angles() in line 720 appears to almost fix this, however, if one changes "Filename" from a discrete variable to a text variable when a new context is opened it tries to use this text variable (which is not in the combobox's model) and so throws an error. I could not rectify this today but will give it a shot late next week.
I had a feeling that using the widgetscheme to label the inputs in the widget wouldn't be something that was acceptable. I wish I had realised that the input tables had a .name variable I could use - this is what I was trying to do from the start but had no idea. Sadly I spent a lot of time on this, such a simple solution!! I have removed the widget labelling the inputs and am now rather using the data.name method you suggested. Note that the test for this widget is based on using a mock scheme so the tests will probably now fail, I will rewrite this.

And to answer your questions in previous posts:
Having Filename as a discrete variable would be preferred, usually I manually change this in the multifile widget with any work I am doing with quasar.
I agree that having the widgets output being essentially outputted 4 times feels a bit odd. The reason I do this is to preserve the spectra from the input datasets. Personally, I think being able to observe all spectra on the output is important to understand the data properly, but some users might prefer to have the output spectra averaged.
Adding a None option to the vector plots should be straight forward - what angle should the azimuth default to though? probably horizontal (0 deg) or vertical (90 deg).
I can confirm that the issues you had with the settings happen with me as well. This was something I encountered early on and put into the too hard pile and evidently forgot about (especially the features column - thank you again for fixing this!!!).

I tried merging the polar branch in my fork to my master branch using git rebase master but I'm pretty sure I am missing something as it does not appear to have made any difference to this pr.

@markotoplak
Copy link
Collaborator

I worked today on getting the single input initialization to save to settings. Changing the "angles" variable to a ContextSetting and adding a call to _change_angles() in line 720 appears to almost fix this, however, if one changes "Filename" from a discrete variable to a text variable when a new context is opened it tries to use this text variable (which is not in the combobox's model) and so throws an error. I could not rectify this today but will give it a shot late next week.

I'll look into this. Context should (theoretically) handle the different types if used properly. I do not know, perhaps they have parameters?

I had a feeling that using the widgetscheme to label the inputs in the widget wouldn't be something that was acceptable. I wish I had realised that the input tables had a .name variable I could use - this is what I was trying to do from the start but had no idea. Sadly I spent a lot of time on this, such a simple solution!! I have removed the widget labelling the inputs and am now rather using the data.name method you suggested. Note that the test for this widget is based on using a mock scheme so the tests will probably now fail, I will rewrite this.

I saw how much care went into testing this... Sorry for not telling you about this earlier. :) Having widget names on the input in some easy consistent way would be indeed useful in this case; much more so than that .name.

Having Filename as a discrete variable would be preferred, usually I manually change this in the multifile widget with any work I am doing with quasar.

Yea, let's change it.

I agree that having the widgets output being essentially outputted 4 times feels a bit odd. The reason I do this is to preserve the spectra from the input datasets. Personally, I think being able to observe all spectra on the output is important to understand the data properly, but some users might prefer to have the output spectra averaged.

Introducing the average option or not is your call. I agree it is imperative to have to option to see the data. The thing is that the HyperSpectra is now undefined. I think it shows the last datapoint for that pixel, so it might even be misleading.

Adding a None option to the vector plots should be straight forward - what angle should the azimuth default to though? probably horizontal (0 deg) or vertical (90 deg).

No idea. It could even draw a different shape, perhaps a circle? Anyway, this is optional, I was just thinking about reusing the same interface for other types of data. Perhaps @borondics has an idea?

I tried merging the polar branch in my fork to my master branch using git rebase master but I'm pretty sure I am missing something as it does not appear to have made any difference to this pr.

Hm, did you then push? Anyway, do not worry about it.

@markotoplak
Copy link
Collaborator

Regarding your last commit: you misapplied my suggestion, so saving features does not work (yet): feats should become context settings. The easiest way to avoid such errors in the future is to use git cherry-pick. If you fetched my test branch onto your computer, you could just then do git cherry-pick my-commit-hash onto your branch

In 99% of cases, any setting that is connected to the variables needs to be a context setting. These then handle that settings are never loaded if needed features do not exist.

To explain my other changes: settings that should never become a general default are best saved as schema_only.

Then, why are you using some settings with packable=False? I do not understand why should it be a setting.

@markotoplak
Copy link
Collaborator

Also, SharedMemory should not be given names when creating them. As it is now, we can get conflicts when multiple widgets are running or when something did not exit cleanly. Now I triedThe construction of memory blocks should leave the name empty: this guarantees that it receives some unique ID. This IDs will then have to be passed to child processes.

Now I'll have to restart my computer because some lingering process is still using the memory and I can not test the widget. :)

@markotoplak
Copy link
Collaborator

I fixed the last mentioned (naming SharedMemory) issue in my test branch because it was seriously bothering my testing. Please cherry-pick my commit onto this branch (do not copy-paste code, please). If you'd like to, I could edit this branch directly, but then you'll have to be careful with synchronization.

The issue partly still remains. If something crashes within the computation, the allocated memory is not always unlinked (as was seen when the error was bothering me). Therefore we now have a known memory leak.

@markotoplak
Copy link
Collaborator

I also tried fitting by passing the jaccardian, which avoids some approximation within the curve_fit algorithm so it should be more stable. Fortunately, the azimuth was simple enough so even I could compute its derivative.

It is very slightly faster. It does do fewer iterations though... See the commit in the test PR. I'd suggest we use it because it should be theoretically more stable.

@callumgassner
Copy link
Contributor Author

Regarding your last commit: you misapplied my suggestion, so saving features does not work (yet): feats should become context settings. The easiest way to avoid such errors in the future is to use git cherry-pick. If you fetched my test branch onto your computer, you could just then do git cherry-pick my-commit-hash onto your branch

woops, sorry about that! I had been copying-pasting code as you figured out, must have missed a line. I'm pretty sure I have git setup properly now so I will use cherry-pick from now on.

In 99% of cases, any setting that is connected to the variables needs to be a context setting. These then handle that settings are never loaded if needed features do not exist.

To explain my other changes: settings that should never become a general default are best saved as schema_only.

Then, why are you using some settings with packable=False? I do not understand why should it be a setting.

Good point, I reviewed these settings and I am not sure why I made them this way. I have just made these as normal lists now. I have manually tested the widget and making these variables lists does not appear to break anything. As far as I can tell, I made these variables as settings with packable=False because LineEditWFocusOut and QLabel are not pickable - why they were settings at all though I can not say.

Also, SharedMemory should not be given names when creating them. As it is now, we can get conflicts when multiple widgets are running or when something did not exit cleanly. Now I triedThe construction of memory blocks should leave the name empty: this guarantees that it receives some unique ID. This IDs will then have to be passed to child processes.

Now I'll have to restart my computer because some lingering process is still using the memory and I can not test the widget. :)

I fixed the last mentioned (naming SharedMemory) issue in my test branch because it was seriously bothering my testing. Please cherry-pick my commit onto this branch (do not copy-paste code, please). If you'd like to, I could edit this branch directly, but then you'll have to be careful with synchronization.

The issue partly still remains. If something crashes within the computation, the allocated memory is not always unlinked (as was seen when the error was bothering me). Therefore we now have a known memory leak.

Thanks for fixing this and sorry for the inconvenience. This was an initial concern of mine, however no matter how hard I tried to get 2 widgets to write to the same memory I could not - and so I had assumed the array in sharedmemory relied on the buffer rather than the name. Also when any process had an error or crashed the memory would be cleared when force closing quasar. This is absolutely an issue though and I will try come up with a solution, initial thoughts is to use a manager to monitor the sharedmemory/processes. I will need to be able to recreate the issue first - any suggestions on how to recreate this? or will it simply be an issue between unix and windows?

I also tried fitting by passing the jaccardian, which avoids some approximation within the curve_fit algorithm so it should be more stable. Fortunately, the azimuth was simple enough so even I could compute its derivative.

It is very slightly faster. It does do fewer iterations though... See the commit in the test PR. I'd suggest we use it because it should be theoretically more stable.

Happy to keep this in :) I am not very familiar with how this works but looks like it produces similar results and if it is more stable that is definitely a plus.

@markotoplak
Copy link
Collaborator

I'm pretty sure I have git setup properly now so I will use cherry-pick from now on.

Now it seem to work properly. Isn't it cool? :)

Good point, I reviewed these settings and I am not sure why I made them this way. I have just made these as normal lists now. I have manually tested the widget and making these variables lists does not appear to break anything.

We only need something to be a Setting if saving/loading it makes sense.

Thanks for fixing this and sorry for the inconvenience. This was an initial concern of mine, however no matter how hard I tried to get 2 widgets to write to the same memory I could not - and so I had assumed the array in sharedmemory relied on the buffer rather than the name. Also when any process had an error or crashed the memory would be cleared when force closing quasar. This is absolutely an issue though and I will try come up with a solution, initial thoughts is to use a manager to monitor the sharedmemory/processes. I will need to be able to recreate the issue first - any suggestions on how to recreate this? or will it simply be an issue between unix and windows?

Hm, yes, these things use system internals and there could easily be a difference between windows and linux. I think the only sure way to guarantee that some code runs is to put it into the finally block. I'll play with this.

Happy to keep this in :) I am not very familiar with how this works but looks like it produces similar results and if it is more stable
that is definitely a plus.

Most optimization techniques compute approximate derivatives so that they know where to move to. But if we can compute them it makes there lives easier.

@markotoplak
Copy link
Collaborator

In my test PR I changed shared memory initialization to something that should be a bit more robust. Could you perhaps now also update tests, please?

@@ -838,11 +838,16 @@ def commit(self):
else:
sorted_data = self.sorted_data

self.setInvalidated(True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the way to do it.

Just calling self.cancel() in the commit function (that calls the start) would be sufficient. Please try that instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Marko,

I originally tried;
'def cancel(self):
super().__cancel_task(wait=True)'
in hopes that it would wait for the tasks to complete. The issue I was having with that is if the inputs were changed when the code was running in the sharedmemorymanager block, the code would continue but with empty data and so the curve fit procedure would try to fit to an empty array. I defined the function beneath 'onDeleteWidget'.

I would have though simply calling 'self.cancel()' in 'commit' would always try and stop the process?

Copy link
Collaborator

@markotoplak markotoplak Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you reimplemented cancel nothing would happen unless you call it. And likely this is not needed.

Adding cancel() to commit should ensure that the new process is stopped before running a new one if that was the only issue.

Theoretically you could run cancel in the input handling functions, but as it seems you pass all needed data into start_compute and do everything within (which is the correct way to do it), the actual objects are unchanged that should be not needed. If, on the other hand, we would be passing lists that are only edited when a new input arrives and not reinitialized, the particular list withing the computation (its object) would be the same, thus edited, thus problems. So we would actually have to avoid that.

So not adding cancel() to the inputs should not affect crashes if the rest is written well.

For this kind of issues I'd advise first making a test case in test_owpolar that showcases the error and then fixing the code so that it disappears. That would also allow me to easily replicate your issue and debug it. And then it is good to have situations that we know were problematic if we are at any later time editing the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this another shot today. Unfortunately calling cancel() doesn't resolve the issue :(

I did some testing and what I found was:

  • If the input is disconnected before the computation processes have started the code appears to continue until the with block is finished, but a ValueError (operands could not be broadcast together with shapes (0,) (4,)) is eventually thrown with the traceback originating from line 240: params = curve_fit(azimuth, x, temp, jac=azimuth_jac)[0]. So it looks like the data within the sharedmemory arrays are being erased... however, the allocated memory is not being cleared as seen by a steadily increasing memory usage with repeated tries.
  • If the input is disconnected after the computation processes have started, no error is given, but the allocated memory is still not released.
  • In both of the above cases, no code is executed after the with block
  • The SharedMemoryManager is created within a new child process, and so to me this seems logical that code within the with block continues to execute.

From this, it would seem to me that this is occurring, but I am not entirely sure how...

If, on the other hand, we would be passing lists that are only edited when a new input arrives and not reinitialized, the particular list withing the computation (its object) would be the same, thus edited, thus problems. So we would actually have to avoid that.

I have added a "kind-of" test for this in test_owpolar. It does not result in a failure of the test, but it does show the ValueError mentioned above. As the error occurs in a child process it is not so easy to catch to result in the test failing.

I will continue to try find another work around for this, but if there are any suggestions they would be greatly appreciated!

@markotoplak
Copy link
Collaborator

@callumgassner, I have merged my "test" branch with your latest commits and I put them here. I think everything should be here. I rewrote history a bit so that the starting commits are better organized. I also made sure that the first version of test data is nowhere in the commits.

We seem to work in compatible (non-overlapping) timezones so working on the on the same branch might be easier. :) The easiest way to checkout is deleting (or renaming) the branch on your computer and the checkout it out a-new. And no panic, there is a backup of the old branch in https://github.com/markotoplak/orange-spectroscopy-fork/tree/callum-polar-20220825.

I refactored owhyper additions a bit. I tried to make the code simpler because that widget is already complex so I try to keep it as simple as possible. Could you double check my commits, please, so that we are sure I did not remove any important functionality or introduce new bugs?

Regarding the bug with context application: I believe introducing something analogous owhyper: properly initialize vector feature combos might help so that no other additions are needed. That way ensures valid starting values. I have not checked proper cancelling yet, but I do get the warnings in the new test.

@markotoplak
Copy link
Collaborator

I tried solving some new signal issues. The test does not warn anymore, so it is probably a step into the right direction. The issue I saw in code was that these lists would be edited in-place when new data was connected and the computation thread used a reference to them so it would get confused because they changed. Copying them solves some issues, but there might be some outstanding ones.

In particular this does nothing for memory leaks. Or perhaps, if they were caused by crashes, it might...

markotoplak and others added 23 commits September 1, 2022 16:23
This is just a column selection and is fast, so there is no need to
perform it in a separate thread. Also, if users now select different
vector features, the image data is not recomputed.
Potentially fixes an issue which persisted where contexts would be loaded when variables did not exist in the domain
Prevents updating data in the widget while tasks are being run.
sharedmemory blocks were previously being erased, leaving processes with no data
NB: Test does not fail, but ValueError does appear in output
Some lists are edited in place, therefore copy them before sending them
into the computation thread.
@stuart-cls
Copy link
Member

I finally had a chance to try this out in the last few weeks with some data. Very nice!

I have a few general comments:

  • I agree with @markotoplak that the multi-input handling stuff is very slick!
  • I don't love the vector plotting gui being always visible, it's a lot of controls for a niche application. Could we put the vector plot checkbox in the first plot control box and then show/hide the vector controls when it's enabled?
  • Do we have to add the jet colormap? So far we avoided the bruker rainbow :D I'm also not quite sure how it relates to this PR, maybe another time we could break it out
  • Regarding the average output checkbox, I think we should instead have a warning in HyperSpectra when there are ambiguous map data (overlapping points) and point users to the "SNR" widget
  • Alpha value: Useful but seems unlikely that it would be the same for every vibrational mode you choose in "Features"
  • Visible image stacking is excellent, let's steal this for the Multi-file widget?

Some bugs:

  1. Vector plot crashes
Traceback (most recent call last):
  File "C:\Users\reads\Miniconda3\envs\orange-spectroscopy\lib\site-packages\orangewidget\gui.py", line 2258, in __call__
    self.func(**kwds)
  File "\\canopus\staff\reads\src\orange-spectroscopy\orangecontrib\spectroscopy\widgets\owhyper.py", line 1266, in update_vector_scale
    self.imageplot.update_vectors()
  File "\\canopus\staff\reads\src\orange-spectroscopy\orangecontrib\spectroscopy\widgets\owhyper.py", line 900, in update_vectors
    dispx = amp*wx/2*np.cos(np.radians(th))
TypeError: loop of ufunc does not support argument 0 of type float which has no callable radians method
  1. map Variables always chooses "map_x"/"map_x"
  2. Warning about "Select Feature" for selecting angles is too vague, it's not clear what needs to be done
  3. Error/warning set/clear in general are inconsistent: not always properly set/cleared

Sorry I ran out of time to better track these down but I wanted to capture my notes before the end of the year. Can help in the new year if needed.

In general, I think this is better in than out so let's get it merged! We can always fix stuff later as @markotoplak says :D

🎄 Stuart

@markotoplak
Copy link
Collaborator

Closed if favor of #765

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants