-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] 4+ Angle Polarisation widget #615
Conversation
@callumgassner, thanks a lot. I did not check it yet. :) Could you perhaps describe how we could most easily test the functionality of this widget manually: which data files can we use, how should we set the widget and what should we see. The datasets you added to the PR could work, of course. Regarding the data for automated tests and demos; could you make them smaller? Now they take 4MB for this feature alone, which would double the distribution size. I believe they could be always made smaller by taking a subset of wavenumbers or not? If I understood the method, wavenumbers are processed independently so this should not influence results in any way. Also, please store them in some other format than pickle, preferably in Orange's tab format. The pickle file format is the least stable (in terms of how often it changes). |
The 4+ Angle Polarisation (polar) widget can be manually tested with the datasets in the PR (the 4 files named 4-angle-ftir_multiin1.tab, ... 4.tab). For testing the resulting data from this widget, within the features column select the features 1214.0 - 1258.0 and 1500.0 - 1580.0, set the X and Y axis to map_x and map_y, respectively. The alpha value should be 0 and the invert angles checkbox should be selected. Click apply and connect the "Polar Data" output of the polar widget to a hyperspectral widget. @markotoplak I have added a new dataset for this widget with your concerns (hopefully) addressed. |
Hm, for me the tests work locally, so I do not quite understand why they complain about a missing file here. |
I have finally been looking into this PR. Works well, thanks! This is how I used the widget. I like how flexible it is in terms of inputs. It would probably be better if the "Filename" column coming out of the Multifile widget was made Discrete in the future, right? Then that would save me one widget when used with Multifile. I think we could easily change this in the future. I can not think of anything else that would break because of the change. This widget outputs all original data + new fitted polarization stuff. Therefore, the hyperspectral image itself is outputted 4-times, each time slightly different for a different angle. In terms of how informative the output is, this is perfect. Do you think that an option to have intensities averaged in the new widget would help users? That would not influence resulting angles and intensities in any way but might give a friendlier output. Now, even I do not know what exactly HyperSpectra shows when given for 4 variants of an image (I know I could look it up, but normal users would not :) ). I could control HyperSpectra by using select rows with a specific angle criterium. |
Some other bugs/ideas:
Can you verify if the settings issues are also happening to you? Were the things I notes ad setting bugs perhaps intentional? I do understand that Orange settings are difficult to get correct (I, at least, always struggle) so I can try fixing them myself. Did you also notice any other issues regarding settings that I missed? I attach the workflow I played with. The data I used were your 4 sample files. untitled.ows.zip |
doc/widgets/polar.md
Outdated
------- | ||
MultiFile Input | ||
|
||
![](images/Polar-Example1.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check that your images use the exact filename: also lower and uppercase letters are important on some platforms (Linux). Two images committed have ".PNG" (uppercase) extension, so these references need to be fixed.
Also, before doing any new changes it would be best to get rid of merge commits in this branch. Updating your master branch to the latest version and then doing |
@callumgassner, I fixed the "Features" setting saving in my "Test" PR. The other one, single input initialization remains. I'll fix that next. All Orange widgets are meant to be written to be canvas-independent; as long as they get the inputs, that is fine. But this widget also ask the schema who is sending stuff onto its input. I understand why this is the case, it is so that you can label angles of inputs so that they do not get mixed-up. This is a no-go as it breaks assumptions of what are widgets allowed to do (and might also hide bugs - what is the order is different and the labels are thus invalid?). Instead, we could:
|
Codecov Report
@@ Coverage Diff @@
## master #615 +/- ##
==========================================
- Coverage 89.06% 88.46% -0.60%
==========================================
Files 71 72 +1
Lines 11767 12483 +716
==========================================
+ Hits 10480 11043 +563
- Misses 1287 1440 +153 |
@markotoplak, thank you for having a look at this pr. First off, your fix for the features setting looks to work well - I have made these changes in the latest commit. I worked today on getting the single input initialization to save to settings. Changing the "angles" variable to a ContextSetting and adding a call to _change_angles() in line 720 appears to almost fix this, however, if one changes "Filename" from a discrete variable to a text variable when a new context is opened it tries to use this text variable (which is not in the combobox's model) and so throws an error. I could not rectify this today but will give it a shot late next week. And to answer your questions in previous posts: I tried merging the polar branch in my fork to my master branch using git rebase master but I'm pretty sure I am missing something as it does not appear to have made any difference to this pr. |
I'll look into this. Context should (theoretically) handle the different types if used properly. I do not know, perhaps they have parameters?
I saw how much care went into testing this... Sorry for not telling you about this earlier. :) Having widget names on the input in some easy consistent way would be indeed useful in this case; much more so than that
Yea, let's change it.
Introducing the average option or not is your call. I agree it is imperative to have to option to see the data. The thing is that the HyperSpectra is now undefined. I think it shows the last datapoint for that pixel, so it might even be misleading.
No idea. It could even draw a different shape, perhaps a circle? Anyway, this is optional, I was just thinking about reusing the same interface for other types of data. Perhaps @borondics has an idea?
Hm, did you then push? Anyway, do not worry about it. |
Regarding your last commit: you misapplied my suggestion, so saving features does not work (yet): In 99% of cases, any setting that is connected to the variables needs to be a context setting. These then handle that settings are never loaded if needed features do not exist. To explain my other changes: settings that should never become a general default are best saved as Then, why are you using some settings with |
Also, SharedMemory should not be given names when creating them. As it is now, we can get conflicts when multiple widgets are running or when something did not exit cleanly. Now I triedThe construction of memory blocks should leave the name empty: this guarantees that it receives some unique ID. This IDs will then have to be passed to child processes. Now I'll have to restart my computer because some lingering process is still using the memory and I can not test the widget. :) |
I fixed the last mentioned (naming SharedMemory) issue in my test branch because it was seriously bothering my testing. Please cherry-pick my commit onto this branch (do not copy-paste code, please). If you'd like to, I could edit this branch directly, but then you'll have to be careful with synchronization. The issue partly still remains. If something crashes within the computation, the allocated memory is not always unlinked (as was seen when the error was bothering me). Therefore we now have a known memory leak. |
I also tried fitting by passing the jaccardian, which avoids some approximation within the It is very slightly faster. It does do fewer iterations though... See the commit in the test PR. I'd suggest we use it because it should be theoretically more stable. |
woops, sorry about that! I had been copying-pasting code as you figured out, must have missed a line. I'm pretty sure I have git setup properly now so I will use cherry-pick from now on.
Good point, I reviewed these settings and I am not sure why I made them this way. I have just made these as normal lists now. I have manually tested the widget and making these variables lists does not appear to break anything. As far as I can tell, I made these variables as settings with
Thanks for fixing this and sorry for the inconvenience. This was an initial concern of mine, however no matter how hard I tried to get 2 widgets to write to the same memory I could not - and so I had assumed the array in sharedmemory relied on the buffer rather than the name. Also when any process had an error or crashed the memory would be cleared when force closing quasar. This is absolutely an issue though and I will try come up with a solution, initial thoughts is to use a manager to monitor the sharedmemory/processes. I will need to be able to recreate the issue first - any suggestions on how to recreate this? or will it simply be an issue between unix and windows?
Happy to keep this in :) I am not very familiar with how this works but looks like it produces similar results and if it is more stable that is definitely a plus. |
Now it seem to work properly. Isn't it cool? :)
We only need something to be a Setting if saving/loading it makes sense.
Hm, yes, these things use system internals and there could easily be a difference between windows and linux. I think the only sure way to guarantee that some code runs is to put it into the finally block. I'll play with this.
Most optimization techniques compute approximate derivatives so that they know where to move to. But if we can compute them it makes there lives easier. |
In my test PR I changed shared memory initialization to something that should be a bit more robust. Could you perhaps now also update tests, please? |
@@ -838,11 +838,16 @@ def commit(self): | |||
else: | |||
sorted_data = self.sorted_data | |||
|
|||
self.setInvalidated(True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the way to do it.
Just calling self.cancel()
in the commit
function (that calls the start
) would be sufficient. Please try that instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Marko,
I originally tried;
'def cancel(self):
super().__cancel_task(wait=True)'
in hopes that it would wait for the tasks to complete. The issue I was having with that is if the inputs were changed when the code was running in the sharedmemorymanager block, the code would continue but with empty data and so the curve fit procedure would try to fit to an empty array. I defined the function beneath 'onDeleteWidget'.
I would have though simply calling 'self.cancel()' in 'commit' would always try and stop the process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you reimplemented cancel nothing would happen unless you call it. And likely this is not needed.
Adding cancel()
to commit should ensure that the new process is stopped before running a new one if that was the only issue.
Theoretically you could run cancel in the input handling functions, but as it seems you pass all needed data into start_compute
and do everything within (which is the correct way to do it), the actual objects are unchanged that should be not needed. If, on the other hand, we would be passing lists that are only edited when a new input arrives and not reinitialized, the particular list withing the computation (its object) would be the same, thus edited, thus problems. So we would actually have to avoid that.
So not adding cancel()
to the inputs should not affect crashes if the rest is written well.
For this kind of issues I'd advise first making a test case in test_owpolar
that showcases the error and then fixing the code so that it disappears. That would also allow me to easily replicate your issue and debug it. And then it is good to have situations that we know were problematic if we are at any later time editing the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave this another shot today. Unfortunately calling cancel()
doesn't resolve the issue :(
I did some testing and what I found was:
- If the input is disconnected before the computation processes have started the code appears to continue until the
with
block is finished, but a ValueError (operands could not be broadcast together with shapes (0,) (4,)) is eventually thrown with the traceback originating from line 240:params = curve_fit(azimuth, x, temp, jac=azimuth_jac)[0]
. So it looks like the data within the sharedmemory arrays are being erased... however, the allocated memory is not being cleared as seen by a steadily increasing memory usage with repeated tries. - If the input is disconnected after the computation processes have started, no error is given, but the allocated memory is still not released.
- In both of the above cases, no code is executed after the
with
block - The SharedMemoryManager is created within a new child process, and so to me this seems logical that code within the
with
block continues to execute.
From this, it would seem to me that this is occurring, but I am not entirely sure how...
If, on the other hand, we would be passing lists that are only edited when a new input arrives and not reinitialized, the particular list withing the computation (its object) would be the same, thus edited, thus problems. So we would actually have to avoid that.
I have added a "kind-of" test for this in test_owpolar. It does not result in a failure of the test, but it does show the ValueError mentioned above. As the error occurs in a child process it is not so easy to catch to result in the test failing.
I will continue to try find another work around for this, but if there are any suggestions they would be greatly appreciated!
@callumgassner, I have merged my "test" branch with your latest commits and I put them here. I think everything should be here. I rewrote history a bit so that the starting commits are better organized. I also made sure that the first version of test data is nowhere in the commits. We seem to work in compatible (non-overlapping) timezones so working on the on the same branch might be easier. :) The easiest way to checkout is deleting (or renaming) the branch on your computer and the checkout it out a-new. And no panic, there is a backup of the old branch in https://github.com/markotoplak/orange-spectroscopy-fork/tree/callum-polar-20220825. I refactored owhyper additions a bit. I tried to make the code simpler because that widget is already complex so I try to keep it as simple as possible. Could you double check my commits, please, so that we are sure I did not remove any important functionality or introduce new bugs? Regarding the bug with context application: I believe introducing something analogous |
I tried solving some new signal issues. The test does not warn anymore, so it is probably a step into the right direction. The issue I saw in code was that these lists would be edited in-place when new data was connected and the computation thread used a reference to them so it would get confused because they changed. Copying them solves some issues, but there might be some outstanding ones. In particular this does nothing for memory leaks. Or perhaps, if they were caused by crashes, it might... |
This is just a column selection and is fast, so there is no need to perform it in a separate thread. Also, if users now select different vector features, the image data is not recomputed.
Potentially fixes an issue which persisted where contexts would be loaded when variables did not exist in the domain
Prevents updating data in the widget while tasks are being run. sharedmemory blocks were previously being erased, leaving processes with no data
NB: Test does not fail, but ValueError does appear in output
Some lists are edited in place, therefore copy them before sending them into the computation thread.
I finally had a chance to try this out in the last few weeks with some data. Very nice! I have a few general comments:
Some bugs:
Sorry I ran out of time to better track these down but I wanted to capture my notes before the end of the year. Can help in the new year if needed. In general, I think this is better in than out so let's get it merged! We can always fix stuff later as @markotoplak says :D 🎄 Stuart |
Closed if favor of #765 |
Introduces a widget ("4+ Angle Polarisation") to calculate molecular orientation information from polarised vibrational spectroscopy methods collected at 4 or more polarisation angles.
Adds features to hyperspectra which compliments results from 4+ angle polarisation widget: