-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Catalogs plugin: support loading catalog from ecsv #1707
Conversation
72bf9ee
to
c3bf2d8
Compare
Codecov ReportBase: 87.27% // Head: 87.27% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #1707 +/- ##
=======================================
Coverage 87.27% 87.27%
=======================================
Files 95 95
Lines 10082 10125 +43
=======================================
+ Hits 8799 8837 +38
- Misses 1283 1288 +5
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
51abaec
to
d30072f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoding of requiring an arbitrary column named sky_centroid
that must contain astropy.coordinates.SkyCoord
object can be a deal breaker for some users. Traditionally, sky information in a catalog is encoded in at least 2 columns: RA (float) and Dec (float). Sometimes the frame is encoded in the column names, sometimes in yet another column. Column names may vary greatly depending on which service provider generated it.
So, if I were a user, and I got a catalog from some telescope that does not have this sky_centroid
column, how am I supposed to use this feature? Is it reasonable to expect users to rewrite the catalogs themselves?
self.catalog = SelectPluginComponent(self, | ||
items='catalog_items', | ||
selected='catalog_selected', | ||
manual_options=['SDSS']) | ||
manual_options=['SDSS', 'From File...']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the ...
really necessary?
manual_options=['SDSS', 'From File...']) | |
manual_options=['SDSS', 'From File']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I included the ...
because I believe this is a generally accepted UX pattern to expect a popup dialog ("Save" vs "Save As..."). Granted this then gets a little confusing when setting from the API which does not invoke the popup, but still does require that you set another traitlet. We could have a separate label (shown in the dropdown) and value (actual traitlet string), but I'm not sure that is worth the overhead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did #1707 (comment) not happen? I was under the impression that you were going to change "From File" to "Load".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The button in the dialog was changed from "Select" to "Load". @Jenneh agreed that the ...
is a common pattern that we should use. We can discuss whether to separate the API string from the UI label when we expose the API for the plugin.
self.from_file_message = 'Could not parse file with astropy.table.QTable.read' | ||
return | ||
|
||
if 'sky_centroid' not in table.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the column name can be changed by the user but maybe that can be future work.
# since we loaded the file already to check if its valid, we might as well cache the table | ||
# so we don't have to re-load it when clicking search. We'll only keep the latest entry | ||
# though, but store in a dict so we can catch if the file path was changed from the API | ||
self._cached_table_from_file = {path: table} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way for user to invalidate cache? What if user modified the table on disk after loading it into the plugin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had considered that... but from a UX perspective I'm not really sure what would be expected in this scenario (the button says "search", not load, so to me it makes sense that the file is already loaded and parsed once selected and so future changes to the file would require re-loading it). @Jenneh - any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One option to consider would be to change the text in the file dialog from "Select" to "Load" to make this more obvious to the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Load" makes more sense for me but maybe PO has the final say in this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some context: I had originally gone against "Load" to avoid any relation to the load data dialog (which has a "Load" button).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussing with @Jenneh - I'll change the text on the button to "Load" and we can always revisit the behavior if it causes any confusion (but we don't watch for file changes anywhere else in the app - once you load a fits file, for example, changes on disk are not picked up).
The scope of this work was specifically to support the ecsv files from JWST. If you have other catalog files to test against, we can either make the logic more flexible here or down the road (this PR does document the requirements and expose messages in the UI). If we really want this to be flexible, we could have all columns parsed after selecting the file and two dropdowns appear to select the columns to correspond to RA and DEC, respectively. But I don't want to overengineer this unless we have a request to do so. |
Oh, I didn't know you were asked to only support one kind of input file (JWST ECSV). In that case, yes, the text should be made clear. Even better, link to STScI documentation of this JWST ECSV so we don't have to explain the format to users ourselves. |
@@ -99,3 +101,47 @@ def test_plugin_image_with_result(self, imviz_helper): | |||
catalogs_plugin.vue_do_clear() | |||
|
|||
assert not catalogs_plugin.results_available | |||
|
|||
# test loading from file | |||
table = imviz_helper.app._catalog_source_table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice if we can turn this test into a test without remote data. Either create this table on-the-fly or generate a small ECSV test data file in a new test outside of this remote data test class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ecsv file is created on the fly, but I used the existing catalogs test to do so (to make sure that it gives the same results as the SDSS query used to create the ecsv file). It looks like this particular test doesn't actually use remote data, but the entire test class is marked as requiring remote data though... maybe we should just mark the individual methods or split the class into those that need remote data and those that don't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just make sure the test is not set up that it downloads remote data regardless of what you are doing. But yes, if a test does not really need remote data, should not mark it as such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, if you are building the table from a query that needs internet, it still counts as remote data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, then this should remain marked as needing remote data since it uses and compares against the SDSS query. Since all other tests in the catalogs plugin will require remote calls, I think that makes sense. The alternative I guess would be to use the local imviz fixture and create an entirely fake ecsv on-the-fly - is that worth it just to remove one part of the catalogs testing from requiring remote data that already has to be fetched for the rest of the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A separate offline test would be nice, IMHO, to test that:
- the plugin does not access internet when it does not have to, and
- it can read something other than reformatted SDSS query (get a current real JWST ESCV file, trim it down to 2 rows, modify the SkyCoord to match image if you have to)
docs/imviz/plugins.rst
Outdated
@@ -223,6 +223,10 @@ catalog dropdown menu. | |||
This plugin is still under active development. As a result, the search only uses the SDSS DR17 catalog | |||
and works best when you only have a single image loaded in a viewer. | |||
|
|||
To load a catalog from an ecsv file, choose "From File..." and choose a valid file. The file must be able | |||
to be parsed by `astropy.table.Table.read` and contain a column labeled 'sky_centroid'. Clicking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still not convinced that "sky_centroid" is a universal convention for column name containing SkyCoord objects. Especially your test seems to be parsing a cached SDSS query, so that column must have been added internally by the plugin itself. Just where is this JWST ECSV file format defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the example data file I grabbed from MAST (jw02731-o001_t017_nircam_f444w-f470n_cat.ecsv
) to write this did... but if you find any that don't, send them my way and I'll try to make the logic more flexible to handle them as well.
The only thing I found in docs online about the format of the files does not describe individual columns: https://jwst-docs.stsci.edu/understanding-jwst-data-files/jwst-data-formats#JWSTDataFormats-ecsvECSVformat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is where the format is described: https://jwst-pipeline.readthedocs.io/en/latest/jwst/source_catalog/main.html#output-products
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks! We should link to that doc, if not already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it a permanent link that won't change? If so - I can add it to the plugin docs. But otherwise, I think the feedback in the UI should be sufficient for now and we can generalize as needed to other file formats down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is jwst
doc, so we could use intersphinx. RTD would fail if they move the :ref:
and I can go bug Nadia or Howard. No link is permanent. But a link is better than no link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I added an external link for now (don't see any refs in their source rst file for that section).
I confirm this is only to support the JWST catalog file that gets spit out by the pipeline along with the i2d file. A broader generalization of this plugin is out of scope now. |
table = imviz_helper.app._catalog_source_table | ||
skycoord_table = SkyCoord(table['ra'], | ||
table['dec'], | ||
unit='deg') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if anyone (other than me) would benefit from an extra comment here, but I'll suggest it anyway. We could say that the table
you're manipulating here is the default result from an SDSS catalog query, via astroquery. I was confused because you specify the unit='deg'
, but in the title of the PR you say "from ecsv", which supports units. So I spent a while in the rabbit hole of checking where table
stems from. In the end, it looks good. The ecsv source does not have units (for now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, for context: this test was just an attempt to mimic the same format as the JWST ecsv files (at least as far as the one column that is actually parsed) without requiring a new remote data download in the tests. So I used the SDSS query and write to an ecsv file using the same column structure and use that for the tests to ensure the same results are found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works well for me. The only thing I noticed is that the markers disappear if I load a new image.
That is something I have seen before, hence out of scope here to fix. Probably same underlying problem as #773 . |
Thinking about this a little more, maybe the option in the menu could be "Load JWST ecsv file" instead of a generic "From file...". We will eventually get to the more generic loader where the user can specify the columns for RA/dec. |
My hesitation against that is then we'll have to break API when we do want to generalize the logic (although the API for catalogs isn't quite public yet... but if we make it public before generalizing, it could become a problem). So I guess that decision might depend on when you think that effort might be scheduled/prioritized. |
Understood. Then I am ok with "From file..." if the needed format is well explained in the docs. I would advocate for writing an explicit sentence that says that the tool will need a single skycoord column. The link to the pipeline readthedocs is an add-on to this. |
Plugin docs have been updated to clarify the current requirements and point to the JWST docs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a reminder for myself to add that test in the original post.
self.catalog = SelectPluginComponent(self, | ||
items='catalog_items', | ||
selected='catalog_selected', | ||
manual_options=['SDSS']) | ||
manual_options=['SDSS', 'From File...']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did #1707 (comment) not happen? I was under the impression that you were going to change "From File" to "Load".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My remaining question is the "From File ..." vs "Load" but you are settled on the former, then fine by me. Just want to make sure it was not overlooked.
Maybe easier to open a follow up issue to add an offline test as I suggested and assign it to me. I will write it when I can. Thanks!
* currently API access only to set file path
* to make it more clear that the file is loaded at that point rather than when clicking "Search"
Co-authored-by: P. L. Lim <[email protected]>
Description
This pull request adds the capability to search and display markers from a catalog in an ecsv selected from the local machine.
Updated plugin docs
Screen.Recording.2022-10-07.at.2.40.29.PM.mov
TODO:
Out-of-scope (deferred for later):
Change log entry
CHANGES.rst
? If you want to avoid merge conflicts,list the proposed change log here for review and add to
CHANGES.rst
before merge. If no, maintainershould add a
no-changelog-entry-needed
label.Checklist for package maintainer(s)
This checklist is meant to remind the package maintainer(s) who will review this pull request of some common things to look for. This list is not exhaustive.
trivial
label.