Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catalogs plugin: support loading catalog from ecsv #1707

Merged
merged 8 commits into from
Oct 14, 2022

Conversation

kecnry
Copy link
Member

@kecnry kecnry commented Oct 7, 2022

Description

This pull request adds the capability to search and display markers from a catalog in an ecsv selected from the local machine.

Updated plugin docs

Screen.Recording.2022-10-07.at.2.40.29.PM.mov

TODO:

Out-of-scope (deferred for later):

  • spinner while searching: pressing search/clear can take a few seconds depending on the size of the catalog so should really have some UI feedback. Since we're intending a redesign in the near future to support loading multiple catalogs and changing marker colors similar to line lists, I propose we defer this to be a part of that effort.
  • styling of file dialog (this is the same dialog used for importing data... so we should consider restyling all instances at once)

Change log entry

  • Is a change log needed? If yes, is it added to CHANGES.rst? If you want to avoid merge conflicts,
    list the proposed change log here for review and add to CHANGES.rst before merge. If no, maintainer
    should add a no-changelog-entry-needed label.

Checklist for package maintainer(s)

This checklist is meant to remind the package maintainer(s) who will review this pull request of some common things to look for. This list is not exhaustive.

  • Are two approvals required? Branch protection rule does not check for the second approval. If a second approval is not necessary, please apply the trivial label.
  • Do the proposed changes actually accomplish desired goals? Also manually run the affected example notebooks, if necessary.
  • Do the proposed changes follow the STScI Style Guides?
  • Are tests added/updated as required? If so, do they follow the STScI Style Guides?
  • Are docs added/updated as required? If so, do they follow the STScI Style Guides?
  • Did the CI pass? If not, are the failures related?
  • Is a milestone set?
  • After merge, any internal documentations need updating (e.g., JIRA, Innerspace)?

@kecnry kecnry added imviz plugin Label for plugins common to multiple configurations labels Oct 7, 2022
@kecnry kecnry added this to the 2.11 milestone Oct 7, 2022
@kecnry kecnry force-pushed the catalogs-ecsv branch 5 times, most recently from 72bf9ee to c3bf2d8 Compare October 7, 2022 17:29
@codecov
Copy link

codecov bot commented Oct 7, 2022

Codecov Report

Base: 87.27% // Head: 87.27% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (a1e9f60) compared to base (22559cf).
Patch coverage: 90.00% of modified lines in pull request are covered.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1707   +/-   ##
=======================================
  Coverage   87.27%   87.27%           
=======================================
  Files          95       95           
  Lines       10082    10125   +43     
=======================================
+ Hits         8799     8837   +38     
- Misses       1283     1288    +5     
Impacted Files Coverage Δ
...z/configs/default/plugins/data_tools/data_tools.py 64.51% <ø> (+3.91%) ⬆️
jdaviz/configs/imviz/plugins/catalogs/catalogs.py 91.22% <90.00%> (-4.43%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@kecnry kecnry force-pushed the catalogs-ecsv branch 3 times, most recently from 51abaec to d30072f Compare October 7, 2022 19:26
@kecnry kecnry marked this pull request as ready for review October 7, 2022 19:40
@kecnry kecnry requested a review from camipacifici October 7, 2022 19:40
@rosteen rosteen modified the milestones: 2.11, 3.1 Oct 10, 2022
Copy link
Contributor

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoding of requiring an arbitrary column named sky_centroid that must contain astropy.coordinates.SkyCoord object can be a deal breaker for some users. Traditionally, sky information in a catalog is encoded in at least 2 columns: RA (float) and Dec (float). Sometimes the frame is encoded in the column names, sometimes in yet another column. Column names may vary greatly depending on which service provider generated it.

So, if I were a user, and I got a catalog from some telescope that does not have this sky_centroid column, how am I supposed to use this feature? Is it reasonable to expect users to rewrite the catalogs themselves?

docs/imviz/plugins.rst Outdated Show resolved Hide resolved
self.catalog = SelectPluginComponent(self,
items='catalog_items',
selected='catalog_selected',
manual_options=['SDSS'])
manual_options=['SDSS', 'From File...'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the ... really necessary?

Suggested change
manual_options=['SDSS', 'From File...'])
manual_options=['SDSS', 'From File'])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included the ... because I believe this is a generally accepted UX pattern to expect a popup dialog ("Save" vs "Save As..."). Granted this then gets a little confusing when setting from the API which does not invoke the popup, but still does require that you set another traitlet. We could have a separate label (shown in the dropdown) and value (actual traitlet string), but I'm not sure that is worth the overhead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did #1707 (comment) not happen? I was under the impression that you were going to change "From File" to "Load".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The button in the dialog was changed from "Select" to "Load". @Jenneh agreed that the ... is a common pattern that we should use. We can discuss whether to separate the API string from the UI label when we expose the API for the plugin.

self.from_file_message = 'Could not parse file with astropy.table.QTable.read'
return

if 'sky_centroid' not in table.keys():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the column name can be changed by the user but maybe that can be future work.

# since we loaded the file already to check if its valid, we might as well cache the table
# so we don't have to re-load it when clicking search. We'll only keep the latest entry
# though, but store in a dict so we can catch if the file path was changed from the API
self._cached_table_from_file = {path: table}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way for user to invalidate cache? What if user modified the table on disk after loading it into the plugin?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had considered that... but from a UX perspective I'm not really sure what would be expected in this scenario (the button says "search", not load, so to me it makes sense that the file is already loaded and parsed once selected and so future changes to the file would require re-loading it). @Jenneh - any thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option to consider would be to change the text in the file dialog from "Select" to "Load" to make this more obvious to the user?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Load" makes more sense for me but maybe PO has the final say in this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some context: I had originally gone against "Load" to avoid any relation to the load data dialog (which has a "Load" button).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with @Jenneh - I'll change the text on the button to "Load" and we can always revisit the behavior if it causes any confusion (but we don't watch for file changes anywhere else in the app - once you load a fits file, for example, changes on disk are not picked up).

@kecnry
Copy link
Member Author

kecnry commented Oct 11, 2022

The hardcoding of requiring an arbitrary column named sky_centroid that must contain astropy.coordinates.SkyCoord object can be a deal breaker for some users.

The scope of this work was specifically to support the ecsv files from JWST. If you have other catalog files to test against, we can either make the logic more flexible here or down the road (this PR does document the requirements and expose messages in the UI).

If we really want this to be flexible, we could have all columns parsed after selecting the file and two dropdowns appear to select the columns to correspond to RA and DEC, respectively. But I don't want to overengineer this unless we have a request to do so.

@pllim
Copy link
Contributor

pllim commented Oct 11, 2022

Oh, I didn't know you were asked to only support one kind of input file (JWST ECSV). In that case, yes, the text should be made clear. Even better, link to STScI documentation of this JWST ECSV so we don't have to explain the format to users ourselves.

@@ -99,3 +101,47 @@ def test_plugin_image_with_result(self, imviz_helper):
catalogs_plugin.vue_do_clear()

assert not catalogs_plugin.results_available

# test loading from file
table = imviz_helper.app._catalog_source_table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice if we can turn this test into a test without remote data. Either create this table on-the-fly or generate a small ECSV test data file in a new test outside of this remote data test class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ecsv file is created on the fly, but I used the existing catalogs test to do so (to make sure that it gives the same results as the SDSS query used to create the ecsv file). It looks like this particular test doesn't actually use remote data, but the entire test class is marked as requiring remote data though... maybe we should just mark the individual methods or split the class into those that need remote data and those that don't?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make sure the test is not set up that it downloads remote data regardless of what you are doing. But yes, if a test does not really need remote data, should not mark it as such.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, if you are building the table from a query that needs internet, it still counts as remote data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then this should remain marked as needing remote data since it uses and compares against the SDSS query. Since all other tests in the catalogs plugin will require remote calls, I think that makes sense. The alternative I guess would be to use the local imviz fixture and create an entirely fake ecsv on-the-fly - is that worth it just to remove one part of the catalogs testing from requiring remote data that already has to be fetched for the rest of the tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A separate offline test would be nice, IMHO, to test that:

  • the plugin does not access internet when it does not have to, and
  • it can read something other than reformatted SDSS query (get a current real JWST ESCV file, trim it down to 2 rows, modify the SkyCoord to match image if you have to)

@@ -223,6 +223,10 @@ catalog dropdown menu.
This plugin is still under active development. As a result, the search only uses the SDSS DR17 catalog
and works best when you only have a single image loaded in a viewer.

To load a catalog from an ecsv file, choose "From File..." and choose a valid file. The file must be able
to be parsed by `astropy.table.Table.read` and contain a column labeled 'sky_centroid'. Clicking
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not convinced that "sky_centroid" is a universal convention for column name containing SkyCoord objects. Especially your test seems to be parsing a cached SDSS query, so that column must have been added internally by the plugin itself. Just where is this JWST ECSV file format defined?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the example data file I grabbed from MAST (jw02731-o001_t017_nircam_f444w-f470n_cat.ecsv) to write this did... but if you find any that don't, send them my way and I'll try to make the logic more flexible to handle them as well.

The only thing I found in docs online about the format of the files does not describe individual columns: https://jwst-docs.stsci.edu/understanding-jwst-data-files/jwst-data-formats#JWSTDataFormats-ecsvECSVformat

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks! We should link to that doc, if not already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it a permanent link that won't change? If so - I can add it to the plugin docs. But otherwise, I think the feedback in the UI should be sufficient for now and we can generalize as needed to other file formats down the road.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is jwst doc, so we could use intersphinx. RTD would fail if they move the :ref: and I can go bug Nadia or Howard. No link is permanent. But a link is better than no link.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I added an external link for now (don't see any refs in their source rst file for that section).

@camipacifici
Copy link
Contributor

I confirm this is only to support the JWST catalog file that gets spit out by the pipeline along with the i2d file. A broader generalization of this plugin is out of scope now.

table = imviz_helper.app._catalog_source_table
skycoord_table = SkyCoord(table['ra'],
table['dec'],
unit='deg')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if anyone (other than me) would benefit from an extra comment here, but I'll suggest it anyway. We could say that the table you're manipulating here is the default result from an SDSS catalog query, via astroquery. I was confused because you specify the unit='deg', but in the title of the PR you say "from ecsv", which supports units. So I spent a while in the rabbit hole of checking where table stems from. In the end, it looks good. The ecsv source does not have units (for now).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, for context: this test was just an attempt to mimic the same format as the JWST ecsv files (at least as far as the one column that is actually parsed) without requiring a new remote data download in the tests. So I used the SDSS query and write to an ecsv file using the same column structure and use that for the tests to ensure the same results are found.

Copy link
Contributor

@camipacifici camipacifici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works well for me. The only thing I noticed is that the markers disappear if I load a new image.

@pllim
Copy link
Contributor

pllim commented Oct 12, 2022

the markers disappear if I load a new image

That is something I have seen before, hence out of scope here to fix. Probably same underlying problem as #773 .

@camipacifici
Copy link
Contributor

Thinking about this a little more, maybe the option in the menu could be "Load JWST ecsv file" instead of a generic "From file...". We will eventually get to the more generic loader where the user can specify the columns for RA/dec.

@kecnry
Copy link
Member Author

kecnry commented Oct 12, 2022

maybe the option in the menu could be "Load JWST ecsv file" instead of a generic "From file...". We will eventually get to the more generic loader where the user can specify the columns for RA/dec.

My hesitation against that is then we'll have to break API when we do want to generalize the logic (although the API for catalogs isn't quite public yet... but if we make it public before generalizing, it could become a problem). So I guess that decision might depend on when you think that effort might be scheduled/prioritized.

@camipacifici
Copy link
Contributor

Understood. Then I am ok with "From file..." if the needed format is well explained in the docs. I would advocate for writing an explicit sentence that says that the tool will need a single skycoord column. The link to the pipeline readthedocs is an add-on to this.

@kecnry kecnry marked this pull request as draft October 12, 2022 15:36
@kecnry
Copy link
Member Author

kecnry commented Oct 12, 2022

Plugin docs have been updated to clarify the current requirements and point to the JWST docs.

@kecnry kecnry marked this pull request as ready for review October 12, 2022 19:04
Copy link
Contributor

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a reminder for myself to add that test in the original post.

docs/conf.py Outdated Show resolved Hide resolved
CHANGES.rst Outdated Show resolved Hide resolved
docs/imviz/plugins.rst Outdated Show resolved Hide resolved
self.catalog = SelectPluginComponent(self,
items='catalog_items',
selected='catalog_selected',
manual_options=['SDSS'])
manual_options=['SDSS', 'From File...'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did #1707 (comment) not happen? I was under the impression that you were going to change "From File" to "Load".

jdaviz/configs/imviz/plugins/catalogs/catalogs.py Outdated Show resolved Hide resolved
jdaviz/configs/imviz/tests/test_catalogs.py Outdated Show resolved Hide resolved
Copy link
Contributor

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My remaining question is the "From File ..." vs "Load" but you are settled on the former, then fine by me. Just want to make sure it was not overlooked.

Maybe easier to open a follow up issue to add an offline test as I suggested and assign it to me. I will write it when I can. Thanks!

@kecnry kecnry merged commit 65c57af into spacetelescope:main Oct 14, 2022
@kecnry kecnry deleted the catalogs-ecsv branch October 14, 2022 15:32
@pllim pllim mentioned this pull request Oct 20, 2022
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imviz plugin Label for plugins common to multiple configurations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants