Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add catalog queries to lksearch #23

Merged
merged 32 commits into from
Dec 6, 2024
Merged

Add catalog queries to lksearch #23

merged 32 commits into from
Dec 6, 2024

Conversation

christinahedges
Copy link
Collaborator

@christinahedges christinahedges commented Jul 29, 2024

lksearch is our package for searching and finding data. These queries all require the internet to work. It seems that the logical place for catalog queries to live is inside lksearch.

Catalog queries are useful for users, and can be agnostic of any particular file. @rebekah9969 has already opened a PR against lightkurve with this functionality, I have fixed the remaining issues with the PR and changed it to output a pandas.DataFrame object, to be more inline with the rest of lksearch.

Here's an example of the functional call with output

Screenshot 2024-07-29 at 2 41 24 PM

There are convenience functions to query each individual catalog by name and it will let you query by pixel distance.

from lksearch import query_EPIC
catalog = query_EPIC(SkyCoord.from_name('Kepler-10'),
              Time.now(),
              radius=2*u.pixel,
              magnitude_limit=18)

@christinahedges christinahedges added the enhancement New feature or request label Jul 29, 2024
src/lksearch/__init__.py Show resolved Hide resolved
src/lksearch/catalog.py Outdated Show resolved Hide resolved
src/lksearch/catalog.py Outdated Show resolved Hide resolved
src/lksearch/catalog.py Outdated Show resolved Hide resolved
src/lksearch/catalog.py Outdated Show resolved Hide resolved
src/lksearch/catalog.py Outdated Show resolved Hide resolved
src/lksearch/catalog.py Outdated Show resolved Hide resolved
@christinahedges
Copy link
Collaborator Author

christinahedges commented Jul 30, 2024

@Nschanche thanks for a great review, I've updated this PR based on your comments, and a few more tweaks.

  • I updated epoch to have a sane default
  • I changed the outputs to remove the "motion" columns. This was on purpose because the returned table has already been updated for motion, so I didn't want users to be confused over which RA/Dec to use
  • I added the ability to get the skycoord returned instead of a table, which will be helpful for some workflows.
  • I added the name resolve so if users put in "Au mic" it will resolve inside the query.
  • I added a tutorial. To do this I had to make a new landing page so all the tutorials are accessible.

@tylerapritchard could you give this a review?

@Nschanche and @tylerapritchard are you happy with this API? An alternate would be:

from lksearch.catalogs import TICSearch
c = SkyCoord.from_name('Kepler-10')
sr = TICSearch(c)
sr.table
sr.skycoords

But I think this might be confusing because there's no e.g. download function, and no need to filter anything. I think this API is simpler for people, but what do you think?

@tylerapritchard
Copy link
Collaborator

tylerapritchard commented Jul 31, 2024

This looks great - I'll review the code - but from a high level what if we renamed catalog to CatalogSearch to keep the naming theme but not change the API? I think this is still fairly consistent. Overall I think I like the simplicity of the current setup and building out the same API might be over-engineering this.

However, if we did want to change the API for consistency across the package I could imagine a .download() function saving a csv file of the table (which could be a simple pass-through to dataframe.to_csv()) & a .filter() for somewhat who wanted to cut the table down based on say proper motion or magnitude if they realized that they grabbed too many sources in their initial query.

@christinahedges
Copy link
Collaborator Author

Based on comments at TSC3 it seems like people want to be able to put in a TIC/KIC/EPIC ID and get likely matches. We might do something like not giving people a cross-match exactly, but returning the likely hits from the catalog with a separation and relative flux so they could make their own cut.

We might need a facility to do this that can take in multiple TICIDs (see TessProposalTool...)

I think folks will want to get out extra keywords which we could add e.g. the stellar parameters.

@christinahedges
Copy link
Collaborator Author

christinahedges commented Aug 5, 2024

I updated this PR to include stellar parameters as outputs and updated the tutorials.

@tylerapritchard has pointed out that we could have an API like the one I described above e.g.

from lksearch.catalogs import TICSearch
c = SkyCoord.from_name('Kepler-10')
sr = TICSearch(c)
sr.table
sr.skycoords

But I think this is overkill given what we want unless we add in some crossmatching functionality from the TESS proposal tool stuff. Let's keep these as simple functions query_CATALOGNAME until the action items below are taken care of, then we can revisit this question.

I think the piece missing to make this work is a more flexible interface like in lksearch.MASTSearch so people can input e.g. a string of coordinates or a tuple of coordinates, e.g.

query_TIC(f"{ra}, {dec}")
query_TIC((ra, dec))

and a way to resolve TIC, KIC, and EPIC names so we can do:

query_TIC("TIC 3777809370")
query_gaia("TIC 3777809370")
query_EPIC("TIC 3777809370")
query_KIC("TIC 3777809370")

To do this we are going to need a way to query a given catalog by the ID and then return a SkyCoord object e.g.

def TIC_to_SkyCoord(ids:List[str]) -> SkyCoord:
    ...
def KIC_to_SkyCoord(ids:List[str]) -> SkyCoord:
    ...
def EPIC_to_SkyCoord(ids:List[str]) -> SkyCoord:
    ...
def gaiadr3_to_SkyCoord(ids:List[str]) -> SkyCoord:
    ...

This should be a fast query that isn't a radius query to each catalog, because the unique ID name should give us a unique row without a radius query. We need to dig in to figure out how to do this in astroquery.

People will inevitably want a crossmatch, not a radius query. For now, let's keep this as a radius query. That means we want to be able to do the first two of these, but not the last.

  • Radius query: single Coordinate and radius -> many possible catalog rows
  • ID query: one or many catalog IDs -> exact match catalog rows
  • Crossmatch: one or many Coordinates -> same number of "best match" catalog rows

Once the below are addressed we can talk about a) whether we add crossmatching b) whether we change the API.

TODO:

  • Update interface so that query_X can accept a tuple of RA/Dec, a string of RA/Dec
  • Add ID queries for each catalog (TIC_to_SkyCoord, EPIC_to_SkyCoord, etc). Each function should have a check that the input string has the right format e.g. "TIC XXXX", "gaiadr3 XXXXX" etc. To enable this to be future proof, this should take in a list of TICs and return a SkyCoord with multiple entries. Users should not (usually) be using these functions directly, instead they will use query_TIC, query_EPIC etc.
  • Add interface for query_X that, when passed an ID from that catalog, will do an ID match instead of a radius query.
  • Add tests for ID queries e.g. query_TIC("TIC XXXXX") should return one result and it should be an exact match. query_KIC("TIC XXXXX") should not return a single result as there is not a one-to-one match between the two catalogs.
  • Update the tutorial to add an example of querying by ID.

Copy link
Collaborator Author

@christinahedges christinahedges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tylerapritchard This is a really awesome PR! There's lots of great functionality here. Other than the bug fixes we identified the main comments here are

  1. You can remove the CatalogResult class and then have everything be in pandas dataframes. This will be important to keep the outputs consistent for users

  2. The names need to follow Python style of having functions be lower case. I suggest we follow verb_noun(noun) style for function names and make the API something like this:

from lksearch import MASTSearch, TESSSearch, KeplerSearch, K2Search
from lksearch.catalogs import query_region, query_ID, find_alternate_names, match_names_to_catalogs
query_region(region params)
query_id(ID)
find_alternate_names(name)
match_names_to_catalogs(name, match_strings=["tic", 'kic'])

input_catalog: str = None,
max_results: int = None,
return_skycoord: bool = False,
epoch: Union[str, Time] = None,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update epoch across this PR to output_epoch so that it's clear this isn't changing the input?


def QueryID(
search_object: Union[str, int, list[str, int]],
catalog: str = None,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have an input catalog, should be called output catalog?


# use simbad to get name/ID crossmatches
def IDLookup(search_input: Union[str, list[str]], match: Union[str, list[str]] = None):
"""Uses the Simbad name resolver and ids to disambiguate the search_input string or list.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This functionality is excellent, can we break it into two functions so that we don't have two different types of output

NameLookup -> Series/Table of all matching names
IDLookup -> Table of all matching IDs

@@ -0,0 +1,616 @@
"""Catalog class to search various catalogs for missions"""
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module should be lowercase not upper case to meet Python standards, uppercase like this would indicate a class. CatalogResult can stay!

for cat in np.atleast_1d(match):
mcat = cat.strip().replace(" ", "").lower()
cmatch = None
for sid in result[i]["id"]:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to break with astroquery versions that are less recent as the column name is "ID", you could do a check earlier to get the name of the first column and set it to "id".

return c


class CatalogResult(Table):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove this, we can remove the functionality to turn a table into a sky coordinate, and then everything can be pandas dataframes. I think this is the best trade, (we still have the functionality with return_skycoord=True) and we get to use pandas throughout the lksearch package outputs.

@tylerapritchard tylerapritchard merged commit 868fc11 into main Dec 6, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants