-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add catalog queries to lksearch
#23
Conversation
@Nschanche thanks for a great review, I've updated this PR based on your comments, and a few more tweaks.
@tylerapritchard could you give this a review? @Nschanche and @tylerapritchard are you happy with this API? An alternate would be: from lksearch.catalogs import TICSearch
c = SkyCoord.from_name('Kepler-10')
sr = TICSearch(c)
sr.table
sr.skycoords But I think this might be confusing because there's no e.g. |
This looks great - I'll review the code - but from a high level what if we renamed catalog to CatalogSearch to keep the naming theme but not change the API? I think this is still fairly consistent. Overall I think I like the simplicity of the current setup and building out the same API might be over-engineering this. However, if we did want to change the API for consistency across the package I could imagine a .download() function saving a csv file of the table (which could be a simple pass-through to dataframe.to_csv()) & a .filter() for somewhat who wanted to cut the table down based on say proper motion or magnitude if they realized that they grabbed too many sources in their initial query. |
Based on comments at TSC3 it seems like people want to be able to put in a TIC/KIC/EPIC ID and get likely matches. We might do something like not giving people a cross-match exactly, but returning the likely hits from the catalog with a separation and relative flux so they could make their own cut. We might need a facility to do this that can take in multiple TICIDs (see TessProposalTool...) I think folks will want to get out extra keywords which we could add e.g. the stellar parameters. |
I updated this PR to include stellar parameters as outputs and updated the tutorials. @tylerapritchard has pointed out that we could have an API like the one I described above e.g. from lksearch.catalogs import TICSearch
c = SkyCoord.from_name('Kepler-10')
sr = TICSearch(c)
sr.table
sr.skycoords But I think this is overkill given what we want unless we add in some crossmatching functionality from the TESS proposal tool stuff. Let's keep these as simple functions I think the piece missing to make this work is a more flexible interface like in query_TIC(f"{ra}, {dec}")
query_TIC((ra, dec)) and a way to resolve TIC, KIC, and EPIC names so we can do: query_TIC("TIC 3777809370")
query_gaia("TIC 3777809370")
query_EPIC("TIC 3777809370")
query_KIC("TIC 3777809370") To do this we are going to need a way to query a given catalog by the ID and then return a SkyCoord object e.g. def TIC_to_SkyCoord(ids:List[str]) -> SkyCoord:
...
def KIC_to_SkyCoord(ids:List[str]) -> SkyCoord:
...
def EPIC_to_SkyCoord(ids:List[str]) -> SkyCoord:
...
def gaiadr3_to_SkyCoord(ids:List[str]) -> SkyCoord:
... This should be a fast query that isn't a radius query to each catalog, because the unique ID name should give us a unique row without a radius query. We need to dig in to figure out how to do this in astroquery. People will inevitably want a crossmatch, not a radius query. For now, let's keep this as a radius query. That means we want to be able to do the first two of these, but not the last.
Once the below are addressed we can talk about a) whether we add crossmatching b) whether we change the API. TODO:
|
…lity? Separate PR probably
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tylerapritchard This is a really awesome PR! There's lots of great functionality here. Other than the bug fixes we identified the main comments here are
-
You can remove the CatalogResult class and then have everything be in
pandas
dataframes. This will be important to keep the outputs consistent for users -
The names need to follow Python style of having functions be lower case. I suggest we follow
verb_noun(noun)
style for function names and make the API something like this:
from lksearch import MASTSearch, TESSSearch, KeplerSearch, K2Search
from lksearch.catalogs import query_region, query_ID, find_alternate_names, match_names_to_catalogs
query_region(region params)
query_id(ID)
find_alternate_names(name)
match_names_to_catalogs(name, match_strings=["tic", 'kic'])
src/lksearch/CatalogSearch.py
Outdated
input_catalog: str = None, | ||
max_results: int = None, | ||
return_skycoord: bool = False, | ||
epoch: Union[str, Time] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update epoch
across this PR to output_epoch
so that it's clear this isn't changing the input?
src/lksearch/CatalogSearch.py
Outdated
|
||
def QueryID( | ||
search_object: Union[str, int, list[str, int]], | ||
catalog: str = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have an input catalog, should be called output catalog?
src/lksearch/CatalogSearch.py
Outdated
|
||
# use simbad to get name/ID crossmatches | ||
def IDLookup(search_input: Union[str, list[str]], match: Union[str, list[str]] = None): | ||
"""Uses the Simbad name resolver and ids to disambiguate the search_input string or list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This functionality is excellent, can we break it into two functions so that we don't have two different types of output
NameLookup -> Series/Table of all matching names
IDLookup -> Table of all matching IDs
src/lksearch/CatalogSearch.py
Outdated
@@ -0,0 +1,616 @@ | |||
"""Catalog class to search various catalogs for missions""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module should be lowercase not upper case to meet Python standards, uppercase like this would indicate a class. CatalogResult can stay!
src/lksearch/CatalogSearch.py
Outdated
for cat in np.atleast_1d(match): | ||
mcat = cat.strip().replace(" ", "").lower() | ||
cmatch = None | ||
for sid in result[i]["id"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to break with astroquery versions that are less recent as the column name is "ID"
, you could do a check earlier to get the name of the first column and set it to "id"
.
src/lksearch/CatalogSearch.py
Outdated
return c | ||
|
||
|
||
class CatalogResult(Table): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we remove this, we can remove the functionality to turn a table into a sky coordinate, and then everything can be pandas
dataframes. I think this is the best trade, (we still have the functionality with return_skycoord=True) and we get to use pandas throughout the lksearch
package outputs.
…ed CatalogResult class, returned outputs to DataFrame
lksearch
is our package for searching and finding data. These queries all require the internet to work. It seems that the logical place for catalog queries to live is insidelksearch
.Catalog queries are useful for users, and can be agnostic of any particular file. @rebekah9969 has already opened a PR against
lightkurve
with this functionality, I have fixed the remaining issues with the PR and changed it to output apandas.DataFrame
object, to be more inline with the rest oflksearch
.Here's an example of the functional call with output
There are convenience functions to query each individual catalog by name and it will let you query by pixel distance.