Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets and data sources that need API #1574

Open
5 tasks
henrykironde opened this issue Apr 4, 2021 · 6 comments
Open
5 tasks

Datasets and data sources that need API #1574

henrykironde opened this issue Apr 4, 2021 · 6 comments

Comments

@henrykironde
Copy link
Contributor

henrykironde commented Apr 4, 2021

@Aakash3101
Copy link
Contributor

@henrykironde How should I proceed with this? In the issue mentioned, the dataset is uploaded as an R package on CRAN.

@henrykironde
Copy link
Contributor Author

This is an issue related to the GSOC project https://github.com/weecology/retriever/wiki/GSoC-2021-Project-Ideas#data-retriever-support-for-loginapi.
In this case the data can be dowloaded using the R tool.

The first step would be getting to know how the R tool downloads the data.
In the documentation the demonstrate how you can access the date.
If you can run the tool by creating an R script that downloads the data. The next step would be to call retriever to install this data.
Since there are two languages, Retriever being Python, we can use rpy2 to call the R script that will basically download the data. the retriever will then take it from here.

There are many data sources that provide APIs or need login. Retriever can scale up to be able to do that. A good example is the Kaggle Api that we added.

Having said that, this issue would best be solved as part of the summer work. But feel free to check it out and let me know

@Aakash3101
Copy link
Contributor

I would like to take the GSOC project Data Retriever: Support for Login/API. Can you please send me more information related to the API support in retriever. This would help me create a better proposal. Also, can you share an example of how the Kaggle API is called/used in retriever install? Should I communicate about my proposal to you privately on gitter?

@henrykironde
Copy link
Contributor Author

Yes you can communicate privately on gitter.

How Kaggle api works https://technowhisp.com/kaggle-api-python-documentation/
Kaggle is a python package that enables users to download kaggle dataset using programing tools. Basically like a small Data Retriever. So can use the Data Retriever to call the package and download data. Then Retriever will clean it up and standardize it.

def download_from_kaggle(

@henrykironde
Copy link
Contributor Author

henrykironde commented May 11, 2021

CDC data uses API
https://data.cdc.gov/Disability-Health/DHDS-Prevalence-of-Disability-Status-and-Types-by-/qjg3-6acf

In the API they have away to list the datasets https://github.com/xmunoz/sodapy. That means we can add all their datasets with much ease.

@Aakash3101
Copy link
Contributor

It uses the same API as opendata.utah.gov uses, which is the Socrata (Sodapy) API. So it would not add any new API, as I have already discussed the Socrata API for the project

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants