Rich Context API integrations for federating discovery services and metadata exchange across multiple scholarly infrastructure providers.
Development of the Rich Context knowledge graph uses this library to:
- identify dataset links to research publications
- locate open access publications
- reconcile journal references
- reconcile author profiles
- reconcile keyword taxonomy
This library has been guided by collaborative work on community building and metadata exchange to improve Scholarly Infrastructure, held at the 2019 Rich Context Workshop.
Prerequisites:
- Python 3.x
- Beautiful Soup
- Biopython.Entrez
- Crossref Commons
- Dimensions CLI
- Requests
- Requests-Cache
- Selenium
- xmltodict
To install from PyPi:
pip install richcontext.scholapi
If you install directly from this Git repo, be sure to install the dependencies as well:
pip install -r requirements.txt
Then copy the configuration file template rc_template.cfg
to rc.cfg
and populate it with your credentials.
NB: be careful not to commit the rc.cfg
file in Git since by
definition it will contain sensitive data, e.g., your passwords.
Parameters used in the configuration file include:
parameter | value |
---|---|
chrome_exe_path |
path/to/chrome.exe |
core_apikey |
CORE API key |
dimensions_password |
Dimensions API password |
elsevier_api_key |
Elsvier API key |
email |
personal email address |
orcid_secret |
ORCID API key |
repec_token |
RePEc API token |
Download the ChromeDriver
webdriver for the Chrome
brower to enable use of Selenium.
This will be run in a "headless" mode.
For a good (though slightly dated) tutorial for installing and testing Selenium on Ubuntu Linux, see: https://christopher.su/2015/selenium-chromedriver-ubuntu/
from richcontext import scholapi as rc_scholapi
# initialize the federated API access
schol = rc_scholapi.ScholInfraAPI(config_file="rc.cfg", logger=None)
source = schol.openaire
# search parameters for example publications
title = "Deal or no deal? The prevalence and nutritional quality of price promotions among U.S. food and beverage purchases."
# run it...
if source.has_credentials():
response = source.title_search(title)
# report results
if response.message:
# error case
print(response.message)
else:
print(response.meta)
source.report_perf(response.timing)
First, be sure that you're testing the source and not from an installed library.
Then run unit tests on the APIs for which you have credentials and generate a coverage report:
coverage run -m unittest discover
Then create GitHub issues among the submodules for any failed tests.
Also, you can generate a coverage report and upload that via:
coverage report
bash <(curl -s https://codecov.io/bash) -t @.cc_token
Test coverage reports can be viewed at https://codecov.io/gh/Coleridge-Initiative/RCApi
APIs used to retrieve metadata:
-
PubMed family
-
Scholix family
-
OA family
-
Misc.
See the coding examples in the test.py
unit test for usage patterns
per supported API.
ChromeDriver
If you encounter an exception about the ChromeDriver
version, for
example:
selenium.common.exceptions.SessionNotCreatedException: Message: session not created:
This version of ChromeDriver only supports Chrome version 78
Then check your instance of the Chrome browser to find its release
number, then go to https://chromedriver.chromium.org/downloads to
download the corresponding required version of ChromeDriver
.
For more background about open access publications see:
Piwowar H, Priem J, Larivière V, Alperin JP, Matthias L, Norlander B, Farley A, West J, Haustein S. 2017.
The State of OA: A large-scale analysis of the prevalence and impact of Open Access articles
PeerJ Preprints 5:e3119v1
https://doi.org/10.7287/peerj.preprints.3119v1
If you'd like to contribute, please see our listings of good first issues.
For info about joining the AI team working on Rich Context, see https://github.com/Coleridge-Initiative/RCGraph/blob/master/SKILLS.md
Contributors: @ceteri, @IanMulvany, @srand525, @ernestogimeno, @lobodemonte, plus many thanks for the inspiring 2019 Rich Context Workshop notes by @metasj, and guidance from @claytonrsh, @Juliaingridlane.