-
Notifications
You must be signed in to change notification settings - Fork 17
getpapers TUTORIAL for EUPMC search
getpapers
is a simple, powerful tool for querying repositories of scholarly articles using a simple one-line command.
Full instructions for installation and use are given at getpapers
OVERVIEW
Please download and install.
Our first search query would be 'Viral Epidemics'
when we type 'getpapers' on terminal we get following informations
Usage: getpapers [options]
Options:
-h, --help output usage information
-V, --version output the version number
-q, --query <query> search query (required)
-o, --outdir <path> output directory (required - will be created if not found)
--api <name> API to search [eupmc, crossref, ieee, arxiv] (default: eupmc)
-x, --xml download fulltext XMLs if available
-p, --pdf download fulltext PDFs if available
-s, --supp download supplementary files if available
-t, --minedterms download text-mined terms if available
-l, --loglevel <level> amount of information to log (silent, verbose, info*, data, warn, error, or debug)
-a, --all search all papers, not just open access
-n, --noexecute report how many results match the query, but don't actually download anything
-f, --logfile <filename> save log to specified file in output directory as well as printing to terminal
-k, --limit <int> limit the number of hits and downloads
--filter <filter object> filter by key value pair, passed straight to the crossref api only
-r, --restart restart file downloads after failure
Use command getpapers -q viral epidemics -n for the results. -n is no-execute mode so it only shows Open access results
Let's first download 100 papers in xml format in the AMI directory by issuing the query getpapers -q viral epidemics -x -k 100 --outdir test
The -o creates a directory viral_epidemics
of the articles
The -x downloads XML copies of the articles (these can be turned into HTML later using ami).
The -p downloads PDF copies. These have the same text but in a different format
It will look something like this:
C:\Users\Kareena\Desktop\openVirus\cmder
λ getpapers --query "viral epidemics" -k -10 --outdir test
info: Searching using eupmc API
info: Found 16918 open access results
warn: This version of getpapers wasn't built with this version of the EuPMC api in mind
warn: getpapers EuPMCVersion: 5.3.2 vs. 6.2 reported by api
info: Limiting to -10 hits
Retrieving results [------------------------------] 0% (eta 0.0s)
info: Done collecting results
info: limiting hits
info: Saving result metadata
info: Full EUPMC result metadata written to eupmc_results.json
info: Individual EUPMC result metadata records written
info: Extracting fulltext HTML URL list (may not be available for all articles)
info: Fulltext HTML URL list written to eupmc_fulltext_html_urls.txt
Here we only get xml_full_text files having HTML url links of eupmc.
Let's download the pdf format of the above 100 papers by giving the command getpapers -q 'viral epidemics' -p -k 100 --outdir test_pdf
Above results will all be in pdf formats and stored in viral_epidemics folder Now we have files in Two formats in pdf as well as in xml formats.
let us first see the open access results found on the query n95 face masks
type
getpapers -q "n95 face masks" -n
always use query in between double-quotes or else it takes as two words.
The result will be like this:
info : Searching using eupmc API
info : Running in no-execute mode, so nothing will be downloaded.
info : Found 869 open access results.
warn : This versin of getpapers wasn't built wit this version of the EuPMC api in mind.
warn : getpapers EuPMCVersion: 5.3.2 vs. 6.2 reported by api.
In order to gain all papers regarding n95 face masks
use
-a, --all : search all papers, not only open access
to download the papers
getpapers -q "n95 face masks" -a -k 100 -o n95
the result will be ike this:
info : Searching using eupmc API
info : Found 1074 results
warn : This version of getpapers wasn't built with version of the EuPMC api in mind
warn : getpapers EuPMCVersion: 5.3.2 vs. 6.2 reported by api
info : limiting to 100 hits
Retrieving results [=====================] 100% (eta 0.05)
info : Done collecting results
info : limiting hits
info : Saving result metadata
info : Full EuPMC result metadata return to eupmc_results.json
info : Individual EUPMC result metadata records written
info : Extracting fulltext HTML URL list (may not be available for all articles)
info : Fulltext HTML URL list written to eupmc_fultext_html_urls.txt
This gives only the .xml
fies to the directory n95, as mentioned.
To download the pdf
open access files use
getpapers -q "n95 face masks" -p -k 100 -o n95
the result will be gained as like this:
https://drive.google.com/file/d/1JMNgzJTajFqNg1XWWgpX3HLIh4ruZkrm/view?usp=sharing
such as for example let's say on viral epidemics
getpapers -q ("viral epidemics PUB_YEAR:[2018 TO 2019]") -k 100 -p -x -o viral_epidemics
PUB_YEAR : will give the open access results that were published on the years represented.
the results will be like below:
https://drive.google.com/file/d/1Nc3UtBkIUTmG6VB2N4UaALpHQnC7ya3F/view?usp=sharing
Beta Tester: Ambreen Hamadani
Specific journals can be searched using the getpapers --query
Eg: --query "viral epidemics Journal:medrxiv" --limit 100 --outdir jun_14_new_3
The output will produce papers specific to medRxiv
Commandline output:
C:\Users\xxx>getpapers --query "viral epidemics Journal:medrxiv" --limit 100 --outdir jun_14_new_3
info: Searching using eupmc API
info: Found 10 open access results
warn: This version of getpapers wasn't built with this version of the EuPMC api in mind
warn: getpapers EuPMCVersion: 5.3.2 vs. 6.3 reported by api
Retrieving results [==============================] 100% (eta 0.0s)
info: Done collecting results
info: Saving result metadata
info: Full EUPMC result metadata written to eupmc_results.json
info: Individual EUPMC result metadata records written
info: Extracting fulltext HTML URL list (may not be available for all articles)
info: Fulltext HTML URL list written to eupmc_fulltext_html_urls.txt