Skip to content

getpapers TUTORIAL for EUPMC search

kareenasingh edited this page May 26, 2020 · 9 revisions

Retrieving papers from EUPMC using getpapers

getpapers is a simple, powerful tool for querying repositories of scholarly articles using a simple one-line command. Full instructions for installation and use are given at getpapers OVERVIEW Please download and install. Our first search query would be 'Viral Epidemics'

Step 1

when we type 'getpapers' on terminal we get following informations

Usage: getpapers [options]

Options:

-h, --help                output usage information
-V, --version             output the version number
-q, --query <query>       search query (required)
-o, --outdir <path>       output directory (required - will be created if not found)
--api <name>              API to search [eupmc, crossref, ieee, arxiv] (default: eupmc)
-x, --xml                 download fulltext XMLs if available
-p, --pdf                 download fulltext PDFs if available
-s, --supp                download supplementary files if available
-t, --minedterms          download text-mined terms if available
-l, --loglevel <level>    amount of information to log (silent, verbose, info*, data, warn, error, or debug)
-a, --all                 search all papers, not just open access
-n, --noexecute           report how many results match the query, but don't actually download anything
-f, --logfile <filename>  save log to specified file in output directory as well as printing to terminal
-k, --limit <int>         limit the number of hits and downloads
--filter <filter object>  filter by key value pair, passed straight to the crossref api only
-r, --restart             restart file downloads after failure

Use command getpapers -q viral epidemics -n for the results. -n is no-execute mode so it only shows Open access results

Let's first download 100 papers in xml format in the AMI directory by issuing the query getpapers -q viral epidemics -x -k 100 --outdir test The -o creates a directory viral_epidemics of the articles The -x downloads XML copies of the articles (these can be turned into HTML later using ami). The -p downloads PDF copies. These have the same text but in a different format

It will look something like this:

C:\Users\Kareena\Desktop\openVirus\cmder
λ getpapers --query "viral epidemics" -k -10 --outdir test
info: Searching using eupmc API
info: Found 16918 open access results
warn: This version of getpapers wasn't built with this version of the EuPMC api in mind
warn: getpapers EuPMCVersion: 5.3.2 vs. 6.2 reported by api
info: Limiting to -10 hits
Retrieving results [------------------------------] 0% (eta 0.0s)
info: Done collecting results
info: limiting hits
info: Saving result metadata
info: Full EUPMC result metadata written to eupmc_results.json
info: Individual EUPMC result metadata records written
info: Extracting fulltext HTML URL list (may not be available for all articles)
info: Fulltext HTML URL list written to eupmc_fulltext_html_urls.txt

Step 2

Here we only get xml_full_text files having HTML url links of eupmc.

Let's download the pdf format of the above 100 papers by giving the command getpapers -q 'viral epidemics' -p -k 100 --outdir test_pdf

Above results will all be in pdf formats and stored in viral_epidemics folder Now we have files in Two formats in pdf as well as in xml formats.

Clone this wiki locally