Skip to content
saifisana edited this page Jun 27, 2020 · 40 revisions

getpapers

primary URL

https://github.com/ContentMine/getpapers

purpose

queries a repository with RESTful API and downloads content in bulk

documentation.

overview and installation https://github.com/ContentMine/getpapers/blob/master/README.md For example https://github.com/petermr/tigr2ess/blob/master/getpapers/OVERVIEW.md

installation

Installation of getpapersinvolves steps for every operation system.

Instructions followed: https://github.com/ContentMine/getpapers/blob/master/README.md

See also: https://github.com/petermr/tigr2ess/blob/master/installation/INSTALLATION.md

example of use

Simple: https://github.com/ContentMine/getpapers/blob/master/README.md

For a full example: https://github.com/petermr/tigr2ess/tree/master/getpapers

comments

  • getpapers uses a headless browser (Phantom.js) which still works but is no longer maintained. It is customised for EPMC, IEEE, Crossref and ?arXiv. It needs a RESTful API.
  • the query syntax is different on different sites. Also escape characters (" or ')
  • default query format is EPMC

usefulness

help in downloading large files with full text content in bulk at a short time duration.

Installation problems

Users can face various problems during the installation process of getpapers. They may encounter errors in their process. Follow the instructions and in case of any installation problem, post an issue about the same in the issue section, or refer to an existing issue if it matches the problem.

Usage problems

For users facing any usage problems in getpapers they can create an issue regarding the same or may refer the existing ones.

Windows

More examples

Some EPMC queries

(please add some queries involving DATE, OR, AND, NOT)

tester experiences:

tester 1

Kareena Singh

operating system

Windows 10

INSTALLATION PROCESS

(1) Installation of nvm-windows

source of instructions

https://github.com/petermr/tigr2ess/blob/master/installation/windows/INSTALLATION.md

steps of installation

Go to the downloads page and download latest version of nvm-setup.zip.

Unzip the downloaded file and run the included installer.

installation

successfully installed and run

(2) Installation of node

source of instructions

https://github.com/petermr/tigr2ess/blob/master/installation/windows/INSTALLATION.md

steps of installation

Open your command prompt, and run the following commands one after the other.

nvm install 7 nvm use 7.10.1

installation of node

successful

installation problems

The following installation problem occured when I put node installation command in command line

Error: Access to the registry path is denied
  • reason insufficient privileges to install (requires "root" permission in windows)

  • solution

test of installation

successful

node --version
version 11.11.0

(3) Installation of getpapers

source of instructions

https://github.com/petermr/tigr2ess/blob/master/installation/windows/INSTALLATION.md

installation steps

Run the following command at command prompt:

npm install --global getpapers Now run the command getpapers at the command prompt, and you should see something as below:

installation problems

none reported

test of installation

You can run the test of installation by putting the command getpapers --version If you get the following, then installation is succesful. 0.4.17

Tester 2

Lakshmi Devi Priya

Operating System

Windows 10

Installation of node

source of instruction

Instructions from: https://github.com/ContentMine/blob/master/README.md

installation of node

Successful

test of installation

Successful

C:\Users>node -v v12.16.3

Installation of getpapers

source of instruction

Instructions from: https://github.com/ContentMine/blob/master/README.md

installtion of getpapers

Successful

installation problems

C:\Users>$ npm install --global getpapers

`$` is not recognized as an internal or external command,
operable program or batch file.
  • '$' is not a part of the command(it's UNIX prompt).

  • So just try as

npm install --global getpapers
  • getpapers is installed.

test of installation

Use

C:\Users>getpapers --help
  • The command option used for getpapers are viewed.

  • Installed getpapers.

Use of getpapers

Followed example from:

https://github.com/petermr/tigr2ess/tree/master/getpapers

To search query on a specified task

use the following syntax
getpapers -q <query> -n -k 100

-q, --query : search query(required)

-n, --noexecute : only reports how many queries match the query, but don't actually download anything

For eg: for the query of COVID-19

Use as

getpapers -q COVID-19 -n -k <int>

The results will be shown as below:

https://drive.google.com/file/d/1DP0_xcjC5GMQ2CflM7TQUoyIO3J3MyCW/view?usp=drivesdk

Output - Founds 46887 open acesss results. This much result cannot be downloaded, so the number of downloads should be limited.

-k, --limit : limits the number of hits and downloads

<int> refers to an integer. Hence, the number of files to be downloaded should be represented.

To download the files, use the following syntax
getpapers -q <query> -k <int> -o <path> -x -p

-o, --outdir : output directory(required - will be created if not found).

This command gives the path to the directory created in the system for the downloaded files.

-p, --pdf : downloads fulltext PDFs if available.

-x, --xml : downloads fulltext XMLs if available.

Thus, for the query COVID-19 the syntax

getpapers -q COVID-19 -k 100 -o covid -x -p

gives the result as follows.

https://drive.google.com/file/d/1H5k8ZooTFD1dHnMOK6-eTxckc95iJ6lG/view?usp=drivesdk

.xml files in the resultant folder are both machine-readable and human-readable.

Expected 100 .xml files were downloaded. But only 76 .pdf files were downloaded.

Tester 3

Pruthiv rajan

Operating System

Windows 10

Installation of node

source of instruction

Instructions from: https://github.com/ContentMine/blob/master/README.md

installation of node

Successful

test of installation

Successful

C:\Users>node -v v12.16.3

Installation of getpapers

source of instruction

Instructions from: https://github.com/ContentMine/blob/master/README.md

installtion of getpapers

Successful

installation problems

No problems.

test of installation

Use

C:\Users>getpapers --help
  • The command option used for getpapers are viewed.

  • Installed getpapers.

Use of getpapers

Followed example from:

https://github.com/petermr/tigr2ess/tree/master/getpapers

To search query on a specified task

use the following syntax
getpapers -q <query> -n -k 100

-q, --query : search query(required)

-n, --noexecute : only reports how many queries match the query, but don't actually download anything

For eg: for the query of COVID-19

Use as

getpapers -q COVID-19 -n -k <int>

The results will be shown as below:

https://drive.google.com/file/d/1DP0_xcjC5GMQ2CflM7TQUoyIO3J3MyCW/view?usp=drivesdk

Output - Founds 46887 open acesss results. This much result cannot be downloaded, so the number of downloads should be limited.

-k, --limit : limits the number of hits and downloads

<int> refers to an integer. Hence, the number of files to be downloaded should be represented.

To download the files, use the following syntax
getpapers -q <query> -k <int> -o <path> -x -p

-o, --outdir : output directory(required - will be created if not found).

This command gives the path to the directory created in the system for the downloaded files.

-p, --pdf : downloads fulltext PDFs if available.

-x, --xml : downloads fulltext XMLs if available.

Thus, for the query Human genome project the syntax

getpapers -q “human genome project  ” -k 100 -o covid -x -hgp

gives the result as follows.

Expected 100 .xml files were downloaded. But only 84 .pdf files were downloaded.


Tester 4:

Name: Ambreen Hamadani

Operating System: Windows 10

INSTALLATION PROCESS

Installation of Node.Js

Preinstalled on the System

Installation of getpapers

Source of Instruction: ContentMine / getpapers

Steps in the Installation:

  1. Open Comand Prompt
  2. Run the command https://github.com/ContentMine/getpapers

Installation: Successful

Test of the Installation:

  1. Type getpapers in Command Prompt
  2. Usage and options displayed

Successful installation

Usage of getpapers: Test 1

The tool was used to retrieve 100 papers on the topic, 'masks' with the output directory specified as 'test1' Command used: getpapers --query 'masks ' --limit 100 --outdir test1

Results

  1. A new directory (test1) created within the home directory
  2. 100 folders (PMC###) created within 'test1' each containing a JSON file (eupmc_result)
  3. 1 text file (eupmc_fulltext_html_urls) containing the URLs of all downloaded documents
  4. 1 JSON file (eupmc_results) created **Command line output **
  5. 0 error messages
  6. No warnings

Query results for getting papers on masks (limit 100)

Usage of getpapers: Test 2

The tool was used to retrieve 200 papers on the topic, 'viral epidemics' with the output directory specified as 'test3' Command used: getpapers --query 'viral epidemics' --limit 200 --outdir test3

Results

  1. A new directory (test3) created within the home directory
  2. 200 folders (PMC###) created within 'test3' each containing a JSON file (eupmc_result)
  3. 1 text file (eupmc_fulltext_html_urls) containing the URLs of all downloaded documents
  4. 1 JSON file (eupmc_results) created **Command line output **
  5. 0 error messages
  6. 2 warnings (warn: This version of getpapers wasn't built with this version of the EuPMC api in mind; warn: getpapers EuPMCVersion: 5.3.2 vs. 6.3 reported by api)

Query results for getting papers on viral epidemics (limit 200)

Tester 5:

Name: Vaishali Arora

Operating System: Windows 10

INSTALLATION STEPS

  1. Installation of Node.Js Reference :https://github.com/petermr/tigr2ess/blob/master/installation/INSTALLATION.md

  2. Installation of getpapers= Reference :https://github.com/petermr/tigr2ess/blob/master/installation/INSTALLATION.md

Test of installation:

Successful I.Type getpapers in Command Prompt

Usage of getpapers

1.** Downloaded 100 papers on the topic, 'COVID-19' (PDF Files)**

Commands Used:

getpapers -q "COVID-19" -p -k 100 -o covid_19 Successfully downloaded 100 papers with 1(.json file) and 1(.txt file)

Command Line Output:

  1. 0 error messages
  2. 2 Warnings

RESULTS: https://drive.google.com/file/d/1rKgNGojNacMPLeViSPykpXgGsJg0zFUk/view?usp=sharing

**Downloaded 100 (.xml) files on 'COVID deaths' with the directory cdeaths **

Commands Used:

getpapers -q "COVID deaths" -o cdeaths -x -k 100 with 1 (.json file) and 1 (.txt file)

Command Line Output

  1. 0 error messages
  2. 2 Warnings

Reference:https://github.com/petermr/tigr2ess/blob/master/getpapers/TUTORIAL.md

Tester 6:

Vanisha Arora

Operating system:

Windows 10

Installation of node:

Source of instructions: https://github.com/petermr/tigr2ess/blob/master/installation/INSTALLATION.md

Installation of getpapers:

Instructions from:

https://github.com/ContentMine/blob/master/README.md

Installation of getpapers:

Successful

Test of installation:

Put the command getpapers --version in the command prompt.

Getting 0.4.17 confirms installation.

To search query on a specified task :

getpapers -q "query" -n -k 50 (If 50 articles are to be downloaded)

For eg: for the query of viral epidemics Use as

getpapers -q "viral epidemics" -n -k 50

-p, --pdf : (For downloading pdfs) -x, --xml : (For downloading .xml)

Thus, for the query viral epidemics the syntax

getpapers -q viral epidemics -k 50 -o viral epidemics -x -p

Downloaded 50 (pdf and xml files )with viral epidemics under the directory viral epidemics

Tester 7

NAME

SANA SAIFI

OPERATING SYSTEM

WINDOWS 10

INSTALLATION PROCESS

1.Installation of nvm-windows

SOURCE:https://github.com/petermr/tigr2ess/blob/master/installation/INSTALLATION.md

A. Scroll and go on section Software Installation. And click on the appropriate link, depending on your Operating system.

B. Go to download page (https://github.com/coreybutler/nvm-windows/releases) & download latest version of nvm-setup.zip.

c. Run the file and install it in your windows.

2. Installation of getpapers

SOURCE: https://github.com/petermr/tigr2ess/blob/master/installation/windows/INSTALLATION.md

3.Test of Installation

A. Run the command getpapers in the command prompt. B. Various usage options are displayed with their meanings.

4.Installation of getpapers

Successful

4 warnings 2 errors

GETPAPERS

  1. Why we are using?

To download n numbers of research paper from an open source.

  1. How to use?

TEST

To download 100 pdfs/ .xml files on viral epidemics,

open the command prompt and

type the syntax getpapers -q viral epidemics -k 100 -o viral epidemics -x -p

Downloaded 77 files out of 100 from open source under the directory of Viral Epidemics.

Clone this wiki locally