Lexis Nexis Scraper

Description

This repository contains a python package for the automated scraping of the Lexis Nexis web service.

Why use this package/code? There are a small collection of alternative options available however these codesets are either outdated, inefficient, require access to the official API, or are not suitably customisable. These reasons led to the development of this package.

Currently, this package only works for academic users but the functionality can easily be expanded to regular and organisational users. Please let us know if you want these features incorporated! This can quickly and easily (hopefully) be done. The only reason it hasn’t been completed is because we have no way of testing it!

Whilst our best efforts have been made to ensure that the package is robust, as with any web scraping tasks connection drop outs, delays in obtaining pages may cause issues. Likewise, if the website structure changes we endeavour to do our best to update the package as soon as an issue is raised.

DISCLAIMER: This code was developed for academic purposes. Using this software package may be a violation of the terms of use set out by LexisNexis. By downloading and using this package you do so at your own legal risk.

Requirements

Lexis Nexis Subscription (API Access Not Required)
Python 3
Chrome Driver
- Used to simulate a real web browser.
- Available to download at: https://chromedriver.chromium.org/downloads
Access to MySQL Database (Optional)

Installation

Using PIP

pip install lnscraper

Manual Installation

Usage

Credentials File

The credentials.json file is a template for your specific authentication details. It is a quick and convenient way to store and utilise credentials as it can be edited in any text editor.

Open the credentials.json file and replace changemeUsername and changemePassword with your academic login details.
Save the file.

Authentication Setup

This bot automatically completes the authentication information for an academic sign-in at regular intervals. In order to do this you need to get your institution login link.

Access the Nexis Service by visiting https://nexis.com
Click on the academic sign in link.
Find your institution and click “copy link”.
Copy the link within the pop-up box and paste it into the credentials.json file replacing changemeAuthURL.
Save the file.

Whilst the pre-configured options will work in the majority of cases you may need to adjust the following options to ensure that the correct authentication fields are completed.

Click the link.
Right Click on the username field and click inspect. Repeat.
If the id=‘username’ do nothing. Else save the id value for later.
Now repeat step 6 for the password field.
If the id=‘password’do nothing. Else save the id value for later.

Credentials File

Storage Option

The data can either be saved into either a mySQL database or a pandas dataframe. These need to be setup, as follows, before the webscraping can be initiated.

MySQL (Recommended)

from lnscraper import mysql_version, authentication

Pandas

from lnscraper import pandas_version, authentication

Search Terms and Sources

Running the WebScraper

Citing This Package

Please accredit this package by citing the following in your references.

@phdthesis{hammocks_2019, title={Identifying Weak Signals of Future Change: Detecting and Analysing Trends in Modus Operandi Through Topic Modelling}, author={Hammocks, Daniel}, year={2019}}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Lexis Nexis Scraper

Description

Requirements

Installation

Using PIP

Manual Installation

Usage

Credentials File

Authentication Setup

Credentials File

Storage Option

MySQL (Recommended)

Pandas

Search Terms and Sources

Running the WebScraper

Citing This Package

Files

README.md

Latest commit

History

README.md

File metadata and controls

Lexis Nexis Scraper

Description

Requirements

Installation

Using PIP

Manual Installation

Usage

Credentials File

Authentication Setup

Credentials File

Storage Option

MySQL (Recommended)

Pandas

Search Terms and Sources

Running the WebScraper

Citing This Package