This repository contains a python package for the automated scraping of the Lexis Nexis web service.
Why use this package/code? There are a small collection of alternative options available however these codesets are either outdated, inefficient, require access to the official API, or are not suitably customisable. These reasons led to the development of this package.
Currently, this package only works for academic users but the functionality can easily be expanded to regular and organisational users. Please let us know if you want these features incorporated! This can quickly and easily (hopefully) be done. The only reason it hasn’t been completed is because we have no way of testing it!
Whilst our best efforts have been made to ensure that the package is robust, as with any web scraping tasks connection drop outs, delays in obtaining pages may cause issues. Likewise, if the website structure changes we endeavour to do our best to update the package as soon as an issue is raised.
DISCLAIMER: This code was developed for academic purposes. Using this software package may be a violation of the terms of use set out by LexisNexis. By downloading and using this package you do so at your own legal risk.
- Lexis Nexis Subscription (API Access Not Required)
- Python 3
- Chrome Driver
- Used to simulate a real web browser.
- Available to download at: https://chromedriver.chromium.org/downloads
- Access to MySQL Database (Optional)
pip install lnscraper
The credentials.json
file is a template for your specific authentication details. It is a quick and convenient way to store and utilise credentials as it can be edited in any text editor.
- Open the
credentials.json
file and replace changemeUsername and changemePassword with your academic login details. - Save the file.
This bot automatically completes the authentication information for an academic sign-in at regular intervals. In order to do this you need to get your institution login link.
- Access the Nexis Service by visiting https://nexis.com
- Click on the academic sign in link.
- Find your institution and click “copy link”.
- Copy the link within the pop-up box and paste it into the
credentials.json
file replacing changemeAuthURL. - Save the file.
Whilst the pre-configured options will work in the majority of cases you may need to adjust the following options to ensure that the correct authentication fields are completed.
- Click the link.
- Right Click on the username field and click inspect. Repeat.
- If the id=‘username’ do nothing. Else save the id value for later.
- Now repeat step 6 for the password field.
- If the id=‘password’do nothing. Else save the id value for later.
The data can either be saved into either a mySQL database or a pandas dataframe. These need to be setup, as follows, before the webscraping can be initiated.
from lnscraper import mysql_version, authentication
from lnscraper import pandas_version, authentication
Please accredit this package by citing the following in your references.
@phdthesis{hammocks_2019, title={Identifying Weak Signals of Future Change: Detecting and Analysing Trends in Modus Operandi Through Topic Modelling}, author={Hammocks, Daniel}, year={2019}}