Name		Name	Last commit message	Last commit date
parent directory ..
Makefile		Makefile
README-nouns.csv.out		README-nouns.csv.out
README-verbs.csv.out		README-verbs.csv.out
README.md		README.md
nouns.py		nouns.py
requirements.txt		requirements.txt
verbs.py		verbs.py

README.md

Extracting noun/verb phrases from documents

Requirements:

input is either a stdin pipe or a filename
if the input file is not plain text, convert it to such
the input file can be PDF, Unicode, or ASCII
output is a histogram of noun and verb phrases, complete with adjective and adverbial modifiers, contained in the input

This script uses TextBlob for the heavy lifting.

Installation

This tool depends upon Python2 and a few C and Python libraries. See the first step below.

Note that one must be careful about macOS installations, which no longer include Python 2. We recommend using Homebrew's pyenv (and, indeed, these instructions assume a working Homebrew installation).

Install distribution-level dependencies

Ubuntu/Debian: $ sudo apt install build-essential libpoppler-cpp-dev libmagic-dev pkg-config python3-venv
macOS: $ brew install poppler libmagic

brew install pyenv pyenv-virtualenv (v2.4.10 is latest as of this writing)
pyenv install 3.12.5 (v3.12.5 is the latest release of Python3)
pyenv install 2.7.18 (v2.7.18 is the final release of Python2)
pyenv global system 3.12.5 2.7.18 (puts both versions into the global environment)
Run eval "$(pyenv init -)" and consider adding it to your shell startup.
Create a Python virtual environment

$ python3 -m venv env makes one named env
$ source env/bin/activate lets you work in that environment
$ deactivate gets you back to your normal environment

Install Python package dependencies, making sure you use Python2's pip:

$ pip2 install -r requirements.txt

Install Pattern locally

$ pip2 install Pattern==2.6

Download necessary NLTK data

$ python2 -c 'import nltk; nltk.download("brown"); nltk.download("punkt")'
$ python2 -m textblob.download_corpora

Testing the Installation

The provided Makefile has two rules that run the extraction commands on this README. If those commands run with no output beyond printing the selftest commands, the installation is working.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

phrases

phrases

README.md

Extracting noun/verb phrases from documents

Installation

Testing the Installation

Usage

Extracting Noun Phrases

Extracting Verb Phrases

Useful Links

Files

phrases

Directory actions

More options

Directory actions

More options

Latest commit

History

phrases

Folders and files

parent directory

README.md

Extracting noun/verb phrases from documents

Installation

Testing the Installation

Usage

Extracting Noun Phrases

Extracting Verb Phrases

Useful Links