cetem-publico
is a Python wrapper for the CETEMPublico corpus. It
takes care of downloading, storing and importing the corpus into NLTK.
THIS IS STILL A WORK IN PROGRESS, API MIGHT BREAK WITHOUT WARNING.
Install and update using pip:
pip install [--user] cetem-publico
import CETEMPublico
cp = CETEMPublico.load() # loads a small 10KB sample
# or
cp = CETEMPublico.load(full=True) # loads the full 12GB
print(cp.tagged_sents())
This module only exists thanks to the Publico newspaper and the team responsible for the CETEMPublico corpus.
Open a GitHub issue or, preferably, send me a pull request.
MIT