Uses Natural Language Processing and Wikipedia content to try to generate Mad Libs-style game questions.
Built for TrackMaven Monthly Challenge meetup in December 2014.
I also made a short presentation about the project. See this YouTube video for an idea of the kind of game these questions are meant to support.
In April 2017 basic Spanish support was added.
wikipedia-question-generator is open source under the MIT License.
Running the command:
$ wikitrivia --help
yields:
Usage: wikitrivia [OPTIONS] [TITLES]...
Generates trivia questions from wikipedia articles. If no titles are
supplied, pulls from these sample articles:
'Tony Bennett', 'Gauls', 'Scabbling', 'Henry V, Duke of Carinthia',
'Ukrainian Women's Volleyball Super League'
Options:
--lang TEXT Wikipedia language: en, es
--output FILENAME Output to JSON file
--help Show this message and exit.
And
$ wikitrivia 'Tony Bennett'
yields:
[
{
"question": "Bennett is also an accomplished __________, having created works\u2014under the name Anthony Benedetto\u2014that are on permanent public display in several institutions.",
"answer": "painter", "title": "Tony Bennett",
"similar_words": ["classic", "classicist", "constructivist", "decorator", "draftsman", "etcher", "expressionist", "illustrator"]
}
{
"question": "He is the __________ of the Frank Sinatra School of the Arts in ..."
}
]
wikipedia-question-generator is a Python 3 project that uses the fantastic click package to expose itself as a shell command.
You can use the project locally (and quickly) through Docker or a local installation of Python 3.4.
If you just want to run the tool, and don't want to modify it, just pull the latest image from Docker Hub:
$ [sudo] docker pull beevaenriqueotero/wikipedia-question-generator
Then, run the image with:
$ [sudo] docker run beevaenriqueotero/wikipedia-question-generator --help
Usage: wikitrivia [OPTIONS] [TITLES]...
Generates trivia questions from wikipedia articles. If no titles are
supplied, pulls from these sample articles:
'Tony Bennett', 'Python (programming language)', 'Scabbling', 'Ukrainian
Women's Volleyball Super League'
Options:
--output FILENAME Output to JSON file
--help Show this message and exit.
To make running the container less cumbersome, you can alias the docker run
command:
$ alias wikitrivia='[sudo] docker run beevaenriqueotero/wikipedia-question-generator'
$ wikitrivia --help
Usage: wikitrivia [OPTIONS] [TITLES]...
If you want to contribute to the tool, you can clone the repo and use Fig to get started quickly.
Clone the repo, and then use pyvenv-3.4 (or virtualenv) to create a new virtual environment. Then, install the requirements and the NLTK corpora:
$ pyvenv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
$ python -m textblob.download_corpora
$ python -c "import nltk; nltk.download(['cess_esp', 'omw'])"
Install the command line tool so you can use the tool easily:
$ pip install -e .
Now you can run the tool with the command wikitrivia
.
By default, the tool will scrape the hard-coded sample articles listed in the --help
and return its results to stdout.
You can point the tool to a specific Wikipedia page by specifying its title:
$ wikitrivia 'William Shatner'
Be sure to include multi-word titles in quotes, or the tool will treat each word as a separate title.
You can scrape multiple articles at once by providing multiple titles:
$ wikitrivia 'Leonard Nimoy' 'George Takei' 'Nichelle Nichols'
If you want to take this data elsewhere, you can output the results to a JSON file:
$ wikitrivia --output scotty.json 'James Doohan'
If you're using docker run
, by default this will save scotty.json
inside the container. Either mount the current directory with the -v
option or just use fig instead, which mounts the directory as a volume automatically.
More info about original methodology in https://github.com/atbaker/wikipedia-question-generator#methodology