wikipedia-question-generator

This project is no longer maintained. It is MIT licensed, so you're welcome to take the code and use it yourself.

Uses Natural Language Processing and Wikipedia content to try to generate Mad Libs-style game questions. Powers the web app at http://wikitrivia.atbaker.me.

Built for TrackMaven Monthly Challenge meetup in December 2014.

I also made a short presentation about the project. See this YouTube video for an idea of the kind of game these questions are meant to support.

wikipedia-question-generator is open source under the MIT License.

Sample usage

Running the command:

$ wikitrivia 'Tony Bennett'

yields:

[
  {
    "question": "Bennett is also an accomplished __________, having created works\u2014under the name Anthony Benedetto\u2014that are on permanent public display in several institutions.",
    "answer": "painter", "title": "Tony Bennett",
    "similar_words": ["classic", "classicist", "constructivist", "decorator", "draftsman", "etcher", "expressionist", "illustrator"]
  }
  {
    "question": "He is the __________ of the Frank Sinatra School of the Arts in ..."
  }
]

Quickstart

wikipedia-question-generator is a Python 3 project that uses the fantastic click package to expose itself as a shell command.

You can use the project locally (and quickly) through Docker or a local installation of Python 3.4.

Installing with Docker

If you just want to run the tool, and don't want to modify it, just pull the latest image from Docker Hub:

$ docker pull atbaker/wikipedia-question-generator

Then, run the image with:

$ docker run atbaker/wikipedia-question-generator --help
Usage: wikitrivia [OPTIONS] [TITLES]...

  Generates trivia questions from wikipedia articles. If no titles are
  supplied, pulls from these sample articles:

  'Tony Bennett', 'Python (programming language)', 'Scabbling', 'Ukrainian
  Women's Volleyball Super League'

Options:
  --output FILENAME  Output to JSON file
  --help             Show this message and exit.

To make running the container less cumbersome, you can alias the docker run command:

$ alias wikitrivia='docker run atbaker/wikipedia-question-generator'
$ wikitrivia --help
Usage: wikitrivia [OPTIONS] [TITLES]...

If you want to contribute to the tool, you can clone the repo and use Fig to get started quickly.

Installing with Python 3.4

Clone the repo, and then use pyvenv-3.4 (or virtualenv) to create a new virtual environment. Then, install the requirements and the NLTK corpora:

$ pyvenv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
$ python -m textblob.download_corpora

Install the command line tool so you can use the tool easily:

$ pip install -e .

Now you can run the tool with the command wikitrivia.

Advanced usage

By default, the tool will scrape the hard-coded sample articles listed in the --help and return its results to stdout.

Scraping a specific article

You can point the tool to a specific Wikipedia page by specifying its title:

$ wikitrivia 'William Shatner'

Be sure to include multi-word titles in quotes, or the tool will treat each word as a separate title.

Scraping multiple articles

You can scrape multiple articles at once by providing multiple titles:

$ wikitrivia 'Leonard Nimoy' 'George Takei' 'Nichelle Nichols'

Outputting to JSON

If you want to take this data elsewhere, you can output the results to a JSON file:

$ wikitrivia --output scotty.json 'James Doohan'

If you're using docker run, by default this will save scotty.json inside the container. Either mount the current directory with the -v option or just use fig instead, which mounts the directory as a volume automatically.

Methodology

Though I tried a few different approaches when developing this tool, in the end I had the most success with a rather simple methodology.

Finding the right ___________'s

Only consider sentences in the summary section of an article. Sentences from the body often didn't make sense out of context.
Never use the first sentence of the summary. It's usually too straightforward to make interesting trivia.
Don't use a sentence that starts with an adverb. They usually depend too heavily on the idea of the previous sentence to make sense out of context.
Blank out the first common noun in the sentence (e.g. 'painter', 'infantryman'). Proper nouns (e.g. 'Frank Sinatra', 'The White House') usually seemed too easy to guess when given the title of the article and the other words in the sentence.
If that noun is part of a noun phrase, blank out the last two words of the phrase. Blanking out just one word seemed too easy if the phrase was recognizable.

Creating decoy answers

For sentences where just one word was blanked out, I also used WordNet to find similar words to the answer (the blanked out word). These words provide decoy answers during the trivia game.

My approach is to find the hypernym of the answer, and then select other hyponyms of that hypernym.

In the example in the "Sample usage" section, the correct answer is painter. The hypernym of painter is artist. The hyponyms I found for artist appear in the similar_words array in the output: "classic", "classicist", "constructivist", "decorator", "draftsman", "etcher", "expressionist", "illustrator".

Clearly there's still much room for improvement in all respsects of the methodology, but overall I was impressed with how far I could get with TextBlob, NLTK, and an introductory understanding of NLP.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
wikitrivia		wikitrivia
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
default.json		default.json
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wikipedia-question-generator

Sample usage

Quickstart

Installing with Docker

Installing with Python 3.4

Advanced usage

Scraping a specific article

Scraping multiple articles

Outputting to JSON

Methodology

Finding the right ___________'s

Creating decoy answers

About

Releases

Packages

Languages

License

atbaker/wikipedia-question-generator

Folders and files

Latest commit

History

Repository files navigation

wikipedia-question-generator

Sample usage

Quickstart

Installing with Docker

Installing with Python 3.4

Advanced usage

Scraping a specific article

Scraping multiple articles

Outputting to JSON

Methodology

Finding the right ___________'s

Creating decoy answers

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages