Michael Beswetherick, Joe Kohlmann {mbesweth,kohlmann}@uw.edu
Nested Year Timelines is a hierarchical timeline visualization that supports exploring large sets of news articles retrieved from the New York Times Article Search API.
You should definitely go play with Nested Year Timelines at keynotesavant.com. (Be gentle and patient with that server, please!)
- Design Concept: Michael and Joe
- Data: The New York Times
- Prototype Implementation: Michael
- New York Times API Wrangling and Python-Whispering: Joe
- Spit-Shining: Joe
- Late Nights: Michael and Joe
Again, check out the live version of NYT at keynotesavant.com if you haven't already.
If you're about to clone this repository and try to get this working on your own, keep reading.
This is a Python job, so you'll need the following software to run Nested Year Timelines:
- Python 2.7.5 (probably doesn't work with Python 3)
- Flask 0.10
- Requests: HTTP for Humans
- Maybe some other fourth thing (Python will tell you, I promise)
Once you've cloned the repository, check out the flask
folder. Inside you'll find config_example.py
, which you should duplicate and rename config.py
. Open it up and you'll see something like this:
class Config(object):
DEBUG = False
# NYT Article Search API Key
# Request a key at http://developer.nytimes.com
SEARCH_API_KEY = "PASTE_API_KEY_HERE"
CACHE_ROOT = os.path.join(
os.path.dirname(os.path.realpath(__file__)),
"static",
"cache"
)
ARTICLE_SEARCH_ROOT = os.path.join(
CACHE_ROOT,
"ArticleSearch"
)
QUERY_CACHE_PATH = os.path.join(
CACHE_ROOT,
"queries.json"
)
class DevelopmentConfig(Config):
DEBUG = True
Visit developer.nytimes.com, create a user account, and register your application for use with the Article Search API.
NYT will give you an API key, which you should paste into config.py
to set the value of SEARCH_API_KEY
to a valid API key string.
If you're just on your local machine, invoke python
on the command line to start the Flask web server on http://localhost:5000:
$ cd path/to/repo
$ cd flask
$ python app.py
Loaded cached queries from disk.
* Running on http://127.0.0.1:5000/
Check out flask/static/cache/
for three (count 'em, three) types of cache files:
queries.json
, which lists recent queries (for which there may exist cached data)- In the
ArticleSearch
folder,61c5cce1ba03475e.json
(or something like that) is a cached JSON file containing all the NYT Article Search API results for a certain query.- Which query does a certain file represent, you ask? Good question. There's a rudimentary caching scheme in place that uses the search query value to generate a file name, so your guess is as good as mine. Check out the function
nyt_api.ArticleSearch.get_hash_for_query
for more on how those file names are generated. - I can tell you that
61c5cce1ba03475e.json
contains the cached results for the queryfq=persons:("Obama, Barack")
, however. - These cache files take a long time to generate since we have to retrieve however many thousands of article results from the NYT Article Search API to create them. I'd let them stick around if you can spare the megabytes.
- Which query does a certain file represent, you ask? Good question. There's a rudimentary caching scheme in place that uses the search query value to generate a file name, so your guess is as good as mine. Check out the function
- In the
ArticleSearch
folder,64d2ae7086249faf_reduce_by_year_and_month_with_multimedia.json
is a cache file containing the reduced data set that actually gets loaded into Nested Year Timelines. These are both smaller and easier to regenerate, so get rid of them whenever you want.
Things should be up and running. Don't forget to have fun!