Skip to content

OpenWayback Replay API

Patrick T. Rourke edited this page Feb 3, 2016 · 9 revisions

Introduction

The OpenWayback URL for a single archived web page for a specific date and time looks like this:

http://webarchive.archivedomain.tld/all/20000101000000/subjectdomain.tld
http://[wayback server hostname]/[access point]/[yyyymmddhhmmss]/[access_url]


Access points are indicated by strings in the first field of the OpenWayback URL after the hostname; this access point name is configured in the OpenWayback configuration file.

Dates are represented in the second field of the OpenWayback URL after the hostname as fourteen-character integers in the format yyyymmddhhmmss; on requests, they may be truncated.

The access URL - the URL of the archived site - is represented in the third and last field of the OpenWayback URL after the hostname. Because an access URL may itself include a path, the fields of the OpenWayback URL should always be counted from the left; everything after the fifth slash is part of the access URL.

Direct Page Requests

The simplest requests are for a specific access URL for a specific date.

If there is an archive of the requested access URL for the requested date, that archive is returned to the browser by OpenWayback with an HTTP 200 response.

http://webarchive.archivedomain.tld/all/200101092140/subjectdomain.tld 
    Archived page displayed
    URL in location bar: http://webarchive.archivedomain.tld/all/200101092140/subjectdomain.tld 
    HTTP response: 200

If there is no archive of the requested access URL for the requested date, the archive whose date is closest to the requested date (whether earlier or later) is returned to the browser by OpenWayback with an HTTP 302 response:

http://webarchive.archivedomain.tld/all/200101081200/subjectdomain.tld 
    Archived page displayed: returns the page whose date most closely matches 2001-01-08 12:00
    URL in location bar: http://webarchive.archivedomain.tld/all/200101092140/subjectdomain.tld 
    HTTP response: 302

If there is no archive of the requested access URL for any date, OpenWayback will return an HTTP 404 response with a page indicated that the site is not found in the archive:

http://webarchive.archivedomain.tld/all/200101081200/nonexistentdomain.tld 
    Error page displayed
    URL in location bar: http://webarchive.archivedomain.tld/all/200101081200/nonexistentdomain.tld 
    HTTP response: 404

If the date part of the url is truncated, the date closest to the middle of the range implied by the
request is matched and the request is redirected to the matched page (while OpenWayback returns an HTTP 302 response). 

http://webarchive.archivedomain.tld/all/2000/subjectdomain.tld 
    Archived page displayed: Returns the capture of the page whose archival date most closely matches 2000-07-01
    URL in location bar: http://webarchive.archivedomain.tld/all/200101092140/subjectdomain.tld 
    HTTP response: 302

http://webarchive.archivedomain.tld/all/200010/subjectdomain.tld 
    Archived page displayed: Returns the capture of the page whose archival date most closely matches 2000-10-15
    URL in location bar: http://webarchive.archivedomain.tld/all/200101092140/subjectdomain.tld 
    HTTP response: 302

http://webarchive.archivedomain.tld/all/subjectdomain.tld 
    Archived page displayed: Returns the most recent capture of the page
    URL in location bar: http://webarchive.archivedomain.tld/all/200101092140/subjectdomain.tld 
    HTTP response: 302

First and last capture requests

There are special requests that will return either the first or the last capture from the archive.

http://webarchive.archivedomain.tld/all/1/subjectdomain.tld
    Archived page displayed: Returns the first capture of the requested page
    URL in location bar: http://webarchive.archivedomain.tld/all/200101092140/subjectdomain.tld 
    HTTP response: 302

http://webarchive.archivedomain.tld/all/2/subjectdomain.tld 
    Archived page displayed: Returns the most most recent capture of the page
    URL in location bar: http://webarchive.archivedomain.tld/all/200101092140/subjectdomain.tld 
    HTTP response: 302

If there is no archive of the page for any date, the response to a request with a truncated date will be the same as one for a specific date: an error page and an HTTP 404 response.

Fuzzy Date Requests

Requests for date ranges for specific access URLs.


Requesting a specific access URL with an asterisk as the sole character in the date part of the OpenWayback URL will return a page showing the capture dates for the requested URL (different configurations of OpenWayback will return calendar pages for a single year - the year of the latest capture - or for multiple years, or return a table of capture dates):

http://webarchive.archivedomain.tld/all/*/subjectdomain.tld
    Capture date page displayed: Returns list of capture dates
    URL in location bar: http://webarchive.archivedomain.tld/all/*/subjectdomain.tld 
    HTTP response: 200

Adding the asterisk wildcard character after a year will return a list of capture dates for that year:

http://webarchive.archivedomain.tld/all/2000*/subjectdomain.tld 
    Capture date page displayed: Returns list of capture dates in the year 2000
    URL in location bar: http://webarchive.archivedomain.tld/all/2000*/subjectdomain.tld 
    HTTP response: 200

An ordered pair of dates, separated by a hyphen, and concluded by an asterisk represents a date range; a request with a date range will return a list of capture dates for that range:

http://webarchive.archivedomain.tld/all/2000-2012*/subjectdomain.tld - 
    Capture date page displayed: Returns list of capture dates in the years 2000 to 2012
    URL in location bar: http://webarchive.archivedomain.tld/all/2000-2012*/subjectdomain.tld - 
    HTTP response: 200

Captured Page List Requests

If a wildcard is added to the first part of the access URL, all captured URLs whose original URL begins with the string in the access URL field will be listed, with the number of capture dates for each URL, the total count of captured pages, and the date range of captures.

http://webarchive.archivedomain.tld/all/*/subjectdomain.tld*
    List page displayed: Returns a list of all captures of pages with the prefix `subjectdomain.tld` for all dates.

        Showing 1 to 6,609 of 6,609 results for subjectdomain.tld
        subjectdomain.tld/ 475 versions 
        2,961 pages between Jun 16, 1997 and Jun 5, 2013 

        subjectdomain.tld/%22 3 versions 
        7 pages between Mar 23, 2003 and Jan 22, 2009 

        subjectdomain.tld/2009/12/07/today_in_history 1 version 
        3 pages between Aug 27, 2010 and Nov 27, 2010 

    URL in location bar: http://webarchive.archivedomain.tld/all/*/subjectdomain.tld*
    HTTP response: 200

As with capture list responses, page list responses can be limited by date ranges:

http://webarchive.loc.gov/all/2008*/subjectdomain.tld*
    List page displayed: Returns a list of all captures of pages with the prefix `subjectdomain.tld` for all dates.

        Showing 1 to 2,012 of 2,012 results for subjectdomain.tld
        subjectdomain.tld/ 80 versions 
        500 pages between Jan 1, 2008 and Dec 31, 2008

    URL in location bar: http://webarchive.loc.gov/all/2008*/subjectdomain.tld*
    HTTP response: 200

A wildcarded access URL request with a specific date will fail and return an error page with an HTTP response code of 400:

http://webarchive.loc.gov/all/200804051200/subjectdomain.tld*
    Error page displayed: `The request is missing information, or is not understood by this server. Bad URL(subjectdomain.tld*)`
    URL in location bar: http://webarchive.loc.gov/all/200804051200/subjectdomain.tld*
    HTTP response: 400