Skip to content
This repository has been archived by the owner on Jan 25, 2024. It is now read-only.

Online Resources And Data Sets

Mike Caprio edited this page Jun 28, 2017 · 80 revisions

AMNH Library Systems

These systems are the central focus of the AMNH API Portal challenge and are key resources for many of the challenges - see each challenge for details. Teams working on challenges that use these systems should collaborate with teams working on the AMNH API Portal and determine what calls / API wrappers they need!


Sierra is the online catalog for all analog library media. It contains descriptions of books, serials, archives, art, videos, and special collections. Some records have links to DSpace (the digital library) and Biodiversity Heritage Library. It's the best place to begin a search as it crosses formats and location. Downside: for collections-based material, the descriptions may be too general and relevant content may not come up in a search.

Sierra example: American Museum novitates.


Omeka contains digitized images and catalog numbers from the library's vast photo negative and slide collection, primarily used in web publishing. It also includes Rare Books and some archives materials. Pull for image-based results and metadata (location, content, and identities may be included).


DSpace is a digital repository for AMNH publications such as scientific publications, Annual Reports, and other documents. It also includes some research data sets, manuscripts and dissertations from the Richard Gilder Graduate School. There is duplication of metadata from Sierra, the Library's catalog. It can possibly be used for keyword search of OCR text for relevant hits. This could borrow from the results display made for Snippet Search to highlight the best matches. Publication covers may be used as graphic representations. Images may be searched based on their captions.


Descriptions of archives on collection and container levels which are specific to series or folders within a collection. An abbreviated collection description is also available in the Library catalog. ArchivesSpace goes deeper in describing the materials found in a collection. Can be used to pull relevant data from folder-level descriptions, offering better potential for discovery.


An outreach tool used to highlight archival material housed in our collections and describe our grant project goals. Posts feature unique artifacts and general backgrounds written by student interns. We published lists of entities in spreadsheets and other useful resources as a way of documenting our process of building out our descriptive metadata. This site can be mined for images relating to search queries. Not all images are captioned, but the content is described in the blog posts.


EAC-CPF - Encoded Archival Context for Corporate Bodies, Persons, and Families - is an XML-Schema. It provides a grammar for encoding names of creators of archival materials and related information. xEAC is an open-source XForms-based application for creating and managing EAC-CPF collections. The AMNH implmentation is a database of museum related entities: people, departments (some), permanent halls, expeditions. it provides general information about the entities, some entries more detailed than others. It includes links to related entities and related materials providing a very rich resource for entity networks, or identity constellations.

See the Whitney South Sea Expedition. The names and resources have all been hard-coded into the record, but we anticipate a future where the relationships may be pulled together dynamically. Very useful to note: controlled and local versions of the names are identified, geographic locations are listed separately and linked to external databases, includes structured data such as timelines, associated dates and roles. There is huge potential for visualization using this metadata. Unlike the above resources, xEAC provides information the who, what, when, where and why of content creators. All identities in the database have unique IDs.


SNAC is also a database for entities using EAC-CPF. It has been called the "Facebook for dead people" because of the relational networks. The current beta prototype includes hundreds of thousands of records, mostly derived from catalog and finding aid (archives) descriptions. Because there is no automated way to pull records, this site may not be helpful in particular challenges, but we are planning to contribute many of our records to their database. Below is a rough visualization of their Whitney South Sea Expedition identity constellation.

Whitney South Sea Expedition identity constellation by SNAC


AMNH is a participating member of the BHL consortium of natural history and botanical libraries. We contribute digitized publications and field work to be accessed openly in a global "biodiversity commons." There is some overlap between DSpace and BHL. Like DSpace, this is useful for searching content within a resource and pulling graphic elements or images for the results page.


Computer Vision / Text Reading Implementations


Miscellaneous Resources and Other External Systems