From a3bdadfca32a7f5fef6c18d6acee3bbab0a5a00a Mon Sep 17 00:00:00 2001 From: Mark Jacobson <52427991+marksparkza@users.noreply.github.com> Date: Thu, 24 Aug 2023 11:42:22 +0200 Subject: [PATCH] Add a description of the ODP publishing method --- odp/catalog/README.md | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) create mode 100644 odp/catalog/README.md diff --git a/odp/catalog/README.md b/odp/catalog/README.md new file mode 100644 index 0000000..3899a24 --- /dev/null +++ b/odp/catalog/README.md @@ -0,0 +1,38 @@ +# ODP Catalog + +*Twice a day, at 2pm and 2am, the ODP publishing service is kicked into +action by cron, in a container running on a Linux host somewhere in +the uLwazi outback.* + +## The ODP publishing method + +The publishing method is applied to each implemented ODP catalog +(currently SAEON, MIMS and DataCite) in turn. Its function is to +ensure that the set of published records accessible at a catalog's +API endpoint is up to date with respect to the state of all ODP +collections, records and tags. + +First, a temporary snapshot is created, consisting of record API +output for all ODP records with recently updated timestamps. Record +timestamps are updated whenever any changes are made to digital object +metadata or identifiers, and also every time any referencing object such +as a tag or a child record is added, updated or removed. The record API +output model consists of digital object metadata and identifiers, parent +and child record references (if any), and associated record and collection +tags. To ensure consistency of lookup information across all of a catalog's +published records, the database transaction isolation level is set to +'REPEATABLE READ' while taking the snapshot. + +Next, each record API output object in the snapshot is evaluated against +the catalog's rules for deciding whether or not the record should be +published to that catalog. For records that should be published, a +catalog-specific representation of the record's digital object metadata, +identifiers and auxiliary information is created (or updated), and indexing +data are updated to facilitate full-text, spatial, temporal and faceted search. +Records that should not be published are made accessible as retracted record +stubs at the catalog's API endpoint, to facilitate local deletion by +external catalog client systems. + +Finally, in the case of a catalog service - such as DataCite - to which the +ODP is a client, the published records for the catalog are synchronized with +the catalog service via its API.