The DiSCO deposit service consumes DiSCOs from message queues, and attempts to deposit them in RMap. Records that end in a failure (for any reason, including network hiccups) will be sent to an error queue.
The deposit service maintains a database of records it deposited. If messages in the queue contain RecordInfo headers, this information will be used to determine if a DiSCO is new, or represents an update to an existing DiSCO. Absent this information, all DiSCOs are assumed to be new.
Physically, the deposit service is an executable jar file that can be configured via system properties or environment variables. It can be run continuously in the background to continuously monitor queues and deposit DiSCOs immediatly when available, or be run on-demand, or on a schedule (e.g. by using cron
, etc).
The DiSCO deposit service blindly deposits the body of all messages consumed from queues. By default, it uses the wildcard queue:
rmap.harvest.disco.>
If the message has a Content-Type
header, it will use that media type when uploading to RMap. Otherwise, it uses application/vnd.rmap-project.disco+rdf+xml
. Errors go into an error queue based upon the name of the queue a partular message came from. So if a message was consumed from
rmap.harvest.disco.a.b.c.d
.. an error will go to
rmap.harvest.error.a.b.c.d
Message headers relevant for DiSCO accounting are listed in JmsHeaders. If using the JMSClient with the HarvestRecord abstraction, mapping to or from JMS headers occurs automatically.
The DiSCO loader keeps the id, date, and disco identifier in a relational database.
A record's ID is found in jms header rmap.harvest.record.id
, or accessible by harvestRecord.getRecordInfo().getId()
. It is a URI, and is presumed to be globally unique. The date is rmap.harvest.record.date
, or harvestRecord.getRecordInfo().getDate()
, and represents the logical date of a record.
When the RMap deposit service recieves a record, it checks its database for the presence of a matching ID. If one matches, it compares dates. If the provided date is newer than the one in the database, it'll update the corresponding DiSCO (the Disco ID is in the database). If the provided date is older or equal (i.e. already in RMap), it skips the record and discards it. If not present, it will deposit a new record to RMap, and create an entry.
The disco deposit service is just an executable jar, the jar artifact for this module is executable.
Configuration is provided by using environment variables, or system properties (it doesn't matter which).
For example,
export jdbc.password=myPassword
java -jar target/rmap-loader-deposit-disco-0.0.1-SNAPSHOT.jar -Djdbc.username=user
JDBC url, e.g jdbc:postgresql://localhost/test
. By default, it uses a non-durable in-memory sqlite database.
Supported JSBC drivers include:
- sqlite. This is especially useful for simply persisting to a file, without installing a RDBMS, e.g.
jdbc:sqlite:/path/to/rmap.db
- postgresql
To use other JDBC drivers, add their jar to the classpath when running the deposit service jar, and specify an appropriate jdbc URI.
RDBMS username
RDBMS password.
ActiveMQ broker URL. Default is tcp://localhost:61616
JMS username. Leave undefined if authentication is not used.
JMS password. Leave undefined if authentication is not used.
Maximum number of JMS connections. Default is 10.
Queue to consume DiSCOs from. By default, it's the wildcard rmap.harvest.disco.>
RMap authentication token.
RMap API base URI. The default is https://test.rmap-hub.org/api/
The number of worked threads for taking DiSCOs off the queue, and depositing to RMap
Any environment variable or system propertu that begins with LOG.
can be used to specify the logging level of
the logger whose name appears after the LOG.
characters. For example, setting the environment variable:
LOG.info.rmapproject=DEBUG
This will set the logger called info.rmapproject
to the DEBUG
level.