EHRI Index Tool

This is a convenience tool used by the EHRI frontend to synchronise the search engine with the EHRI backend (and for doing the same easily from the command-line.) The basic idea is to read some JSON from a web service (EHRI REST), convert it to another format (Solr Doc), and POST it to another web service (Solr).

The traditional way to do this would be something like:

curl <WS-URL> | convert-json | curl -X POST "Content-type: application/json" <SOLR-UPDATE-URL> --data @-

Here, we just bundle the downloading and uploading bits with some shortcut syntax. There are ways to accomplish the shell pipeline approach using certain options detailed below.

Current options:

usage: index-data-converter [OPTIONS] <spec> ... <specN>
 -c,--clear-id <arg>          Clear an individual id. Can be used multiple
                              times.
 -C,--clear-type <arg>        Clear an item type. Can be used multiple
                              times.
 -D,--clear-all               Clear entire index first (use with caution.)
 -f,--file <arg>              Read input from a file instead of the REST
                              service. Use '-' for stdin.
 -H <header=value>            Set a header for the REST service.
 -h,--help                    Print this message.
 -i,--index                   Index the data. This is NOT the default for
                              safety reasons.
 -k,--clear-key-value <arg>   Clear items with a given key=value pair. Can
                              be used multiple times.
 -n,--noconvert               Don't convert data to index format.
 -P,--pretty                  Pretty print out JSON given by --print
                              (implies --print).
 -p,--print                   Print converted JSON to stdout. The default
                              action in the omission of --index.
 -r,--rest <arg>              Base URL for EHRI REST service.
 -s,--solr <arg>              Base URL for Solr service (minus the action
                              segment.)
 -S,--stats                   Print indexing stats.
 -v,--verbose                 Print individual item ids to show progress.
 -version                     Print the version number and exit.

Each <spec> should consist of:
* an item type (all items of that type)
* an item id prefixed with '@' (individual items)
* a type|id (bar separated - all children of an item)
The default URIs for Solr and the REST service are:
* http://localhost:7474/ehri
* http://localhost:8983/solr/portal

Examples:

Index documentary unit and repository types from default service endpoints:

java -jar index-data-converter.jar --index DocumentaryUnit Repository

Index individual item us-005578:

java -jar index-data-converter.jar --index @us-005578

Pretty print (to stdout) the converted JSON output for all documentary units, but don't index:

java -jar index-data-converter.jar --pretty DocumentaryUnit

Pretty print (to stdout) the raw REST service output:

java -jar index-data-converter.jar --pretty --noconvert DocumentaryUnit

Clear the entire index:

java -jar index-data-converter.jar --clear-all

Clear items with holderId 'us-005248':

java -jar index-data-converter.jar --clear-key-value holderId=us-005248

Index data read from a JSON file instead of the REST service, outputting some stats:

java -jar index-data-converter.jar --index -f data.json -v

Same as above, but piping the data through stdin (use '-' as the file name):

cat data.json | java -jar index-data-converter.jar --index -f - -v

Read data from stdin, convert it, and pipe it to a Curl upload for indexing:

cat orig.json | java -jar index-data-converter.jar -f - | curl -X POST -H "Content-type: application/json"
"http://localhost:8983/solr/ehri/update?commit=true" --data @-

TODO:

Add proper logging
Add proper error handling
Ensure all resources are properly cleaned up
Add more tests!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

EHRI Index Tool

Current options:

Examples:

TODO:

Files

README.md

Latest commit

History

README.md

File metadata and controls

EHRI Index Tool

Current options:

Examples:

TODO: