This is a getting-started guide for developers.
- PostgreSQL
- OpenSearch >= 1.x
- The ICU Analysis Plugin is also required.
- Cantaloupe 4.1.x (image server)
- Required for thumbnails but otherwise optional.
- You can install and configure this yourself, but it will be easier to run a metaslurp-cantaloupe container in Docker.
- metaslurper
$ brew install rbenv
$ brew install ruby-build
$ brew install rbenv-gemset --HEAD
$ rbenv init
$ rbenv rehash
$ git clone --recursive https://github.com/medusa-project/metaslurp.git
$ cd metaslurp
$ rbenv install "$(< .ruby-version)"
$ gem install bundler
$ bundle install
cp config/credentials/template.yml config/credentials/development.yml
cp config/credentials/template.yml config/credentials/test.yml
Fill in the new files and do not commit them to version control.
$ bin/rails db:setup
Uncomment discovery.type: single-node
in config/opensearch.yml
. Also add
the following lines:
plugins.security.disabled: true
plugins.index_state_management.enabled: false
reindex.remote.whitelist: "localhost:*"
$ bin/opensearch-plugin install analysis-icu
$ bin/opensearch
To confirm that it's running, try to access http://localhost:9200.
$ bin/rails opensearch:indexes:create[my_index]
$ bin/rails opensearch:indexes:create_alias[my_index,my_index_alias]
(my_index_alias
is the value of the opensearch_index
configuration key.)
Cantaloupe has several dependencies of its own and requires particular configuration and delegate method implementations to work with the application. Rather than documenting all of that here, see the README in the metaslurp-cantaloupe repository. It is recommended to clone that and run it locally using Docker.
Note that Cantaloupe plays a relatively minor role in the application (only rendering thumbnails) and it is perfectly possible to do 99% of development on Metaslurp without it running.
bin/rails server
N.B.: In macOS, if you get an error like "+[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called" in macOS, try setting this envrionment variable:
OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
bin/rails db:migrate
For the most part, once created, index schemas can't be modified. To migrate to an incompatible schema, the procedure would be something like:
- Update the index schema in
app/search/index_schema.yml
- Create an index with the new schema:
bin/rails opensearch:indexes:create[my_new_index]
- Populate the new index with documents. There are a couple of ways to do
this:
- If the schema change was backwards-compatible with the source documents
added to the index, invoke
bin/rails opensearch:indexes:reindex[my_current_index,my_new_index]
. This will reindex all source documents from the current index into the new index. - Otherwise, reharvest everything into the new index. This can be
accomplished by invoking the harvester with the
SERVICE_SINK_METASLURP_INDEX
environment variable set to the name of the index.
- If the schema change was backwards-compatible with the source documents
added to the index, invoke
Because all of the above can be a huge pain, an effort has been made to design the index schema to be flexible enough to require migration as infrequently as possible.
In production, the various web-based buttons for initiating harvests trigger calls to the ECS API to start new harvesting tasks. This won't work in development. Instead, metaslurper should be invoked manually. Here is an example that will harvest the DLS into a local Metaslurp instance:
SERVICE_SOURCE_DLS_KEY=dls
SERVICE_SOURCE_DLS_ENDPOINT=https://digital.library.illinois.edu
# your NetID
SERVICE_SOURCE_DLS_USERNAME=...
# your API key; see https://digital.library.illinois.edu/admin/users/{NetID}
SERVICE_SOURCE_DLS_SECRET=...
SERVICE_SINK_METASLURP_KEY=metaslurp
SERVICE_SINK_METASLURP_ENDPOINT=http://localhost:3000
# username of a "non-human user"; see http://localhost:3000/admin/users
SERVICE_SINK_METASLURP_USERNAME=...
# the above user's API key
SERVICE_SINK_METASLURP_SECRET=...
java -jar target/metaslurper-VERSION.jar \
-source $SERVICE_SOURCE_DLS_KEY \
-sink $SERVICE_SINK_METASLURP_KEY \
-threads 2
(These environment variable values are just examples. The variables used in production are stored in Metaslurp's ECS task definition, which is Terraformed.)
See the metaslurper README for more information about using metaslurper.
Once a harvest is running, you can monitor it from the harvests page just like any other harvest.
Sign in as admin
with password [email protected]
.