Docker image of a Solr instance ready to be used as an AkSearch Solr backend and MARC data import environment.
As in solr:7
the Solr is installed in /opt/solr/server/solr
.
AkSearch repository is cloned into /opt/aksearch
.
/opt/aksearch/import/*properties
files have Solr port fixed (AkSearch assumes Solr works on port 8080 while by default it works on 8983).
Run with with e.g.:
docker run --name aksearch-solr -d -p 8983:8983 -v aksearch-solrdata:/opt/solr/server/solr/mycores acdhch/aksearch-solr
Use SOLR_JAVA_MEM
, GC_TUNE
and OOM
environment variables to tune Solr memory management. See default values defined in the Dockerfile.
Enter container, e.g. with docker exec -ti aksearch-solr bash
and:
- If you want to import a single file:
cd /opt/aksearch && ./import-marc.sh pathToTheMarcFile
. - If you want to import all files in a directory:
cd /opt/aksearch && harvest/batch-import-marc.sh -m -d pathToTheMarcDir
.- Be aware the
batch-import-marc.sh
script writes logs intopathToTheMarcDir/log
directory by default and fails if it's unable to create/write to this dir. If you run into such trouble consider using the-z
switch to turn off logging to files (and you will still get log on the console and you can redirect it to a file). - The
/opt/aksearch/harvest/batch-import-marc-single.sh
wrapper assures the import isn't run until the previous one finished.
- Be aware the
Similarly you can use other import scripts shipped with AkSearch (/opt/aksearch/harvest/batch-import-marc-auth.sh
, /opt/aksearch/import-marc-auth.sh
and others).
- Login into running container, e.g.
docker exec -ti aksearch-solr bash
. - Download, decompress and import sample data:
curl -L 'https://github.com/acdh-oeaw/AkSearchSolr/blob/main/.github/workflows/marc.xml.gz?raw=true' > /tmp/marc.xml.gz gunzip /tmp/marc.xml.gz cd /opt/aksearch && ./import-marc.sh /tmp/marc.xml rm /tmp/marc.xml
- Exit the container with
exit
.
- Log into the running container as root, e.g.
docker exec -u root -ti aksearch-solr bash
. - Download the dataset over OAI-PMH using the
VuFindMinimal
set and make it look like the VuFind download script results:curl 'https://eu02.alma.exlibrisgroup.com/view/oai/43ACC_OEAW/request?metadataPrefix=marc21&verb=ListRecords&set=VuFindMinimal' | tail -n +5 | grep -v '^</metadata>' > /tmp/records.xml echo "<collection>" > /tmp/records2.xml cat /tmp/records.xml >> /tmp/records2.xml echo "</collection>" >> /tmp/records2.xml mv /tmp/records2.xml /tmp/records.xml
- Import the data with
/opt/aksearch/import-marc.sh /tmp/records.xml
- Alternatively you can harvest the set via OAI-PMH (in the AkSearchWeb Container)
/var/www/vufind/harvest/harvest_oai.sh bib_almamin
- And subsequently import it with the batch import script (back here in the AkSearchSolr):
cd /opt/aksearch && harvest/batch-import-marc.sh -m -d -p /opt/local/import/import_bib_almamin.properties /opt/harvest/bib_almamin # note that the -m flag set here will not move the data after indexing
- The full_reindex.sh script will clear the core and trigger an import with the options set above
/opt/local/full_reindex.sh bib_almamin
- Find Alma record's MMS_ID
- Log into the running container as root, e.g.
docker exec -u root -ti aksearch-solr bash
. - Download the record over OAI-PMH skipping the OAI-PMH envelope:
e.g.
curl 'https://eu02.alma.exlibrisgroup.com/view/oai/43ACC_OEAW/request?metadataPrefix=marc21&verb=GetRecord&identifier=oai:alma.43ACC_OEAW:{MMS_ID}' | tail -n 2 | head -n 1 > /tmp/record.xml
curl 'https://eu02.alma.exlibrisgroup.com/view/oai/43ACC_OEAW/request?metadataPrefix=marc21&verb=GetRecord&identifier=oai:alma.43ACC_OEAW:993516214704498' | tail -n 2 | head -n 1 > /tmp/record.xml
- Import the record with
/opt/aksearch/import-marc.sh /tmp/record.xml
Just mount it into the container and set the VUFIND_LOCAL_DIR
environment variable in a way $VUFIND_LOCAL_DIR/import/import.properties
points to your MARC import properties file, e.g.:
docker run --name aksearch-solr -d -p 8983:8983 -v aksearch-solrdata:/opt/solr/server/solr/mycores \
-v pathToMyLocalImport.properties:/opt/localcfg/import/import.properties -e VUFIND_LOCAL_DIR=/opt/localcfg \
acdhch/aksearch-solr
Act accordingly for any other configuration file.
If you want all data to be removed and reingested, run inside the docker container
/opt/harvest/oeaw_reindex.sh
- This image runs Solr on port 8983. If you are using your own
import.properties
file, you may need to adjust thesolr.hosturl
configuration property. - If
batch-import-marc-auth.sh
gives youit means it messed up with paths.FATAL [main] (Boot.java:215) - ERROR: Error while invoking main method in specified class: org.solrmarc.driver.IndexDriver java.lang.LinkageError: loader org.solrmarc.driver.Boot @3fee733d attempted duplicate interface definition for org.ini4j.Persistable. (org.ini4j.Persistable is in unnamed module of loader org.solrmarc.driver.Boot @3fee733d, parent loader 'app')
Runningbatch-import-marc.sh
from the directory whereimport-marc.sh
should solve the issue (see examples above -cd /opt/aksearch
and thenharvest/batch-import-marc.sh -m -d pathToTheMarcDir
).- The precise reason is stated in logs a few lines above and looks more or less like that:
The issue here is that the third homeDirStrs (which isn't printed in the log) is the
DEBUG [main] (Boot.java:645) - Number of homeDirStrs: 3 DEBUG [main] (Boot.java:648) - homeDirStrs[0]: /usr/local/vufind/import DEBUG [main] (Boot.java:652) - homeDirStrs[1]: /opt/aksearch/harvest/../import
import-marc.sh
directory and it duplicates with/opt/aksearch/harvest/../import
leading to duplicated class imports.
- The precise reason is stated in logs a few lines above and looks more or less like that: