The Solr index that powers the Calisphere website is hosted on the AWS Elastic Beanstalk platform.
The CNAME solr.calisphere.org
points to https://ucldc-solr.us-west-2.elasticbeanstalk.com, whichever Beanstalk environment which is at this address will be the server for our search requests.
The Beanstalk is hosted in the Oregon (us-west-2) AWS region. The application name is ucldc-solr
. Currently it runs on only one micro EC2 instance.
The process to create a new production index is as follows:
- Optimize the Solr index
- Push the index to S3
- Clone the existing environment
- In the cloned environment, set the env var INDEX_PATH to the new index sub-path in S3
- Rebuild the cloned environment
- Check that the cloned environment is serving up the new index
- Swap URLs from existing environment to the new cloned environment
This will put in place the new index.
Generally, I then rebuild the original environment and swap back so the name of the environment remains ucldc-solr
. Not really necessary but makes it a bit easier to remember what's what.
Optimize the Solr index:
- Go to the core admin page in Solr production: https://harvest-prd.cdlib.org/solr/#/~cores/dc-collection
- Hit the
optimize
button. This process will take a while - Keep refreshing until the index reports being optimized and current.
To push a new index to S3:
-
Log into blackstar and sudo su - hrv-prd
-
Run
snsatnow solr-index-to-s3.sh
. The DATA_BRANCH is set to production in this environment. This will push the last build Solr index to S3 at the location. This process will take a while, but with the snsatnow wrapper it will send a message to dsc_harvesting_report Slack channel when finished. (It takes some time for the new index to be packaged and zipped on S3).solr.ucldc/indexes//YYYY/MM/solr-index.YYYY-MM-DD-HH_MM_SS.tar.bz2
-
Look at the message sent to dsc_harvesting_report Slack channel. Find the
s3_file_path
reports it will be something like:"s3_file_path": "s3://solr.ucldc/indexes/production/2016/06/solr-index.2016-06-21-19_53_40.tar.bz2"
. -
This is the value to pass into the update environment command
The script clone-with-new-s3-index.sh
will do steps 3 to 5 above.
- First, check what environments are running. Run this from your home directory (e.g., /home/ec2-user or /home/hrv-prd):
eb list
- Now run the following, where the
<new index path>
is the value from Step #1 (e.g., s3://solr.ucldc/indexes/production/2016/06/solr-index.2016-06-21-19_53_40.tar.bz2). This process will take a while. Again, by convention, we name the existing environment (<old env name>
)ucldc-solr
. By convention, we have been naming the new environment (<new env name>
)ucldc-solr-clone
.
snsatnow clone-with-new-s3-index.sh <old env name> <new env name> <new index path>
- It will send a message to dsc_harvesting_report when finished
- When it finishes, you should be able to run the following, and see that INDEX_PATH is updated to the value passed to the script.
eb printenv <new env name>
Check the new environments URL for the proper search results:
- Run the following, to confirm the URL that is associated with the environment:
cname_for_env.sh <new env name>
- You can check that the URL is up by running:
check_solr_api_for_env.sh <new env name>
Swap URLs from the existing environment to the new cloned environment running the updated solr index:
- First, check what environment has the ucldc-solr.us-west-2.elasticbeanstalk.com CNAME:
eb status <new env name>
Also, check the status and health of the environment. Here's an example of a happy environment:
Environment details for: ucldc-solr
Application name: ucldc-solr
Region: us-west-2
Deployed Version: new-nginx-index-html
Environment ID: e-dmmzpvb2vj
Platform: 64bit Amazon Linux 2016.03 v2.1.3 running Docker 1.11.1
Tier: WebServer-Standard
CNAME: ucldc-solr.us-west-2.elasticbeanstalk.com
Updated: 2016-09-10 02:09:01.062000+00:00
Status: Ready
Health: Green
- If both look right, swap the URLs and the new index will be live (
eb swap -n <new env name> <old env name>
):
eb swap -n ucldc-solr-clone ucldc-solr
I have been then updating the ucdlc-solr
environment and then swapping the URL back, so that the environment we have up is always named ucldc-solr
, but this is not required. The important thing is that the ucdlc-solr.us-west-2.elasticbeanstalk.com/solr/query URL works.
The update-env-with-new-s3-index.sh
command will update an existing beanstalk environment to the new index path. e.g.
update-env-with-new-s3-index.sh ucldc-solr s3://solr.ucldc/indexes/production/2016/09/solr-index.2016-09-21-22_26_55.tar.bz2
Once that is run, you can swap CNAMEs to the updated environment.
eb swap -n ucldc-solr-clone ucldc-solr
Then the ucldc-solr environment is once again be the production environment with the CNAME ucdlc-solr.us-west-2.elasticbeanstalk.com attached to it.
As a very last step, delete the cloned environment:
eb terminate ucldc-solr-clone