Skip to content

Commit

Permalink
Remove eye
Browse files Browse the repository at this point in the history
  • Loading branch information
jcoyne committed May 10, 2023
1 parent 2c273d8 commit e872cda
Show file tree
Hide file tree
Showing 4 changed files with 44 additions and 74 deletions.
4 changes: 2 additions & 2 deletions Procfile.stage
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
marc_bodoni_dev_indexer: JRUBY_OPTS=-J-Xmx8192m /usr/local/rvm/bin/rvm jruby-9.4.1.0 do bundle exec traject -c ./lib/traject/config/sirsi_config.rb -s solr_writer.max_skipped=-1 -s log.level=debug -s log.file=log/traject_marc_bodoni_dev_indexer.log -s processing_thread_pool=2 -s kafka.topic=marc_bodoni -s kafka.consumer_group_id=traject_marc_bodoni_dev -s solr.url=http://sul-solr.stanford.edu/solr/searchworks-dev -s reserves_path=/data/sirsi/bodoni/crez
marc_morison_dev_indexer: JRUBY_OPTS=-J-Xmx8192m /usr/local/rvm/bin/rvm jruby-9.4.1.0 do bundle exec traject -c ./lib/traject/config/sirsi_config.rb -s solr_writer.max_skipped=-1 -s log.level=debug -s log.file=log/traject_marc_morison_dev_indexer.log -s processing_thread_pool=2 -s kafka.topic=marc_morison -s kafka.consumer_group_id=traject_marc_morison_dev -s solr.url=http://sul-solr.stanford.edu/solr/searchworks-morison-dev -s reserves_path=/data/sirsi/morison/crez
marc_bodoni_dev_indexer: /usr/local/rvm/bin/rvm jruby-9.4.1.0 do bundle exec traject -c ./lib/traject/config/sirsi_config.rb -s solr_writer.max_skipped=-1 -s log.level=debug -s log.file=log/traject_marc_bodoni_dev_indexer.log -s processing_thread_pool=2 -s kafka.topic=marc_bodoni -s kafka.consumer_group_id=traject_marc_bodoni_dev -s solr.url=http://sul-solr.stanford.edu/solr/searchworks-dev -s reserves_path=/data/sirsi/bodoni/crez
marc_morison_dev_indexer: /usr/local/rvm/bin/rvm jruby-9.4.1.0 do bundle exec traject -c ./lib/traject/config/sirsi_config.rb -s solr_writer.max_skipped=-1 -s log.level=debug -s log.file=log/traject_marc_morison_dev_indexer.log -s processing_thread_pool=2 -s kafka.topic=marc_morison -s kafka.consumer_group_id=traject_marc_morison_dev -s solr.url=http://sul-solr.stanford.edu/solr/searchworks-morison-dev -s reserves_path=/data/sirsi/morison/crez
sw_dev_indexer: /usr/local/rvm/bin/rvm ruby-3.1.2 do bundle exec traject -c ./lib/traject/config/sdr_config.rb -s solr_writer.max_skipped=-1 -s log.level=debug -s log.file=log/traject_sw_dev_indexer.log -s kafka.topic=purl_fetcher_prod -s solr.url=http://sul-solr.stanford.edu/solr/searchworks-dev
sw_preview_stage_indexer: /usr/local/rvm/bin/rvm ruby-3.1.2 do bundle exec traject -c ./lib/traject/config/sdr_config.rb -s solr_writer.max_skipped=-1 -s log.level=debug -s log.file=log/traject_sw_preview_stage_indexer.log -s kafka.topic=purl_fetcher_stage -s kafka.consumer_group_id=traject_purl_fetcher_stage_sw_preview -s purl_fetcher.target=SearchWorksPreview -s purl_fetcher.skip_catkey=false -s purl.url=https://sul-purl-test.stanford.edu -s solr.url=http://sul-solr.stanford.edu/solr/sw-preview-stage
earthworks_stage_indexer: /usr/local/rvm/bin/rvm ruby-3.1.2 do bundle exec traject -c ./lib/traject/config/geo_config.rb -s solr_writer.max_skipped=-1 -s log.level=debug -s log.file=log/traject_earthworks-stage-indexer.log -s kafka.topic=purl_fetcher_stage -s kafka.consumer_group_id=earthworks-stage-indexer -s purl.url=https://sul-purl-stage.stanford.edu -s stacks.url=https://sul-stacks-stage.stanford.edu -s geoserver.pub_url=https://earthworks-geoserver-stage-b.stanford.edu/geoserver -s geoserver.stan_url=https://earthworks-geoserver-stage-a.stanford.edu/geoserver -s solr.url=http://sul-solr.stanford.edu/solr/earthworks-stage
Expand Down
56 changes: 39 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,13 +32,24 @@ note that some integration tests may hit a live server, for which you may need t
## Building services
For development we can use Foreman to run a procfile, but on a deployed machine, we export the rules to systemd:
```
foreman export -a traject -f Procfile.stage --formation marc_bodoni_dev_indexer=1,marc_morison_dev_indexer=1,folio_dev_indexer=8,sw_dev_indexer=2,sw_preview_stage_indexer=2,earthworks_stage_indexer=1 systemd ~/service_templates
# disable/remove old rules
sudo systemctl stop traject.target
sudo systemctl disable traject.target
# Ensure the .env file exists with JRUBY_OPTS=-J-Xmx8192m LANG=en_US.UTF-8 and then:
foreman export -a traject -u indexer -f Procfile.stage --formation marc_bodoni_dev_indexer=1,marc_morison_dev_indexer=1,folio_dev_indexer=8,sw_dev_indexer=2,sw_preview_stage_indexer=2,earthworks_stage_indexer=1 systemd ~/service_templates
sudo cp /opt/app/indexer/service_templates/* /usr/lib/systemd/system/
sudo systemctl enable traject.target
sudo systemctl start traject.target
```

## Monitor logs
```
ksu
journalctl -u traject-marc_bodoni_dev_indexer.1.service -e
```

## indexing data
indexing is a multi-step process:
1. an extractor process publishes data to be indexed to a [kafka](https://kafka.apache.org/) topic
Expand Down Expand Up @@ -66,7 +77,7 @@ another useful operation is resetting the messages published in a particular top
```
some tools offer the option to "plan" execution by default, and actually execute using the `--execute` flag. for more, try passing `--help`.
### consuming data from kafka
the daemon processes are managed by a common [eye configuration](./traject.eye). it reads information from the `config/settings.yml` (using the `config` gem) to set up the indexing daemons.
the daemon processes are managed by systemd.

the `config/settings.yml` file is configured as a capistrano shared file, allowing each deployment environment to have separate configuration. note that most settings are not checked into GitHub and are not available in `shared_configs`.

Expand All @@ -84,22 +95,32 @@ processes:
config:
start_command: '/usr/local/rvm/bin/rvm jruby-9.3.2.0 do bundle exec honeybadger exec traject -c ./lib/traject/config/sirsi_config.rb -s solr_writer.max_skipped=-1 -s log.level=debug -s log.file=log/traject_marc_bodoni_prod_indexer.log'
```
daemon processes run continuously. you can use `eye info` in an ssh session to view status information:
```sh
$ eye info
traject
workers
earthworks-stage-indexer_0 .... up (12:07, 0%, 73Mb, <1218016>)
folio_dev_indexer_0 ........... up (14:34, 6%, 137Mb, <1386323>)
marc_bodoni_dev_indexer_0 ..... up (12:07, 0%, 2731Mb, <1223573>)
marc_morison_dev_indexer_0 .... up (12:07, 0%, 1253Mb, <1223963>)
sw_dev_indexer_0 .............. up (12:07, 0%, 75Mb, <1224054>)
sw_dev_indexer_1 .............. up (12:07, 0%, 73Mb, <1224303>)
sw_preview_stage_indexer_0 .... up (12:07, 0%, 73Mb, <1224638>)
sw_preview_stage_indexer_1 .... up (12:07, 0%, 73Mb, <1224861>)
You can use `sudo systemctl list-dependencies traject.target` to view status information:
```
● ├─traject-earthworks_stage_indexer.1.service
● ├─traject-folio_dev_indexer.1.service
● ├─traject-folio_dev_indexer.2.service
● ├─traject-folio_dev_indexer.3.service
● ├─traject-folio_dev_indexer.4.service
● ├─traject-folio_dev_indexer.5.service
● ├─traject-folio_dev_indexer.6.service
● ├─traject-folio_dev_indexer.7.service
● ├─traject-folio_dev_indexer.8.service
● ├─traject-marc_bodoni_dev_indexer.1.service
● ├─traject-marc_morison_dev_indexer.1.service
● ├─traject-sw_dev_indexer.1.service
● ├─traject-sw_dev_indexer.2.service
● ├─traject-sw_preview_stage_indexer.1.service
● └─traject-sw_preview_stage_indexer.2.service
```
you can stop and start daemons with e.g. `eye stop sw_dev_indexer`. note that it may take some time for all the processes to start and stop. for more information on eye, use `eye help` or see the [eye wiki](https://github.com/kostya/eye/wiki).
Then look at the logs of any service by doing:
```
ksu
journalctl -u traject-marc_bodoni_dev_indexer.1.service -e
```
### indexing the data into solr
traject configurations specific to each target environment are responsible for transforming the data into a format that can be indexed into solr. you can view the configuration files in `lib/traject/config/`, which often include traject commands like:
Expand All @@ -110,7 +131,8 @@ which extracts information from the 008 field of a MARC record and puts it into

each traject configuration specifies a reader class located in `lib/traject/readers/` that can read the data from the kafka topic and hand it off to be transformed into solr JSON.

other configuration values, like the URL of the solr instance, are usually set at the top of the configuration file using traject's `provide`. many can be set by environment variables; some of these in turn are set by the eye configuration:
other configuration values, like the URL of the solr instance, are usually set at the top of the configuration file using traject's `provide`. many can be set by environment variables

```yaml
processes:
- name: my_indexer
Expand Down
15 changes: 3 additions & 12 deletions config/deploy.rb
Original file line number Diff line number Diff line change
Expand Up @@ -61,19 +61,10 @@
end

namespace :deploy do
desc "stop/start eye, config for monitoring the deployment's traject workers"
before :cleanup, :load_eye_config do
desc "config for monitoring the deployment's traject workers"
before :cleanup, :start_workers do
on roles(:app) do
execute '/usr/local/rvm/bin/rvm-exec default gem list -i -e eye --silent || /usr/local/rvm/bin/rvm-exec default gem install eye'
execute '/usr/local/rvm/bin/rvm-exec default gem list -i -e config --silent || /usr/local/rvm/bin/rvm-exec default gem install config'

execute '/usr/local/rvm/bin/rvm-exec default eye info'
execute '/usr/local/rvm/bin/rvm-exec default eye stop traject'
execute '/usr/local/rvm/bin/rvm-exec default eye quit'
sleep 1
execute '/usr/local/rvm/bin/rvm-exec default eye load /opt/app/indexer/searchworks_traject_indexer/current/traject.eye &> /dev/null'
sleep 1
execute '/usr/local/rvm/bin/rvm-exec default eye info'
sudo :systemctl, 'restart', 'traject.target', raise_on_non_zero_exit: false
end
end
end
Expand Down
43 changes: 0 additions & 43 deletions traject.eye

This file was deleted.

0 comments on commit e872cda

Please sign in to comment.