Name		Name	Last commit message	Last commit date
Latest commit History 1,292 Commits
.circleci		.circleci
app		app
bin		bin
config		config
db		db
jslib		jslib
lib		lib
log		log
nginx		nginx
public		public
test		test
vendor/assets		vendor/assets
xsl		xsl
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
Capfile		Capfile
Dockerfile		Dockerfile
Dockerfile-jruby		Dockerfile-jruby
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
README.md		README.md
Rakefile		Rakefile
build_docker_image.sh		build_docker_image.sh
config.ru		config.ru
delete_by_id.sh		delete_by_id.sh
fetch_and_process_full_export.sh		fetch_and_process_full_export.sh
fetch_and_process_oai.sh		fetch_and_process_oai.sh
fetch_oai.rb		fetch_oai.rb
file_pipeline.rb		file_pipeline.rb
index_and_deletions.sh		index_and_deletions.sh
index_solr.sh		index_solr.sh
index_solr_brown.sh		index_solr_brown.sh
index_solr_chicago.sh		index_solr_chicago.sh
index_solr_columbia.sh		index_solr_columbia.sh
index_solr_cornell.sh		index_solr_cornell.sh
index_solr_duke.sh		index_solr_duke.sh
index_solr_file.sh		index_solr_file.sh
index_solr_harvard.sh		index_solr_harvard.sh
index_solr_hathi.sh		index_solr_hathi.sh
index_solr_princeton.sh		index_solr_princeton.sh
index_solr_stanford.sh		index_solr_stanford.sh
jruby-traject.md		jruby-traject.md
package-lock.json		package-lock.json
package.json		package.json
preprocess.sh		preprocess.sh
preprocess_hathi.sh		preprocess_hathi.sh
preprocess_hathi_file.sh		preprocess_hathi_file.sh
preprocess_oai.sh		preprocess_oai.sh
preprocess_oai_step1.sh		preprocess_oai_step1.sh
preprocess_oai_step2.sh		preprocess_oai_step2.sh
preprocess_step1.sh		preprocess_step1.sh
preprocess_step2.sh		preprocess_step2.sh
process_files.rb		process_files.rb
split.sh		split.sh

Repository files navigation

Nouveau Franklin

Installation:

Checkout this repo.
Make sure you have ruby 2.3.1 installed. It's recommended that you use rbenv, but it may be quicker/easier to get running with rvm.
Run bundle install to install all gem dependencies.
Run npm install to install javascript libraries.
Edit the local_dev_env file and populate the variables with appropriate values. Then source it in your shell.
```
source local_dev_env
```
Run bundle exec rake db:migrate to initialize the database. You'll also have run this again whenever you pull code that includes new migrations (if you forget, Rails will raise an exception when serving requests because there are unloaded migrations.)

If there isn't a Solr instance you can use, you'll need to install Solr and add the solrplugins extensions to it. The following line should be added to the file solr-x.x.x/server/contexts/solr-jetty-context.xml inside the 'Configure' tag:

<Set name="extraClasspath">/path/to/solrplugins-0.1-SNAPSHOT.jar</Set>

Add the solr core from the library-solr-schema repo. You can copy the core's directory into solr-x.x.x/server/solr

Load some test marc data into Solr:

bundle exec rake solr:marc:index_test_data

This pulls 30 sample records from the Blacklight-Data repository.

If the test data is successfully indexed, you should see output something like:

2016-03-03T12:29:40-05:00  INFO    Traject::SolrJsonWriter writing to 'http://127.0.0.1:8983/solr/blacklight-core/update/json' in batches of 100 with 1 bg threads
2016-03-03T12:29:40-05:00  INFO    Indexer with 1 processing threads, reader: Traject::MarcReader and writer: Traject::SolrJsonWriter
2016-03-03T12:29:41-05:00  INFO Traject::SolrJsonWriter sending commit to solr at url http://127.0.0.1:8983/solr/blacklight-core/update/json...
2016-03-03T12:29:41-05:00  INFO finished Indexer#process: 30 records in 0.471 seconds; 63.8 records/second overall.

Start the rails server:
```
bundle exec rails s
```
Open up localhost:3000 in a browser. If everything went well, you should see the generic Blacklight homepage and have 30 faceted records to search.

Solr Indexing

This repository also contains Traject code for indexing MARC records into Solr. It isn't separate because we want to consolidate the MARC parsing logic, as some of it is used to generate display values on-the-fly at page render time.

We handle two types of data exports from Alma: full exports and incremental updates via OAI.

The commands in this section can be run directly, or in an application container. See the run_in_container.sh wrapper script in the ansible repository.

Full exports

Transfer the *.tar.gz files created by the Alma publishing job to the directory where they will be preprocessed and indexed. Run these commands:

./preprocess.sh /var/solr_input_data/alma_prod_sandbox/20170412_full allTitles

./index_solr.sh /var/solr_input_data/alma_prod_sandbox/20170412_full/processed

Incremental updates (OAI)

This runs via a cron job, which fetches the updates available via OAI since the last time the job was run.

./fetch_and_process_oai.sh /var/solr_input_data/alma_prod_sandbox/oai

If you do a full index using an older full data export, and you want to apply a set of already fetched and processed OAI updates manually, you can do so like this:

# run this for each dated directory
./index_and_deletions.sh /var/solr_input_data/alma_prod_sandbox/oai/allTitles/2017_04_10_00_00 allTitles

JRuby and Traject

See the jruby-traject.md file for details on how to use JRuby with Traject, which is currently broken.

Docker

There is a build_docker_image.sh script you can use to build docker images from specific branches that have been freshly pulled from origin. It's intended to be run from a repository clone whose sole purpose is to do builds, so that the images aren't polluted with misc files you may have lying around. Run it with the branch name:

./build_docker_image.sh master
# remember to push to the registry afterwards! see the output of the script.

See the deploy-docker repository for Ansible scripts that build Docker images and deploy containers to the test and production environments.

Auditing Secrets

You can use Gitleaks to check the repository for unencrypted secrets that have been committed.

docker run --rm --name=gitleaks -v $PWD:/code quay.io/upennlibraries/gitleaks:v1.23.0 -v --repo-path=/code --repo-config

Any leaks will be logged to stdout. You can add the --redact flag if you do not want to log the offending secrets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nouveau Franklin

Solr Indexing

Full exports

Incremental updates (OAI)

JRuby and Traject

Docker

Auditing Secrets

About

Releases

Packages

Contributors 12

Languages

upenn-libraries/discovery-app

Folders and files

Latest commit

History

Repository files navigation

Nouveau Franklin

Solr Indexing

Full exports

Incremental updates (OAI)

JRuby and Traject

Docker

Auditing Secrets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 12

Languages

Packages