This repository serves as a collection of scripts, library classes, and other projects developed to help improve the digitization workflow at the Florida Museum of Natural History at UF.
$ pip3 install --user -r requirements.txt
There is a selection of scripts available in the scripts
directory. They all have unique CLI structures, so be sure to run whichever is needed with the --help
flag to get started. The below table provides a brief overview for each script:
Script | Description |
---|---|
dynaiello.py |
A version of the Aiello script with less column restrictions. Copy and rename entries based on a CSV file. |
gene_copy.py |
Removes divergent consensus sequences (IBA pipeline) from .fas/.fasta files |
gene_parser.py |
Parses .fa/.fasta files to extract accession numbers and gene names |
mgcl_tracker.py |
Tracks the used catalog numbers in the filesystem against a range/csv of numbers |
protein_combine.py |
Combines separated protein/nucleotide files into one combined file |
relocate.py |
(deprecated) Relocates 'troublesome' images based on the log output of other scripts |
suspect_numbers.py |
Agreggates 'suspect' catalog numbers in a filesystem |
unique_values.py |
Outputs all the unique values in the columns of a CSV or XLSX file |
wls.py |
(deprecated) Generates CSV of specimen at current working directory |
wrangler.py |
Assigns BOMBID numbers to collection specimen |
Although a little dated, a few major workflow tasks for were implemented as libraries used by the wrapper digitization.py
file.
A text-based list of programs to load will show on launch. Each program has a unique help prompt once loaded that will be available once selected.
This is not an exhaustive list:
Name | Description | Example(s) |
---|---|---|
Rename | Replaces the existing bash scripts previously for renaming images that have been named via the physical barcode scanner: | MGCL 0123456 (2).CR2 --> MGCL 0123456_V.CR2 |
Legacy Upgrade | We have ran Legacy Upgrade on server data already, if you are a current volunteer you most likely do not need to run this script. Legacy Upgrade is to be used to upgrade data to the new standards. | MGCL__0123456__V__M.CR2 --> MGCL_0123456_V.CR2 |
Aiello | This program is designed for a specific task, and should only be used accordingly. It will parse out a specifically formatted excel sheet to locate file references in the museum server and copy these files to another location with a different naming scheme to be used in a particular project. | |
Rescale | This will rescale images in a given directory (and subdirectories, if required by user) by a user provided proportion. This is to aid in the upload of images, as the hi-res images previously took up a large amount of space. This will allow for smaller file sizes without losing a noticeable amount of image quality. | |
Zipper | This will analyze a directory (and subdirectories, if required by user) and determine how many 1GB zip archives should be created. This is to aid in the upload of images, as well. Important notes: this script only looks inside folders entitled LOW-RES at all levels. This is because only the downscaled images will be grouped together for upload, as a means of saving cloud storage. |