v1.3.0

**Release v1.3.0:** * **`VEBA` Modules:** * Added `profile-pathway.py` module and associated scripts for building `HUMAnN` databases from *de novo* genomes and annotations. Essentially, a reads-based functional profiling method via `HUMAnN` using binned genomes as the database. * Added `marker_gene_clustering.py` script which identifies core marker proteins that are present in all genomes within a genome cluster (i.e., pangenome) and unique to only that genome cluster. Clusters in either protein or nucleotide space. * Added `module_completion_ratios.py` script which calculates KEGG module completion ratios for genomes and pangenomes. Automatically run in backend of `annotate.py`. * Updated `annotate.py` and `merge_annotations.py` to provide better annotations for clustered proteins. * Added `merge_genome_quality.py` and `merge_taxonomy_classifications.py` which compiles genome quality and taxonomy, respectively, for all organisms. * Added BGC clustering in protein and nucleotide space to `biosynthetic.py`. Also, produces prevalence tables that can be used for further clustering of BGCs. * Added `pangenome_core_sequences` in `cluster.py` writes both protein and CDS sequences for each genome cluster. * Added PDF visualization of newick trees in `phylogeny.py`. * **`VEBA` Database (`VDB_v5.2`)**: * Added `CAZy` * Added `MicrobeAnnotator-KEGG` <details> <summary>**Release v1.3.0 Details**</summary> * Update `annotate.py` and `merge_annotations.py` to handle `CAZy`. They also properly address clustered protein annotations now. * Added `module_completion_ratio.py` script which is a fork of `MicrobeAnnotator` [`ko_mapper.py`](https://github.com/cruizperez/MicrobeAnnotator/blob/master/microbeannotator/pipeline/ko_mapper.py). Also included a database [Zenodo: 10020074](https://zenodo.org/records/10020074) which will be included in `VDB_v5.2` * Added a checkpoint for `tRNAscan-SE` in `binning-prokaryotic.py` and `eukaryotic_gene_modeling_wrapper.py`. * Added `profile-pathway.py` module and `VEBA-profile_env` environments which is a wrapper around `HUMAnN` for the custom database created from `annotate.py` and `compile_custom_humann_database_from_annotations.py` * Added `GenoPype version` to log output * Added `merge_genome_quality.py` which combines `CheckV`, `CheckM2`, and `BUSCO` results. * Added `compile_custom_humann_database_from_annotations.py` which compiles a `HUMAnN` protein database table from the output of `annotate.py` and taxonomy classifications. * Added functionality to `merge_taxonomy_classifications.py` to allow for `--no_domain` and `--no_header` which will serve as input to `compile_custom_humann_database_from_annotations.py` * Added `marker_gene_clustering.py` script which gets core marker genes unique to each SLC (i.e., pangenome). `average_number_of_copies_per_genome` to protein clusters. * Added `--minimum_core_prevalence` in `global_clustering.py`, `local_clustering.py`, and `cluster.py` which indicates prevalence ratio of protein clusters in a SLC will be considered core. Also remove `--no_singletons` from `cluster.py` to avoid complications with marker genes. Relabeled `--input` to `--genomes_table` in clustering scripts/module. * Added a check in `coverage.py` to see if the `mapped.sorted.bam` files are created, if they are then skip them. Not yet implemented for GNU parallel option. * Changed default representative sequence format from table to fasta for `mmseqs2_wrapper.py`. * Added `--nucleotide_fasta_output` to `antismash_genbank_to_table.py` which outputs the actual BGC DNA sequence. Changed `--fasta_output` to `--protein_fasta_output` and added output to `biosynthetic.py`. Changed BGC component identifiers to `[bgc_id]_[position_in_bgc]|[start]:[end]([strand])` to match with `MetaEuk` identifiers. Changed `bgc_type` to `protocluster_type`. `biosynthetic.py` now supports GFF files from `MetaEuk` (exon and gene features not supported by `antiSMASH`). Fixed error related to `antiSMASH` adding CDS (i.e., `allorf_[start]_[end]`) that are not in GFF so `antismash_genbank_to_table.py` failed in those cases. * Added `ete3` to `VEBA-phylogeny_env.yml` and automatically renders trees to PDF. * Added presets for `MEGAHIT` using the `--megahit_preset` option. * The change for using `--mash_db` with `GTDB-Tk` violated the assumption that all prokaryotic classifications had a `msa_percent` field which caused the cluster-level taxonomy to fail. `compile_prokaryotic_genome_cluster_classification_scores_table.py` fixes this by uses `fastani_ani` as the weight when genomes were classified using ANI and `msa_percent` for everything else. Initial error caused unclassified prokaryotic for all cluster-level classifications. * Fixed small error where empty gff files with an asterisk in the name were created for samples that didn't have any prokaryotic MAGs. * Fixed critical error where descriptions in header were not being removed in `eukaryota.scaffolds.list` and did not remove eukaryotic scaffolds in `seqkit grep` so `DAS_Tool` output eukaryotic MAGs in `identifier_mapping.tsv` and `__DASTool_scaffolds2bin.no_eukaryota.txt` * Fixed `krona.html` in `biosynthetic.py` which was being created incorrectly from `compile_krona.py` script. * Create `pangenome_core_sequences` in `global_clustering.py` and `local_clustering.py` which writes both protein and CDS sequences for each SLC. Also made default in `cluster.py` to NOT do local clustering switching `--no_local_clustering` to `--local_clustering`. * `pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects` in `biosynthetic.py` when `Diamond` finds multiple regions in one hit that matches. Added `--sort_by` and `--ascending` to `concatenate_dataframes.py` along with automatic detection and removal of duplicate indices. Also added `--sort_by bitscore` in `biosynthetic.py`. * Added core pangenome and singleton hits to clustering output * Updated `--megahit_memory` default from 0.9 to 0.99 * Fixed error in `genomad_taxonomy_wrapper.py` where `viral_taxonomy.tsv` should have been `taxonomy.tsv`. * Fixed minor error in `assembly.py` that was preventing users from using `SPAdes` programs that were not `spades.py`, `metaspades.py`, or `rnaspades.py` that was the result of using an incorrect string formatting. * Updated `bowtie2` in preprocess, assembly, and mapping modules. Updated `fastp` and `fastq_preprocessor` in preprocess module. </details>
jolespin · Oct 27, 2023 · 57b9a34 · 57b9a34
1 parent 55ae7dc
commit 57b9a34
Show file tree

Hide file tree

Showing 30 changed files with 1,620 additions and 249 deletions.
diff --git a/.DS_Store b/.DS_Store
diff --git a/CHANGELOG.md b/CHANGELOG.md
diff --git a/CITATIONS.md b/CITATIONS.md
diff --git a/DEPENDENCIES.xlsx b/DEPENDENCIES.xlsx
diff --git a/README.md b/README.md
@@ -42,26 +42,22 @@ ___________________________________________________________________
 
 ### Announcements
 
-* **`VEBA v1.2.0` is now available!**
+* **What's new in `VEBA v1.3.0`?**
 
 * **`VEBA` Modules:**
-	* Updated `GTDB-Tk` now uses `Mash` for ANI screening to speed up classification (now provided in `VDB_v5.1` database)
-	* rRNA and tRNA are identified for prokaryotic and eukaryotic genomes via `BARRNAP` and `tRNAscan-SE`
-	* Eukaryotic genes (CDS, rRNA, tRNA) are analyzed separately for nuclear, mitochondrion, and plastid sequences
-	* Genome GFF files include contigs, CDS, rRNA, and tRNA with tags for mitochondrion and plastids when applicable
-	* Clustering automatically generates pangenome protein prevalence tables for each genome cluster
-	* Ratios of singletons in each genome are now calculated
-	* [Virulence factor database](http://www.mgc.ac.cn/VFs/main.htm) (`VFDB`) is now included in annotations
-	* [UniRef50/90](https://www.uniprot.org/help/uniref) is now included in annotations
-	* `Krona` plots are generated for taxonomy classifications and biosynthetic gene cluster detection
-	* Fixed a minor issue in `biosynthetic.py` where the fasta and genbank files were not properly symlinked.  Also added virulence factor results to synopsis.
-
+	* Added `profile-pathway.py` module and associated scripts for building `HUMAnN` databases from *de novo* genomes and annotations.  Essentially, a reads-based functional profiling method via `HUMAnN` using binned genomes as the database.
+	* Added `marker_gene_clustering.py` script which identifies core marker proteins that are present in all genomes within a genome cluster (i.e., pangenome) and unique to only that genome cluster.  Clusters in either protein or nucleotide space.
+	* Added `module_completion_ratios.py` script which calculates KEGG module completion ratios for genomes and pangenomes. Automatically run in backend of `annotate.py`.
+	* Updated `annotate.py` and `merge_annotations.py` to provide better annotations for clustered proteins.
+	* Added `merge_genome_quality.py` and `merge_taxonomy_classifications.py` which compiles genome quality and taxonomy, respectively, for all organisms.
+	* Added BGC clustering in protein and nucleotide space to `biosynthetic.py`.  Also, produces prevalence tables that can be used for further clustering of BGCs.
+	* Added `pangenome_core_sequences` in `cluster.py` writes both protein and CDS sequences for each genome cluster.
+	* Added PDF visualization of newick trees in `phylogeny.py`.
+
 
-* **`VEBA` Database**:
-	* Added `VFDB`
-	* Updated `GTDB v207_v2 → v214.1`
-	* Changed `NR  → UniRef50/90` 
-	* Deprecated [`RefSeq non-redundant`](https://www.ncbi.nlm.nih.gov/refseq/about/nonredundantproteins/) in place of `UniRef`
+* **`VEBA` Database (`VDB_v5.2`)**:
+	* Added `CAZy`
+	* Added `MicrobeAnnotator-KEGG`
 
 Check out the [*VEBA* Change Log](CHANGELOG.md) for insight into what is being implemented in the upcoming version.
 
@@ -72,7 +68,9 @@ ___________________________________________________________________
 
 ### Installation and databases
 
-**Current Stable Version:** [`v1.2.0`](https://github.com/jolespin/veba/releases/tag/v1.2.0)
+**Current Stable Version:** [`v1.3.0`](https://github.com/jolespin/veba/releases/tag/v1.3.0)
+
+**Current Database Version:** `VDB_v5.2`
 
 Please refer to the [*Installation and Database Configuration Guide*](install/README.md) for software installation and database configuration.
 

diff --git a/SOURCES.xlsx b/SOURCES.xlsx
diff --git a/VERSION b/VERSION
@@ -1,2 +1,2 @@
-1.3.0b
-VDB_v5.1
+1.3.0
+VDB_v5.2
diff --git a/install/DATABASE.md b/install/DATABASE.md
@@ -34,7 +34,170 @@ A protein database is required not only for eukaryotic gene calls using MetaEuk
 #### Database Structure:
 
 **Current:**
-*VEBA Database* version: `VDB_v5.1`
+*VEBA Database* version: `VDB_v5.2` (243 GB)
+
+*  Added `MicrobeAnnotator-KEGG` [Zenodo: 10020074](https://zenodo.org/records/10020074) which includes KEGG module pathway information from [`MicrobeAnnotator`](https://doi.org/10.1186/s12859-020-03940-5).
+*  Added `CAZy` protein sequences from [`dbCAN2`](https://academic.oup.com/nar/article/46/W1/W95/4996582)
+
+```
+tree -L 3 .
+.
+├── ACCESS_DATE
+├── Annotate
+│   ├── CAZy
+│   │   └── CAZyDB.07262023.dmnd
+│   ├── KOFAM
+│   │   ├── ko_list
+│   │   └── profiles
+│   ├── MIBiG
+│   │   └── mibig_v3.1.dmnd
+│   ├── MicrobeAnnotator-KEGG
+│   │   ├── KEGG_Bifurcating_Module_Information.pkl
+│   │   ├── KEGG_Bifurcating_Module_Information.pkl.md5
+│   │   ├── KEGG_Module_Information.txt
+│   │   ├── KEGG_Module_Information.txt.md5
+│   │   ├── KEGG_Regular_Module_Information.pkl
+│   │   ├── KEGG_Regular_Module_Information.pkl.md5
+│   │   ├── KEGG_Structural_Module_Information.pkl
+│   │   └── KEGG_Structural_Module_Information.pkl.md5
+│   ├── MicrobeAnnotator-KEGG.tar.gz
+│   ├── NCBIfam-AMRFinder
+│   │   ├── NCBIfam-AMRFinder.changelog.txt
+│   │   ├── NCBIfam-AMRFinder.hmm.gz
+│   │   └── NCBIfam-AMRFinder.tsv
+│   ├── Pfam
+│   │   ├── Pfam-A.hmm.gz
+│   │   └── relnotes.txt
+│   ├── UniRef
+│   │   ├── uniref50.dmnd
+│   │   ├── uniref50.release_note
+│   │   ├── uniref90.dmnd
+│   │   └── uniref90.release_note
+│   └── VFDB
+│       └── VFDB_setA_pro.dmnd
+├── Classify
+│   ├── CheckM2
+│   │   └── uniref100.KO.1.dmnd
+│   ├── CheckV
+│   │   ├── genome_db
+│   │   ├── hmm_db
+│   │   └── README.txt
+│   ├── geNomad
+│   │   ├── genomad_db
+│   │   ├── genomad_db.dbtype
+│   │   ├── genomad_db_h
+│   │   ├── genomad_db_h.dbtype
+│   │   ├── genomad_db_h.index
+│   │   ├── genomad_db.index
+│   │   ├── genomad_db.lookup
+│   │   ├── genomad_db_mapping
+│   │   ├── genomad_db.source
+│   │   ├── genomad_db_taxonomy
+│   │   ├── genomad_integrase_db
+│   │   ├── genomad_integrase_db.dbtype
+│   │   ├── genomad_integrase_db_h
+│   │   ├── genomad_integrase_db_h.dbtype
+│   │   ├── genomad_integrase_db_h.index
+│   │   ├── genomad_integrase_db.index
+│   │   ├── genomad_integrase_db.lookup
+│   │   ├── genomad_integrase_db.source
+│   │   ├── genomad_marker_metadata.tsv
+│   │   ├── genomad_mini_db -> genomad_db
+│   │   ├── genomad_mini_db.dbtype
+│   │   ├── genomad_mini_db_h -> genomad_db_h
+│   │   ├── genomad_mini_db_h.dbtype -> genomad_db_h.dbtype
+│   │   ├── genomad_mini_db_h.index -> genomad_db_h.index
+│   │   ├── genomad_mini_db.index
+│   │   ├── genomad_mini_db.lookup -> genomad_db.lookup
+│   │   ├── genomad_mini_db_mapping -> genomad_db_mapping
+│   │   ├── genomad_mini_db.source -> genomad_db.source
+│   │   ├── genomad_mini_db_taxonomy -> genomad_db_taxonomy
+│   │   ├── mini_set_ids
+│   │   ├── names.dmp
+│   │   ├── nodes.dmp
+│   │   ├── plasmid_hallmark_annotation.txt
+│   │   ├── version.txt
+│   │   └── virus_hallmark_annotation.txt
+│   ├── GTDB
+│   │   ├── fastani
+│   │   ├── markers
+│   │   ├── mash
+│   │   ├── masks
+│   │   ├── metadata
+│   │   ├── mrca_red
+│   │   ├── msa
+│   │   ├── pplacer
+│   │   ├── radii
+│   │   ├── split
+│   │   ├── taxonomy
+│   │   └── temp
+│   ├── Microeukaryotic
+│   │   ├── clean_ftp.sh
+│   │   ├── humann_uniref50_annotations.tsv.gz
+│   │   ├── md5_checksums
+│   │   ├── microeukaryotic
+│   │   ├── microeukaryotic.dbtype
+│   │   ├── microeukaryotic.eukaryota_odb10
+│   │   ├── microeukaryotic.eukaryota_odb10.dbtype
+│   │   ├── microeukaryotic.eukaryota_odb10_h
+│   │   ├── microeukaryotic.eukaryota_odb10_h.dbtype
+│   │   ├── microeukaryotic.eukaryota_odb10_h.index
+│   │   ├── microeukaryotic.eukaryota_odb10.index
+│   │   ├── microeukaryotic.eukaryota_odb10.lookup
+│   │   ├── microeukaryotic.eukaryota_odb10.source
+│   │   ├── microeukaryotic_h
+│   │   ├── microeukaryotic_h.dbtype
+│   │   ├── microeukaryotic_h.index
+│   │   ├── microeukaryotic.index
+│   │   ├── microeukaryotic.lookup
+│   │   ├── microeukaryotic.source
+│   │   ├── reference.eukaryota_odb10.list
+│   │   ├── RELEASE_NOTES
+│   │   ├── source_taxonomy.tsv.gz
+│   │   ├── source_to_lineage.dict.pkl.gz
+│   │   └── target_to_source.dict.pkl.gz
+│   └── NCBITaxonomy
+│       ├── citations.dmp
+│       ├── delnodes.dmp
+│       ├── division.dmp
+│       ├── gc.prt
+│       ├── gencode.dmp
+│       ├── merged.dmp
+│       ├── names.dmp
+│       ├── nodes.dmp
+│       ├── prot.accession2taxid.FULL.gz
+│       └── readme.txt
+├── Contamination
+│   ├── AntiFam
+│   │   ├── AntiFam.hmm.gz
+│   │   ├── relnotes
+│   │   └── version
+│   ├── chm13v2.0
+│   │   ├── chm13v2.0.1.bt2
+│   │   ├── chm13v2.0.2.bt2
+│   │   ├── chm13v2.0.3.bt2
+│   │   ├── chm13v2.0.4.bt2
+│   │   ├── chm13v2.0.rev.1.bt2
+│   │   └── chm13v2.0.rev.2.bt2
+│   └── kmers
+│       └── ribokmers.fa.gz
+└── MarkerSets
+    ├── Archaea_76.hmm.gz
+    ├── Bacteria_71.hmm.gz
+    ├── CPR_43.hmm.gz
+    ├── eukaryota_odb10.hmm.gz
+    ├── eukaryota_odb10.scores_cutoff.tsv.gz
+    ├── Fungi_593.hmm.gz
+    ├── Protista_83.hmm.gz
+    └── README
+
+37 directories, 112 files
+```
+
+**Deprecated:**
+
+<details>
+	<summary> *VEBA Database* version: `VDB_v5.1` </summary>
 
 * `VDB_v5` → `VDB_v5.1` updates `GTDB` database from `r207_v2` → `r214`.  
 * Changes `${VEBA_DATABASE}/Classify/GTDBTk` → `${VEBA_DATABASE}/Classify/GTDB`.
@@ -177,8 +340,7 @@ tree -L 3 .
     ├── Protista_83.hmm.gz
     └── README
 ```
-
-**Deprecated:**
+</details>
 
 <details>
 	<summary> *VEBA Database* version: `VDB_v5` </summary>
@@ -464,7 +626,7 @@ tree -L 3 .
 
 
 <details>
-	<summary>*VEBA Database* version: VDB_v3.1</summary>
+	<summary>*VEBA Database* version: `VDB_v3.1`</summary>
 
 The same as `VDB_v3` but updates `VDB-Microeukaryotic_v2` to `VDB-Microeukaryotic_v2.1` which has a `reference.eukaryota_odb10.list` containing only the subset of identifiers that core eukaryotic markers (useful for classification).
 
@@ -573,7 +735,7 @@ tree -L 3 .
 
 
 <details>
-	<summary>*VEBA Database* version: VDB_v3</summary>
+	<summary>*VEBA Database* version: `VDB_v3`</summary>
 
 ```
 tree -L 3 .
@@ -671,7 +833,7 @@ tree -L 3 .
 
 
 <details>
-	<summary>*VEBA Database* version: VDB_v2</summary>
+	<summary>*VEBA Database* version: `VDB_v2`</summary>
 
 * Compatible with *VEBA* version: `v1.0.2a+`
 
@@ -769,7 +931,7 @@ tree -L 3 .
 
 
 <details>
-	<summary>*VEBA Database* version: VDB_v1</summary>
+	<summary>*VEBA Database* version: `VDB_v1`</summary>
 
 
 * Compatible with *VEBA* version: `v1.0.0`, `v1.0.1`

diff --git a/install/README.md b/install/README.md
@@ -7,7 +7,7 @@ The basis for these environments is creating a separate environment for each mod
 
 The majority of the time taken to build database is downloading/decompressing large archives, `Diamond` database creation of `UniRef`, and `MMSEQS2` database creation of microeukaryotic protein database.
 
-Total size is `214 GB` but if you have certain databases installed already then you can just symlink them so the `VEBA_DATABASE` path has the correct structure.  Note, the exact size may vary as Pfam and UniRef are updated regularly.
+Total size is `243 GB` but if you have certain databases installed already then you can just symlink them so the `VEBA_DATABASE` path has the correct structure.  Note, the exact size may vary as Pfam and UniRef are updated regularly.
 
 Each major version will be packaged as a [release](https://github.com/jolespin/veba/releases) which will include a log of module and script versions. 
 
@@ -83,7 +83,7 @@ The `VEBA` installation is going to configure some `conda` environments for you
 ```
 # For stable version, download and decompress the tarball:
 
-VERSION="1.2.0"
+VERSION="1.3.0"
 wget https://github.com/jolespin/veba/archive/refs/tags/v${VERSION}.tar.gz
 tar -xvf v${VERSION}.tar.gz && mv veba-${VERSION} veba
 
@@ -181,6 +181,7 @@ VEBA-database_env
 VEBA-mapping_env
 VEBA-phylogeny_env
 VEBA-preprocess_env
+VEBA-profile_env
 ```
 All the environments should have the `VEBA_DATABASE` environment variable set. If not, then add it manually to ~/.bash_profile: `export VEBA_DATABASE=/path/to/veba_database`.
 

diff --git a/install/download_databases.sh b/install/download_databases.sh
@@ -1,6 +1,6 @@
 #!/bin/bash
-# __version__ = "2023.6.20"
-# VEBA_DATABASE_VERSION = "VDB_v5.1"
+# __version__ = "2023.10.23"
+# VEBA_DATABASE_VERSION = "VDB_v5.2"
 # MICROEUKAYROTIC_DATABASE_VERSION = "VDB-Microeukaryotic_v2.1"
 
 # Create database
@@ -110,13 +110,18 @@ rm -rf ${DATABASE_DIRECTORY}/MarkerSets.tar.gz
 
 # KOFAMSCAN
 echo ". .. ... ..... ........ ............."
-echo "vii * Processing KOFAMSCAN profile HMM marker sets"
+echo "vii * Processing KEGG profile HMM marker sets"
 echo ". .. ... ..... ........ ............."
 mkdir -v -p ${DATABASE_DIRECTORY}/Annotate/KOFAM
 wget -v -O - ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz | gzip -d > ${DATABASE_DIRECTORY}/Annotate/KOFAM/ko_list
 wget -v -c ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz -O - |  tar -xz
 mv profiles ${DATABASE_DIRECTORY}/Annotate/KOFAM/
 
+wget -v -O ${DATABASE_DIRECTORY}/MicrobeAnnotator-KEGG.tar.gz https://zenodo.org/records/10020074/files/MicrobeAnnotator-KEGG.tar.gz?download=1
+tar xvzf ${DATABASE_DIRECTORY}/MicrobeAnnotator-KEGG.tar.gz -C ${DATABASE_DIRECTORY}/Annotate --no-xattrs
+rm -rf ${DATABASE_DIRECTORY}/Annotate/._MicrobeAnnotator-KEGG
+rm -rf ${DATABASE_DIRECTORY}/MicrobeAnnotator-KEGG.tar.gz
+
 # Pfam
 echo ". .. ... ..... ........ ............."
 echo "viii * Processing Pfam profile HMM marker sets"
@@ -183,6 +188,12 @@ wget -v -P ${DATABASE_DIRECTORY} http://www.mgc.ac.cn/VFs/Down/VFDB_setA_pro.fas
 diamond makedb --in ${DATABASE_DIRECTORY}/VFDB_setA_pro.fas.gz --db ${DATABASE_DIRECTORY}/Annotate/VFDB/VFDB_setA_pro.dmnd
 rm -rf ${DATABASE_DIRECTORY}/VFDB_setA_pro.fas.gz
 
+# CAZy
+mkdir -v -p ${DATABASE_DIRECTORY}/Annotate/CAZy
+wget -v -P ${DATABASE_DIRECTORY} https://bcb.unl.edu/dbCAN2/download/CAZyDB.07262023.fa
+diamond makedb --in ${DATABASE_DIRECTORY}/CAZyDB.07262023.fa --db ${DATABASE_DIRECTORY}/Annotate/CAZy/CAZyDB.07262023.dmnd
+rm -rf ${DATABASE_DIRECTORY}/CAZyDB.07262023.fa
+
 # Contamination
 echo ". .. ... ..... ........ ............."
 echo "xi * Processing contamination databases"

diff --git a/install/uninstall_veba.sh b/install/uninstall_veba.sh
@@ -1,5 +1,5 @@
 #!/bin/bash
-# __version__ = "2023.5.15"
+# __version__ = "2023.10.18"
 
 CONDA_BASE=$(conda run -n base bash -c "echo \${CONDA_PREFIX}")
 
@@ -12,4 +12,4 @@ echo -e " _    _ _______ ______  _______\n  \  /  |______ |_____] |_____|\n   \/
 echo -e "..............................."
 echo -e "     Uninstall Complete     "
 echo -e "..............................."
-echo -e "Don't forget to remove the VEBA database directory."
+echo -e "Don't forget to remove the VEBA database directory if you don't need it anymore.  If you're doing a reinstall, then think twice about this."