-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #18 from aryakaul/snakemake
chore: flesh out readme
- Loading branch information
Showing
1 changed file
with
239 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,239 @@ | ||
# call me mobile | ||
here's my heuristic, call (genetic elements) mobile | ||
# callmemobile – Annotation of putatively mobile bacterial genomic regions | ||
|
||
|
||
Workflow for identification of putatively mobile genetic regions from genome | ||
assemblies and bed files. | ||
callmemobile first runs various prediction tools to classify mobile features | ||
of the input genomes and then compiles these results alongside user options | ||
and a set of genetic regions to identify which subset of these regions are | ||
predicted to be mobile (maybe). | ||
|
||
If you use this in your own work, please cite the papers whose tools the workflow uses: | ||
workflow uses: | ||
- | ||
- | ||
- | ||
|
||
As well as, the paper for which the workflow was written for: [link preprint] | ||
|
||
[![GitHub release](https://img.shields.io/github/release/baymlab/callmemobile.svg)](https://github.com/baymlab/callmemobile/releases/) | ||
[![callmemobile test](https://github.com/baymlab/callmemobile/actions/workflows/main.yaml/badge.svg)](https://github.com/baymlab/callmemobile/actions/) | ||
|
||
|
||
<h2>Contents</h2> | ||
|
||
<!-- vim-markdown-toc GFM --> | ||
|
||
* [1. Introduction](#1-introduction) | ||
* [2. Dependencies](#2-dependencies) | ||
* [2a. Essential dependencies](#2a-essential-dependencies) | ||
* [2b. Protocol-specific dependencies](#2b-protocol-specific-dependencies) | ||
* [3. Installation](#3-installation) | ||
* [4. Usage](#4-usage) | ||
* [4a. Basic example](#4a-basic-example) | ||
* [4b. Adjusting configuration](#4b-adjusting-configuration) | ||
* [4c. List of implemented protocols](#4c-list-of-implemented-protocols) | ||
* [4d. List of workflow commands](#4d-list-of-workflow-commands) | ||
* [4e. Running on a cluster](#4e-running-on-a-cluster) | ||
* [4f. Troubleshooting](#4f-troubleshooting) | ||
* [5. Citation](#5-citation) | ||
* [6. Issues](#6-issues) | ||
* [7. Changelog](#7-changelog) | ||
* [8. License](#8-license) | ||
* [9. Contacts](#9-contacts) | ||
|
||
<!-- vim-markdown-toc --> | ||
|
||
|
||
## 1. Introduction | ||
|
||
The user provides 2 files in the `input/` directory. The first is a file of | ||
files containing each genomic assembly to be analyzed and their respective | ||
paths. This file ends with a `.txt`. The second is a file of files containing | ||
the bed files of genomic regions to be queried. This file ends with a `.beds`. | ||
They shoudl share the same basename (i.e. `nctc3k.txt` and `nctc3k.beds`) and | ||
should have the same number of lines. Order of the inputs is how we match the | ||
bed file to the genomic assembly, so please ensure they are matched correctly. | ||
|
||
|
||
## 2. Dependencies | ||
|
||
|
||
* [Conda](https://docs.conda.io/en/latest/miniconda.html) (unless the use of Conda is switched off in the configuration) and ideally also [Mamba](https://mamba.readthedocs.io/) (>= 0.20.0) | ||
* [GNU Make](https://www.gnu.org/software/make/) | ||
* [Python](https://www.python.org/) (>=3.7) | ||
* [Snakemake](https://snakemake.github.io) (>=8.0.0) | ||
|
||
and can be installed by Conda by | ||
```bash | ||
conda install -c conda-forge -c bioconda -c defaults \ | ||
make "python>=3.7" "snakemake-minimal>=8.0.0" "mamba>=0.20.0" | ||
``` | ||
|
||
|
||
The rest of the dependencies are installed automatically by Snakemake | ||
when they are requested. The specifications of individual environments | ||
can be found in [`workflow/envs/`](workflow/envs/), | ||
and they contain: | ||
- bedops | ||
- phigaro | ||
- Biopython | ||
- blast | ||
- cgecore | ||
- csvtk | ||
- seqkit | ||
- plasmidfinder | ||
- MOB-suite | ||
- mobileelementfinder | ||
- pandas | ||
|
||
All dependencies across all rules can also be | ||
installed at once by `make conda`. | ||
|
||
|
||
## 3. Installation | ||
|
||
Clone and enter the repository by | ||
|
||
```bash | ||
git clone https://github.com/baymlab/callmemobile | ||
cd callmemobile | ||
``` | ||
|
||
Alternatively, the repository can also be installed using cURL by | ||
```bash | ||
mkdir callmemobile | ||
cd callmemobile | ||
curl -L https://github.com/baymlab/callmemobile/tarball/main \ | ||
| tar xvf - --strip-components=1 | ||
``` | ||
|
||
|
||
## 4. Usage | ||
|
||
### 4a. Basic example | ||
|
||
* ***Step 1: Provide lists of input files.*** \ | ||
For every input, create a txt list of input assemblies in the `input/` | ||
directory (i.e., as `input/{batch_name}.txt`. Use either absolute paths (recommended), | ||
or paths relative to the root of the Github repository (not relative to the txt files). | ||
|
||
Such a list can be generated, for instance, by `find` by | ||
```bash | ||
find ~/dir_with_my_genomes -name '*.fa' > input/my_first_batch.txt | ||
``` | ||
The supported input files should be in FASTA format. | ||
|
||
Next, create another txt list of input beds in the `input/` directory. | ||
```bash | ||
find ~/dir_with_my_beds -name '*.bed' > input/my_first_batch.beds | ||
``` | ||
|
||
These should follow the format of typical bed files: | ||
``` | ||
➜ head NCTC10036-assembly_abr-search.abr.bed | ||
ENA|LR134493|LR134493.1_388 406996 410142 adeF | ||
ENA|LR134493|LR134493.1_694 718039 718224 rsmA | ||
ENA|LR134493|LR134493.1_1759 1957864 1958496 CRP | ||
``` | ||
The first column corresponds to the contig, the second column is the position | ||
start coordinate and the third column is the end coordinate. The fourth column | ||
can be any string description of the region (in this case, the gene name). | ||
|
||
* ***Step 2 (optional): Adjust configuration.*** \ | ||
By editing [`config.yaml`](config.yaml) it is possible to specify | ||
parameters of the run, both in terms of options for | ||
individual tools and the heuristics employed to deem | ||
a region as potentially mobile. | ||
|
||
* ***Step 3: Run the pipeline.*** \ | ||
Run the pipeline by running `make all`; this is run | ||
Snakemake with the corresponding parameters. | ||
|
||
* ***Step 4: Retrieve the output files.*** \ | ||
All output files will be located in `output/`. | ||
|
||
|
||
### 4b. Adjusting configuration | ||
|
||
The workflow can be configured via the [`config.yaml`](./config.yaml) file, and | ||
all options are documented directly there. The configurable functionality includes: | ||
|
||
- prophage_maxdist: base pair distance from a predicted prophage to classify it as potentially mobile | ||
- integron_pctolap: percent overlap of given region with an integron to classify it as potentially mobile | ||
- mobileelement_maxdist: distance between two mobile elements of the same type to classify it as potentially mobile | ||
- plasmidfinder_mincov: minimum coverage for plasmidfinder to classify a contig as a possible plasmid | ||
- plasmidfinder_threshold: minimum threshold for plasmidfinder to classify a contig as a possible plasmid | ||
|
||
### 4c. List of workflow commands | ||
|
||
callmemobile is executed via [GNU Make](https://www.gnu.org/software/make/), which handles all parameters and passes them to Snakemake. | ||
Here's a list of all implemented commands (to be executed as `make {command}`): | ||
|
||
|
||
```yaml | ||
###################### | ||
## General commands ## | ||
###################### | ||
all Run everything (the default subcommand) | ||
help Print help messages | ||
conda Create the conda environments | ||
clean Clean all output archives and files with statistics | ||
cleanall Clean everything but Conda, Snakemake, and input files | ||
cleanallall Clean completely everything | ||
############### | ||
## Reporting ## | ||
############### | ||
viewconf View configuration without comments | ||
reports Create html report | ||
#################### | ||
## For developers ## | ||
#################### | ||
test Run the workflow on test data (P1) | ||
bigtest Run the workflow on test data (P1, P2, P3) | ||
format Reformat all source code | ||
checkformat Check source code format | ||
``` | ||
|
||
*Note:* `make format` and `make checkformat` require | ||
[YAPF](https://github.com/google/yapf) and | ||
[Snakefmt](https://github.com/snakemake/snakefmt), which can be installed by | ||
`conda install -c conda-forge -bioconda yapf snakefmt`. | ||
|
||
|
||
### 4d. Troubleshooting | ||
|
||
Tests can be run by `make test`. | ||
|
||
|
||
## 5. Citation | ||
|
||
[todo] | ||
|
||
```bibtex | ||
todo | ||
``` | ||
|
||
|
||
## 6. Issues | ||
|
||
Please use [Github issues](https://github.com/karel-brinda/miniphy/issues). | ||
|
||
|
||
|
||
## 7. Changelog | ||
|
||
See [Releases](https://github.com/karel-brinda/miniphy/releases). | ||
|
||
|
||
|
||
## 8. License | ||
|
||
[MIT](https://github.com/karel-brinda/miniphy/blob/master/LICENSE) | ||
|
||
|
||
|
||
## 9. Contacts | ||
|
||
* [Arya Kaul](https://arya.casa) \<[email protected]\> | ||
|