Skip to content

Commit

Permalink
Add Kallisto index (#149)
Browse files Browse the repository at this point in the history
  • Loading branch information
emmarousseau authored Sep 13, 2024
1 parent 3f6a1b5 commit 80aaf33
Show file tree
Hide file tree
Showing 8 changed files with 217 additions and 1 deletion.
6 changes: 5 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,11 @@
* `sortmerna`: Local sequence alignment tool for mapping, clustering, and filtering rRNA from metatranscriptomic
data. (PR #146)

* `fq_subsample`: Sample a subset of records from single or paired FASTQ files (PR #147).
* `fq_subsample`: Sample a subset of records from single or paired FASTQ files (PR #147).

* `kallisto`:
- `kallisto_index`: Create a kallisto index (PR #149).


## MINOR CHANGES

Expand Down
Binary file added src/kallisto/kallisto_index/Kallisto
Binary file not shown.
94 changes: 94 additions & 0 deletions src/kallisto/kallisto_index/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
name: kallisto_index
namespace: kallisto
description: |
Build a Kallisto index for the transcriptome to use Kallisto in the mapping-based mode.
keywords: [kallisto, index]
links:
homepage: https://pachterlab.github.io/kallisto/about
documentation: https://pachterlab.github.io/kallisto/manual
repository: https://github.com/pachterlab/kallisto
issue_tracker: https://github.com/pachterlab/kallisto/issues
references:
doi: https://doi.org/10.1038/nbt.3519
license: BSD 2-Clause License

argument_groups:
- name: "Input"
arguments:
- name: "--input"
type: file
description: |
Path to a FASTA-file containing the transcriptome sequences, either in plain text or
compressed (.gz) format.
required: true
- name: "--d_list"
type: file
description: |
Path to a FASTA-file containing sequences to mask from quantification.
- name: "Output"
arguments:
- name: "--index"
type: file
direction: output
example: Kallisto_index

- name: "Options"
arguments:
- name: "--kmer_size"
type: integer
description: |
Kmer length passed to indexing step of pseudoaligners (default: '31').
example: 31
- name: "--make_unique"
type: boolean_true
description: |
Replace repeated target names with unique names.
- name: "--aa"
type: boolean_true
description: |
Generate index from a FASTA-file containing amino acid sequences.
- name: "--distiguish"
type: boolean_true
description: |
Generate index where sequences are distinguished by the sequence names.
- name: "--min_size"
alternatives: ["-m"]
type: integer
description: |
Length of minimizers (default: automatically chosen).
- name: "--ec_max_size"
alternatives: ["-e"]
type: integer
description: |
Maximum number of targets in an equivalence class (default: no maximum).
- name: "--tmp"
alternatives: ["-T"]
type: string
description: |
Path to a directory for temporary files.
example: "tmp"

resources:
- type: bash_script
path: script.sh

test_resources:
- type: bash_script
path: test.sh
- path: test_data

engines:
- type: docker
image: ubuntu:22.04
setup:
- type: docker
run: |
apt-get update && \
apt-get install -y --no-install-recommends wget && \
wget --no-check-certificate https://github.com/pachterlab/kallisto/releases/download/v0.50.1/kallisto_linux-v0.50.1.tar.gz && \
tar -xzf kallisto_linux-v0.50.1.tar.gz && \
mv kallisto/kallisto /usr/local/bin/
runners:
- type: executable
- type: nextflow
21 changes: 21 additions & 0 deletions src/kallisto/kallisto_index/help.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
```
kallisto index
```
kallisto 0.50.1
Builds a kallisto index

Usage: kallisto index [arguments] FASTA-files

Required argument:
-i, --index=STRING Filename for the kallisto index to be constructed

Optional argument:
-k, --kmer-size=INT k-mer (odd) length (default: 31, max value: 31)
-t, --threads=INT Number of threads to use (default: 1)
-d, --d-list=STRING Path to a FASTA-file containing sequences to mask from quantification
--make-unique Replace repeated target names with unique names
--aa Generate index from a FASTA-file containing amino acid sequences
--distinguish Generate index where sequences are distinguished by the sequence name
-T, --tmp=STRING Temporary directory (default: tmp)
-m, --min-size=INT Length of minimizers (default: automatically chosen)
-e, --ec-max-size=INT Maximum number of targets in an equivalence class (default: no maximum)
34 changes: 34 additions & 0 deletions src/kallisto/kallisto_index/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#!/bin/bash

## VIASH START
## VIASH END

set -eo pipefail

unset_if_false=( par_make_unique par_aa par_distinguish )

for var in "${unset_if_false[@]}"; do
temp_var="${!var}"
[[ "$temp_var" == "false" ]] && unset $var
done

if [ -n "$par_kmer_size" ]; then
if [[ "$par_kmer_size" -lt 1 || "$par_kmer_size" -gt 31 || $(( par_kmer_size % 2 )) -eq 0 ]]; then
echo "Error: Kmer size must be an odd number between 1 and 31."
exit 1
fi
fi

kallisto index \
-i "${par_index}" \
${par_kmer_size:+--kmer-size "${par_kmer_size}"} \
${par_make_unique:+--make-unique} \
${par_aa:+--aa} \
${par_distinguish:+--distinguish} \
${par_min_size:+--min-size "${par_min_size}"} \
${par_ec_max_size:+--ec-max-size "${par_ec_max_size}"} \
${par_d_list:+--d-list "${par_d_list}"} \
${meta_cpus:+--cpu "${meta_cpus}"} \
${par_tmp:+--tmp "${par_tmp}"} \
"${par_input}"

35 changes: 35 additions & 0 deletions src/kallisto/kallisto_index/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/bin/bash

echo ">>>Test 1: Testing $meta_functionality_name with non-default k-mer size"

"$meta_executable" \
--input "$meta_resources_dir/test_data/transcriptome.fasta" \
--index Kallisto \
--kmer_size 21


echo ">>> Checking whether output exists and is correct"
[ ! -f "Kallisto" ] && echo "Kallisto index does not exist!" && exit 1
[ ! -s "Kallisto" ] && echo "Kallisto index is empty!" && exit 1

kallisto inspect Kallisto 2> test.txt
grep "number of k-mers: 989" test.txt || { echo "The content of the index seems to be incorrect." && exit 1; }

################################################################################

echo ">>>Test 2: Testing $meta_functionality_name with d_list argument"

"$meta_executable" \
--input "$meta_resources_dir/test_data/transcriptome.fasta" \
--index Kallisto \
--d_list "$meta_resources_dir/test_data/d_list.fasta"

echo ">>> Checking whether output exists and is correct"
[ ! -f "Kallisto" ] && echo "Kallisto index does not exist!" && exit 1
[ ! -s "Kallisto" ] && echo "Kallisto index is empty!" && exit 1

kallisto inspect Kallisto 2> test.txt
grep "number of k-mers: 959" test.txt || { echo "The content of the index seems to be incorrect." && exit 1; }

echo "All tests succeeded!"
exit 0
5 changes: 5 additions & 0 deletions src/kallisto/kallisto_index/test_data/d_list.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
>YAL067W-A CDS=1-228
ATGCCAATTATAGGGGTGCCGAGGTGCCTTATAAAACCCTTTTCTGTGCCTGTGACATTTCCTTTTTCGG
TCAAAAAGAATATCCGAATTTTAGATTTGGACCCTCGTACAGAAGCTTATTGTCTAAGCCTGAATTCAGT
CTGCTTTAAACGGCTTCCGCGGAGGAAATATTTCCATCTCTTGAATTCGTACAACATTAAACGTGTGTTG
GGAGTCGTATACTGTTAG
23 changes: 23 additions & 0 deletions src/kallisto/kallisto_index/test_data/transcriptome.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
>YAL069W CDS=1-315
ATGATCGTAAATAACACACACGTGCTTACCCTACCACTTTATACCACCACCACATGCCATACTCACCCTC
ACTTGTATACTGATTTTACGTACGCACACGGATGCTACAGTATATACCATCTCAAACTTACCCTACTCTC
AGATTCCACTTCACTCCATGGCCCATCTCTCACTGAATCAGTACCAAATGCACTCACATCATTATGCACG
GCACTTGCCTCAGCGGTCTATACCCTGTGCCATTTACCCATAACGCCCATCATTATCCACATTTTGATAT
CTATATCTCATTCGGCGGTCCCAAATATTGTATAA
>YAL068W-A CDS=1-255
ATGCACGGCACTTGCCTCAGCGGTCTATACCCTGTGCCATTTACCCATAACGCCCATCATTATCCACATT
TTGATATCTATATCTCATTCGGCGGTCCCAAATATTGTATAACTGCCCTTAATACATACGTTATACCACT
TTTGCACCATATACTTACCACTCCATTTATATACACTTATGTCAATATTACAGAAAAATCCCCACAAAAA
TCACCTAAACATAAAAATATTCTACTTTTCAACAATAATACATAA
>YAL068C CDS=1-363
ATGGTCAAATTAACTTCAATCGCCGCTGGTGTCGCTGCCATCGCTGCTACTGCTTCTGCAACCACCACTC
TAGCTCAATCTGACGAAAGAGTCAACTTGGTGGAATTGGGTGTCTACGTCTCTGATATCAGAGCTCACTT
AGCCCAATACTACATGTTCCAAGCCGCCCACCCAACTGAAACCTACCCAGTCGAAGTTGCTGAAGCCGTT
TTCAACTACGGTGACTTCACCACCATGTTGACCGGTATTGCTCCAGACCAAGTGACCAGAATGATCACCG
GTGTTCCATGGTACTCCAGCAGATTAAAGCCAGCCATCTCCAGTGCTCTATCCAAGGACGGTATCTACAC
TATCGCAAACTAG
>YAL067W-A CDS=1-228
ATGCCAATTATAGGGGTGCCGAGGTGCCTTATAAAACCCTTTTCTGTGCCTGTGACATTTCCTTTTTCGG
TCAAAAAGAATATCCGAATTTTAGATTTGGACCCTCGTACAGAAGCTTATTGTCTAAGCCTGAATTCAGT
CTGCTTTAAACGGCTTCCGCGGAGGAAATATTTCCATCTCTTGAATTCGTACAACATTAAACGTGTGTTG
GGAGTCGTATACTGTTAG

0 comments on commit 80aaf33

Please sign in to comment.