Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rsem-calculate-expression #93

Merged
merged 25 commits into from
Sep 18, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
38f586b
initial commit dedup
emmarousseau Apr 11, 2024
271108c
Merge branch 'viash-hub:main' into main
emmarousseau Apr 11, 2024
2c26968
Revert "initial commit dedup"
emmarousseau Apr 11, 2024
5ea8c78
Merge branch 'viash-hub:main' into main
emmarousseau Apr 13, 2024
897cd89
Merge branch 'viash-hub:main' into main
emmarousseau May 5, 2024
ea0383c
Merge branch 'viash-hub:main' into main
emmarousseau May 23, 2024
44b3fcc
Merge branch 'viash-hub:main' into main
emmarousseau Jul 1, 2024
6cc4f94
Merge branch 'viash-hub:main' into main
emmarousseau Jul 2, 2024
c9613d1
Merge branch 'viash-hub:main' into main
emmarousseau Jul 8, 2024
1679c59
Merge branch 'viash-hub:main' into main
emmarousseau Jul 9, 2024
dc275da
three rsem components initial commit
emmarousseau Jul 18, 2024
9745e4a
update container setup
emmarousseau Jul 23, 2024
9f9602a
Simplified container configuration
emmarousseau Jul 24, 2024
84fe94d
temporarily remove version recording from config
emmarousseau Jul 24, 2024
7a8772f
Complete config file
emmarousseau Aug 11, 2024
aef65a1
add tests and complete config file
emmarousseau Aug 12, 2024
2918fb4
change test dataset
emmarousseau Aug 20, 2024
c584b93
functional test, adjustements to scripts
emmarousseau Aug 21, 2024
e055c97
Update changelog
emmarousseau Aug 21, 2024
d23ee7b
Simplified test data and help.txt contents
emmarousseau Aug 25, 2024
bd1075e
suggested changes, typos
emmarousseau Sep 5, 2024
de23085
simplify, get rid of test_data folder
emmarousseau Sep 18, 2024
601518c
Update CHANGELOG.md
rcannood Sep 18, 2024
cf46beb
Merge branch 'main' into rsem
rcannood Sep 18, 2024
41598f0
Update CHANGELOG.md
rcannood Sep 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,002 changes: 1,002 additions & 0 deletions src/rsem/rsem_calculate_expression/help.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be empty

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion src/rsem/rsem_calculate_expression/script.sh
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ rsem-calculate-expression \
${par_counts_gene:+--counts-gene "$par_counts_gene"} \
${par_counts_transcripts:+--counts-transcripts "$par_counts_transcripts"} \
${par_stat:+--stat "$par_stat"} \
${par_bam_star:+--bam-star "\$par_bam_star"} \
${par_bam_star:+--bam-star "$par_bam_star"} \
${par_bam_genome:+--bam-genome "$par_bam_genome"} \
${par_bam_transcript:+--bam-transcript "$par_bam_transcript"} \
${par_fai:+--fai "$par_fai"} \
Expand Down
97 changes: 77 additions & 20 deletions src/rsem/rsem_calculate_expression/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,34 +4,91 @@ echo ">>> Testing $meta_executable"

test_dir="${meta_resources_dir}/test_data"

wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/reference/rsem.tar.gz
gunzip -k rsem.tar.gz
tar -xf rsem.tar
# wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/reference/rsem.tar.gz
# gunzip -k rsem.tar.gz
# tar -xf rsem.tar
# mv $test_dir/rsem $meta_resources_dir

mv $test_dir/rsem $meta_resources_dir
echo "> Prepare test data"

cat > reads_R1.fastq <<'EOF'
@SEQ_ID1
ACGCTGCCTCATAAGCCTCACACAT
+
IIIIIIIIIIIIIIIIIIIIIIIII
@SEQ_ID2
ACCCGCAAGATTAGGCTCCGTACAC
+
!!!!!!!!!!!!!!!!!!!!!!!!!
EOF

cat > reads_R2.fastq <<'EOF'
@SEQ_ID1
ATGTGTGAGGCTTATGAGGCAGCGT
+
IIIIIIIIIIIIIIIIIIIIIIIII
@SEQ_ID2
GTGTACGGAGCCTAATCTTGCAGGG
+
!!!!!!!!!!!!!!!!!!!!!!!!!
EOF

cat > genome.fasta <<'EOF'
>chr1
TGGCATGAGCCAACGAACGCTGCCTCATAAGCCTCACACATCCGCGCCTATGTTGTGACTCTCTGTGAGCGTTCGTGGG
GCTCGTCACCACTATGGTTGGCCGGTTAGTAGTGTGACTCCTGGTTTTCTGGAGCTTCTTTAAACCGTAGTCCAGTCAA
TGCGAATGGCACTTCACGACGGACTGTCCTTAGGTGTGAGGCTTATGAGGCACTCAGGGGA
EOF

cat > genes.gtf <<'EOF'
chr1 example_source gene 0 50 . + . gene_id "gene1"; transcript_id "transcript1";
chr1 example_source exon 20 40 . + . gene_id "gene1"; transcript_id "transcript1";
chr1 example_source gene 100 219 . + . gene_id "gene2"; transcript_id "transcript2";
chr1 example_source exon 191 210 . + . gene_id "gene2"; transcript_id "transcript2";
EOF


echo "> Generate index"

rsem-prepare-reference \
--gtf "genes.gtf" \
"genome.fasta" \
"index"\
emmarousseau marked this conversation as resolved.
Show resolved Hide resolved

mkdir index
mv index.* index/

STAR \
${meta_cpus:+--runThreadN $meta_cpus} \
--runMode genomeGenerate \
--genomeDir "index/" \
--genomeFastaFiles "genome.fasta" \
--sjdbGTFfile "genes.gtf" \
--genomeSAindexNbases 2

#########################################################################################

echo ">>> Test 1: Paired-end reads using STAR to align reads"
"$meta_executable" \
--star \
--star_gzipped_read_file \
--paired \
--input "$test_dir/SRR6357070_1.fastq.gz;$test_dir/SRR6357070_2.fastq.gz" \
--index rsem \
--id WT_REP1 \
--seed 1 \
--quiet
--star \
--paired \
--input "reads_R1.fastq;reads_R2.fastq" \
--index index \
--id test \
--seed 1 \
--quiet

echo ">>> Checking whether output exists"
[ ! -f "WT_REP1.genes.results" ] && echo "Gene level expression counts file does not exist!" && exit 1
[ ! -s "WT_REP1.genes.results" ] && echo "Gene level expression counts file is empty!" && exit 1
[ ! -f "WT_REP1.isoforms.results" ] && echo "Transcript level expression counts file does not exist!" && exit 1
[ ! -s "WT_REP1.isoforms.results" ] && echo "Transcript level expression counts file is empty!" && exit 1
[ ! -d "WT_REP1.stat" ] && echo "Stats file does not exist!" && exit 1
[ ! -f "test.genes.results" ] && echo "Gene level expression counts file does not exist!" && exit 1
[ ! -s "test.genes.results" ] && echo "Gene level expression counts file is empty!" && exit 1
[ ! -f "test.isoforms.results" ] && echo "Transcript level expression counts file does not exist!" && exit 1
[ ! -s "test.isoforms.results" ] && echo "Transcript level expression counts file is empty!" && exit 1
[ ! -d "test.stat" ] && echo "Stats file does not exist!" && exit 1

echo ">>> Check wheter output is correct"
diff $test_dir/ref.genes.results WT_REP1.genes.results || { echo "Gene level expression counts file is incorrect!"; exit 1; }
diff $test_dir/ref.isoforms.results WT_REP1.isoforms.results || { echo "Transcript level expression counts file is incorrect!"; exit 1; }
diff $test_dir/ref.cnt WT_REP1.stat/WT_REP1.cnt || { echo "Stats file is incorrect!"; exit 1; }
diff $test_dir/output/ref.genes.results test.genes.results || { echo "Gene level expression counts file is incorrect!"; exit 1; }
diff $test_dir/output/ref.isoforms.results test.isoforms.results || { echo "Transcript level expression counts file is incorrect!"; exit 1; }
diff $test_dir/output/ref.cnt test.stat/test.cnt || { echo "Stats file is incorrect!"; exit 1; }

#####################################################################################################

Expand Down
Binary file not shown.
Binary file not shown.
5 changes: 5 additions & 0 deletions src/rsem/rsem_calculate_expression/test_data/output/ref.cnt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the fastq files are inside test.sh, perhaps store the expected output in test.sh as well? ☺️

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep makes sense haha

Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
1 0 0 1
0 0 0
0 3
0 1
Inf 0
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
gene_id transcript_id(s) length effective_length expected_count TPM FPKM
gene1 transcript1 21.00 21.00 0.00 0.00 0.00
gene2 transcript2 20.00 20.00 0.00 0.00 0.00
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct
transcript1 gene1 21 21.00 0.00 0.00 0.00 0.00
transcript2 gene2 20 20.00 0.00 0.00 0.00 0.00
7 changes: 0 additions & 7 deletions src/rsem/rsem_calculate_expression/test_data/ref.cnt

This file was deleted.

126 changes: 0 additions & 126 deletions src/rsem/rsem_calculate_expression/test_data/ref.genes.results

This file was deleted.

126 changes: 0 additions & 126 deletions src/rsem/rsem_calculate_expression/test_data/ref.isoforms.results

This file was deleted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this is indeed too big 😅

In other components, I just store the fasta and gtf in the test data and build the index inside the unit test. Examples:

This way, we don't need to store a large indexed reference. Could you get this to work here as well?

Binary file not shown.