You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
we tested Graphmap to align RNA-seq reads to the reference genome. For benchmark purposes, we tried to align the reference transcripts in fasta format on the reference genome, both from Ensembl.
I tried to install and run Graphmap as described in the docs for spliced alignments. I compiled the rna-alpha branch, as suggested, and run this command: /home/aechchik/graphmap/bin/graphmap-not_release align -x rnaseq -t 16 -r reference-genome.fa -d reference-transcripts.fasta -o graphmap_ref.sam
The main problems is that the local alignments for the transcripts sub-features are not reconciled: if a transcript has two exons, then both are reported but their local alignments are not merged to a spliced alignment, reported separately as multiple alignments instead. For example - transcript FBtr0344900 has two annotated exons:
In the alignment file by graphmap, it figures twice as local alignment - once as mapping on the forward strand, once as secondary mapping on the forward strand ($2):
Am I doing something wrong or simply this feature is not implemented yet?
Another strange behaviour as seen in the output alignment file is that some of the reads ID appear once in the alignment file, but there is no other alignment of the same read ID reporting the primary alignment flag. For example: cat graphmap_ref.sam | grep -v '^@' | awk '$2==256 {print $0}' | head -1 | cut -f1 outputs: FBtr0300689. If we grep this ID out of the mapping file, then we are returned one hit only:
Thank you for testing and reporting back!
You are right about exons being reported as separate SAM alignments. This was an intentional approach for the proof of concept. However, we are working on implementing a solution which would produce alignments similar to STAR, but it might take some time to finalize this.
Regarding the missing primary alignments - that should definitely not happen. Could you by any chance share a sample of your data which reproduces this issue, so I can take a look?
I ran the following command: /home/aechchik/graphmap/bin/graphmap-not_release align -x rnaseq -t 16 -r reference-genome.fa -d reference-transcripts.fasta -o graphmap_ref.sam where:
reference-genome.fa is a concatenation of chromosomal genome files at: ftp://ftp.ensembl.org/pub/release-89/fasta/drosophila_melanogaster/dna/ : cat Drosophila_melanogaster.BDGP6.dna.chromosome.*.fa.gz
reference-transcripts.fasta is an extraction of the fasta sequences from the reference annotation at: ftp://ftp.ensembl.org/pub/release-89/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.89.gtf.gz using the gffread utility, using the command: gffread -w reference_transcripts.fasta -g reference_genome.fa reference_transcripts.gtf
You should be able to reproduce exactly what I got.
(I edited the comment, for some reason the links to the Ensembl ftp were not displayed )
Hi Ivan,
we tested Graphmap to align RNA-seq reads to the reference genome. For benchmark purposes, we tried to align the reference transcripts in fasta format on the reference genome, both from Ensembl.
I tried to install and run Graphmap as described in the docs for spliced alignments. I compiled the
rna-alpha
branch, as suggested, and run this command:/home/aechchik/graphmap/bin/graphmap-not_release align -x rnaseq -t 16 -r reference-genome.fa -d reference-transcripts.fasta -o graphmap_ref.sam
The main problems is that the local alignments for the transcripts sub-features are not reconciled: if a transcript has two exons, then both are reported but their local alignments are not merged to a spliced alignment, reported separately as multiple alignments instead. For example - transcript
FBtr0344900
has two annotated exons:In the alignment file by graphmap, it figures twice as local alignment - once as mapping on the forward strand, once as secondary mapping on the forward strand ($2):
What we would rather see instead would be something like the output of STAR:
Am I doing something wrong or simply this feature is not implemented yet?
Another strange behaviour as seen in the output alignment file is that some of the reads ID appear once in the alignment file, but there is no other alignment of the same read ID reporting the primary alignment flag. For example:
cat graphmap_ref.sam | grep -v '^@' | awk '$2==256 {print $0}' | head -1 | cut -f1
outputs:FBtr0300689
. If we grep this ID out of the mapping file, then we are returned one hit only:In my opinion, this read ID should have alignment flag equal to 0 instead.
I compiled the branch alpha on CentOS 6.9:
And
graphmap-not_release
is the only executable in the execution folder:Cheers,
Amina
The text was updated successfully, but these errors were encountered: