This is a tool to create gene-level summaries of transcript expression estimates. You provide the tool with an annotation file (in GTF/GFF format) and a set of transcript-level estimates, and it aggregates these expression estimates to the gene-level.
Building genesum requires CMake and a C++11-compatible compiler. The build process is fairly simple. Checkout the repository or download the source tarball and decompress it. In the top-level directory, create a sub-directory to perform the build e.g.:
[path/to/genesum]$ mkdir build && cd build
then invoke cmake and make:
[path/to/genesum/build]$ cmake .. && make && make install
The "install" command installs genesum locally to a /bin
directory under the
top-level directory, so you won't need admin privileges to do this. Finally,
create some data and you can test it out. You can check the usage with the
-h
flag.
[path/to/genesum/build]$ cd ..
[path/to/genesum/]$ bin/genesum -h
A usage example is given below.
Say you have a file annotations.gtf
and a set of expression estimates
expressions.sf
(e.g. generated by
Sailfish). This tool can be
invoked as such:
$ genesum -e expressions.sf -g annotations.gtf -o expressions_genes.sf
This will produce a file, expressions_genes.sf
where the expression estimates
from expressions.sf
have been aggregated to the gene level according to the
transcript-to-gene mapping encoded by annotations.gtf
. For simplicity, the
length assigned to each gene in the output file is simply the length of the
longest transcript present in the input file that mapped to that gene. By
default, transcripts are grouped together based on the gene_name
field of the
gtf
file. However, the -k
argument supports grouping transcripts based on
other fields like gene_id
or locus_id
.