AMALGKIT (/əm`ælgkit/) is a toolkit to integrate RNA-seq data from the NCBI SRA database and from private fastq files to generate unbiased cross-species transcript abundance dataset for a large-scale evolutionary gene expression analysis.
# Installation with pip
pip install git+https://github.com/kfuku52/amalgkit
# This should show complete options
amalgkit -h
See Wiki for details.
-
amalgkit metadata
: NCBI SRA metadata retrieval -
amalgkit integrate
: Appending local fastq info to a metadata table -
amalgkit config
: Creating a series of config files for the metadata selection -
amalgkit select
: Selecting SRA entries for analysis -
amalgkit getfastq
: Generating fastq files -
amalgkit quant
: Transcript abundance estimation -
amalgkit merge
: Generating transcript abundance tables -
amalgkit cstmm
: Cross-species TMM normalization using single-copy genes -
amalgkit curate
: Automatic removal of outlier samples and unwanted biases -
amalgkit csca
: Generating plots with cross-species correlation analysis -
amalgkit sanity
: Checking the integrity of AMALGKIT input and output files
Although AMALGKIT supports novel unpublished functions, some functionalities including metadata curation, expression level quantification, and further curation steps have been described in this paper, in which we reported the transcriptome amalgamation of 21 vertebrate species.
Fukushima K*, Pollock DD*. 2020. Amalgamated cross-species transcriptomes reveal organ-specific propensity in gene expression evolution. Nature Communications 11: 4459 (DOI: 10.1038/s41467-020-18090-8) open access
amalgkit is BSD-licensed (3 clause). See LICENSE for details.