The MaizeGDB Python 3 version of James Schnable's RNA-seq processing pipeline and modular web interface.
For an examples of MaizeGDB's qTeller web interfaces visit https://qteller.maizegdb.org/
qTeller data types | Description |
---|---|
Genes in an Interval | Select a chromosome coordinate interval for a given genome to retrieve RNA/protein abundances. |
Genes by Name | Paste a list of gene models of interest to retrieve their RNA/protein abundances. |
Visualize Expression | Visualize RNA/protein abundances for a single gene, or compare abundances for two genes. |
Data types are optimized for single-genome RNA-seq data, single-genome RNA-seq and protein abundance data, or multi-genome RNA-seq data.
Directory Name | Description |
---|---|
build_db | Scripts for constructing the SQLite DB. |
web_interface | Public facing files that are served by the Apache Server. |
qteller_python2.7 | MaizeGDB Python 2.7 instance. |
- No additional libraries required.
- See python3_requirements.txt and python modules for a list of dependencies.
- No untypical customization is needed.
See an example of installing Apache on CentOS 8.
Centos 8 comes with Python 3, which includes PIP.
To install additional libraries:
$ 'pip install -r python3_requirements.txt'
Upon successful installation of Python, PHP, and Apache, you can git clone
this project into your Apache directory. The public-facing directories are located in the web_interface directory. Assuming a default Apache installation, the DocumentRoot in the httpd.conf
would look like this:
DocumentRoot "/var/www/html/qTeller/web_interface"
See Adding new data on final steps for generating the DB.
(1) Drop-down menu changes for chromosome IDs must be manually edited in index_singlegenome.php, index_multigenome.php, and Protein_index.php files. These files must be edited in each to reflect the chromosome IDs of the target genome(s). For instance, maize has ten chromosomes with the nomenclature chr1, chr2, etc; Sorghum also has ten chromosomes, but the chromosome nomenclature is Chr01, Chr02, etc. The index*.php files must be edited to reflect your target genome's chromosome information:
(2) For index_multigenome.php only, the Genome Version drop-down menu must be edited to reflect the genomes from the multi-genome bed file. Note that <option value= for the Genome Version dropdown menu in the php file corresponds to the genome ID listed in Column 5 of the bed file. To see more in-depth examples of file formatting, click here.
The qTeller database generation script requires the following 3 files:
- RNA-seq and/or protein abundance files
- If it doesn't exist already, create the build_db/abundance directory (the abundance directory can be whatever name you want):
$ mkdir build_db/abundance
- Drop your fpkm_tracking files in the build_db/abundance directory. They must end with either the .fpkm_tracking file extension from a Cufflinks output, or if you are submitting RNA-seq or protein abundances with only the gene model ID (column 1) and abundance data (column 2), the file extension should be .txt .
- If it doesn't exist already, create the build_db/abundance directory (the abundance directory can be whatever name you want):
- GFF or bed file
- CSV file
- Create a metadata file in CSV format so the script knows how to interpret the abundance files. Here is an example.
- NOTE: The File_handle column specifies the name of the abundance file to load (minus the .fpkm_tracking or .txt file extension)
Assuming you have the required files, you can create the SQLite DB for RNA and/or protein abundance data using the following command:
$ cd build_db
$ python multigenome_build_qt_db.py <METADATA.CSV> --bed_file <BED.bed> --info_dir ./<ABUNDANCE> --dbname userdb # creates userdb
where <METADATA.CSV> is the CSV file (3), <GFF.gff3> is the GFF file (2), and is the directory where the abundance files are kept (1) as described above. This will create a userdb
SQLite file.
To build the SQLite DB for single-genome data (with no protein abundances) from the included test data, download and uncompress this gff3 file, move to build_db, then run this command:
$ cd build_db
$ python build_qt_db_gene_protein.py test_singlegenome_metadata.csv --gff_file Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.gff3 --info_dir ./test_singlegenome_fpkm --dbname singledb # creates singledb
To build the SQLite DB for single-genome data with both RNA and protein abundances from the included test data, download and uncompress this gff3 file, move to build_db, then run this command:
$ cd build_db
$ python build_qt_db_gene_protein.py test_singlegenome_metadata.csv --gff_file Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.gff3 --info_dir ./test_protein_abundance --dbname proteindb # creates proteindb
To create the SQLite DB for multi-genome data from the included test data, run this command:
$ cd build_db
$ python multigenome_build_qt_db.py test_multigenome_metadata.csv --bed_file test_multigenome_NAM_merged_IDs.bed --info_dir ./test_multigenome_fpkm --dbname multidb # creates multidb
To see more in-depth examples of file formatting, click here.
- Fun fact: you can use the SQLite Viewer to easily look inside the DB and experiment with queries.
Finally, the last step is to move the generated singledb
, proteindb
, or multidb
file into the web_interface directory:
$ mv singledb ../web_interface/
You should now be able to access qTeller through your browser.