name | -description | -naming_convention | -file format | -example | -
---|---|---|---|---|
.fastq | -raw sequencing reads | -nan | -nan | -sampleID_run_read1.fastq | -
.fastqc | -quality control from fastqc | -nan | -nan | -sampleID_run_read1.fastqc | -
.bam | -aligned reads | -nan | -nan | -sampleID_run_read1.bam | -
GTF | -sequence annotation | -nan | -nan | -one of https://www.gencodegenes.org/ | -
GFF | -sequence annotation | -nan | -nan | -one of https://www.gencodegenes.org/ | -
.bed | -genome locations | -nan | -nan | -nan | -
.bigwig | -genome coverage | -nan | -nan | -nan | -
.fasta | -sequence data (nucleotide/aminoacid) | -nan | -nan | -one of https://www.gencodegenes.org/ | -
Multiqc report | -QC aggregated report | -<assayID\>_YYYYMMDD.multiqc | -multiqc | -RNA_20200101.multiqc | -
Count matrix | -final count matrix | -<assayID\>_cm_aligner_YYYYMMDD.tsv | -tsv | -RNA_cm_salmon_20200101.tsv | -
DEA | -differential expression analysis results | -DEA_<condition1-condition2\>_LFC<absolute_threshold\>_p<pvalue decimals\>_YYYYMMDD.tsv | -tsv | -DEA_treat-untreat_LFC1_p01_20200101.tsv | -
DBA | -differential binding analysis results | -DBA_<condition1-condition2\>_LFC<absolute_threshold\>_p<pvalue decimals\>_YYYYMMDD.tsv | -tsv | -DBA_treat-untreat_LFC1_p01_20200101.tsv | -
MAplot | -MA plot | -MAplot_<condition1-condition2\>_YYYYMMDD.jpeg | -jpeg | -MAplot_treat-untreat_20200101.jpeg | -
Heatmap plot | -Heatmap plot of anything | -heatmap_<type\>_YYYYMMDD.jpeg | -jpeg | -heatmap_sampleCor_20200101.jpeg | -
Volcano plot | -Volcano plot | -volcano_<condition1-condition2\>_YYYYMMDD.jpeg | -jpeg | -volcano_treat-untreat_20200101.jpeg | -
Venn diagram | -Venn diagram | -venn_<type\>_YYYYMMDD.jpeg | -jpeg | -venn_consensus_20200101.jpeg | -
Enrichment table | -Enrichment results | -nan | -tsv | -nan | -
NGS data strategies
NGS data strategies
NGS data strategies
Effective RDM Practices in NGS Analysis
Effective RDM Practices in NGS Analysis
In the data life cycle for Next Generation Sequencing (NGS) technology data, processing, and analyzing are critical phases that involve transforming raw sequencing data into meaningful biological insights. Researchers apply computational methods and bioinformatics tools to extract valuable information from the vast amount of sequencing data generated in NGS experiments. We’ll first explore the primary data types generated pre- and post-processing and the importance of detailed documentation. We will then focus on good practices used when performing data analysis and software development.
+File naming convention examples
+name | +description | +naming_convention | +file format | +example | +
---|---|---|---|---|
.fastq | +raw sequencing reads | +nan | +nan | +sampleID_run_read1.fastq | +
.fastqc | +quality control from fastqc | +nan | +nan | +sampleID_run_read1.fastqc | +
.bam | +aligned reads | +nan | +nan | +sampleID_run_read1.bam | +
GTF | +sequence annotation | +nan | +nan | +one of https://www.gencodegenes.org/ | +
GFF | +sequence annotation | +nan | +nan | +one of https://www.gencodegenes.org/ | +
.bed | +genome locations | +nan | +nan | +nan | +
.bigwig | +genome coverage | +nan | +nan | +nan | +
.fasta | +sequence data (nucleotide/aminoacid) | +nan | +nan | +one of https://www.gencodegenes.org/ | +
Multiqc report | +QC aggregated report | +<assayID\>_YYYYMMDD.multiqc | +multiqc | +RNA_20200101.multiqc | +
Count matrix | +final count matrix | +<assayID\>_cm_aligner_YYYYMMDD.tsv | +tsv | +RNA_cm_salmon_20200101.tsv | +
DEA | +differential expression analysis results | +DEA_<condition1-condition2\>_LFC<absolute_threshold\>_p<pvalue decimals\>_YYYYMMDD.tsv | +tsv | +DEA_treat-untreat_LFC1_p01_20200101.tsv | +
DBA | +differential binding analysis results | +DBA_<condition1-condition2\>_LFC<absolute_threshold\>_p<pvalue decimals\>_YYYYMMDD.tsv | +tsv | +DBA_treat-untreat_LFC1_p01_20200101.tsv | +
MAplot | +MA plot | +MAplot_<condition1-condition2\>_YYYYMMDD.jpeg | +jpeg | +MAplot_treat-untreat_20200101.jpeg | +
Heatmap plot | +Heatmap plot of anything | +heatmap_<type\>_YYYYMMDD.jpeg | +jpeg | +heatmap_sampleCor_20200101.jpeg | +
Volcano plot | +Volcano plot | +volcano_<condition1-condition2\>_YYYYMMDD.jpeg | +jpeg | +volcano_treat-untreat_20200101.jpeg | +
Venn diagram | +Venn diagram | +venn_<type\>_YYYYMMDD.jpeg | +jpeg | +venn_consensus_20200101.jpeg | +
Enrichment table | +Enrichment results | +nan | +tsv | +nan | +
Click below to access a list of the most common file formats used when working with NGS data.
@@ -521,8 +1108,6 @@ 4. Pipelin
Explore more data types at the UCSC webpage. Check out this tutorial for more detailed explanations.
Explore more data types at the UCSC webpage. Check out this tutorial for more detailed explanations.
Wrap up
In this lesson, we have taken a look a the vast and diverse landscape of bioinformatics data.
diff --git a/develop/examples/NGS_metadata.html b/develop/examples/NGS_metadata.html index 9a9fdb35..bba83638 100644 --- a/develop/examples/NGS_metadata.html +++ b/develop/examples/NGS_metadata.html @@ -177,7 +177,7 @@