JSON output file

This page describes the contents of the gzipped JSON file log.json.gz, made when running viridian run_one_sample.

The file contains a dictionary of run details. The main entries (keys) are:

run_summary - has high-level details on the run
stages_completed - the progress of each main stage in the pipeline
reads - high-level summary of read counts
read_depth - genome coverage and read depth information
amplicon_scheme_name - name of the identified amplicon scheme
scheme_choice - details of the amplicon scheme scoring
amplicons - details of the amplicon scheme that was used
self_qc - details of read pileup information at each masked position
sequences - consensus sequence and variations (for MSAs and tree building)

Please read on for more details about the contents of each of those entries.

run_summary

This section is a dictionary with a basic summary of the run. Here is an example (most of the key/value pairs in the options dictionary are omitted for brevity):

"last_stage_completed": "Finished",
"command": "viridian run_one_sample ... full command line used",
"options": {
  "debug": false,
  "outdir": "OUT",
  "force": false,
},
"cwd": "/foo/bar/",
"version": "1.1.0",
"finished_running": true,
"start_time": "2023-09-08T13:37:59+00:00",
"end_time": "2023-09-08T13:39:28+00:00",
"hostname": "thehoff",
"result": "Success",
"errors": [],
"temp_processing_dir": "/tmp/viridian.rxs2ttki",
"total_amplicons": 98,
"successful_amplicons": 98,
"consensus_length": 29836,
"consensus_N_count": 96,
"consensus_N_percent": 0.32,
"consensus_ACGT_count": 29740,
"consensus_ACGT_percent": 99.68,
"consensus_het_count": 0,
"consensus_het_percent": 0.0,
"run_time": "0:01:29.384867"

The most important thing to check is:

"result": "Success"

meaning that the run finished successfully. If instead is says "Fail", then something went wrong and the details will be in the stages_completed section. The other entries should be self-explanatory.

stages_completed

This is a list of the stages that were completed. Each time a stage finishes the json file is written, so that if Viridian crashes or is killed, you can see the last stage that was run.

A successful run looks like this:

"stages_completed": [
  "1/10 Start pipeline (0.0s)",
  "2/10 Process amplicon scheme files (0.1s)",
  "3/10 Map reads to reference (36.8s)",
  "4/10 Detect amplicon scheme (2.7s)",
  "5/10 Sample reads (23.3s)",
  "6/10 Initial consensus sequence (6.1s)",
  "7/10 Initial VCF and MSA of consensus/reference (0.4s)",
  "8/10 QC using reads vs consensus sequence (17.9s)",
  "9/10 Final QC checks (0.1s)",
  "10/10 Tidy up final files and log (1.0s)",
  "Finished"
]

The entries can vary depending on the command line options. For example, if a BAM file of mapped reads was provided, then the "Map reads to reference" stage would not be present. However, the final entry for a successful run is always "Finished".

reads

The "reads" section is a dictionary of summary statistics of the reads. Here is an example for paired Illumina reads:

"reads": {
  "unpaired_reads": 0,
  "reads1": 337637,
  "reads2": 337637,
  "total_reads": 675274,
  "mapped": 667394,
  "match_any_amplicon": 328507,
  "read_lengths": {
  "250": 20,
  "251": 675254
  }
}

The meaning of these should be clear, except for match_any_amplicon. For unpaired reads, this is simply the number of reads that matched to any amplicon in the chosen amplicon scheme (not all schemes under consideration). For paired reads it is the number of read pairs that matched, since both reads within a pair must be considered together when matching to an amplicon - their order and orientation is important.

The "read_lengths" dictionary is a count of the number of reads of each given read length. In that example, there were 20 reads of length 250, and the remaining 675254 reads all had length 251.

read_depth

This has a summary of the read depth and genome coverage. Here is an example:

"read_depth": {
  "depth_at_least": {
    "1": 29865,
    "2": 29862,
    "5": 29862,
    "10": 29836,
    "15": 29836,
    "20": 29794,
    "50": 29600,
    "100": 29600
  },
  "percent_at_least_x_depth": {
    "1": 99.87,
    "2": 99.86,
    "5": 99.86,
    "10": 99.78,
    "15": 99.78,
    "20": 99.64,
    "50": 98.99,
    "100": 98.99
  },
  "mean_depth": 5470.33,
  "mode_depth": 7393,
  "median_depth": 5051
}

These are all based on read mapping to the genome without using any information on amplicons schemes. The mean, mode and median depths are calculated with respect to the entire genome (amplicon schemes do not cover the whole genome). In that example, 99.64% of the genome (29794bp) had at least 20X read depth. This is the value used during QC (the options --coverage_min_x and --coverage_min_pc), where by default Viridian requires at least 50 percent of the genome with at least 20X read depth

amplicon_scheme_name

This is simply a key/value pair with the chosen amplicon scheme, for example:

"amplicon_scheme_name": "COVID-ARTIC-V3"

scheme_choice

This section has details of the amplicon scheme scores, and which scheme was chosen as best matching the reads. Example:

"scheme_choice": {
  "scores": {
    "COVID-ARTIC-V3": 4902,
    "COVID-ARTIC-V4.1": 808,
    "COVID-ARTIC-V5.0-5.3.2_400": 293,
    "COVID-ARTIC-V5.0-5.2.0_1200": 184,
    "COVID-MIDNIGHT-1200": 320,
    "COVID-AMPLISEQ-V1": -193,
    "COVID-VARSKIP-V1a-2b": 59
  },
  "best_schemes": [
    "COVID-ARTIC-V3"
  ],
  "best_score": 4902,
  "best_scheme": "COVID-ARTIC-V3",
  "score_ratio": 0.16
}

In that example, the best scheme was COVID-ARTIC-V3, with a score of 4902. The second-best scheme was COVID-ARTIC-V4.1 with a score of 808. The ratio of these (score_ratio) was 808/4902 = 0.16.

By default, the best score needs to be at least 250, and the ratio no more than 0.5 (options --min_scheme_score and --max_scheme_ratio).

The best_schemes entry is a list to allow for the extremely unlikely (and never seen!) case that two schemes score equally well. If this happened, then the score ratio would be 1 and the run halted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON output file

run_summary

stages_completed

reads

read_depth

amplicon_scheme_name

scheme_choice

amplicons

self_qc

sequences

Clone this wiki locally