Instructors: Oscar A. Pérez Escobar (Royal Botanic Gardens, Kew) & Sidonie Bellot (Royal Botanic Gardens, Kew)
Organizers: Liam Trethowan (Royal Botanic Gardens, Kew), Himmah Rustiami (Herbarium Bogoriense-BRIN), Oscar A. Pérez Escobar, Sidonie Bellot.
This workshop is funded and supported by: The Darwin Innitiative UK - Royal Botanic Gardens, Kew (RBG Kew) - Herbarium Bogoriense-BRIN)
This repository contains a tutorial guide to the analysis of raw data derived from Oxford Nanopore Technologies (ONT) and the initial steps for conducting a genome assembly. Additionally, it includes a real example of ONT data application/usability e.g., conducting sequence searches on a predetermined database using ncbi blast to determine the identiy of an organism sequenced using ONT.
This tutorial is intended for users with a basic knowledge in programming and is designed to run in UNIX environments. The participant should ideally have experience using shell, and text file manipulation (e.g., using awk, sed, grep, among others). The workshop will be run on pre-configured laptops (Ubuntu 22.04). A basic introduction to the UNIX enviroment with some useful commands is available here.
This tutorial requires the following programs/dependencies (it is highly recommended to have these installed before starting the tutorial). Please make sure that the dependencies on which these programs run are also available:
- NCBI blast: This program builds blast databases, which are required for searches of DNA/AA sequences in blast databases.
- NCBI magicblast: This program conducts DNA/AA sequence searches derived from illumina/Nanopore sequencing experioments (in fasta or fastq format) against blast databases. IMPORTANT: please create a free NCBI account to then freely access an NCBI API KEY here. This is needed to perform remote (online) sequence queries.
- CANU: this program allows the correction and filtering of ONT/PacBio sequences.
- SMARTdenovo: this program assembles de-novo corrected and trimmed ONT/PacBio sequences.
- minimap2: this programs conducts pairwaise genome alignments, or short-short, short-long read mapping. This is required for genome polishing.
- racon: this program corrects long reads or scaffolds from short read mapped reads against said scaffolds/genome. This is required for genome polishing.
- NanoPlot: An online executable version is available here; this program produces plots with information associated with sequencing experiments conducted on ONT technologies.
- Guppy: This program (now legacy) calls bases from FAST5 files generated by ONT. It is only available for ONT users (this part of the tutorial, although explained, will not be executed).
- dorado: This program (now the official ONT basecaller) calls bases from POD5 files generated by ONT. It is Open Access and can perform a wide range of functions.
This tutorial is divided into four main steps:
A. Base Calling
C. Data trimming, correction and genome assembly
D. Genome search and/or annotation operations
Figure 1: Simplified view of tutorial/pipeline
Important
The base data needed to run this tutorial is available in the different subfolders of this repo (e.g., NGS
and NanoPlot
). Some files need to be downloaded from a google drive folder. The link to such files is provided in the README.md
files of each subdfolder
In any bioinformatics pipeline, it is essential to relate which programs the pipeline depends on and to know where the input files, etc. are located. To run this tutorial, you must copy this repository to a directory of your choice (ideally /home/ontasia*/Documents
). To do this, please execute:
git clone https://github.com/siriusb-nox/ONT-workshop-Oct-2023.git
For users with programs installed in a UNIX environment on personal computers, these can be entered in the current session (terminal) using the following command, for example:
PATH=$PATH:/directory/of/the/folder/programx
For this particular workshop, users with Dell Laptops should run the following lines to add the dependencies to ENV:
# Canu
PATH=$PATH:/home/ontasia*/Softwares/canu/canu-1.9/Linux-amd64/bin/
# Racon
PATH=$PATH:/home/ontasia*/softwares/genomics/racon/build/bin
# Minimap2
PATH=$PATH:/home/ontasia*/softwares/genomics/minimap2-2.17_x64-linux/
# samtools
PATH=$PATH:/home/ontasia*/softwares/genomics/samtools-1.10
# magicblast
PATH=$PATH:/home/ontasia*/softwares/genomics/ncbi-magicblast-1.5.0/bin/
# ncbi blast
PATH=$PATH:/home/ontasia*/softwares/genomics/ncbi-blast-2.10.0+/bin/
# SMARTdenovo
PATH=$PATH:/home/ontasia*/softwares/genomics/
export PATH