diff --git a/README.md b/README.md index 3fe3364..17bcb1d 100644 --- a/README.md +++ b/README.md @@ -17,7 +17,7 @@ This project was funded in FY23 by the [NOAA High Performance Computing and Comm With the [NOAA ‘Omics Strategic Plan](https://sciencecouncil.noaa.gov/wp-content/uploads/2022/08/Omics-Strategic-Plan_Final-Signed.pdf), the generation and analysis of large molecular (DNA, RNA) and chemical (metabolites, proteins) data is recognized as a NOAA mission priority. These types of data are rapidly advancing the field of fisheries and biological oceanography and are a crucial component of systems-level understanding of marine habitats, species diversity, and population dynamics. ‘Omics data sets consist of large raw and processed data files that require substantial storage space and computational processing power. Importantly, the requirements for ‘Omics workflows are distinct from those of mathematical simulations such as weather modeling. ‘Omics computational resources ideally have a flexible architecture that can accommodate both highly-parallel, low-memory processes as well as low-node count, high memory processes. Local on-premise servers built for bioinformatics demands have been important upgrades in NOAA computational capacity, but as ‘Omics projects expand in scope, individual servers may not meet the expanding scientific needs. Cloud computing could overcome resource challenges and represents a potential long-term solution to meet some of the scientific needs defined in the ‘Omics Strategic Plan. This project will tested the feasibility of running ‘Omics analyses in a cloud environment, and compared the cost and effort with those of on-premise HPC. ‘Omics bioinformatic workflows were conducted in parallel (i.e, in the cloud and on-premise), representing major areas of NOAA ‘Omics research: (1) DNA metabarcoding, (2) shotgun metagenome-assembled genome (MAG) binning, (3) transcriptome assembly and annotation, and (4) whole genome assembly, alignment, and variant calling. -Detailed documentation of required storage space, processing power, and time spent on installation and troubleshooting will provide a roadmap for evaluating cloud vs. on-premise computing, which in the long-term plan will identify the most effective means for storage and analysis of ‘Omics datasets. The final report from this project may be found **here**, and repositories used for analyses are linked below. +Detailed documentation of required storage space, processing power, and time spent on installation and troubleshooting will provide a roadmap for evaluating cloud vs. on-premise computing, which in the long-term plan will identify the most effective means for storage and analysis of ‘Omics datasets. The final report from this project may be found [here](https://github.com/noaa-nwfsc/FY24-HPCC-Incubator-testing-for-bioinformatics/blob/main/docs/NOAA%20Genomics%20Report%20(6.28.24).pdf), and repositories used for analyses are linked below. ## Project Repositories