Skip to content

Latest commit

 

History

History

Finding_and_summarizing_colossal_files

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Needle in a Haystack: Finding and summarizing data from colossal files

Audience Computational skills required Duration
Biologists Beginner bash 2-3 hour workshop (~2-3 hours of trainer-led time)

Description

This repository has teaching materials for a 3 hour, hands-on Intermediate bash workshop led at a relaxed pace. Many tools for the analysis of big data require knowledge of the command line, and this workshop will build on the basic skills taught in the The Foundation - Basic Shell workshop to teach users basic command line functions such as grep, sed and awk to find and summarize information from large files.

Learning Objectives

  • Recognize basic regex
  • Utilize regex to cast a wider next with grep, sed, and awk
  • Differentiate between best use cases for grep, sed, and awk
  • Implement proper syntax for grep, sed, and awk commands
  • Observe the wide range of options for sed to perform various tasks
  • Identify bioinformatic applications for grep, sed, and awk

These materials are developed for a trainer-led workshop, but are also amenable to self-guided learning.

Contents

Lessons Estimated Duration
Setting up 15 min
Regular Expressions 45 min
Sed 45 min
Awk 75 min

Dataset

Installation Requirements

Windows users: GitBash R