Skip to content

Latest commit

 

History

History
29 lines (22 loc) · 1.24 KB

README.md

File metadata and controls

29 lines (22 loc) · 1.24 KB

Finding the most common 7-mer in a FASTA file

Your task

Write a script to print out the most common 7-mer and its GC percentage from all the sequences in data/records.fa. You are free to reuse your existing toolbox.

  • The example FASTA file was adapted from: Genome Biology DNA60 Bioinformatics Challenge.

Hints

Challenges

  • Find out how to change your script so that it can read from data/challenge.fa.gz without unzipping the file first (hint: check standard library).
  • Can you add a command line argument parser such that you are able to specify the path towards the input file from the command line?
  • Can you change the parser so that there is an option flag to tell the program whether the input file is gzipped or not?
  • Can you change your script so that it works for any N-mers instead of for just 7-mers?