Finding the most common 7-mer in a FASTA file

Your task

Write a script to print out the most common 7-mer and its GC percentage from all the sequences in data/records.fa. You are free to reuse your existing toolbox.

The example FASTA file was adapted from: Genome Biology DNA60 Bioinformatics Challenge.

Hints

FASTA files have two types of lines: header lines starting with a ">" character and sequence lines. We are only concerned with the sequence line.
Read the string functions documentation.
Read the documentation for built in functions.
Parse command line arguments

Challenges

Find out how to change your script so that it can read from data/challenge.fa.gz without unzipping the file first (hint: check standard library).
Can you add a command line argument parser such that you are able to specify the path towards the input file from the command line?
Can you change the parser so that there is an option flag to tell the program whether the input file is gzipped or not?
Can you change your script so that it works for any N-mers instead of for just 7-mers?

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Finding the most common 7-mer in a FASTA file

Your task

Hints

Challenges

About

Releases

Packages

lumc-python/day3_assignments

Folders and files

Latest commit

History

Repository files navigation

Finding the most common 7-mer in a FASTA file

Your task

Hints

Challenges

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages