Skip to content

extra-programming/shotgun

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

#Shotgun Sequencing!


###Description: A project to sequence DNA


###Goals: ####Completed:

  • Start project
  • Create a README
  • Create DNA to sequnence

####In Progress:

  • Fake the process of cutting-up and sequencing short strips of DNA
    • The sequencing error rate should be significantly higher near the ends of the segments
  • Make tests

Short Term Plans:

  • Begin to sequence the genome
    • Find matching (or close-enough) ends of segments to assemble into subsequences
    • Keep matching subsequences until the entire genome is sequenced
      • What do we do about spurious matches?
    • Maybe we should try this first with sequencing errors disabled?

####Long Term Plans:

  • Fully sequence the genome
  • Think about animating (at least the recombining of long sequences), probably color-coded. Maybe long matching sequences would get highlighted before being dragged together. Maybe we can learn from the animated sorters. Note: google search shotgun sequencing animation finds many potential videos and tutorials. (Including DNA Learning Center has basic shotgun sequencing video (needs flash) with links to other videos.)

####Recommended reading

####Questions for further research

  • Is our random dna really random in the right way? How do we test that our fake dna has typical distribution of the standard base pairs?
  • Standards for difference measurement and error count between reads of a sequence (From nature article, "Finishing the euchromatic sequence of the human genome" accessed 2015 feb 18: http://www.nature.com/nature/journal/v431/n7011/full/nature03001.html "Figure 2: Assessment of potential errors by analysis of BAC overlaps. a, Single-base differences between overlapping finished BAC clones (with ≥5 kb overlap). The number of single-base differences in overlaps for clones from the same library and from different libraries is plotted. The results are consistent with half of the clones from the same library representing identical underlying DNA sequence with low error rate, and half representing different haplotypes as expected. b, Insertion/deletion (indel) differences between overlapping clones. The number of indels per Mb for a given size range is compared for clones with no single-base mismatches (presumed to be derived from the same haploid source) and >3 single-base mismatches (presumed to be derived from different haploid sources). Indels in the former class primarily represent errors in finished sequence; they occur at ~20-fold lower frequency (inset) than indels in the latter class, which primarily represent polymorphic differences."
  • Do "restriction sites" break at particular motifs?

####Related Topics

  • Darpa's "shred challenge" in 2011 gave competitors pictures of shreds from sliced-up documents and asked teams to re-assemble the original documents. Winning team ”All Your Shreds are belong to U.S.” used some AI and visual processing, much like re-assembling broken and overlapping sequences of DNA. Here's an essay about a [griefer who scrambled] (https://medium.com/backchannel/how-a-lone-hacker-shredded-the-myth-of-crowdsourcing-d9d0534f1731) and vandalized the publicly available work-in-progress of the UCSD crowdsourced team. Article includes interview with the hacker who was tracked down 3 years later. Article suggests this act proves that some crowdsourcing might not stand up to even small teams of attackers (though longterm efforts with hard-won reputation scores, like wikipedia, might stand up better to vandalism). Winning team denies being the griefer(s). Fun-fact from darpa’s shred page: the documents make reference to the Mad Magazine “Spy vs Spy” cartoons of Antonio Prohías.

About

Shotgun Sequencing!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages