-
Notifications
You must be signed in to change notification settings - Fork 0
pashadag/Pdb
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Code for Paired de Bruijn graph , based on recomb 2011 paper. There has been some development since the paper, but you can still find the original code in Recomb2011version.zip MAIN FILES: singleGenerateReads.cpp Generates non-paired reads from a genome generateReads.cpp Generates paired reads from a genome ab.cpp Assembles non-paired reads pab.cpp Assembles paired reads -- exact d version pabNL.cpp Assembles paired reads -- NoLimit on delta pabNLpar.cpp Threaded version of pabNL pabFull.cpp Adds support for gaps in the coverage to pabNL. This is done by effectively throwing away any matepairs whose right mate doesn't have a place in the DB (b/c the DB is implemented just for the left reads). This version should supercede all the other ones -- use it in place of pab, pabNL, pabNLpar. There are probably bug fixes in this one that are not in the other ones. VALIDATION: plot.py Plots the contig lengths qcheck.sh Uses blat to check accuracy of contigs qcheck2.cpp Uses suffix array to check accuracy of contigs OTHER: generateKs.cpp Takes a readl file (with reads) and generates a readk file (with kmers). check_loops.cpp Reports any loops found in the dbcollapsed output file. union.h Implements union find data structure buildsa.cpp Builds the suffix array for a genome filter_genome.cpp Removes all non [atcgATGC] characters from the genome and uppercases everything run Collections of commands used to generate figures in paper THE FOLLOWING ARE FILES STILL IN DEVELOPMENT: short.cpp A replacement for singleGenerateReads that saves memory by using the genome sequence in sorting. abshort.cpp A replacement for ab.cpp saves memory by using indices into the genome sequence in place of the read strings. validate.cpp A tool that takes a partition info and, using the source genome sequence as a cheat, checks if the partitioning is safe. FILE FORMATS: readl : each line is a matepair, left read followed by right read, no space in between readk : each line is a left kmer in the reads and its position in readl readks : readk sorted by the kmers with only the position stored Workflow from a readl file is to first run generateKs to create readk, then sort to create readks, then pabFull to run assembly
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published