Data Model #7

averagehat · 2019-10-09T17:26:39Z

No description provided.

averagehat · 2019-10-09T18:40:47Z

@startuml
skinparam objectFontSize 22

object "Bam File" as bf
object "Alt Position" as ap
object "extract reads" as er
object "match primers" as mp
object "Primer File" as pf
object "Primer Set" as ps
object "Alt Base" as ab
object "Max Primer Length" as mpl 


object BamRecords {
 1	QNAME	String	Query template NAME
 2	FLAG	Int	bitwise FLAG
 3	RNAME	String	References sequence NAME
 4	POS	Int	1- based leftmost mapping POSition
 5	MAPQ	Int	MAPping Quality
 6	CIGAR	String	CIGAR String
 7	RNEXT	String	Ref. name of the mate/next read
 8	PNEXT	Int	Position of the mate/next read
 9	TLEN	Int	observed Template LENgth
 10	SEQ	String	segment SEQuence
 11	QUAL	String	ASCII of Phred-scaled base QUALity+33

 }


ps : type: `set`
mp : action: Filter Match Primers Function
lofreqRecord --o ap
lofreqRecord --o ab
pf --o ps
ps --o mpl
ab --o er
ap --o er
mpl --o er
bf --o BamRecords
BamRecords --o er
er --o mp
ps --o mp

object "Match Primers Function" as mpf
note top of mpf
  are all primers the same size?
  should a primer match if there are any mismatches or indels?
end note

object lofreqRecord {
 alternates : [Base]
 position : Int
 alleleFreq : Double
 totalDepth : Int
 strandBias : Int
 fwdRefDepth : Int
 revRefDepth : Int
 fwdAltDepth : Int
 revAltDepth : Int
 isIndel : Bool
 consVariant : Bool
 hpolyLength : Int
}

er : action: https://github.com/averagehat/ngs-doit/blob/master/ngs_doit/plotting.py#L44
er : extract reads where the contributing base is within [max primer length] of that end of the read
"Bam File" .. "extract reads"
"Alt Position" .. "extract reads"
"extract reads" .. "match primers"
mpf : args @ BamRecord, Primer Set, Alt Base, Alt Position
mpf : returns @ Boolean
mpf *-- match
mpf *-- "The matched Primer contributes to the Alt"
match : the read starts or ends with a primer
@enduml

https://github.com/VDBWRAIR/vartable/blob/40b5e095538a5b05e634757db5f5409a34f5bc9d/diagram.png

averagehat · 2019-10-09T18:42:14Z

Generated with https://www.planttext.com
see http://plantuml.com/object-diagram

averagehat · 2019-10-25T14:23:45Z

After discussion with @pirekupcode, we need a better data model.
We don't interface with a BAM file as a BamRecord described above, but with pysam. We use pysam because we don't want to parse and apply the cigar string or flags ourselves. Here's another alternative:

https://github.com/simon-anders/htseq/blob/89f12460eb5bcf7b13478d6e056acd011508106b/python2/src/HTSeq/_HTSeq.pyx#L217

https://samtools.github.io/hts-specs/SAMv1.pdf

https://pysam.readthedocs.io/en/latest/glossary.html#term-soft-clipped

https://pysam.readthedocs.io/en/latest/api.html#pysam.AlignedSegment