Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Model #7

Open
averagehat opened this issue Oct 9, 2019 · 3 comments
Open

Data Model #7

averagehat opened this issue Oct 9, 2019 · 3 comments
Assignees

Comments

@averagehat
Copy link
Collaborator

No description provided.

@averagehat averagehat self-assigned this Oct 9, 2019
@averagehat
Copy link
Collaborator Author

averagehat commented Oct 9, 2019

@startuml
skinparam objectFontSize 22

object "Bam File" as bf
object "Alt Position" as ap
object "extract reads" as er
object "match primers" as mp
object "Primer File" as pf
object "Primer Set" as ps
object "Alt Base" as ab
object "Max Primer Length" as mpl 


object BamRecords {
 1	QNAME	String	Query template NAME
 2	FLAG	Int	bitwise FLAG
 3	RNAME	String	References sequence NAME
 4	POS	Int	1- based leftmost mapping POSition
 5	MAPQ	Int	MAPping Quality
 6	CIGAR	String	CIGAR String
 7	RNEXT	String	Ref. name of the mate/next read
 8	PNEXT	Int	Position of the mate/next read
 9	TLEN	Int	observed Template LENgth
 10	SEQ	String	segment SEQuence
 11	QUAL	String	ASCII of Phred-scaled base QUALity+33

 }


ps : type: `set`
mp : action: Filter Match Primers Function
lofreqRecord --o ap
lofreqRecord --o ab
pf --o ps
ps --o mpl
ab --o er
ap --o er
mpl --o er
bf --o BamRecords
BamRecords --o er
er --o mp
ps --o mp

object "Match Primers Function" as mpf
note top of mpf
  are all primers the same size?
  should a primer match if there are any mismatches or indels?
end note

object lofreqRecord {
 alternates : [Base]
 position : Int
 alleleFreq : Double
 totalDepth : Int
 strandBias : Int
 fwdRefDepth : Int
 revRefDepth : Int
 fwdAltDepth : Int
 revAltDepth : Int
 isIndel : Bool
 consVariant : Bool
 hpolyLength : Int
}

er : action: https://github.com/averagehat/ngs-doit/blob/master/ngs_doit/plotting.py#L44
er : extract reads where the contributing base is within [max primer length] of that end of the read
"Bam File" .. "extract reads"
"Alt Position" .. "extract reads"
"extract reads" .. "match primers"
mpf : args @ BamRecord, Primer Set, Alt Base, Alt Position
mpf : returns @ Boolean
mpf *-- match
mpf *-- "The matched Primer contributes to the Alt"
match : the read starts or ends with a primer
@enduml

https://github.com/VDBWRAIR/vartable/blob/40b5e095538a5b05e634757db5f5409a34f5bc9d/diagram.png

@averagehat
Copy link
Collaborator Author

@averagehat
Copy link
Collaborator Author

After discussion with @pirekupcode, we need a better data model.
We don't interface with a BAM file as a BamRecord described above, but with pysam. We use pysam because we don't want to parse and apply the cigar string or flags ourselves. Here's another alternative:

https://github.com/simon-anders/htseq/blob/89f12460eb5bcf7b13478d6e056acd011508106b/python2/src/HTSeq/_HTSeq.pyx#L217

https://samtools.github.io/hts-specs/SAMv1.pdf

https://pysam.readthedocs.io/en/latest/glossary.html#term-soft-clipped

https://pysam.readthedocs.io/en/latest/api.html#pysam.AlignedSegment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants