Replies: 6 comments
-
There is a step missing in the provided solution - symmetrising the final interaction matrix (saddle) # transposing 2nd and 3rd dimensions, leaving 1st dimension alone !
_sX = _sumX + _sumX.transpose(0,2,1)
_cX = _countX + _countX.transpose(0,2,1) it is unclear though why that symmetrising is there in the first place and when does it come into play ... Potentially useful for future fully sparse implementation of saddle |
Beta Was this translation helpful? Give feedback.
-
related PR: #484 |
Beta Was this translation helpful? Give feedback.
-
to me, this seems like a very useful extension of the saddle functionality! A simple back-of-the-envelope calculation:
Are these estimates correct? I can think of two solutions to mitigate these issues:
|
Beta Was this translation helpful? Give feedback.
-
100%, yes ! this would explode in memory very very quickly ... definitely . Yes, this could become another way of shooting yourself in the foot - but not the first one:
But enough of that ... Practical solutions:
|
Beta Was this translation helpful? Give feedback.
-
such ultimately flexible saddle_by_dist is also a flashback to existing (in sandbox now) |
Beta Was this translation helpful? Give feedback.
-
some examples just in case - here is how a saddle by distance looks like for WT-like and deltaRad21-like samples - biology aside - short range does look quite different (saddles are done using their own eigen vectors @25kb binsize, intra-arm only) so, yes, - we don't need all of those individual diagonals per se - but they are nice to have - to aggregate them into distance bins after the fact I personally haven't even check different chroms separately - that was purely theoretical excercise - I think still it would be useful |
Beta Was this translation helpful? Give feedback.
-
intro: saddles plots discrete/continuous by distance are useful and informative ...
"By distance" means - taking into account interactions within a certain distance range, e.g.
1MB<D<5MB
when calculating "the saddle" ...Currently, the "by distance" part is done by filling anything that does not belong to a selected distance band - i.e. outside of
min_diag
andmax_diag
withNaNs
:Instead one can generate saddles for every diagonal without significant performance penalty:
So, instead of existing
_accumulate
one could use something like that:This way one would accumulate saddles (sum and counts first) into 3D-stacked arrays, ala pileups/snipping:
S[diagonals, saddle_bins, saddle_bins]
- such that it would be easy to do by distance saddles like so:Full prototype solution below:
https://gist.github.com/sergpolly/ee39a452c1e30f12d5100b28f35f4ee0
Beta Was this translation helpful? Give feedback.
All reactions