ultimate saddle by dist, remove deprecated [WIP] #484
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
ultimate saddle by distance, where instead of specifying
min_diag
max_diag
one would receive a 3D array of saddle-data (3D arrays of saddle sums and counts), where the 1st dimension isindex_of_diag
index of the distance-diagonal it corresponds to - same as in #469 ...So
min_diag
,max_diag
can be achieved then asPotential use-case - by-distance saddles seem to be useful overall, short-range interactions vs longer-range interaction can tell slightly different stories .
min_diag/max_diag
is OK for the purpose, but often time the choice of distance ranges isn't obvious at the time or running the function, so one has to runsaddle
multiple times which can get annoying ... So instead, using this new functionality, one would receive a 3D stack of saddles and slice it however they desire in an agony of exploratory data analysis !also remove plotting part from saddles - it hasn't been maintained anyways #313
Update - in the same vein of generalization - one can further imagine splitting saddles even further -> by regions ... the way public API would look like is as follows:
so, default behavior would not change (maybe retire
min_diag/max_diag
but that's it), and aggregate flags would work as follows:aggregate_by_region = False
-> would make saddle return a dictionary of 2D(or 3D depending onaggregate_by_distance
)ndarrays
, with the(region1, region2)
-keysaggregate_by_distance = False
-> would make saddle return 3Dndarrays
of sums and counts, where 1st index corresponds to thedistance
(aka diagonal) in case of cis-data and potentially just a fake 1-dimension for transPotential use-cases - similarly as for the "by-distance" case - one might want to spot check some chromosomes individually - typical advanced data exploratory stuff ... separate out inter-arm (still not supported by saddles, which is a shame) - that can be closer to mainstream once inter-arm is supported.
This wouldn't be hard to implement in the current, dense-matrix based framework - simply reuse existing
_accumulate
type functions that keep modifying/accumulating intoS,C
ndarrays
, and make them returnS,C
per-region instead - then finish aggregation outside (if requested).Potential sparse-saddle implementation should be straightforward as well - just one more
groupby
...