Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ultimate saddle by dist, remove deprecated [WIP] #484

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sergpolly
Copy link
Member

@sergpolly sergpolly commented Jan 19, 2024

ultimate saddle by distance, where instead of specifying min_diag max_diag one would receive a 3D array of saddle-data (3D arrays of saddle sums and counts), where the 1st dimension is index_of_diag index of the distance-diagonal it corresponds to - same as in #469 ...
So min_diag,max_diag can be achieved then as

np.divide(
   np.nansum(saddle_sum_stack[min_diag:max_diag], axis=0),
   np.nansum(saddle_count_stack[min_diag:max_diag], axis=0)
)

Potential use-case - by-distance saddles seem to be useful overall, short-range interactions vs longer-range interaction can tell slightly different stories . min_diag/max_diag is OK for the purpose, but often time the choice of distance ranges isn't obvious at the time or running the function, so one has to run saddle multiple times which can get annoying ... So instead, using this new functionality, one would receive a 3D stack of saddles and slice it however they desire in an agony of exploratory data analysis !

also remove plotting part from saddles - it hasn't been maintained anyways #313

Update - in the same vein of generalization - one can further imagine splitting saddles even further -> by regions ... the way public API would look like is as follows:

saddle(
    # ... existing parameters with no or little change
    aggregate_by_region = True,
    aggregate_by_distance = True,
)

so, default behavior would not change (maybe retire min_diag/max_diag but that's it), and aggregate flags would work as follows:

  • aggregate_by_region = False -> would make saddle return a dictionary of 2D(or 3D depending on aggregate_by_distance) ndarrays, with the (region1, region2)-keys
  • aggregate_by_distance = False -> would make saddle return 3D ndarrays of sums and counts, where 1st index corresponds to the distance(aka diagonal) in case of cis-data and potentially just a fake 1-dimension for trans

Potential use-cases - similarly as for the "by-distance" case - one might want to spot check some chromosomes individually - typical advanced data exploratory stuff ... separate out inter-arm (still not supported by saddles, which is a shame) - that can be closer to mainstream once inter-arm is supported.

This wouldn't be hard to implement in the current, dense-matrix based framework - simply reuse existing _accumulate type functions that keep modifying/accumulating into S,C ndarrays, and make them return S,C per-region instead - then finish aggregation outside (if requested).
Potential sparse-saddle implementation should be straightforward as well - just one more groupby ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant