Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support grouping over splits in rgdr #58

Open
BSchilperoort opened this issue Jul 27, 2022 · 1 comment
Open

Support grouping over splits in rgdr #58

BSchilperoort opened this issue Jul 27, 2022 · 1 comment
Labels
RDGR Issues relating to the RGDR module

Comments

@BSchilperoort
Copy link
Contributor

BSchilperoort commented Jul 27, 2022

Due to computational limits (applying DBSCAN for every individual train/test split might not be viable), we want to allow users to be able to 'grouping' splits in RGDR before calculating the DBSCAN clusters.

To do this we need to go through the following steps:

  1. Calculate the correlation coefficient and p-value for every fold (see Implemented dbscan for RGDR #57 )
  2. Determine the p-value mask for every individual split (training data only)
  3. Reduce this mask over the split dimension with np.any
  4. Apply DBSCAN to the reduced mask
  5. Recombine the DBSCAN clusters with each split's mask. (e.g. for each split's cluster labels: cluster_labels[~split_mask] = 0.0)

This way we end up with clusters for each split, with aligned split labels.

@BSchilperoort BSchilperoort changed the title Support grouping over folds in rgdr Support grouping over splits in rgdr Aug 2, 2022
@geek-yang
Copy link
Member

Based on the discussion in issue #71, we will only provide iterator for the user to walk through all the splits. They have the flexibility to perform RGDR (or even complete ML workflow). We can further discuss whether we need a function to do "grouping over splits". But at least we can provide a notebook to show this as a usecase.

@BSchilperoort BSchilperoort added the RDGR Issues relating to the RGDR module label Aug 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RDGR Issues relating to the RGDR module
Projects
None yet
Development

No branches or pull requests

2 participants