Split dataset into k folds with balanced label distribution (stratified) and non-overlapping groups.
StratifiedGroupKFold class is compatible with sklearn.model_selection.KFold
Reference : Stratified Group k-Fold Cross-Validation | Kaggle
pip install git+https://github.com/yk-szk/stratified_group_kfold
from stratified_group_kfold import StratifiedGroupKFold
X, y, groups = load_dataset()
sgkf = StratifiedGroupKFold(n_splits=5, shuffle=True)
for train_index, test_index in sgkf.split(X, y, groups):
do_stuff(train_index, test_index)