the validity of employing ml model for the covariate shift calculation in the example [Arangopipe_Feature_ext2_output.ipynb] #167

tomaszek0 · 2021-12-22T15:54:31Z

Hi,
The approach in the example is a simple way to demonstrate the covariate shift. Thank you for an informative description of the covariate shift detection problem and your work. However, I am a bit confused. What is the sense to engage a machine learning model to solve a problem that is solved at a start by "human learning", ie. predetermined dividing of data on two groups according to generated histogram? This histogram showed us already the covariate shift in the dataset. For me, dividing the dataset into a reference group (a group with lat values less than -119) and a group representing the whole dataset makes more sense. I know that there is demonstrated a simple example, but the addition of a dataset example with a hided covariate shift would be helpful (the breast cancer dataset is a classic and very easy binary classification dataset from sklearn.datasets).

rajivsam · 2021-12-23T02:22:03Z

Hi @tomaszek0 , thanks for the question. So the idea is something like this. In real world models, dataset shift and covariate shift occur because of changing business conditions, for example, customer tastes change, market forces change etc. In this case we are simulating two such datasets from different conditions. In the real world, this would happen organically. Does that help? Your request for a real-world example is noted.

tomaszek0 · 2021-12-28T21:13:57Z

Hi @rajivsam , Simpler = Better (https://towardsdatascience.com/the-limitations-of-machine-learning-a00e0c3040c6). Here is a similar approach to a covariate shift detection as shown by Du Phan (https://medium.com/data-from-the-trenches/a-primer-on-data-drift-18789ef252a6). I feel that in the case of "Arangopipe-feature..." some additional explanation of the features drift should have been given by computing the drift values as an equivalent of feature importance (we should be able to indicate the feature that discriminates the corrupted (shifted) samples from the reference (stationary) samples (for example https://docs.seldon.io/projects/alibi-detect/en/latest/examples/cd_spot_the_diff_mnist_wine.html).

rajivsam · 2021-12-29T02:25:40Z

@tomaszek0 noted. In fact, you choose to implement drift detection with logistic regression, then you get what you are referring to. The choice of the classifier is really a design and application preference. Interest in this feature is noted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the validity of employing ml model for the covariate shift calculation in the example [Arangopipe_Feature_ext2_output.ipynb] #167

the validity of employing ml model for the covariate shift calculation in the example [Arangopipe_Feature_ext2_output.ipynb] #167

tomaszek0 commented Dec 22, 2021 •

edited

Loading

rajivsam commented Dec 23, 2021

tomaszek0 commented Dec 28, 2021

rajivsam commented Dec 29, 2021 •

edited

Loading

the validity of employing ml model for the covariate shift calculation in the example [Arangopipe_Feature_ext2_output.ipynb] #167

the validity of employing ml model for the covariate shift calculation in the example [Arangopipe_Feature_ext2_output.ipynb] #167

Comments

tomaszek0 commented Dec 22, 2021 • edited Loading

rajivsam commented Dec 23, 2021

tomaszek0 commented Dec 28, 2021

rajivsam commented Dec 29, 2021 • edited Loading

tomaszek0 commented Dec 22, 2021 •

edited

Loading

rajivsam commented Dec 29, 2021 •

edited

Loading