You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
The approach in the example is a simple way to demonstrate the covariate shift. Thank you for an informative description of the covariate shift detection problem and your work. However, I am a bit confused. What is the sense to engage a machine learning model to solve a problem that is solved at a start by "human learning", ie. predetermined dividing of data on two groups according to generated histogram? This histogram showed us already the covariate shift in the dataset. For me, dividing the dataset into a reference group (a group with lat values less than -119) and a group representing the whole dataset makes more sense. I know that there is demonstrated a simple example, but the addition of a dataset example with a hided covariate shift would be helpful (the breast cancer dataset is a classic and very easy binary classification dataset from sklearn.datasets).
The text was updated successfully, but these errors were encountered:
Hi @tomaszek0 , thanks for the question. So the idea is something like this. In real world models, dataset shift and covariate shift occur because of changing business conditions, for example, customer tastes change, market forces change etc. In this case we are simulating two such datasets from different conditions. In the real world, this would happen organically. Does that help? Your request for a real-world example is noted.
@tomaszek0 noted. In fact, you choose to implement drift detection with logistic regression, then you get what you are referring to. The choice of the classifier is really a design and application preference. Interest in this feature is noted.
Hi,
The approach in the example is a simple way to demonstrate the covariate shift. Thank you for an informative description of the covariate shift detection problem and your work. However, I am a bit confused. What is the sense to engage a machine learning model to solve a problem that is solved at a start by "human learning", ie. predetermined dividing of data on two groups according to generated histogram? This histogram showed us already the covariate shift in the dataset. For me, dividing the dataset into a reference group (a group with lat values less than -119) and a group representing the whole dataset makes more sense. I know that there is demonstrated a simple example, but the addition of a dataset example with a hided covariate shift would be helpful (the breast cancer dataset is a classic and very easy binary classification dataset from sklearn.datasets).
The text was updated successfully, but these errors were encountered: