You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 31, 2023. It is now read-only.
Reweighting takes a dataset D and assigns a weight to each observation using conditional probabilities based on target labels and protected class membership.
s1 - disadvantaged group s2 - advantaged group + - positive label - - negative label
large weights are assigned to X_s1_y+ and X_s0_y–:
weights for s1 | +: (p(s1) * p(+)) / p(s1 and +)
weights for s1 | -: (p(s1) * p(-)) / p(s1 and -)
small weights are assigned to Xs1_y– and X_s0_y+
weights for s0 | +: (p(s0) * p(+)) / p(s0 and +)
weights for s0 | -: (p(s0) * p(-)) / p(s0 and -)
the weights are then used as input to model types that support weighted observations
NOTE: The above weighting scheme works because e.g. the numerator p(s1) * p(+) denotes the
expected probability of an observation being disadvantaged and positively labelled if the two variables are independent, and the denominator p(s1 and +) denotes the actual probability. Therefore, in a discriminatory dataset the term (p(s1) * p(+)) / p(s1 and +) will evaluate to > 1 since the actual probability of being s1 and + is less than the expected probability under the independence assumption.
Conversly, (p(s1) * p(-)) / p(s1 and -) will evaluate to < 1 since the actual probability of being s1 and - is greater than the expected probability under the independence assumption.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Reweighting takes a dataset
D
and assigns a weight to each observation using conditional probabilities based on target labels and protected class membership.s1
- disadvantaged groups2
- advantaged group+
- positive label-
- negative labelX_s1_y+
andX_s0_y–
:s1 | +
:(p(s1) * p(+)) / p(s1 and +)
s1 | -
:(p(s1) * p(-)) / p(s1 and -)
Xs1_y–
andX_s0_y+
s0 | +
:(p(s0) * p(+)) / p(s0 and +)
s0 | -
:(p(s0) * p(-)) / p(s0 and -)
NOTE: The above weighting scheme works because e.g. the numerator
p(s1) * p(+)
denotes theexpected probability of an observation being disadvantaged and positively labelled if the two variables are independent, and the denominator
p(s1 and +)
denotes the actual probability. Therefore, in a discriminatory dataset the term(p(s1) * p(+)) / p(s1 and +)
will evaluate to> 1
since the actual probability of beings1
and+
is less than the expected probability under the independence assumption.Conversly,
(p(s1) * p(-)) / p(s1 and -)
will evaluate to< 1
since the actual probability of beings1
and-
is greater than the expected probability under the independence assumption.The text was updated successfully, but these errors were encountered: