BiasBusterDPGen (B₂DPG)

Detailed Project Proposal: Key Steps and Sub-Steps

User Data Upload Interface: Provide a platform for users to securely upload tabular or NLP data.
Bias Type Specification: Allow users to specify the kind of binary bias they are concerned about in their data.
Initial Data Observation: Use LLMs to observe a subset of the data to understand its structure and content.
Regex Query Generation by LLMs: Leverage LLMs to craft two regex queries aimed at detecting the specified binary biases.
Bias Detection Using Regex: Apply the generated regex queries to the dataset to identify instances of bias.
Bias Visualization: Create pie chart visualizations to represent the proportion of detected biases within the dataset.
Dataset Segmentation: Split the dataset into three distinct subsets:
- Majority class bias
- Minority class bias
- Neutral or undetermined bias
Sentence Embedding with Sentence-BERT: Embed samples from the majority and minority class bias subsets using Sentence-BERT.
Cosine Similarity Analysis: Compute cosine similarity between embeddings of majority and minority class samples to identify significant bias discrepancies.
Identification of Seeds for Synthetic Counterfactuals: Determine seeds by selecting samples from the majority class that are most dissimilar to the minority class based on cosine similarity.
Synthetic Counterfactual Generation with LLMs: Use seeds to guide LLMs in generating counterfactuals to transition samples from majority to minority class, aiming to balance the dataset.
Dataset Augmentation with Synthetic Counterfactuals: Integrate the synthetic counterfactuals into the original dataset to mitigate identified biases.
Differential Privacy Synthetic Data Generator Development:
- Private Dataset Consideration: Treat the uploaded dataset as private, applying differential privacy principles.
- Epsilon Setting: Allow users to set an epsilon value for differential privacy guarantees.
- Random Subset Sampling: Sample a random subset from the private dataset as a basis for synthetic data generation.
- LLM-Powered Synthetic Data Generation:
  - Employ in-context learning or few-shot examples to guide LLMs.
  - Generate new synthetic data by prompting LLMs, incorporating Gaussian noise into the next token prediction task for differential privacy.

Impact and Utility

This project introduces a comprehensive framework for detecting, analyzing, and mitigating bias in datasets using state-of-the-art LLMs and NLP techniques. It not only aids in uncovering subtle biases within data but also provides a novel approach to creating balanced datasets through synthetic counterfactuals, enhancing the fairness of machine learning models derived from such data. Additionally, by incorporating differential privacy into synthetic data generation, the project addresses critical concerns regarding data privacy, making it a pioneering effort towards responsible AI development. This endeavor promises to set new standards in ethical data science practices, significantly benefiting researchers, data scientists, and organizations striving for equity and privacy in their analytical and predictive models.

Datasets Experimented With:

Wino Dataset
Wiki Subset
Adult
Credit
COMPAS - Correctional Offender Management Profiling for Alternative Sanctions

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.streamlit		.streamlit
code		code
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
logo.png		logo.png
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BiasBusterDPGen (B₂DPG)

Detailed Project Proposal: Key Steps and Sub-Steps

Impact and Utility

Datasets Experimented With:

About

Releases

Packages

Contributors 3

Languages

License

yashmaurya01/BiasBusterDPGen

Folders and files

Latest commit

History

Repository files navigation

BiasBusterDPGen (B2DPG)

Detailed Project Proposal: Key Steps and Sub-Steps

Impact and Utility

Datasets Experimented With:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

BiasBusterDPGen (B₂DPG)

Packages