diff --git a/img/benjamin.jpg b/img/benjamin.jpg new file mode 100644 index 0000000..622070a Binary files /dev/null and b/img/benjamin.jpg differ diff --git a/index.html b/index.html index 511a5d3..d54a199 100644 --- a/index.html +++ b/index.html @@ -168,41 +168,64 @@
Iowa State University
-Machine Learning (ML) software has been widely adopted in modern society, with reported fairness implications for minority groups based on race, sex, age, etc. - Many recent works have proposed methods to measure and mitigate algorithmic bias in ML models. The existing approaches focus on single classifier-based ML models. - However, real-world ML models are often composed of multiple independent or dependent learners in an ensemble (e.g., Random Forest), where the fairness composes in a non-trivial way. - How does fairness compose in ensembles? What are the fairness impacts of the learners on the ultimate fairness of the ensemble? Can fair learners result in an unfair ensemble? - Furthermore, studies have shown that hyperparameters influence the fairness of ML models. Ensemble hyperparameters are more complex since they affect how learners are combined in - different categories of ensembles. In this paper, we comprehensively study popular real-world ensembles: Bagging, Boosting, Stacking, and Voting. We have developed a benchmark - of 168 ensemble models collected from Kaggle on four popular fairness datasets. We use existing fairness metrics to understand the composition of fairness. Our results show that - ensembles can be designed to be fairer without using mitigation techniques. We also identify the interplay between fairness composition and data characteristics to guide fair - ensemble design.
- -Presentation Date: Monday, November 20, 2023 at 4:00 PM CET
+Iowa State University
+Machine Learning (ML) software has been widely adopted in modern society, with reported fairness implications for minority groups based on race, sex, age, etc. + Many recent works have proposed methods to measure and mitigate algorithmic bias in ML models. The existing approaches focus on single classifier-based ML models. + However, real-world ML models are often composed of multiple independent or dependent learners in an ensemble (e.g., Random Forest), where the fairness composes in a non-trivial way. + How does fairness compose in ensembles? What are the fairness impacts of the learners on the ultimate fairness of the ensemble? Can fair learners result in an unfair ensemble? + Furthermore, studies have shown that hyperparameters influence the fairness of ML models. Ensemble hyperparameters are more complex since they affect how learners are combined in + different categories of ensembles. In this paper, we comprehensively study popular real-world ensembles: Bagging, Boosting, Stacking, and Voting. We have developed a benchmark + of 168 ensemble models collected from Kaggle on four popular fairness datasets. We use existing fairness metrics to understand the composition of fairness. Our results show that + ensembles can be designed to be fairer without using mitigation techniques. We also identify the interplay between fairness composition and data characteristics to guide fair + ensemble design.
+ +Presentation Date: Monday, November 20, 2023 at 4:00 PM CET
-Iowa State University
+Deep learning-based vulnerability detection has shown great performance and, in some studies, outperformed static analysis tools. However, the highest-performing + approaches use token-based transformer models, which are not the most efficient to capture code semantics required for vulnerability detection. In this paper, we propose + to combine such causal-based vulnerability detection algorithms with deep learning, aiming to achieve more efficient and effective vulnerability detection. Specifically, + we designed DeepDFA, a dataflow analysis-inspired graph learning framework and an embedding technique that enables graph learning to simulate dataflow computation. We show + that DeepDFA is both performant and efficient. DeepDFA outperformed all non-transformer baselines. It was trained in 9 minutes, 75x faster than the highest-performing baseline + model. When using only 50+ vulnerable and several hundreds of total examples as training data, the model retained the same performance as 100% of the dataset. DeepDFA also + generalized to real-world vulnerabilities in DbgBench; it detected 8.7 out of 17 vulnerabilities on average across folds and was able to distinguish between patched and buggy versions. + By combining DeepDFA with a large language model, we surpassed the state-of-the-art vulnerability detection performance on the Big-Vul dataset + with 96.46 F1 score, 97.82 precision, and 95.14 recall.
+ +Presentation Date: Monday, December 4, 2023 at 3:00 PM CET
+ +