diff --git a/img/Xin-Cheng_Wen.jpg b/img/Xin-Cheng_Wen.jpg new file mode 100644 index 0000000..309da66 Binary files /dev/null and b/img/Xin-Cheng_Wen.jpg differ diff --git a/index.html b/index.html index b476842..6c1e23f 100644 --- a/index.html +++ b/index.html @@ -177,23 +177,25 @@
Nanjing University
+HIT
With the rapid development of Deep Learning, deep predictive models have been widely applied to improve Software Engineering tasks, such as defect prediction and issue - classification, and have achieved remarkable success. They are mostly trained in a supervised manner, which heavily relies on high-quality datasets. Unfortunately, due to - the nature and source of software engineering data, the real-world datasets often suffer from the issues of sample mislabelling and class imbalance, thus undermining the - effectiveness of deep predictive models in practice. This problem has become a major obstacle for deep learning-based Software Engineering. In this paper, we propose - RobustTrainer, the first approach to learning deep predictive models on raw training datasets where the mislabelled samples and the imbalanced classes coexist. - RobustTrainer consists of a two-stage training scheme, where the first learns feature representations robust to sample mislabelling and the second builds a classifier robust - to class imbalance based on the learned representations in the first stage. We apply RobustTrainer to two popular Software Engineering tasks, i.e., Bug Report Classification - and Software Defect Prediction. Evaluation results show that RobustTrainer effectively tackles the mislabelling and class imbalance issues and produces significantly better - deep predictive models compared to the other six comparison approaches.
+Automated code vulnerability detection has gained increasing attention in recent years. The deep learning (DL)-based methods, which implicitly learn vulnerable code patterns, have proven + effective in vulnerability detection. The performance of DL-based methods usually relies on the quantity and quality of labeled data. However, the current labeled data are generally automatically + collected, such as crawled from human-generated commits, making it hard to ensure the quality of the labels. Prior studies have demonstrated that the non-vulnerable code (i.e., negative labels) + tends to be unreliable in commonly-used datasets, while vulnerable code (i.e., positive labels) is more determined. Considering the large numbers of unlabeled data in practice, it is necessary and + worth exploring to leverage the positive data and large numbers of unlabeled data for more accurate vulnerability detection. In this paper, we focus on the Positive and Unlabeled (PU) learning problem + for vulnerability detection and propose a novel model named PILOT, i.e., Positive and unlabeled Learning mOdel for vulnerability deTection. PILOT only learns from positive and unlabeled data for + vulnerability detection. It mainly contains two modules: (1) A distance-aware label selection module, aiming at generating pseudo-labels for selected unlabeled data, which involves the inter-class + distance prototype and progressive fine-tuning; (2) A mixed-supervision representation learning module to further alleviate the influence of noise and enhance the discrimination of representations. + The experimental results show that PILOT outperforms + the popular weakly supervised methods by 2.78%-18.93% in the PU learning setting. Compared with the state-of-the-art methods, PILOT also improves the performance of 1.34%-12.46 % in F1 score metrics in + the supervised setting. In addition, PILOT can identify 23 mislabeled from the FFMPeg+Qemu dataset in the PU learning setting based on manual checking.
-Presentation Date: Monday, December 18, 2023 at 10:30 AM CET
+Presentation Date: Monday, January 29, 2024 at 10:30 AM CET
Nanjing University
+With the rapid development of Deep Learning, deep predictive models have been widely applied to improve Software Engineering tasks, such as defect prediction and issue + classification, and have achieved remarkable success. They are mostly trained in a supervised manner, which heavily relies on high-quality datasets. Unfortunately, due to + the nature and source of software engineering data, the real-world datasets often suffer from the issues of sample mislabelling and class imbalance, thus undermining the + effectiveness of deep predictive models in practice. This problem has become a major obstacle for deep learning-based Software Engineering. In this paper, we propose + RobustTrainer, the first approach to learning deep predictive models on raw training datasets where the mislabelled samples and the imbalanced classes coexist. + RobustTrainer consists of a two-stage training scheme, where the first learns feature representations robust to sample mislabelling and the second builds a classifier robust + to class imbalance based on the learned representations in the first stage. We apply RobustTrainer to two popular Software Engineering tasks, i.e., Bug Report Classification + and Software Defect Prediction. Evaluation results show that RobustTrainer effectively tackles the mislabelling and class imbalance issues and produces significantly better + deep predictive models compared to the other six comparison approaches.
+ +Presentation Date: Monday, December 18, 2023 at 10:30 AM CET
+Presentation Date: Monday, December 4, 2023 at 3:00 PM CET
-Presentation Date: Monday, November 20, 2023 at 4:00 PM CET
-