Evaluating_Immunogenicity_Classifier

Evaluating performance of immunogenicity classifiers

Here, we provide data and scripts to generate all main figures in the paper.

Figure 1 - Data curation

Data_curation_visualisations.rmd generates panels from Figure 1. This is done using the standardDataset for Fig 1B, Pan HLA dataset for Fig 1C-E and GBM dataset for Fig 1F.

Figure 2 - Model benchmarking

Re-training each model and testing in a k-fold cross-validation fashion, takes a considerable amount of time due to the feature computation component of Repitope. Therefore for the following benchmarking experiments, the data after crossvalidation occurred is read into each markdown file and figures are generated. An example of how each model is re-trained and tested in a cross-validation context is shown in 10Fold_CV_Example.rmd. This file runs 10-fold crossvalidation for each model (except Repitope which takes too long, an example script for Repitope is provided and here the results are simply read in), in the pathogenic HLA-specific scenario.
PanHLA_Pathogenic_CV.rmd generates Fig 2A and corresponding supplementary confusion matrices /PR-AUC. This uses the data generated after the cross-validation experiment (PanHLA_combinedData.rds). A .txt version of this data is included.
HLASpecific_Pathogenic_CV.rmd generates Fig 2B and corresponding supplementary confusion matrices / PR-AUC. This uses the data generated after the cross-validation experiment (HLASpecific_combinedData.rds). A .txt version of this data is included.
Benchmark_GBM_PanHLA.rmd generates Fig 2C and corresponding supplementary confusion matrices / ROC-AUC. This uses the data generated after training the models on the Pan-HLA dataset and testing against the GBM dataset (GBM_PAN_HLA_combinedData.rds). A .txt version of this data is included.
Benchmark_GBM_A201.rmd generates Fig 2D and corresponding supplemenetary confusion matrices / ROC-AUC. This uses the data generated after training the models on the HLA-specific dataset and testing against the GBM dataset (HLA_Specific_GBM_combinedData.rds). A .txt version of this data is included.
Bjerregaard_PanHLA.rmd generates Fig 2E and correspsonding supplementary confusion matrices / ROC-AUC. This uses the data generated after training the models on the Pan HLA dataset and testing against the 291 Bjerregaard 9mers dataset (PanHLA_Bjerregaard_combinedData.rds). A .txt version of this data is included.
Bjerregaard_HLA_Specific.rmd generates Fig 2F and correspsonding supplementary confusion matrices / ROC-AUC. This uses the data generated after training the models on the HLA Specific dataset and testing against the 291 Bjerregaard 9mers dataset (HLASpecific_Bjerregaard_combinedData.rds). A .txt version of this data is included.

Figure 3 - Models are unreliable for predicting immunogenic neoantigens

Models_unreliable_neoantigens.rmd generates all panels of Figure 3.

Figure 4 - Evaluating the effects of HLA skewness on model performance

Evaluating_HLA_imbalance.rmd generates all panels of Figure 4.

Figure 5 - Further data associated complexities

Further_data_associated_complexities.rmd generates all panels of Figure 5.

Key dataset location

Standard dataset (Additional file 2) - "Datasets_csv/standardDataset_200903_PB_Inc_ImmunogenicityEvidence.csv"
Pan HLA dataset (Additional file 3) - "Analysis_Pan_HLA_Pathogenic/PanHLA_FullDataset"
HLA Specific dataset (Additional file 4) - "Analysis_HLA_specific_Pathogenic/A201_standardData_forAnalysis_BALANCED.rds" and /HLASpecific_FullDataset
GBM Dataset (Additional file 5) (produced by Margardia Rei and Rui Ma) "GBM_Benchmark_PanHLA_Train/GBM_Peptides.tsv"

Third-party code

NetTepi - https://services.healthtech.dtu.dk/service.php?NetTepi-1.0 (please remember for re-training, a new .MOD file generated by ourselves is passed to NetTepi's python script)
iPred - https://github.com/antigenomics/ipred
Repitope - https://github.com/masato-ogishi/Repitope
NetMHCpan 4.0 - https://services.healthtech.dtu.dk/service.php?NetMHCpan-4.1

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Analysis_HLA_specific_Pathogenic		Analysis_HLA_specific_Pathogenic
Analysis_Pan_HLA_Pathogenic		Analysis_Pan_HLA_Pathogenic
Bjerregaard_HLA_specific		Bjerregaard_HLA_specific
Bjerregaard_PanHLA		Bjerregaard_PanHLA
Datasets_csv		Datasets_csv
GBM_Benchmark_A201_Train		GBM_Benchmark_A201_Train
GBM_Benchmark_PanHLA_Train		GBM_Benchmark_PanHLA_Train
Models_unreliable_neoantigens_files/figure-html		Models_unreliable_neoantigens_files/figure-html
.DS_Store		.DS_Store
.gitignore		.gitignore
10Fold_CV_Example.Rmd		10Fold_CV_Example.Rmd
10Fold_CV_Example.html		10Fold_CV_Example.html
Data_curation_visualisations.Rmd		Data_curation_visualisations.Rmd
Data_curation_visualisations.html		Data_curation_visualisations.html
Evaluating_HLA_Imbalance.Rmd		Evaluating_HLA_Imbalance.Rmd
Evaluating_HLA_Imbalance.html		Evaluating_HLA_Imbalance.html
Further_data_associated_complexities.Rmd		Further_data_associated_complexities.Rmd
Further_data_associated_complexities.html		Further_data_associated_complexities.html
IEDB_CALIS_TRAINING_DATA.csv		IEDB_CALIS_TRAINING_DATA.csv
LICENSE		LICENSE
Models_unreliable_neoantigens.Rmd		Models_unreliable_neoantigens.Rmd
Models_unreliable_neoantigens.html		Models_unreliable_neoantigens.html
PREDICT_HLA_combinedData.rds		PREDICT_HLA_combinedData.rds
README.html		README.html
README.md		README.md
Run_Repitope_Example.R		Run_Repitope_Example.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluating_Immunogenicity_Classifier

Evaluating performance of immunogenicity classifiers

Figure 1 - Data curation

Figure 2 - Model benchmarking

Figure 3 - Models are unreliable for predicting immunogenic neoantigens

Figure 4 - Evaluating the effects of HLA skewness on model performance

Figure 5 - Further data associated complexities

Key dataset location

Third-party code

About

Releases

Packages

Languages

License

hkoohy/Immunogenicity_Classifier

Folders and files

Latest commit

History

Repository files navigation

Evaluating_Immunogenicity_Classifier

Evaluating performance of immunogenicity classifiers

Figure 1 - Data curation

Figure 2 - Model benchmarking

Figure 3 - Models are unreliable for predicting immunogenic neoantigens

Figure 4 - Evaluating the effects of HLA skewness on model performance

Figure 5 - Further data associated complexities

Key dataset location

Third-party code

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages