Skip to content

hkoohy/Immunogenicity_Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Evaluating_Immunogenicity_Classifier

Evaluating performance of immunogenicity classifiers

Here, we provide data and scripts to generate all main figures in the paper.

Figure 1 - Data curation

  • Data_curation_visualisations.rmd generates panels from Figure 1. This is done using the standardDataset for Fig 1B, Pan HLA dataset for Fig 1C-E and GBM dataset for Fig 1F.

Figure 2 - Model benchmarking

  • Re-training each model and testing in a k-fold cross-validation fashion, takes a considerable amount of time due to the feature computation component of Repitope. Therefore for the following benchmarking experiments, the data after crossvalidation occurred is read into each markdown file and figures are generated. An example of how each model is re-trained and tested in a cross-validation context is shown in 10Fold_CV_Example.rmd. This file runs 10-fold crossvalidation for each model (except Repitope which takes too long, an example script for Repitope is provided and here the results are simply read in), in the pathogenic HLA-specific scenario.
  • PanHLA_Pathogenic_CV.rmd generates Fig 2A and corresponding supplementary confusion matrices /PR-AUC. This uses the data generated after the cross-validation experiment (PanHLA_combinedData.rds). A .txt version of this data is included.
  • HLASpecific_Pathogenic_CV.rmd generates Fig 2B and corresponding supplementary confusion matrices / PR-AUC. This uses the data generated after the cross-validation experiment (HLASpecific_combinedData.rds). A .txt version of this data is included.
  • Benchmark_GBM_PanHLA.rmd generates Fig 2C and corresponding supplementary confusion matrices / ROC-AUC. This uses the data generated after training the models on the Pan-HLA dataset and testing against the GBM dataset (GBM_PAN_HLA_combinedData.rds). A .txt version of this data is included.
  • Benchmark_GBM_A201.rmd generates Fig 2D and corresponding supplemenetary confusion matrices / ROC-AUC. This uses the data generated after training the models on the HLA-specific dataset and testing against the GBM dataset (HLA_Specific_GBM_combinedData.rds). A .txt version of this data is included.
  • Bjerregaard_PanHLA.rmd generates Fig 2E and correspsonding supplementary confusion matrices / ROC-AUC. This uses the data generated after training the models on the Pan HLA dataset and testing against the 291 Bjerregaard 9mers dataset (PanHLA_Bjerregaard_combinedData.rds). A .txt version of this data is included.
  • Bjerregaard_HLA_Specific.rmd generates Fig 2F and correspsonding supplementary confusion matrices / ROC-AUC. This uses the data generated after training the models on the HLA Specific dataset and testing against the 291 Bjerregaard 9mers dataset (HLASpecific_Bjerregaard_combinedData.rds). A .txt version of this data is included.

Figure 3 - Models are unreliable for predicting immunogenic neoantigens

  • Models_unreliable_neoantigens.rmd generates all panels of Figure 3.

Figure 4 - Evaluating the effects of HLA skewness on model performance

  • Evaluating_HLA_imbalance.rmd generates all panels of Figure 4.

Figure 5 - Further data associated complexities

  • Further_data_associated_complexities.rmd generates all panels of Figure 5.

Key dataset location

  • Standard dataset (Additional file 2) - "Datasets_csv/standardDataset_200903_PB_Inc_ImmunogenicityEvidence.csv"
  • Pan HLA dataset (Additional file 3) - "Analysis_Pan_HLA_Pathogenic/PanHLA_FullDataset"
  • HLA Specific dataset (Additional file 4) - "Analysis_HLA_specific_Pathogenic/A201_standardData_forAnalysis_BALANCED.rds" and /HLASpecific_FullDataset
  • GBM Dataset (Additional file 5) (produced by Margardia Rei and Rui Ma) "GBM_Benchmark_PanHLA_Train/GBM_Peptides.tsv"

Third-party code

About

Evaluating performance of immunogenicity classifiers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages