Understanding Prediction Results #11

bennyruss · 2020-06-10T14:06:27Z

hello,

I am trying to make sense of some results I got from the model, but it seems like they are on different scales. Here is a sample output below. This result is

observed | predicted
0.06345 | 0.72494
0.07636 | 0.37529
0.082 | 0.66338
0.08482 | 0.5969
0.08264 | 0.46543
0.07091 | 0.43927
0.067 | 0.26192
0.07445 | 0.25262
0.06682 | 0.26192
0.05955 | 0.40488
`
I have a dataset of 78 peptides I would like to test on, so I put that dataset in the test_path folder. However, I left all the other parameters the same. The RT for the test data is in seconds.
train_path = 'data/mod_train_2.txt'
test_path = 'data/DeepRTtest.txt'
result_path = 'work/mod_pred_test.txt'
log_path = 'work/mod_test.log'
save_prefix = 'work/mod/2/3'
pretrain_path = ''
dict_path = ''

conv1_kernel = 12
conv2_kernel = 12
min_rt = 0
max_rt = 110
time_scale = 60
max_length = 50

Thank you for the help!

horsepurve · 2020-06-10T20:40:42Z

Hi,

If you only want to predict RT values for a handful of testing data (without training), I would suggest using:

python prediction_emb_cpu.py max_rt param_cpu/dia_all_epo20_dim24_conv12/dia_all_epo20_dim24_conv12_filled.pt 12 data/DeepRTtest.txt

where max_rt is the maximum RT value among your 78 items (see here for details). In "config.py" the only change would be:

max_length = 66 # since we are using the "DIA" data for prediction, we change it to be the max peptide length of it

Best,

bennyruss · 2020-06-11T14:09:40Z

Hi there, thank you for the response!

So in the config.py file should I specify the max_rt of my data in minutes and then the scale of 60 converts that to seconds? Like so:
`train_path = 'data/mod_train_2.txt'
test_path = 'data/mod_test_2.txt'
result_path = 'work/mod_test_2.pred.txt'
log_path = 'work/mod_test_2.log'
save_prefix = 'work/mod/2/3'
pretrain_path = ''
dict_path = ''

conv1_kernel = 12
conv2_kernel = 12
min_rt = 0
max_rt = 11.2
time_scale = 60
max_length = 66
`

And then when I run the script below you told me to run I put the max_rt in seconds?

python prediction_emb_cpu.py **672** param_cpu/dia_all_epo20_dim24_conv12/dia_all_epo20_dim24_conv12_filled.pt 12 data/DeepRTtest.txt

I will also note that the data I am trying to predict is from an RPLC system. Am I using the right training files?

Thanks,
Ben

horsepurve · 2020-06-13T22:58:39Z

Hi, sorry for the delayed response.
Yes, generally all the RT values here are in minutes so "max_rt = 11.2 min", and "time_scale = 60" converts RT in the data file to minutes.
When running the "prediction_emb_cpu.py" script, if the max_rt is in minutes (i.e. 11.2) then the predicted values would be written in minutes, and if it's in seconds (i.e. 672) then the output file would be in seconds too.

Yes, the model provided here is also from an RPLC system [1], however, this model usually cannot be directly applied to another dataset from RPLC, because the gradients are usually different. So directly running "prediction_emb_cpu.py" would only give estimated RT of peptides other than their precise retention times under your chromatographic condition. To obtain a more accurate prediction, a bunch of calibration peptides is typically needed (i.e. transfer learning).

[1] A Repository of Assays to Quantify 10,000 Human Proteins by SWATH-MS. Sci. Data 2014, 1, 140031, DOI: 10.1038/sdata.2014.31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding Prediction Results #11

Understanding Prediction Results #11

bennyruss commented Jun 10, 2020

horsepurve commented Jun 10, 2020 •

edited

Loading

bennyruss commented Jun 11, 2020 •

edited

Loading

horsepurve commented Jun 13, 2020

Understanding Prediction Results #11

Understanding Prediction Results #11

Comments

bennyruss commented Jun 10, 2020

horsepurve commented Jun 10, 2020 • edited Loading

bennyruss commented Jun 11, 2020 • edited Loading

horsepurve commented Jun 13, 2020

horsepurve commented Jun 10, 2020 •

edited

Loading

bennyruss commented Jun 11, 2020 •

edited

Loading