Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding Prediction Results #11

Open
bennyruss opened this issue Jun 10, 2020 · 3 comments
Open

Understanding Prediction Results #11

bennyruss opened this issue Jun 10, 2020 · 3 comments

Comments

@bennyruss
Copy link

hello,

I am trying to make sense of some results I got from the model, but it seems like they are on different scales. Here is a sample output below. This result is

observed | predicted
0.06345 | 0.72494
0.07636 | 0.37529
0.082 | 0.66338
0.08482 | 0.5969
0.08264 | 0.46543
0.07091 | 0.43927
0.067 | 0.26192
0.07445 | 0.25262
0.06682 | 0.26192
0.05955 | 0.40488
`
I have a dataset of 78 peptides I would like to test on, so I put that dataset in the test_path folder. However, I left all the other parameters the same. The RT for the test data is in seconds.
train_path = 'data/mod_train_2.txt'
test_path = 'data/DeepRTtest.txt'
result_path = 'work/mod_pred_test.txt'
log_path = 'work/mod_test.log'
save_prefix = 'work/mod/2/3'
pretrain_path = ''
dict_path = ''

conv1_kernel = 12
conv2_kernel = 12
min_rt = 0
max_rt = 110
time_scale = 60
max_length = 50

Thank you for the help!

@horsepurve
Copy link
Owner

horsepurve commented Jun 10, 2020

Hi,

If you only want to predict RT values for a handful of testing data (without training), I would suggest using:

python prediction_emb_cpu.py max_rt param_cpu/dia_all_epo20_dim24_conv12/dia_all_epo20_dim24_conv12_filled.pt 12 data/DeepRTtest.txt

where max_rt is the maximum RT value among your 78 items (see here for details). In "config.py" the only change would be:

max_length = 66 # since we are using the "DIA" data for prediction, we change it to be the max peptide length of it

Best,

@bennyruss
Copy link
Author

bennyruss commented Jun 11, 2020

Hi there, thank you for the response!

So in the config.py file should I specify the max_rt of my data in minutes and then the scale of 60 converts that to seconds? Like so:
`train_path = 'data/mod_train_2.txt'
test_path = 'data/mod_test_2.txt'
result_path = 'work/mod_test_2.pred.txt'
log_path = 'work/mod_test_2.log'
save_prefix = 'work/mod/2/3'
pretrain_path = ''
dict_path = ''

conv1_kernel = 12
conv2_kernel = 12
min_rt = 0
max_rt = 11.2
time_scale = 60
max_length = 66
`

And then when I run the script below you told me to run I put the max_rt in seconds?

python prediction_emb_cpu.py **672** param_cpu/dia_all_epo20_dim24_conv12/dia_all_epo20_dim24_conv12_filled.pt 12 data/DeepRTtest.txt

I will also note that the data I am trying to predict is from an RPLC system. Am I using the right training files?

Thanks,
Ben

@horsepurve
Copy link
Owner

Hi, sorry for the delayed response.
Yes, generally all the RT values here are in minutes so "max_rt = 11.2 min", and "time_scale = 60" converts RT in the data file to minutes.
When running the "prediction_emb_cpu.py" script, if the max_rt is in minutes (i.e. 11.2) then the predicted values would be written in minutes, and if it's in seconds (i.e. 672) then the output file would be in seconds too.

Yes, the model provided here is also from an RPLC system [1], however, this model usually cannot be directly applied to another dataset from RPLC, because the gradients are usually different. So directly running "prediction_emb_cpu.py" would only give estimated RT of peptides other than their precise retention times under your chromatographic condition. To obtain a more accurate prediction, a bunch of calibration peptides is typically needed (i.e. transfer learning).

[1] A Repository of Assays to Quantify 10,000 Human Proteins by SWATH-MS. Sci. Data 2014, 1, 140031, DOI: 10.1038/sdata.2014.31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants