Formatting of MD Simulated Training Data #4

osession · 2023-07-26T20:39:41Z

Hello!

I have a question in regards to how the training data is formatted. I saw in your paper that 100 frames of MD simulation is gathered for each of the 3218 structures from the PDBBind data and from this post I understand that the 100 frames are represented as 100 pdb and 100 sdf files for one structure, so that it ends up with 321800 pdb/sdf files altogether.

I am curious to know how it affects the model if you are feeding in the 100 different pdb/sdfs for one structure that all have the same pk value? Is that the correct way of inputting the data for preprocessing and training?

I was also wondering about this statement in the README: "Unlike PDBBind, for custom input files, we pick pockets on the fly rather than as input." Is this only true for test data that is being run in the run_cusom_input.sh? Or is it true as well for data that is being passed in as training data for training a new model?

Thanks so much for all your great work!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Formatting of MD Simulated Training Data #4

Formatting of MD Simulated Training Data #4

osession commented Jul 26, 2023 •

edited

Loading

Formatting of MD Simulated Training Data #4

Formatting of MD Simulated Training Data #4

Comments

osession commented Jul 26, 2023 • edited Loading

osession commented Jul 26, 2023 •

edited

Loading