about preprocess_raw_data.py #9

lijiashan2020 · 2022-04-12T02:02:33Z

When I run the command as follows:

python preprocess_raw_data.py -n_jobs 60 -data dips -graph_nodes residues -graph_cutoff 30 -graph_max_neighbor 10 -graph_residue_loc_is_alphaC -pocket_cutoff 8 -data_fraction 1.0

it can generate six files in the directory /extendplus/jiashan/equidock_public/src/cache/dips_residues_maxneighbor_10_cutoff_30.0_pocketCut_8.0/cv_0
with files

label_test.pkl  ligand_graph_test.bin  receptor_graph_test.bin
label_val.pkl   ligand_graph_val.bin   receptor_graph_val.bin

However, three more files could not be generated successfully, and report errors as follows:

Processing  ./cache/dips_residues_maxneighbor_10_cutoff_30.0_pocketCut_8.0/cv_0/label_frac_1.0_train.pkl
Num of pairs in  train  =  39901
Killed

Could you help me solve this problem?
Thanks!

The text was updated successfully, but these errors were encountered:

octavian-ganea · 2022-04-14T17:56:14Z

Generating the full DIPS training data takes a lot of time and you have to check if you have enough resources for it. Can you try generating just a fraction of it first, e.g., -data_fraction 0.1 ?

lijiashan2020 · 2022-04-19T14:52:00Z

Thank you for your reply! I can successfully run the command by modifying parameters! Thank you very much for help!

lizhenping · 2022-11-30T12:29:46Z

Thank you for your reply! I can successfully run the command by modifying parameters! Thank you very much for help!

I run it with 160GB RAM for five hours, still failed get the same error. that's really nedd a huge resources.
mark it hope usefull for others

lizhenping · 2022-12-01T02:12:35Z

marke it , i used 25 cpu 400GB RAm processed for 15 hours.

Octopus125 · 2023-01-19T03:27:27Z

I had the same problem. The main reason for this is insufficient memory. The pre-processing of the training data of DIPS dataset did require a large amount of memory, which I could not complete this at one time with a server with 256G memory.

One way is to batch. /DIPS/data/DIPS/interim/pairs-pruned/pairs-postprocessed-train.txt stores all the PDB files waiting to be pre-processed. So, you can divide the txt file into several parts and preprocessing respectively. After this you just need to merge the generated files together. I divided the training data into two parts and finished the pre-processing successfully with 256G memory server.

lijiashan2020 mentioned this issue Apr 12, 2022

deallock in make_dataset #7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about preprocess_raw_data.py #9

about preprocess_raw_data.py #9

lijiashan2020 commented Apr 12, 2022

octavian-ganea commented Apr 14, 2022

lijiashan2020 commented Apr 19, 2022

lizhenping commented Nov 30, 2022

lizhenping commented Dec 1, 2022

Octopus125 commented Jan 19, 2023

about preprocess_raw_data.py #9

about preprocess_raw_data.py #9

Comments

lijiashan2020 commented Apr 12, 2022

octavian-ganea commented Apr 14, 2022

lijiashan2020 commented Apr 19, 2022

lizhenping commented Nov 30, 2022

lizhenping commented Dec 1, 2022

Octopus125 commented Jan 19, 2023