Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How long is run_unbiased function in ravefuncs.py expected to take? #2

Open
ackbar03 opened this issue Nov 29, 2022 · 4 comments
Open

Comments

@ackbar03
Copy link

ackbar03 commented Nov 29, 2022

Hi,

I am running run_unbiased in ravefuncs.py via line rave.run_unbiased(on_gpu,plumedfile,dt,temp,freq,nstep,index) in the "Unbiased simulations" code block. This is taking a long time however.

I added some debugging statements and it seems to be stuck at
modeller.addSolvent(forcefield) in line 98 (

modeller.addSolvent(forcefield, padding=0.5*nanometers, model='tip3p', neutralize=True, positiveIon='Na+', negativeIon='Cl-')
)
for a very long time, upwards of 25 minutes.

Is this normal? How long is this function expected to take?

Thanks

@bodhivani
Copy link
Contributor

Hi ackbar03,

That's really strange-- that specific step should not take more than a couple of seconds, if that. One question- are you using GPUs?

@ackbar03
Copy link
Author

ackbar03 commented Dec 2, 2022

Hi,

I'm using GPU. I can see some memory is also occupied in the GPU. However, the addSolvent step is done before the use GPU check in

if on_gpu:

, I'm not sure if its related.

I've added some print statements for debugging to both ravefuncs.py and the relevant Code section in the heavydemo notebook

Notebook:

os.system('export CUDA_VISIBLE_DEVICES=0')

t1 = time.perf_counter()
if os.path.isdir("Structures")==False:
  print("You have not run prior AF2 predictions, copying them from github")
  os.system("unzip alphafold2rave/CSP_data/structures.zip -d .")
  num_samples=128
  listindices=[1, 224, 627, 533]
  
#@markdown Check the below box to run on GPU
on_gpu=True #@param {type:"boolean"}
# on_gpu=False #@param {type:"boolean"}
#@markdown MD parameters

#@markdown Integration Timestep (ps)
dt =0.004 #@param{type:"number"}
#@markdown Temperature (K)
temp=300 #@param{type:"number"}
freq=1 #param{type:"number"}
#@markdown Number of steps 
nstep=250000 #@param{type:"number"}
#markdown Plumed file
plumedfile="plumed_unb.dat" #param{type:"string"}
if os.path.isdir("unbiased")==False:
  os.mkdir("unbiased")
os.chdir("./unbiased")
cpath=os.getcwd()
print(cpath)


# os.system('cp /content/alphafold2rave/CSP_data/plumed_unb.dat .')
os.system('cp /home/[redacted]/alphafold2rave_new/alphafold2rave/CSP_data/plumed_unb.dat .')
plumedfile=os.path.join(cpath,plumedfile)

print("check 1")
# If you dont want a plumed file in unbiased 
# Then specify "None" as your plumedfile variable
for index in listindices:
  
  print(cpath)  
  if not os.path.isdir(f'{index}'):
    os.mkdir(f'{index}')
  os.chdir(f'./{index}')
  # os.system(f"cp /content/Structures/pred_{index}.pdb .")
  os.system(f"cp /home/[redacted]/alphafold2rave_new/alphafold2rave/content/Structures/pred_{index}.pdb .")
  print("check2")  
  rave.run_unbiased(on_gpu,plumedfile,dt,temp,freq,nstep,index)
  print(cpath)
  print("check3")  
  os.chdir("..")

for index in listindices:
  # os.chdir(f'/content/unbiased/{index}')
  os.chdir(f'/home/[redacted]/alphafold2rave_new/alphafold2rave/content/unbiased/{index}')
  if os.path.isfile('bck.0.COLVAR_unb.dat'):
    os.remove('COLVAR_unb.dat')
    os.replace('bck.0.COLVAR_unb.dat','COLVAR_unb.dat')
# os.chdir('/content')
os.chdir('/home/[redacted]/alphafold2rave_new/alphafold2rave/content')

t2 = time.perf_counter()
print(f'time taken to run:{(t2-t1)/60:.2f} mins')

ravefuncs.py:


def run_unbiased(on_gpu,plumedfile,dt,temp,freq,nstep,index):
  """
  Runs an unbiased simulation on the cluster center using openMM.
  The MD engine also uses plumed for on the fly calculations
  input : raw pdb from colabfold
  forcefields : amber03 and tip3p
  output : fixed_{index}.pdb, unb_{index}.pdb, COLVAR_unb
  """
  if plumedfile != "None":
    use_plumed=True
  
  outfreq = 0
  chkpt_freq=0
  save_chkpt_file=False
  
  print(f'We are at {os.getcwd()}')
  
  #fixing PDBs to avoid missing residue or terminal issues
  fix_pdb(index);
  pdb_fixed=f'fixed_{index}.pdb'
  print(pdb_fixed)
  #Get the structure and assign force field
  print(f'Loading PDB file')
  print(os.path.isfile(pdb_fixed))
  
  pdb = PDBFile(pdb_fixed) 
  print(f'Loading forcefield')
  forcefield = ForceField('amber03.xml', 'tip3p.xml')
  
  # Placing in a box and adding hydrogens, ions and water
  print(f'Loading modeller')
  modeller = Modeller(pdb.topology, pdb.positions)
  print(f'Adding stuff')
  modeller.addHydrogens(forcefield)
  print(f'Adding solvent')
  modeller.addSolvent(forcefield, padding=0.5*nanometers, model='tip3p', neutralize=True, positiveIon='Na+', negativeIon='Cl-')

The output is:
/home/[redacted]/alphafold2rave_new/alphafold2rave/content/unbiased check 1 /home/[redacted]/alphafold2rave_new/alphafold2rave/content/unbiased check2 We are at /home/[redacted]/alphafold2rave_new/alphafold2rave/content/unbiased/1 fixed_1.pdb Loading PDB file True Loading forcefield Loading modeller Adding stuff Adding solvent

@bodhivani
Copy link
Contributor

This is really weird! We have run this multiple times on colab without issues. To clarify, you are running this on your own system with openmm 7.5.1? The only other thing I can think of is that there are version compatibility issues. What is your openmm forcefield package version? I don't think it's a CUDA issue because I don't think the modeller actually uses GPU.

I think the next solution would be to run this on your local machine or server using whatever MD package you know works compatibly on it. I (or my co-author Akash) can help you with this-- any MD engine should be fine as long as it is Plumed compatible.

The other option is to run this on colab, but while it will work fine for CSP, it will not be enough for your actual system of interest.

@ackbar03
Copy link
Author

ackbar03 commented Dec 3, 2022

Hi,

It seems that I am able to run it directly using a .py file instead of from a Jupiter notebook.

I have no idea why. It might be because I had to rerun the cells out of order since the initial run stopped due to erroneous directories and file paths and I wanted to skip the AF2 inference stage.

In any case I am able to successfully run the rest of the pipeline. I'm not completely sure what it is I am running yet at the moment though and have yet to look in detail but thanks a lot for your prompt help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants