Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ram fix #134

Closed
wants to merge 12 commits into from
Closed

Ram fix #134

wants to merge 12 commits into from

Conversation

hoelzerC
Copy link
Collaborator

@hoelzerC hoelzerC commented Jul 18, 2023

The following changes have been made:

  • Refactor SCF logic into semi-pure functions
  • Detach SCF initial guess from autograd graph

This allows to fulfill the provided RAM tests and counteract the existing memory leak.

Please review the changes and provide feedback.

src/dxtb/scf/iterator.py Outdated Show resolved Hide resolved
@hoelzerC hoelzerC added the autograd Related to PyTorch's autograd engine label Jul 19, 2023
@hoelzerC
Copy link
Collaborator Author

Now with an updated test suite, it could be verified that all tests succeed (only minor tolerance tweaking required). Especially tests regarding positional gradients run as is.

Hence, ready to merge.

@marvinfriede
Copy link
Member

Now with an updated test suite, it could be verified that all tests succeed (only minor tolerance tweaking required). Especially tests regarding positional gradients run as is.

Hence, ready to merge.

I will take a look at it.

@hoelzerC
Copy link
Collaborator Author

Fixed the update-induced fails.

@hoelzerC
Copy link
Collaborator Author

hoelzerC commented Jul 20, 2023

Logs for tests where tolerances needed to be adapted.

The first entry denotes the previous method, and the second one the new method.

Output
test/test_scf/test_elements.py::test_element[dtype0-25] 
iter  energy                   energy change  P norm change   charge change  
-----------------------------------------------------------------------------
  1   -1.7577071189880371E+00   1.757707E+00  3.130482E+00    2.190885E+00
  2    3.3726801872253418E+00   5.130387E+00  4.704489E+00    4.704489E+00
  3   -1.7577036619186401E+00   5.130384E+00  4.704491E+00    4.704491E+00
  4   -2.0400154590606689E+00   2.823118E-01  1.909379E+00    1.909378E+00
  5   -1.7704943418502808E+00   2.695211E-01  1.869487E+00    1.869487E+00
  6   -1.8900899887084961E+00   1.195956E-01  4.315020E-01    4.315020E-01
  7   -2.0247464179992676E+00   1.346564E-01  1.717674E+00    1.717674E+00
  8   -2.0258700847625732E+00   1.123667E-03  1.419320E-02    1.419320E-02
  9   -2.0246031284332275E+00   1.266956E-03  1.600673E-02    1.600673E-02
 10    3.3721199035644531E+00   5.396723E+00  3.651462E+00    3.651462E+00
 11   -2.0246026515960693E+00   5.396723E+00  3.651461E+00    3.651461E+00
 12   -2.0246074199676514E+00   4.768372E-06  6.548779E-05    6.548779E-05
 13   -2.0244750976562500E+00   1.323223E-04  6.203901E-04    6.203901E-04
 14    3.3247368335723877E+00   5.349212E+00  3.633554E+00    3.633554E+00
 15   -2.0245735645294189E+00   5.349310E+00  3.634022E+00    3.634022E+00
 16   -2.0245983600616455E+00   2.479553E-05  1.276079E-04    1.276079E-04
 17   -1.7576973438262939E+00   2.669010E-01  2.190892E+00    2.190892E+00
 18   -1.7820525169372559E+00   2.435517E-02  7.675599E-02    7.675594E-02
 19    3.8559269905090332E+00   5.637980E+00  4.732750E+00    4.732750E+00
 20    3.3720984458923340E+00   4.838285E-01  1.115793E+00    1.115793E+00
 21   -1.7577024698257446E+00   5.129801E+00  4.704573E+00    4.704573E+00
-----------------------------------------------------------------------------
**********************Energy**********************

Contribution                 Energy in a.u.    
--------------------------------------------------
DispersionD3                 0.0000000000000000
Repulsion                    0.0000000000000000
Halogen                      0.0000000000000000
Electronic free energy      -0.0071254242211580
Electronic Energy (SCF)     -2.0400154590606689
--------------------------------------------------
Total Energy                -2.0471408367156982


************************Timings************************

Objective                 Time in s          Time in %   
-------------------------------------------------------
setup calculator          0.003                 0.91   
Overlap                   0.003                 0.74   
Core Hamiltonian          0.002                 0.60   
SCF                       0.122                35.21   
DispersionD3              0.215                62.13   
Repulsion                 0.001                 0.25   
Halogen                   0.000                 0.05   
-------------------------------------------------------
sum                       0.345                99.89  
total                     0.346               100.00  

test/test_scf/test_elements.py::test_element[dtype0-25] 
iter  energy                   energy change  P norm change   charge change  
-----------------------------------------------------------------------------
  1   -1.7576905488967896E+00   1.757691E+00  3.605551E+00    2.828427E+00
  2    3.3720154762268066E+00   5.129706E+00  5.099020E+00    5.099020E+00
  3   -1.7576905488967896E+00   5.129706E+00  5.099020E+00    5.099020E+00
  4   -2.0246009826660156E+00   2.669104E-01  2.828427E+00    2.828427E+00
  5   -1.7576905488967896E+00   2.669104E-01  2.828427E+00    2.828427E+00
  6   -1.7576905488967896E+00   0.000000E+00  0.000000E+00    0.000000E+00
  7    3.8731698989868164E+00   5.630860E+00  5.099020E+00    5.099020E+00
  8    3.3720154762268066E+00   5.011544E-01  1.414214E+00    1.414214E+00
  9   -1.7576905488967896E+00   5.129706E+00  5.099020E+00    5.099020E+00
 10   -1.7576905488967896E+00   0.000000E+00  0.000000E+00    0.000000E+00
 11    3.8731698989868164E+00   5.630860E+00  5.099020E+00    5.099020E+00
 12   -1.7576905488967896E+00   5.630860E+00  5.099020E+00    5.099020E+00
 13   -1.7576907873153687E+00   2.384186E-07  1.414214E+00    1.414214E+00
 14    3.8731698989868164E+00   5.630861E+00  5.099020E+00    5.099020E+00
 15   -2.0246009826660156E+00   5.897771E+00  4.690416E+00    4.690416E+00
 16   -2.0246009826660156E+00   0.000000E+00  1.414214E+00    1.414214E+00
 17   -2.0246009826660156E+00   0.000000E+00  0.000000E+00    0.000000E+00
 18    3.3720154762268066E+00   5.396616E+00  4.242640E+00    4.242640E+00
 19   -1.7576907873153687E+00   5.129706E+00  5.099020E+00    5.099020E+00
 20   -2.0246009826660156E+00   2.669102E-01  2.828427E+00    2.828427E+00
 21   -2.0246009826660156E+00   0.000000E+00  2.449490E+00    2.449490E+00
-----------------------------------------------------------------------------
**********************Energy**********************

Contribution                 Energy in a.u.    
--------------------------------------------------
Halogen                      0.0000000000000000
Repulsion                    0.0000000000000000
DispersionD3                 0.0000000000000000
Electronic free energy      -0.0000000326171552
Electronic Energy (SCF)     -2.0246009826660156
--------------------------------------------------
Total Energy                -2.0246009826660156


************************Timings************************

Objective                 Time in s          Time in %   
-------------------------------------------------------
setup calculator          0.003                 0.92   
Overlap                   0.002                 0.75   
Core Hamiltonian          0.002                 0.55   
SCF                       0.115                34.51   
Halogen                   0.000                 0.06   
Repulsion                 0.001                 0.22   
DispersionD3              0.210                62.88   
-------------------------------------------------------
sum                       0.334                99.88  
total                     0.335               100.00  





test/test_scf/test_elements.py::test_element[dtype0-42] 
iter  energy                   energy change  P norm change   charge change  
-----------------------------------------------------------------------------
  1   -1.7483208179473877E+00   1.748321E+00  2.683266E+00    2.190884E+00
  2   -1.7639018297195435E+00   1.558101E-02  2.186955E+00    2.186955E+00
  3   -1.7604176998138428E+00   3.484130E-03  1.736691E+00    1.736691E+00
  4   -1.7676620483398438E+00   7.244349E-03  1.482926E+00    1.482926E+00
  5   -1.7707145214080811E+00   3.052473E-03  4.330818E-01    4.330818E-01
  6   -1.7707682847976685E+00   5.376339E-05  1.470735E-01    1.470735E-01
  7   -1.7707974910736084E+00   2.920628E-05  2.863076E-02    2.863076E-02
  8   -1.7708067893981934E+00   9.298325E-06  7.955647E-04    7.955647E-04
  9   -1.7708148956298828E+00   8.106232E-06  3.638642E-04    3.638643E-04
 10   -1.7708051204681396E+00   9.775162E-06  2.920628E-05    2.920628E-05
 11   -1.7708200216293335E+00   1.490116E-05  2.168389E-04    2.168389E-04
 12   -1.7708151340484619E+00   4.887581E-06  1.466274E-05    1.466274E-05
 13   -1.7708052396774292E+00   9.894371E-06  2.920628E-05    2.920628E-05
 14   -1.7708101272583008E+00   4.887581E-06  1.454353E-05    1.454353E-05
 15   -1.7708052396774292E+00   4.887581E-06  1.454353E-05    1.454353E-05
 16   -1.7708101272583008E+00   4.887581E-06  1.454353E-05    1.454353E-05
 17   -1.7708052396774292E+00   4.887581E-06  1.454353E-05    1.454353E-05
 18   -1.7708101272583008E+00   4.887581E-06  1.454353E-05    1.454353E-05
 19   -1.7708101272583008E+00   0.000000E+00  0.000000E+00    0.000000E+00
 20   -1.7708052396774292E+00   4.887581E-06  1.454353E-05    1.454353E-05
 21   -1.7708052396774292E+00   0.000000E+00  0.000000E+00    0.000000E+00
-----------------------------------------------------------------------------
**********************Energy**********************

Contribution                 Energy in a.u.    
--------------------------------------------------
Repulsion                    0.0000000000000000
DispersionD3                 0.0000000000000000
Halogen                      0.0000000000000000
Electronic free energy      -0.0078231580555439
Electronic Energy (SCF)     -1.7708052396774292
--------------------------------------------------
Total Energy                -1.7786283493041992








test/test_scf/test_elements.py::test_element[dtype0-42] 
iter  energy                   energy change  P norm change   charge change  
-----------------------------------------------------------------------------
  1   -1.7483351230621338E+00   1.748335E+00  3.464102E+00    3.098387E+00
  2   -1.7638363838195801E+00   1.550126E-02  2.828427E+00    2.828427E+00
  3   -1.7483351230621338E+00   1.550126E-02  2.828427E+00    2.828427E+00
  4   -1.7638363838195801E+00   1.550126E-02  2.828427E+00    2.828427E+00
  5   -1.7638363838195801E+00   0.000000E+00  0.000000E+00    0.000000E+00
  6   -1.7483351230621338E+00   1.550126E-02  2.828427E+00    2.828427E+00
  7   -1.7483351230621338E+00   0.000000E+00  0.000000E+00    0.000000E+00
  8   -1.7638363838195801E+00   1.550126E-02  2.828427E+00    2.828427E+00
  9   -1.7638363838195801E+00   0.000000E+00  0.000000E+00    0.000000E+00
 10   -1.7483351230621338E+00   1.550126E-02  2.828427E+00    2.828427E+00
 11   -1.7638363838195801E+00   1.550126E-02  4.000000E+00    4.000000E+00
 12   -1.7483351230621338E+00   1.550126E-02  4.000000E+00    4.000000E+00
 13   -1.7483351230621338E+00   0.000000E+00  2.828427E+00    2.828427E+00
 14   -1.7638363838195801E+00   1.550126E-02  4.000000E+00    4.000000E+00
 15   -1.7638363838195801E+00   0.000000E+00  4.000000E+00    4.000000E+00
 16   -1.7483351230621338E+00   1.550126E-02  2.828427E+00    2.828427E+00
 17   -1.7638363838195801E+00   1.550126E-02  2.828427E+00    2.828427E+00
 18   -1.7483351230621338E+00   1.550126E-02  4.000000E+00    4.000000E+00
 19   -1.7483351230621338E+00   0.000000E+00  0.000000E+00    0.000000E+00
 20   -1.7638363838195801E+00   1.550126E-02  2.828427E+00    2.828427E+00
 21   -1.7638363838195801E+00   0.000000E+00  2.828427E+00    2.828427E+00
-----------------------------------------------------------------------------
**********************Energy**********************

Contribution                 Energy in a.u.    
--------------------------------------------------
Repulsion                    0.0000000000000000
Halogen                      0.0000000000000000
DispersionD3                 0.0000000000000000
Electronic free energy      -0.0000000326171552
Electronic Energy (SCF)     -1.7638363838195801
--------------------------------------------------
Total Energy                -1.7638363838195801


************************Timings************************

Objective                 Time in s          Time in %   
-------------------------------------------------------
setup calculator          0.003                 0.93   
Overlap                   0.003                 0.78   
Core Hamiltonian          0.002                 0.57   
SCF                       0.115                34.78   
Repulsion                 0.001                 0.29   
Halogen                   0.000                 0.06   
DispersionD3              0.206                62.45   
-------------------------------------------------------
sum                       0.330                99.86  
total                     0.331               100.00 



test/test_scf/test_elements.py::test_element_anion[dtype0-75] 
iter  energy                   energy change  P norm change   charge change  
-----------------------------------------------------------------------------
  1   -1.9723117351531982E+00   1.972312E+00  3.577675E+00    2.408270E+00
  2   -1.4416183233261108E+00   5.306934E-01  4.851630E+00    4.851630E+00
  3   -1.9723764657974243E+00   5.307581E-01  4.849980E+00    4.849980E+00
  4   -1.9737358093261719E+00   1.359344E-03  3.178834E-02    3.178838E-02
  5   -1.3477317094802856E+00   6.260041E-01  5.325831E+00    5.325831E+00
  6   -1.9365401268005371E+00   5.888084E-01  3.454920E+00    3.454920E+00
  7   -1.9725799560546875E+00   3.603983E-02  2.061879E+00    2.061879E+00
  8   -1.9769752025604248E+00   4.395247E-03  9.465238E-02    9.465239E-02
  9   -1.4092333316802979E+00   5.677419E-01  4.916202E+00    4.916202E+00
 10   -1.9920414686203003E+00   5.828081E-01  4.097344E+00    4.097344E+00
 11   -1.9928549528121948E+00   8.134842E-04  1.428874E-01    1.428874E-01
 12   -1.9927724599838257E+00   8.249283E-05  3.805069E-02    3.805069E-02
 13   -1.9927732944488525E+00   8.344650E-07  4.760911E-04    4.760929E-04
 14   -1.9928119182586670E+00   3.862381E-05  1.249174E-02    1.249174E-02
 15   -1.9927893877029419E+00   2.253056E-05  8.196258E-03    8.196261E-03
 16   -1.9927859306335449E+00   3.457069E-06  3.738760E-04    3.738748E-04
 17   -1.9927880764007568E+00   2.145767E-06  1.732923E-04    1.732916E-04
 18   -1.9927861690521240E+00   1.907349E-06  4.248439E-05    4.248302E-05
 19   -1.9927885532379150E+00   2.384186E-06  5.208785E-04    5.208699E-04
 20   -1.9927865266799927E+00   2.026558E-06  4.485868E-04    4.485798E-04
 21   -1.9927865266799927E+00   0.000000E+00  0.000000E+00    0.000000E+00
-----------------------------------------------------------------------------
**********************Energy**********************

Contribution                 Energy in a.u.    
--------------------------------------------------
Halogen                      0.0000000000000000
Repulsion                    0.0000000000000000
DispersionD3                 0.0000000000000000
Electronic free energy      -0.0090627735480666
Electronic Energy (SCF)     -1.9927861690521240
--------------------------------------------------
Total Energy                -2.0018489360809326


************************Timings************************

Objective                 Time in s          Time in %   
-------------------------------------------------------
setup calculator          0.003                 0.84   
Overlap                   0.003                 0.73   
Core Hamiltonian          0.002                 0.54   
SCF                       0.134                37.55   
Halogen                   0.000                 0.05   
Repulsion                 0.001                 0.21   
DispersionD3              0.215                59.99   
-------------------------------------------------------
sum                       0.358                99.90  
total                     0.358               100.00 



test/test_scf/test_elements.py::test_element_anion[dtype0-75] 
iter  energy                   energy change  P norm change   charge change  
-----------------------------------------------------------------------------
  1   -1.9723020792007446E+00   1.972302E+00  4.000000E+00    3.000000E+00
  2   -1.3463304042816162E+00   6.259717E-01  5.656854E+00    5.656854E+00
  3   -1.9723020792007446E+00   6.259717E-01  5.656854E+00    5.656854E+00
  4   -1.9723020792007446E+00   0.000000E+00  0.000000E+00    0.000000E+00
  5   -1.3463304042816162E+00   6.259717E-01  5.656854E+00    5.656854E+00
  6   -1.5578867197036743E+00   2.115563E-01  2.828427E+00    2.828427E+00
  7   -1.9723020792007446E+00   4.144154E-01  4.898980E+00    4.898980E+00
  8   -1.9723020792007446E+00   0.000000E+00  0.000000E+00    0.000000E+00
  9   -1.3463304042816162E+00   6.259717E-01  5.656854E+00    5.656854E+00
 10   -1.9723020792007446E+00   6.259717E-01  5.656854E+00    5.656854E+00
 11   -1.9723020792007446E+00   0.000000E+00  0.000000E+00    0.000000E+00
 12   -1.3463304042816162E+00   6.259717E-01  5.656854E+00    5.656854E+00
 13   -1.9723020792007446E+00   6.259717E-01  5.656854E+00    5.656854E+00
 14   -1.5578867197036743E+00   4.144154E-01  4.898980E+00    4.898980E+00
 15   -1.9723020792007446E+00   4.144154E-01  4.898980E+00    4.898980E+00
 16   -1.9723020792007446E+00   0.000000E+00  0.000000E+00    0.000000E+00
 17   -1.3463304042816162E+00   6.259717E-01  5.656854E+00    5.656854E+00
 18   -1.3463304042816162E+00   0.000000E+00  0.000000E+00    0.000000E+00
 19   -1.9723020792007446E+00   6.259717E-01  5.656854E+00    5.656854E+00
 20   -1.9723020792007446E+00   0.000000E+00  0.000000E+00    0.000000E+00
 21   -1.3463304042816162E+00   6.259717E-01  5.656854E+00    5.656854E+00
-----------------------------------------------------------------------------
**********************Energy**********************

Contribution                 Energy in a.u.    
--------------------------------------------------
Repulsion                    0.0000000000000000
DispersionD3                 0.0000000000000000
Halogen                      0.0000000000000000
Electronic free energy      -0.0000000326171552
Electronic Energy (SCF)     -1.9723020792007446
--------------------------------------------------
Total Energy                -1.9723020792007446


************************Timings************************

Objective                 Time in s          Time in %   
-------------------------------------------------------
setup calculator          0.003                 0.86   
Overlap                   0.003                 0.73   
Core Hamiltonian          0.002                 0.54   
SCF                       0.119                34.46   
Repulsion                 0.001                 0.26   
DispersionD3              0.218                62.99   
Halogen                   0.000                 0.05   
-------------------------------------------------------
sum                       0.346                99.89  
total                     0.347               100.00 



test/test_scf/test_elements.py::test_element_batch[dtype0-SiH4-25]      
     0: |dx|=1.035e+00, |f|=2.069e+00
     1: |dx|=7.874e-01, |f|=7.904e+00
     2: |dx|=1.420e-01, |f|=1.821e+00
     3: |dx|=7.621e-02, |f|=6.642e-01
     4: |dx|=2.879e-02, |f|=4.294e-01
     5: |dx|=6.634e-02, |f|=3.048e-01
     6: |dx|=5.332e-02, |f|=1.705e+00
     7: |dx|=1.674e-01, |f|=1.310e+00
     8: |dx|=1.311e-01, |f|=6.123e+00
     9: |dx|=1.325e-02, |f|=4.248e-01
    10: |dx|=1.917e-02, |f|=3.282e-01


test/test_scf/test_elements.py::test_element_batch[dtype0-SiH4-25]      
     0: |dx|=1.035e+00, |f|=2.069e+00
     1: |dx|=7.874e-01, |f|=7.904e+00
     2: |dx|=1.421e-01, |f|=1.822e+00
     3: |dx|=1.375e+00, |f|=1.687e+00
     4: |dx|=1.230e+00, |f|=2.589e+01
     5: |dx|=1.033e+00, |f|=7.433e+00
     6: |dx|=3.671e-01, |f|=2.104e+00
     7: |dx|=8.488e-01, |f|=1.952e+00
     8: |dx|=5.816e-01, |f|=7.673e+00
     9: |dx|=1.183e-01, |f|=1.887e+00
    10: |dx|=4.813e-01, |f|=1.849e+00

@hoelzerC
Copy link
Collaborator Author

hoelzerC commented Jul 20, 2023

It seems that the Electronic free energy -0.0000000326171552 is identical for the isolated atoms.

This should come from get_electronic_free_energy which is applied after SCF iterations. Hence, only different values in the data object could be reason for that.

@marvinfriede
Copy link
Member

It seems that the Electronic free energy -0.0000000326171552 is identical for the isolated atoms.

I don't quite understand. If I look at the electronic and the electronic free energy, they are both different.

This should come from get_electronic_free_energy which is applied after SCF iterations. Hence, only different values in the data object could be reason for that.

Doesn't that mean that some settings are not passed down properly anymore?

@marvinfriede
Copy link
Member

Fixed the update-induced fails.

I reverted them because these are intended changes (as discussed yesterday). We required pydantic>=2.0.0. Please do a git pull --rebase, since I force-pushed.

@hoelzerC
Copy link
Collaborator Author

hoelzerC commented Jul 21, 2023

It seems that the Electronic free energy -0.0000000326171552 is identical for the isolated atoms.

Added test for checking unique electronic energy for isolated elements. Apparently, different energies are calculated.

@hoelzerC
Copy link
Collaborator Author

Easy fix. Fermi energy was not correctly added to _Data object. Included a test to check for free energies between old and new implementation for all elements

https://github.com/grimme-lab/xtbML/blob/8d3e73c8e89977dac6130a2b3217f75f50bca977/test/test_scf/test_energy.py#L35

marvinfriede
marvinfriede previously approved these changes Jul 24, 2023
@marvinfriede marvinfriede dismissed their stale review July 24, 2023 06:54

Hessian test (test/test_scf/test_hess.py) is still failing for old tolerances

# The initial guess is an "arbitrary" tensor, and hence not part of AD computational graph.
# NOTE: This leads to not entering xitorch._RootFinder.backward() at all during a
# loss.backward() call. However, then the position tensor does receive gradient.
guess = guess.detach()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this causes the Hessian test to fail, since it discards the gradient of the EEQ guess?
Besides, if I comment this out, the test also throws an error: AttributeError: '_Data' object has no attribute 'occupation'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you comment guess.detach() you can remedy the attribute error by commenting https://github.com/grimme-lab/xtbML/blob/8af902d5a50c9ad054f047e1ffbbbbb69491e2dc/src/dxtb/scf/data.py#L183

This leads to the behavior described in the comment, i.e. no Tensor enters xitorch._RootFinder.backward(). This is expected, as there is currently no support for using AG w.r.t. to positions in the implicit SCF implementation.

@marvinfriede
Copy link
Member

Manually merged into #132.

@marvinfriede marvinfriede deleted the RAM-fix branch April 23, 2024 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autograd Related to PyTorch's autograd engine
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants