Skip to content

Commit

Permalink
deploy: db47311
Browse files Browse the repository at this point in the history
  • Loading branch information
RandomDefaultUser committed Nov 26, 2024
1 parent 065f0b5 commit 61ae5e9
Show file tree
Hide file tree
Showing 10 changed files with 259 additions and 41 deletions.
48 changes: 38 additions & 10 deletions _modules/mala/common/parameters.html
Original file line number Diff line number Diff line change
Expand Up @@ -328,7 +328,6 @@ <h1>Source code for mala.common.parameters</h1><div class="highlight"><pre>
<span class="sd"> ----------</span>
<span class="sd"> nn_type : string</span>
<span class="sd"> Type of the neural network that will be used. Currently supported are</span>

<span class="sd"> - &quot;feed_forward&quot; (default)</span>
<span class="sd"> - &quot;transformer&quot;</span>
<span class="sd"> - &quot;lstm&quot;</span>
Expand Down Expand Up @@ -382,12 +381,12 @@ <h1>Source code for mala.common.parameters</h1><div class="highlight"><pre>
<span class="bp">self</span><span class="o">.</span><span class="n">layer_activations</span> <span class="o">=</span> <span class="p">[</span><span class="s2">&quot;Sigmoid&quot;</span><span class="p">]</span>
<span class="bp">self</span><span class="o">.</span><span class="n">loss_function_type</span> <span class="o">=</span> <span class="s2">&quot;mse&quot;</span>

<span class="c1"># for LSTM/Gru + Transformer</span>
<span class="bp">self</span><span class="o">.</span><span class="n">num_hidden_layers</span> <span class="o">=</span> <span class="mi">1</span>

<span class="c1"># for LSTM/Gru</span>
<span class="bp">self</span><span class="o">.</span><span class="n">no_hidden_state</span> <span class="o">=</span> <span class="kc">False</span>
<span class="bp">self</span><span class="o">.</span><span class="n">bidirection</span> <span class="o">=</span> <span class="kc">False</span>

<span class="c1"># for LSTM/Gru + Transformer</span>
<span class="bp">self</span><span class="o">.</span><span class="n">num_hidden_layers</span> <span class="o">=</span> <span class="mi">1</span>

<span class="c1"># for transformer net</span>
<span class="bp">self</span><span class="o">.</span><span class="n">dropout</span> <span class="o">=</span> <span class="mf">0.1</span>
Expand Down Expand Up @@ -815,12 +814,15 @@ <h1>Source code for mala.common.parameters</h1><div class="highlight"><pre>
<span class="sd"> a &quot;by snapshot&quot; basis.</span>

<span class="sd"> checkpoints_each_epoch : int</span>
<span class="sd"> If not 0, checkpoint files will be saved after eac</span>
<span class="sd"> If not 0, checkpoint files will be saved after each</span>
<span class="sd"> checkpoints_each_epoch epoch.</span>

<span class="sd"> checkpoint_name : string</span>
<span class="sd"> Name used for the checkpoints. Using this, multiple runs</span>
<span class="sd"> can be performed in the same directory.</span>
<span class="sd"> </span>
<span class="sd"> run_name : string</span>
<span class="sd"> Name of the run used for logging.</span>

<span class="sd"> logging_dir : string</span>
<span class="sd"> Name of the folder that logging files will be saved to.</span>
Expand All @@ -829,6 +831,34 @@ <h1>Source code for mala.common.parameters</h1><div class="highlight"><pre>
<span class="sd"> If True, then upon creating logging files, these will be saved</span>
<span class="sd"> in a subfolder of logging_dir labelled with the starting date</span>
<span class="sd"> of the logging, to avoid having to change input scripts often.</span>
<span class="sd"> </span>
<span class="sd"> logger : string</span>
<span class="sd"> Name of the logger to be used.</span>
<span class="sd"> Currently supported are:</span>
<span class="sd"> </span>
<span class="sd"> - &quot;tensorboard&quot;: Tensorboard logger.</span>
<span class="sd"> - &quot;wandb&quot;: Weights and Biases logger.</span>
<span class="sd"> </span>
<span class="sd"> validation_metrics : list</span>
<span class="sd"> List of metrics to be used for validation. Default is [&quot;ldos&quot;].</span>
<span class="sd"> Possible options are:</span>
<span class="sd"> </span>
<span class="sd"> - &quot;ldos&quot;: MSE of the LDOS.</span>
<span class="sd"> - &quot;band_energy&quot;: Band energy.</span>
<span class="sd"> - &quot;band_energy_actual_fe&quot;: Band energy computed with ground truth Fermi energy.</span>
<span class="sd"> - &quot;total_energy&quot;: Total energy.</span>
<span class="sd"> - &quot;total_energy_actual_fe&quot;: Total energy computed with ground truth Fermi energy.</span>
<span class="sd"> - &quot;fermi_energy&quot;: Fermi energy.</span>
<span class="sd"> - &quot;density&quot;: Electron density.</span>
<span class="sd"> - &quot;density_relative&quot;: Rlectron density (MAPE).</span>
<span class="sd"> - &quot;dos&quot;: Density of states.</span>
<span class="sd"> - &quot;dos_relative&quot;: Density of states (MAPE).</span>
<span class="sd"> </span>
<span class="sd"> validate_on_training_data : bool</span>
<span class="sd"> Whether to validate on the training data as well. Default is False.</span>
<span class="sd"> </span>
<span class="sd"> validate_every_n_epochs : int</span>
<span class="sd"> Determines how often validation is performed. Default is 1.</span>

<span class="sd"> inference_data_grid : list</span>
<span class="sd"> List holding the grid to be used for inference in the form of</span>
Expand All @@ -843,19 +873,18 @@ <h1>Source code for mala.common.parameters</h1><div class="highlight"><pre>

<span class="sd"> profiler_range : list</span>
<span class="sd"> List with two entries determining with which batch/iteration number</span>
<span class="sd"> the CUDA profiler will start and stop profiling. Please note that</span>
<span class="sd"> this option only holds significance if the nsys profiler is used.</span>
<span class="sd"> the CUDA profiler will start and stop profiling. Please note that</span>
<span class="sd"> this option only holds significance if the nsys profiler is used.</span>
<span class="sd"> &quot;&quot;&quot;</span>

<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="nb">super</span><span class="p">(</span><span class="n">ParametersRunning</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">optimizer</span> <span class="o">=</span> <span class="s2">&quot;Adam&quot;</span>
<span class="bp">self</span><span class="o">.</span><span class="n">learning_rate</span> <span class="o">=</span> <span class="mi">10</span> <span class="o">**</span> <span class="p">(</span><span class="o">-</span><span class="mi">5</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">learning_rate</span> <span class="o">=</span> <span class="mf">0.5</span>
<span class="bp">self</span><span class="o">.</span><span class="n">learning_rate_embedding</span> <span class="o">=</span> <span class="mi">10</span> <span class="o">**</span> <span class="p">(</span><span class="o">-</span><span class="mi">4</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">max_number_epochs</span> <span class="o">=</span> <span class="mi">100</span>
<span class="bp">self</span><span class="o">.</span><span class="n">verbosity</span> <span class="o">=</span> <span class="kc">True</span>
<span class="bp">self</span><span class="o">.</span><span class="n">mini_batch_size</span> <span class="o">=</span> <span class="mi">10</span>
<span class="bp">self</span><span class="o">.</span><span class="n">snapshots_per_epoch</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span>

<span class="bp">self</span><span class="o">.</span><span class="n">l1_regularization</span> <span class="o">=</span> <span class="mf">0.0</span>
<span class="bp">self</span><span class="o">.</span><span class="n">l2_regularization</span> <span class="o">=</span> <span class="mf">0.0</span>
Expand All @@ -874,7 +903,6 @@ <h1>Source code for mala.common.parameters</h1><div class="highlight"><pre>
<span class="bp">self</span><span class="o">.</span><span class="n">num_workers</span> <span class="o">=</span> <span class="mi">0</span>
<span class="bp">self</span><span class="o">.</span><span class="n">use_shuffling_for_samplers</span> <span class="o">=</span> <span class="kc">True</span>
<span class="bp">self</span><span class="o">.</span><span class="n">checkpoints_each_epoch</span> <span class="o">=</span> <span class="mi">0</span>
<span class="bp">self</span><span class="o">.</span><span class="n">checkpoint_best_so_far</span> <span class="o">=</span> <span class="kc">False</span>
<span class="bp">self</span><span class="o">.</span><span class="n">checkpoint_name</span> <span class="o">=</span> <span class="s2">&quot;checkpoint_mala&quot;</span>
<span class="bp">self</span><span class="o">.</span><span class="n">run_name</span> <span class="o">=</span> <span class="s2">&quot;&quot;</span>
<span class="bp">self</span><span class="o">.</span><span class="n">logging_dir</span> <span class="o">=</span> <span class="s2">&quot;./mala_logging&quot;</span>
Expand Down
61 changes: 52 additions & 9 deletions _sources/advanced_usage/trainingmodel.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -194,22 +194,64 @@ keyword, you can fine-tune the number of new snapshots being created.
By default, the same number of snapshots as had been provided will be created
(if possible).

Using tensorboard
******************
Logging metrics during training
*******************************

Training progress in MALA can be visualized via tensorboard or wandb, as also shown
in the file ``advanced/ex03_tensor_board``. Simply select a logger prior to training as

.. code-block:: python
parameters.running.logger = "tensorboard"
parameters.running.logging_dir = "mala_vis"
Training routines in MALA can be visualized via tensorboard, as also shown
in the file ``advanced/ex03_tensor_board``. Simply enable tensorboard
visualization prior to training via
or

.. code-block:: python
# 0: No visualizatuon, 1: loss and learning rate, 2: like 1,
# but additionally weights and biases are saved
parameters.running.logging = 1
import wandb
wandb.init(
project="mala_training",
entity="your_wandb_entity"
)
parameters.running.logger = "wandb"
parameters.running.logging_dir = "mala_vis"
where ``logging_dir`` specifies some directory in which to save the
MALA logging data. Afterwards, you can run the training without any
MALA logging data. You can also select which metrics to record via

.. code-block:: python
parameters.validation_metrics = ["ldos", "dos", "density", "total_energy"]
Full list of available metrics:
- "ldos": MSE of the LDOS.
- "band_energy": Band energy.
- "band_energy_actual_fe": Band energy computed with ground truth Fermi energy.
- "total_energy": Total energy.
- "total_energy_actual_fe": Total energy computed with ground truth Fermi energy.
- "fermi_energy": Fermi energy.
- "density": Electron density.
- "density_relative": Rlectron density (Mean Absolute Percentage Error).
- "dos": Density of states.
- "dos_relative": Density of states (Mean Absolute Percentage Error).

To save time and resources you can specify the logging interval via

.. code-block:: python
parameters.running.validate_every_n_epochs = 10
If you want to monitor the degree to which the model overfits to the training data,
you can use the option

.. code-block:: python
parameters.running.validate_on_training_data = True
MALA will evaluate the validation metrics on the training set as well as the validation set.

Afterwards, you can run the training without any
other modifications. Once training is finished (or during training, in case
you want to use tensorboard to monitor progress), you can launch tensorboard
via
Expand All @@ -221,6 +263,7 @@ via
The full path for ``path_to_log_directory`` can be accessed via
``trainer.full_logging_path``.

If you're using wandb, you can monitor the training progress on the wandb website.

Training in parallel
********************
Expand Down
65 changes: 55 additions & 10 deletions advanced_usage/trainingmodel.html
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@
<li class="toctree-l3"><a class="reference internal" href="#advanced-training-metrics">Advanced training metrics</a></li>
<li class="toctree-l3"><a class="reference internal" href="#checkpointing-a-training-run">Checkpointing a training run</a></li>
<li class="toctree-l3"><a class="reference internal" href="#using-lazy-loading">Using lazy loading</a></li>
<li class="toctree-l3"><a class="reference internal" href="#using-tensorboard">Using tensorboard</a></li>
<li class="toctree-l3"><a class="reference internal" href="#logging-metrics-during-training">Logging metrics during training</a></li>
<li class="toctree-l3"><a class="reference internal" href="#training-in-parallel">Training in parallel</a></li>
</ul>
</li>
Expand Down Expand Up @@ -280,21 +280,65 @@ <h2>Using lazy loading<a class="headerlink" href="#using-lazy-loading" title="Li
By default, the same number of snapshots as had been provided will be created
(if possible).</p>
</section>
<section id="using-tensorboard">
<h2>Using tensorboard<a class="headerlink" href="#using-tensorboard" title="Link to this heading"></a></h2>
<p>Training routines in MALA can be visualized via tensorboard, as also shown
in the file <code class="docutils literal notranslate"><span class="pre">advanced/ex03_tensor_board</span></code>. Simply enable tensorboard
visualization prior to training via</p>
<section id="logging-metrics-during-training">
<h2>Logging metrics during training<a class="headerlink" href="#logging-metrics-during-training" title="Link to this heading"></a></h2>
<p>Training progress in MALA can be visualized via tensorboard or wandb, as also shown
in the file <code class="docutils literal notranslate"><span class="pre">advanced/ex03_tensor_board</span></code>. Simply select a logger prior to training as</p>
<blockquote>
<div><div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># 0: No visualizatuon, 1: loss and learning rate, 2: like 1,</span>
<span class="c1"># but additionally weights and biases are saved</span>
<span class="n">parameters</span><span class="o">.</span><span class="n">running</span><span class="o">.</span><span class="n">logging</span> <span class="o">=</span> <span class="mi">1</span>
<div><div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">parameters</span><span class="o">.</span><span class="n">running</span><span class="o">.</span><span class="n">logger</span> <span class="o">=</span> <span class="s2">&quot;tensorboard&quot;</span>
<span class="n">parameters</span><span class="o">.</span><span class="n">running</span><span class="o">.</span><span class="n">logging_dir</span> <span class="o">=</span> <span class="s2">&quot;mala_vis&quot;</span>
</pre></div>
</div>
</div></blockquote>
<p>or</p>
<blockquote>
<div><div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">wandb</span>
<span class="n">wandb</span><span class="o">.</span><span class="n">init</span><span class="p">(</span>
<span class="n">project</span><span class="o">=</span><span class="s2">&quot;mala_training&quot;</span><span class="p">,</span>
<span class="n">entity</span><span class="o">=</span><span class="s2">&quot;your_wandb_entity&quot;</span>
<span class="p">)</span>
<span class="n">parameters</span><span class="o">.</span><span class="n">running</span><span class="o">.</span><span class="n">logger</span> <span class="o">=</span> <span class="s2">&quot;wandb&quot;</span>
<span class="n">parameters</span><span class="o">.</span><span class="n">running</span><span class="o">.</span><span class="n">logging_dir</span> <span class="o">=</span> <span class="s2">&quot;mala_vis&quot;</span>
</pre></div>
</div>
</div></blockquote>
<p>where <code class="docutils literal notranslate"><span class="pre">logging_dir</span></code> specifies some directory in which to save the
MALA logging data. Afterwards, you can run the training without any
MALA logging data. You can also select which metrics to record via</p>
<blockquote>
<div><div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">parameters</span><span class="o">.</span><span class="n">validation_metrics</span> <span class="o">=</span> <span class="p">[</span><span class="s2">&quot;ldos&quot;</span><span class="p">,</span> <span class="s2">&quot;dos&quot;</span><span class="p">,</span> <span class="s2">&quot;density&quot;</span><span class="p">,</span> <span class="s2">&quot;total_energy&quot;</span><span class="p">]</span>
</pre></div>
</div>
</div></blockquote>
<dl class="simple">
<dt>Full list of available metrics:</dt><dd><ul class="simple">
<li><p>“ldos”: MSE of the LDOS.</p></li>
<li><p>“band_energy”: Band energy.</p></li>
<li><p>“band_energy_actual_fe”: Band energy computed with ground truth Fermi energy.</p></li>
<li><p>“total_energy”: Total energy.</p></li>
<li><p>“total_energy_actual_fe”: Total energy computed with ground truth Fermi energy.</p></li>
<li><p>“fermi_energy”: Fermi energy.</p></li>
<li><p>“density”: Electron density.</p></li>
<li><p>“density_relative”: Rlectron density (Mean Absolute Percentage Error).</p></li>
<li><p>“dos”: Density of states.</p></li>
<li><p>“dos_relative”: Density of states (Mean Absolute Percentage Error).</p></li>
</ul>
</dd>
</dl>
<p>To save time and resources you can specify the logging interval via</p>
<blockquote>
<div><div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">parameters</span><span class="o">.</span><span class="n">running</span><span class="o">.</span><span class="n">validate_every_n_epochs</span> <span class="o">=</span> <span class="mi">10</span>
</pre></div>
</div>
</div></blockquote>
<p>If you want to monitor the degree to which the model overfits to the training data,
you can use the option</p>
<blockquote>
<div><div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">parameters</span><span class="o">.</span><span class="n">running</span><span class="o">.</span><span class="n">validate_on_training_data</span> <span class="o">=</span> <span class="kc">True</span>
</pre></div>
</div>
</div></blockquote>
<p>MALA will evaluate the validation metrics on the training set as well as the validation set.</p>
<p>Afterwards, you can run the training without any
other modifications. Once training is finished (or during training, in case
you want to use tensorboard to monitor progress), you can launch tensorboard
via</p>
Expand All @@ -305,6 +349,7 @@ <h2>Using tensorboard<a class="headerlink" href="#using-tensorboard" title="Link
</div></blockquote>
<p>The full path for <code class="docutils literal notranslate"><span class="pre">path_to_log_directory</span></code> can be accessed via
<code class="docutils literal notranslate"><span class="pre">trainer.full_logging_path</span></code>.</p>
<p>If you’re using wandb, you can monitor the training progress on the wandb website.</p>
</section>
<section id="training-in-parallel">
<h2>Training in parallel<a class="headerlink" href="#training-in-parallel" title="Link to this heading"></a></h2>
Expand Down
Loading

0 comments on commit 61ae5e9

Please sign in to comment.