Skip to content

Commit

Permalink
Docs: Debugging Workflow (#1967)
Browse files Browse the repository at this point in the history
* Docs: Debugging Workflow

Start a debugging workflow section.
Intentionally placed in the user-documentation to explain a
systematic way to check output files.

* Fix typos & improve details
* Fix wording about CUDA_LAUNCH_BLOCKING

Co-authored-by: Edoardo Zoni <[email protected]>
Co-authored-by: Andrew Myers <[email protected]>
  • Loading branch information
3 people authored May 19, 2021
1 parent 7a1a10f commit b6b0aa3
Show file tree
Hide file tree
Showing 2 changed files with 51 additions and 0 deletions.
1 change: 1 addition & 0 deletions Docs/source/usage/workflows.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,6 @@ This section collects typical user workflows and best practices for WarpX.

workflows/parallelization
workflows/profiling
workflows/debugging
workflows/libensemble
workflows/plot_distribution_mapping
50 changes: 50 additions & 0 deletions Docs/source/usage/workflows/debugging.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
.. _debugging_warpx:

Debugging the code
==================

Sometimes, the code does not give you the result that you are expecting.
This can be due to a variety of reasons, from misunderstandings or changes in the :ref:`input parameters <running-cpp-parameters>`, system specific quirks, or bugs.
You might also want to debug your code as you implement new features in WarpX during development.

This section gives a step-by-step guidance on how to systematically check what might be going wrong.

Debugging Workflow
------------------

Try the following steps to debug a simulation:

#. Check the output text file, usually called ``output.txt``: are there warnings or errors present?
#. On an HPC system, look for the job output and error files, usually called ``WarpX.e...`` and ``WarpX.o...``.
Read long messages from the top and follow potential guidance.
#. If your simulation already created output data files:
Check if they look reasonable before the problem occurred; are the initial conditions of the simulation as you expected?
Do you spot numerical artifacts or instabilities that could point to missing resolution or unexpected/incompatible numerical parameters?
#. Did the job output files indicate a crash? Check the ``Backtrace.<mpirank>`` files for the location of the code that triggered the crash.
Backtraces are read from bottom (high-level) to top (most specific line that crashed).
#. In case of a crash, Backtraces can be more detailed if you :ref:`re-compile <install-developers>` with debug flags: for example, try compiling with ``-DCMAKE_BUILD_TYPE=Debug`` (this will make the simulation slower) and rerun.
#. If debug builds are too costly, try instead compiling with ``-DAMReX_ASSERTIONS=ON`` to activate more checks and rerun.
#. If the problem looks like a memory violation, this could be from an invalid field or particle index access.
Try compiling with ``-DAMReX_BOUND_CHECK=ON`` (this will make the simulation very slow), and rerun.
#. If the problem looks like a random memory might be used, try initializing memory with signaling Not-a-Number (NaN) values through the runtime option ``fab.init_snan = 1``.
Further useful runtime options are ``amrex.fpe_trap_invalid``, ``amrex.fpe_trap_zero`` and ``amrex.fpe_trap_overflow`` (see details in the AMReX link below).
#. On Nvidia GPUs, if you suspect the problem might be a race condition due to a missing host / device synchronization, set the environment variable ``export CUDA_LAUNCH_BLOCKING=1`` and rerun.
#. Consider simplifying your input options and re-adding more options after having found a working baseline.

Fore more information, see also the `AMReX Debugging Manual <https://amrex-codes.github.io/amrex/docs_html/Basics.html#debugging>`__.

Last but not least: the community of WarpX developers and users can help if you get stuck.
Collect your above findings, describe where and what you are running and how you installed the code, describe the issue you are seeing with details and input files used and what you already tried.
Can you reproduce the problem with a smaller setup (less parallelism and/or less resolution)?
Report these details in a :ref:`WarpX GitHub issue <contact>`.

Debuggers
---------

See the `AMReX debugger section <https://amrex-codes.github.io/amrex/docs_html/Basics.html#breaking-into-debuggers>`__ on additional runtime parameters to

* disable backtraces
* rethrow exceptions
* avoid AMReX-level signal handling

You will need to set those runtime options to work directly with debuggers.

0 comments on commit b6b0aa3

Please sign in to comment.