diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json
new file mode 100644
index 00000000..9341352d
--- /dev/null
+++ b/.devcontainer/devcontainer.json
@@ -0,0 +1,40 @@
+// devcontainer.json
+    {
+      "name": "athenapk-dev",
+      "image": "ghcr.io/parthenon-hpc-lab/cuda11.6-mpi-hdf5-ascent",
+      // disable Dockerfile for now
+      //"build": {
+      //  // Path is relative to the devcontainer.json file.
+      //  "dockerfile": "Dockerfile"
+      //},
+      "hostRequirements": {
+        "cpus": 4
+      },
+      "customizations": {
+        "vscode": {
+          "settings": {},
+          "extensions": [
+            "-ms-vscode.cpptools",
+            "llvm-vs-code-extensions.vscode-clangd",
+            "github.vscode-pull-request-github",
+            "ms-python.python",
+            "ms-toolsai.jupyter",
+            "ms-vscode.live-server",
+            "ms-azuretools.vscode-docker",
+            "swyddfa.esbonio",
+            "tomoki1207.pdf",
+            "ms-vscode.cmake-tools",
+            "ms-vsliveshare.vsliveshare"
+        ]
+        }
+      },
+      "remoteEnv": {
+        "PATH": "${containerEnv:PATH}:/usr/local/hdf5/parallel/bin",
+        "OMPI_MCA_opal_warn_on_missing_libcuda": "0"
+      },
+      //"remoteUser": "ubuntu",
+      // we need to manually checkout the submodules,
+      // but VSCode may try to configure CMake before they are fully checked-out.
+      // workaround TBD
+      "postCreateCommand": "git submodule update --init"
+    }
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index e24e197b..05049ba7 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -53,6 +53,7 @@ jobs:
             build/tst/regression/outputs/cluster_hse/analytic_comparison.png
             build/tst/regression/outputs/cluster_tabular_cooling/convergence.png
             build/tst/regression/outputs/aniso_therm_cond_ring_conv/ring_convergence.png
+            build/tst/regression/outputs/aniso_therm_cond_gauss_conv/cond.png
             build/tst/regression/outputs/field_loop/field_loop.png
             build/tst/regression/outputs/riemann_hydro/shock_tube.png
             build/tst/regression/outputs/turbulence/parthenon.hst
diff --git a/CHANGELOG.md b/CHANGELOG.md
new file mode 100644
index 00000000..877ed247
--- /dev/null
+++ b/CHANGELOG.md
@@ -0,0 +1,26 @@
+# Changelog
+
+## Current develop (i.e., `main` branch)
+
+### Added (new features/APIs/variables/...)
+- [[PR 1]](https://github.com/parthenon-hpc-lab/athenapk/pull/1) Add isotropic thermal conduction and RKL2 supertimestepping
+
+### Changed (changing behavior/API/variables/...)
+- [[PR 84]](https://github.com/parthenon-hpc-lab/athenapk/pull/84) Bump Parthenon to latest develop (2024-02-15)
+
+### Fixed (not changing behavior/API/variables/...)
+
+### Infrastructure
+- [[PR 112]](https://github.com/parthenon-hpc-lab/athenapk/pull/112) Add dev container configuration
+- [[PR 105]](https://github.com/parthenon-hpc-lab/athenapk/pull/105) Bump Parthenon to latest develop (2024-03-13)
+- [[PR 84]](https://github.com/parthenon-hpc-lab/athenapk/pull/84) Added `CHANGELOG.md`
+
+### Removed (removing behavior/API/varaibles/...)
+
+### Incompatibilities (i.e. breaking changes)
+- [[PR 84]](https://github.com/parthenon-hpc-lab/athenapk/pull/84) Bump Parthenon to latest develop (2024-02-15)
+  - Updated access to block dimension: `pmb->block_size.nx1` -> `pmb->block_size.nx(X1DIR)` (and similarly x2 and x3)
+  - Update access to mesh size: `pmesh->mesh_size.x1max` -> `pmesh->mesh_size.xmax(X1DIR)` (and similarly x2, x3, and min)
+  - Updated Parthenon `GradMinMod` signature for custom prolongation ops
+  - `GetBlockPointer` returns a raw pointer not a shared one (and updated interfaces to use raw pointers rather than shared ones)
+
diff --git a/README.md b/README.md
index 04881cd0..13d7280a 100644
--- a/README.md
+++ b/README.md
@@ -14,7 +14,8 @@ Current features include
   - HLLE (hydro and MHD), HLLC (hydro), and HLLD (MHD) Riemann solvers
   - adiabatic equation of state
   - MHD based on hyperbolic divergence cleaning following Dedner+ 2002
-  - anisotropic thermal conduction
+  - isotropic and anisotropic thermal conduction
+  - operator-split, second-order RKL2 supertimestepping for diffusive terms
   - optically thin cooling based on tabulated cooling tables with either Townsend 2009 exact integration or operator-split subcycling
 - static and adaptive mesh refinement
 - problem generators for
diff --git a/docs/cluster.md b/docs/cluster.md
index b4d83983..11864b6c 100644
--- a/docs/cluster.md
+++ b/docs/cluster.md
@@ -76,14 +76,14 @@ $$
     \frac{3 H_0^2}{8 \pi G}.
 $$
 
-Parameters for the HERNQUIST BCG are controlled via:
+Parameters for the `HERNQUIST` BCG are controlled via:
 ```
 <problem/cluster/gravity>
 
 m_bcg_s = 0.001 # in code_mass
 r_bcg_s = 0.004 # in code_length
 ```
-where a HERNQUIST profile adds a gravitational acceleration defined by
+where a `HERNQUIST` profile adds a gravitational acceleration defined by
 
 $$
  g_{BCG}(r) = G \frac{ M_{BCG} }{R_{BCG}^2} \frac{1}{\left( 1 + \frac{r}{R_{BCG}}\right)^2}
@@ -216,7 +216,7 @@ triggering_mode = COLD_GAS # or NONE, BOOSTED_BONDI, BONDI_SCHAYE
 ```
 where `triggering_mode=NONE` will disable AGN triggering. 
 
-With BOOSTED_BONDI accretion, the mass rate of accretion follows
+With `BOOSTED_BONDI` accretion, the mass rate of accretion follows
 
 $$
 \dot{M} = \alpha \frac { 2 \pi G^2 M^2_{SMBH} \hat {\rho} } {
@@ -235,7 +235,7 @@ m_smbh = 1.0e-06 # in code_mass
 accretion_radius = 0.001 # in code_length
 bondi_alpha= 100.0 # unitless
 ```
-With BONDI_SCHAYE accretion, the `$\alpha$` used for BOOSTED_BONDI accretion is modified to depend on the number density following:
+With `BONDI_SCHAYE` accretion, the $\alpha$ used for `BOOSTED_BONDI` accretion is modified to depend on the number density following:
 
 $$
 \alpha =
@@ -252,7 +252,7 @@ bondi_n0= 2.9379989445851786e+72 # in 1/code_length**3
 bondi_beta= 2.0 # unitless
 ```
 
-With both BOOSTED_BONDI and BONDI_SCHAYE accretion, mass is removed from each
+With both `BOOSTED_BONDI` and `BONDI_SCHAYE` accretion, mass is removed from each
 cell within the accretion zone at a mass weighted rate. E.g. the mass in each
 cell within the accretion region changes by
 ```
@@ -264,7 +264,7 @@ unchanged. Thus velocities and temperatures will increase where mass is
 removed.
 
 
-With COLD_GAS accretion, the accretion rate becomes the total mass within the accretion zone equal to or
+With `COLD_GAS` accretion, the accretion rate becomes the total mass within the accretion zone equal to or
 below a defined cold temperature threshold divided by a defined accretion
 timescale. The temperature threshold and accretion timescale are defined by
 ```
@@ -360,13 +360,17 @@ velocity $v_{jet}$ can be set via
 kinetic_jet_temperature = 1e7 # K
 ```
 However, $T_{jet}$ and $v_{jet}$ must be non-negative and fulfill
+
 $$
-v_{jet} = \sqrt{ 2 \left ( \epsilon c^2 - (1 - \epsilon) \frac{k_B T_{jet}}{ \mu m_h \left( \gamma - 1 \right} \right ) }
+v_{jet} = \sqrt{ 2 \left ( \epsilon c^2 - (1 - \epsilon) \frac{k_B T_{jet}}{ \mu m_h \left( \gamma - 1 \right) } \right ) }
 $$
+
 to ensure that the sum of rest mass energy, thermal energy, and kinetic energy of the new gas sums to $\dot{M} c^2$. Note that these equations places limits  on $T_{jet}$ and $v_{jet}$, specifically
+
 $$
-v_{jet} \leq c \sqrt{ 2 \epsilon } \qquad \text{and} \qquad \frac{k_B T_{jet}}{ \mu m_h \left( \gamma - 1 \right} \leq c^2 \frac{ \epsilon}{1 - \epsilon}
+v_{jet} \leq c \sqrt{ 2 \epsilon } \qquad \text{and} \qquad \frac{k_B T_{jet}}{ \mu m_h \left( \gamma - 1 \right) } \leq c^2 \frac{ \epsilon}{1 - \epsilon}
 $$
+
 If the above equations are not satified then an exception will be thrown at
 initialization. If neither $T_{jet}$ nor $v_{jet}$ are specified, then
 $v_{jet}$ will be computed assuming $T_{jet}=0$ and a warning will be given
@@ -429,9 +433,9 @@ where the injected magnetic field follows
 
 $$
 \begin{align}
-\mathcal{B}_r      &=\mathcal{B}_0 2 \frac{h r}{\ell^2} \exp{ \left ( \frac{-r^2 - h^2}{\ell^2} \right )} \\\\
-\mathcal{B}_\theta &=\mathcal{B}_0 \alpha \frac{r}{\ell} \exp{ \left ( \frac{-r^2 - h^2}{\ell^2} \right ) } \\\\
-\mathcal{B}_h      &=\mathcal{B}_0 2 \left( 1 - \frac{r^2}{\ell^2} \right ) \exp{ \left ( \frac{-r^2 - h^2}{\ell^2} \right )} \\\\
+\mathcal{B}\_r      &= \mathcal{B}\_0 2 \frac{h r}{\ell^2} \exp{ \left ( \frac{-r^2 - h^2}{\ell^2} \right )} \\
+\mathcal{B}\_\theta &= \mathcal{B}\_0 \alpha \frac{r}{\ell} \exp{ \left ( \frac{-r^2 - h^2}{\ell^2} \right ) } \\
+\mathcal{B}\_h      &= \mathcal{B}\_0 2 \left( 1 - \frac{r^2}{\ell^2} \right ) \exp{ \left ( \frac{-r^2 - h^2}{\ell^2} \right )}
 \end{align}
 $$
 
@@ -439,9 +443,9 @@ which has  the corresponding vector potential field
 
 $$
 \begin{align}
-\mathcal{A}_r &= 0 \\\\
-\mathcal{A}_{\theta} &= \mathcal{B}_0 \ell \frac{r}{\ell} \exp{ \left ( \frac{-r^2 - h^2}{\ell^2} \right )} \\\\
-\mathcal{A}_h &= \mathcal{B}_0 \ell \frac{\alpha}{2}\exp{ \left ( \frac{-r^2 - h^2}{\ell^2} \right )}
+\mathcal{A}\_r &= 0 \\
+\mathcal{A}\_{\theta} &= \mathcal{B}\_0 \ell \frac{r}{\ell} \exp{ \left ( \frac{-r^2 - h^2}{\ell^2} \right )} \\
+\mathcal{A}\_h &= \mathcal{B}\_0 \ell \frac{\alpha}{2}\exp{ \left ( \frac{-r^2 - h^2}{\ell^2} \right )}
 \end{align}
 $$
 
@@ -463,7 +467,7 @@ $$
 where $\dot{\rho}_B$ is set to
 
 $$
-\dot{\rho}_B = \frac{3 \pi}{2} \frac{\dot{M} \left ( 1 - \epsilon \right ) f_{magnetic}}{\ell^3}
+\dot{\rho}_B = \frac{3 \pi}{2} \frac{\dot{M} \left ( 1 - \epsilon \right ) f\_\mathrm{magnetic}}{\ell^3}
 $$
 
 so that the total mass injected matches the accreted mass propotioned to magnetic feedback.
@@ -485,13 +489,17 @@ according to their ratio.
 #### Simple field loop (donut) feedback
 
 Magnetic energy is injected according to the following simple potential
+
 $$
-A_h(r, \theta, h) = B_0 L \exp^\left ( -r^2/L^2 \right)$ for $h_\mathrm{offset} \leq |h| \leq h_\mathrm{offset} + h_\mathrm{thickness}
+A_h(r, \theta, h) = B_0 L \exp^\left ( -r^2/L^2 \right) \qquad \mathrm{for} \qquad h_\mathrm{offset} \leq |h| \leq h_\mathrm{offset} + h_\mathrm{thickness}
 $$
+
 resultig in a magnetic field configuration of
+
 $$
-B_\theta(r, \theta, h) = 2 B_0 r /L \exp^\left ( -r^2/L^2 \right)$ for $h_\mathrm{offset} \leq |h| \leq h_\mathrm{offset} + h_\mathrm{thickness}
+B_\theta(r, \theta, h) = 2 B_0 r /L \exp^\left ( -r^2/L^2 \right) \qquad \mathrm{for} \qquad h_\mathrm{offset} \leq |h| \leq h_\mathrm{offset} + h_\mathrm{thickness}
 $$
+
 with all other components being zero.
 
 
@@ -526,7 +534,7 @@ fixed_mass_rate = 1.0
 
 ## SNIA Feedback
 
-Following [Prasad 2020](doi.org/10.1093/mnras/112.2.195), AthenaPK can inject
+Following [Prasad 2020](https://doi.org/10.3847/1538-4357/abc33c), AthenaPK can inject
 mass and energy from type Ia supernovae following the mass profile of the BCG.
 This SNIA feedback can be configured with
 ```
@@ -562,4 +570,4 @@ temperature_threshold = 2e4 # in K
 ```
 
 Note that all parameters need to be specified explicitly for the feedback to work
-(i.e., no hidden default values).
\ No newline at end of file
+(i.e., no hidden default values).
diff --git a/docs/development.md b/docs/development.md
index 8133ef6b..dcb1885d 100644
--- a/docs/development.md
+++ b/docs/development.md
@@ -7,3 +7,9 @@ All pull requests are checked automatically if the code is properly formatted.
 
 The build target `format-athenapk` calls both formatters and automatically format all changes, i.e., before committing changes simply call `make format-athenapk` (or similar).
 Alternatively, leave a comment with `@par-hermes format` in the open pull request to format the code directly on GitHub.
+
+## Dev container
+
+You can open a [GitHub Codespace](https://docs.github.com/en/codespaces) or use VSCode to automatically open local Docker container using the CUDA CI container image.
+
+If you have the [VSCode Dev Container extension](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) installed, on opening this repository in VSCode, it will automatically prompt to ask if you want to re-open this repository in a container.
diff --git a/docs/input.md b/docs/input.md
index f7023463..11dae8c5 100644
--- a/docs/input.md
+++ b/docs/input.md
@@ -69,15 +69,22 @@ conserved to primitive conversion if both are defined.
 
 #### Diffusive processes
 
-##### Anisotropic thermal conduction (required MHD)
+##### Isotropic (hydro and MHD) and anisotropic thermal conduction (only MHD)
 In the presence of magnetic fields thermal conduction is becoming anisotropic with the flux along
 the local magnetic field direction typically being much stronger than the flux perpendicular to the magnetic field.
 
 From a theoretical point of view, thermal conduction is included in the system of MHD equations by an additional
 term in the total energy equation:
 ```math
-\delta_t E + \nabla \cdot (... + \mathbf{F}) \quad \mathrm{with}\\
-\mathbf{F} = - \kappa \mathbf{\hat b} (\mathbf{\hat b \cdot \nabla T})
+\delta_t E + \nabla \cdot (... + \mathbf{F}_\mathrm{c})
+```
+where the full thermal conduction flux $`\mathbf{F}_\mathrm{c}`$ contains both the classic thermal conduction
+```math
+\mathbf{F}_\mathrm{classic} = - \kappa \mathbf{\hat b} (\mathbf{\hat b \cdot \nabla T})
+```
+as well as the saturated flux (as introduced by [^CM77])
+```math
+\mathbf{F}_\mathrm{sat} = - 5 \phi \rho^{-1/2} p^{3/2} \mathrm{sgn}(\mathbf{\hat b \cdot \nabla T}) \mathbf{\hat b}
 ```
 
 From an implementation point of view, two options implemented and can be configured within a `<diffusion>` block in the input file.
@@ -86,23 +93,52 @@ the integration step (before flux correction in case of AMR, and calculating the
 Moreover, they are implemented explicitly, i.e., they add a (potentially very restrictive) constraint to the timestep due to the scaling with $`\propto \Delta_x^2`$.
 Finally, we employ limiters for calculating the temperature gradients following Sharma & Hammett (2007)[^SH07].
 This prevents unphysical conduction against the gradient, which may be introduced because the off-axis gradients are not centered on the interfaces.
+Similarly, to account for the different nature of classic and saturated fluxes (parabolic and hyperbolic, respectively),
+we follow [^M+12] and use a smooth transition
+```math
+\mathbf{F}_\mathrm{c} = \frac{q}{q + F_\mathrm{classic}} \mathbf{F}_\mathrm{classic} \quad \mathrm{with} \quad q = 5 \phi \rho^{-1/2} p^{3/2}
+```
+and upwinding of the hyperbolic, saturated fluxes.
 
-To enable conduction, set
+To enable thermal conduction, set
 
 Parameter: `conduction` (string)
 - `none` : No thermal conduction
+- `isotropic` : Isotropic thermal conduction
+- `anisotropic` : Anisotropic thermal conduction
+
+In addition the coefficient (or diffusivity) needs to be set
+
+Parameter: `conduction_coeff` (string)
 - `spitzer` : Anisotropic thermal conduction with a temperature dependent classic Spitzer thermal conductivity
   $`\kappa (T) = c_\kappa T^{5/2} \mathrm{erg/s/K/cm}`$ and
-  $`c_\kappa`$ being constant prefactor (set via `diffusion/spitzer_cond_in_erg_by_s_K_cm` with a default value of $`4.6\times10^{-7}`$). Note, as indicated by the units in the input parameter name, this kind of thermal conductivity requires a full set of units
+  $`c_\kappa`$ being constant prefactor (set via the additional `diffusion/spitzer_cond_in_erg_by_s_K_cm` parameter with a default value of $`4.6\times10^{-7}`$ which assumes a fully ionized hydrogen plasma [^CM77] with $`\ln \lambda = 40`$ approximating ICM conditions). Note, as indicated by the units in the input parameter name, this kind of thermal conductivity requires a full set of units
   to be defined for the simulation.
-- `thermal_diff` : Contrary to a temperature dependent conductivity, a simple thermal diffusivity can be used instead for which
+- `fixed` : Contrary to a temperature dependent conductivity, a simple thermal diffusivity can be used instead for which
 the conduction flux is $`\mathbf{F} = - \chi \rho \mathbf{\hat b} (\mathbf{\hat b \cdot \nabla \frac{p_\mathrm{th}}{\rho}})`$
-Here, the strength, $`\chi`$, is controlled via the `thermal_diff_coeff_code` parameter in code units.
+Here, the strength, $`\chi`$, is controlled via the additional `thermal_diff_coeff_code` parameter in code units.
 Given the dimensions of $`L^2/T`$ it is referred to a thermal diffusivity rather than thermal conductivity.
 
+Parameter: `conduction_sat_phi` (float)
+- Default value 0.3\
+Factor to account for the uncertainty in the estimated of saturated fluxes, see [^CM77].
+Default value corresponds to the typical value used in literature and goes back to [^MMM80] and [^BM82].
+
+
 [^SH07]:
     P. Sharma and G. W. Hammett, "Preserving monotonicity in anisotropic diffusion," Journal of Computational Physics, vol. 227, no. 1, Art. no. 1, 2007, doi: https://doi.org/10.1016/j.jcp.2007.07.026.
 
+[^M+12]:
+    A. Mignone, C. Zanni, P. Tzeferacos, B. van Straalen, P. Colella, and G. Bodo, “THE PLUTO CODE FOR ADAPTIVE MESH COMPUTATIONS IN ASTROPHYSICAL FLUID DYNAMICS,” The Astrophysical Journal Supplement Series, vol. 198, Art. no. 1, Dec. 2011, doi: https://doi.org/10.1088/0067-0049/198/1/7
+
+[^CM77]:
+    L. Cowie and C. F. McKee, “The evaporation of spherical clouds in a hot gas. I. Classical and saturated mass loss rates.,” , vol. 211, pp. 135–146, Jan. 1977, doi: https://doi.org/10.1086/154911
+
+[^MMM80]:
+    C. E. Max, C. F. McKee, and W. C. Mead, “A model for laser driven ablative implosions,” The Physics of Fluids, vol. 23, Art. no. 8, 1980, doi: https://doi.org/10.1063/1.863183
+
+[^BM82]:
+    S. A. Balbus and C. F. McKee, “The evaporation of spherical clouds in a hot gas. III - Suprathermal evaporation,” , vol. 252, pp. 529–552, Jan. 1982, doi: https://doi.org/10.1086/159581
 
 ### Additional MHD options in `<hydro>` block
 
diff --git a/external/parthenon b/external/parthenon
index 5b6bb619..a66670bb 160000
--- a/external/parthenon
+++ b/external/parthenon
@@ -1 +1 @@
-Subproject commit 5b6bb61906f7c278f9724ee9f38e79dee8707098
+Subproject commit a66670bbc767fe03ef48a33794c6cd288ee517ed
diff --git a/inputs/diffusion.in b/inputs/diffusion.in
index a1a3f723..44d02bcb 100644
--- a/inputs/diffusion.in
+++ b/inputs/diffusion.in
@@ -14,7 +14,8 @@ Bx = 1.0     # Bx for x1 step function (permutated for iprobs in other direction
 By = 0.0     # By for x1 step function (permutated for iprobs in other directions)
 
 #iprob = 10   # Diffusion of Gaussian profile in x1 direction
-sigma = 0.1  # standard deviation of Gaussian for iprob=10
+t0 = 0.5      # Temporal offset for initial Gaussian profile
+amp = 1e-6    # Amplitude of Gaussian profile
 
 iprob = 20   # ring diffusion in x1-x2 plane; 21 for x2-x3 plane; 22 for x3-x1 plane
 
@@ -59,8 +60,11 @@ reconstruction = dc
 gamma = 2.0
 
 <diffusion>
-conduction = thermal_diff
+integrator = unsplit
+conduction = anisotropic
+conduction_coeff = fixed
 thermal_diff_coeff_code = 0.01
+rkl2_max_dt_ratio = 400.0
 
 <parthenon/output0>
 file_type = hdf5
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index ab8556f7..090f1a42 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -8,6 +8,7 @@ add_executable(
         eos/adiabatic_glmmhd.cpp
         units.hpp
         eos/adiabatic_hydro.cpp
+        hydro/diffusion/diffusion.cpp
         hydro/diffusion/diffusion.hpp
         hydro/diffusion/conduction.cpp
         hydro/hydro_driver.cpp
diff --git a/src/hydro/diffusion/conduction.cpp b/src/hydro/diffusion/conduction.cpp
index 1e2bf0d6..d6e57d30 100644
--- a/src/hydro/diffusion/conduction.cpp
+++ b/src/hydro/diffusion/conduction.cpp
@@ -1,6 +1,6 @@
 //========================================================================================
 // AthenaPK - a performance portable block structured AMR astrophysical MHD code.
-// Copyright (c) 2021, Athena-Parthenon Collaboration. All rights reserved.
+// Copyright (c) 2021-2023, Athena-Parthenon Collaboration. All rights reserved.
 // Licensed under the 3-clause BSD License, see LICENSE file for details
 //========================================================================================
 // Athena++ astrophysical MHD code
@@ -12,31 +12,29 @@
 //! \brief
 
 // Parthenon headers
+#include <cmath>
 #include <parthenon/package.hpp>
 
 // AthenaPK headers
 #include "../../main.hpp"
+#include "config.hpp"
 #include "diffusion.hpp"
+#include "utils/error_checking.hpp"
 
 using namespace parthenon::package::prelude;
 
+// Calculate the thermal *diffusivity*, \chi, in code units as the energy flux itself
+// is calculated from -\chi \rho \nabla (p/\rho).
 KOKKOS_INLINE_FUNCTION
-Real ThermalDiffusivity::Get(const Real pres, const Real rho, const Real gradTmag) const {
-  if (conduction_ == Conduction::thermal_diff) {
+Real ThermalDiffusivity::Get(const Real pres, const Real rho) const {
+  if (conduction_coeff_type_ == ConductionCoeff::fixed) {
     return coeff_;
-  } else if (conduction_ == Conduction::spitzer) {
-    const Real T = mbar_over_kb_ * pres / rho;
-    const Real kappa = coeff_ * std::pow(T, 5. / 2.); // Full spitzer
-    const Real chi_spitzer = kappa * mbar_over_kb_ / rho;
-
-    // Saturated total flux: fac * \rho * c_{s,isoth}^3
-    // In practice: fac * \rho * c_{s,isoth}^3 * (gradT / gradTmag)
-    // where T is calculated based on p/rho in the code.
-    // Thus, everything is in code units and no conversion is required.
-    // The \rho above is cancelled as we convert the condution above to a diffusvity here.
-    const Real chi_sat =
-        0.34 * std::pow(pres / rho, 3.0 / 2.0) / (gradTmag + TINY_NUMBER);
-    return std::min(chi_spitzer, chi_sat);
+  } else if (conduction_coeff_type_ == ConductionCoeff::spitzer) {
+    const Real T_cgs = mbar_ / kb_ * pres / rho;
+    const Real kappa_spitzer = coeff_ * std::pow(T_cgs, 5. / 2.); // Full spitzer
+
+    // Convert conductivity to diffusivity
+    return kappa_spitzer * mbar_ / kb_ / rho;
 
   } else {
     return 0.0;
@@ -64,74 +62,207 @@ Real EstimateConductionTimestep(MeshData<Real> *md) {
 
   const auto gm1 = hydro_pkg->Param<Real>("AdiabaticIndex");
   const auto &thermal_diff = hydro_pkg->Param<ThermalDiffusivity>("thermal_diff");
+  const auto &flux_sat_prefac = hydro_pkg->Param<Real>("conduction_sat_prefac");
+
+  if (thermal_diff.GetType() == Conduction::isotropic &&
+      thermal_diff.GetCoeffType() == ConductionCoeff::fixed) {
+    // TODO(pgrete): once mindx is properly calculated before this loop, we can get rid of
+    // it entirely.
+    // Using 0.0 as parameters rho and p as they're not used anyway for a fixed coeff.
+    const auto thermal_diff_coeff = thermal_diff.Get(0.0, 0.0);
+    Kokkos::parallel_reduce(
+        "EstimateConductionTimestep (iso fixed)",
+        Kokkos::MDRangePolicy<Kokkos::Rank<4>>(
+            DevExecSpace(), {0, kb.s, jb.s, ib.s},
+            {prim_pack.GetDim(5), kb.e + 1, jb.e + 1, ib.e + 1},
+            {1, 1, 1, ib.e + 1 - ib.s}),
+        KOKKOS_LAMBDA(const int b, const int k, const int j, const int i, Real &min_dt) {
+          const auto &coords = prim_pack.GetCoords(b);
+          min_dt = fmin(min_dt,
+                        SQR(coords.Dxc<1>(k, j, i)) / (thermal_diff_coeff + TINY_NUMBER));
+          if (ndim >= 2) {
+            min_dt = fmin(min_dt, SQR(coords.Dxc<2>(k, j, i)) /
+                                      (thermal_diff_coeff + TINY_NUMBER));
+          }
+          if (ndim >= 3) {
+            min_dt = fmin(min_dt, SQR(coords.Dxc<3>(k, j, i)) /
+                                      (thermal_diff_coeff + TINY_NUMBER));
+          }
+        },
+        Kokkos::Min<Real>(min_dt_cond));
+  } else {
+    Kokkos::parallel_reduce(
+        "EstimateConductionTimestep (general)",
+        Kokkos::MDRangePolicy<Kokkos::Rank<4>>(
+            DevExecSpace(), {0, kb.s, jb.s, ib.s},
+            {prim_pack.GetDim(5), kb.e + 1, jb.e + 1, ib.e + 1},
+            {1, 1, 1, ib.e + 1 - ib.s}),
+        KOKKOS_LAMBDA(const int b, const int k, const int j, const int i, Real &min_dt) {
+          const auto &coords = prim_pack.GetCoords(b);
+          const auto &prim = prim_pack(b);
+          const auto &rho = prim(IDN, k, j, i);
+          const auto &p = prim(IPR, k, j, i);
+
+          const auto dTdx = 0.5 *
+                            (prim(IPR, k, j, i + 1) / prim(IDN, k, j, i + 1) -
+                             prim(IPR, k, j, i - 1) / prim(IDN, k, j, i - 1)) /
+                            coords.Dxc<1>(i);
+
+          const auto dTdy = ndim >= 2
+                                ? 0.5 *
+                                      (prim(IPR, k, j + 1, i) / prim(IDN, k, j + 1, i) -
+                                       prim(IPR, k, j - 1, i) / prim(IDN, k, j - 1, i)) /
+                                      coords.Dxc<2>(j)
+                                : 0.0;
+
+          const auto dTdz = ndim >= 3
+                                ? 0.5 *
+                                      (prim(IPR, k + 1, j, i) / prim(IDN, k + 1, j, i) -
+                                       prim(IPR, k - 1, j, i) / prim(IDN, k - 1, j, i)) /
+                                      coords.Dxc<3>(k)
+                                : 0.0;
+          const auto gradTmag = sqrt(SQR(dTdx) + SQR(dTdy) + SQR(dTdz));
+
+          // No temperature gradient -> no thermal conduction-> no timestep restriction
+          if (gradTmag == 0.0) {
+            return;
+          }
+          auto thermal_diff_coeff = thermal_diff.Get(p, rho);
+
+          if (thermal_diff.GetType() == Conduction::isotropic) {
+            min_dt = fmin(min_dt, SQR(coords.Dxc<1>(k, j, i)) / thermal_diff_coeff);
+            if (ndim >= 2) {
+              min_dt = fmin(min_dt, SQR(coords.Dxc<2>(k, j, i)) / thermal_diff_coeff);
+            }
+            if (ndim >= 3) {
+              min_dt = fmin(min_dt, SQR(coords.Dxc<3>(k, j, i)) / thermal_diff_coeff);
+            }
+            return;
+          }
+          const auto &Bx = prim(IB1, k, j, i);
+          const auto &By = prim(IB2, k, j, i);
+          const auto &Bz = prim(IB3, k, j, i);
+          const auto Bmag = sqrt(SQR(Bx) + SQR(By) + SQR(Bz));
+          // Need to have some local field for anisotropic conduction
+          if (Bmag == 0.0) {
+            return;
+          }
+
+          // In the saturated regime, i.e., when the ratio of classic to saturated fluxes
+          // is large, the equation becomes hyperbolic with the signal speed of the
+          // conduction front being comparable to the sound speed, see [Balsara, Tilley,
+          // and Howk MANRAS 2008]. Therefore, we don't need to contrain the "parabolic"
+          // timestep here (and the hyperbolic one is constrained automatically by the
+          // fluid EstimateTimestep call).
+          auto const flux_sat = flux_sat_prefac * std::sqrt(p / rho) * p;
+          auto const flux_classic = thermal_diff_coeff * rho * gradTmag;
+          if (flux_classic / flux_sat > 100.) {
+            return;
+          }
+
+          const auto costheta =
+              fabs(Bx * dTdx + By * dTdy + Bz * dTdz) / (Bmag * gradTmag);
+
+          min_dt = fmin(min_dt, SQR(coords.Dxc<1>(k, j, i)) /
+                                    (thermal_diff_coeff * fabs(Bx) / Bmag * costheta +
+                                     TINY_NUMBER));
+          if (ndim >= 2) {
+            min_dt = fmin(min_dt, SQR(coords.Dxc<2>(k, j, i)) /
+                                      (thermal_diff_coeff * fabs(By) / Bmag * costheta +
+                                       TINY_NUMBER));
+          }
+          if (ndim >= 3) {
+            min_dt = fmin(min_dt, SQR(coords.Dxc<3>(k, j, i)) /
+                                      (thermal_diff_coeff * fabs(Bz) / Bmag * costheta +
+                                       TINY_NUMBER));
+          }
+        },
+        Kokkos::Min<Real>(min_dt_cond));
+  }
+
+  return fac * min_dt_cond;
+}
+
+//---------------------------------------------------------------------------------------
+//! Calculate isotropic thermal conduction with fixed coefficient
+
+void ThermalFluxIsoFixed(MeshData<Real> *md) {
+  auto pmb = md->GetBlockData(0)->GetBlockPointer();
+  IndexRange ib = pmb->cellbounds.GetBoundsI(IndexDomain::interior);
+  IndexRange jb = pmb->cellbounds.GetBoundsJ(IndexDomain::interior);
+  IndexRange kb = pmb->cellbounds.GetBoundsK(IndexDomain::interior);
+
+  std::vector<parthenon::MetadataFlag> flags_ind({Metadata::Independent});
+  auto cons_pack = md->PackVariablesAndFluxes(flags_ind);
+  auto hydro_pkg = pmb->packages.Get("Hydro");
+
+  auto const &prim_pack = md->PackVariables(std::vector<std::string>{"prim"});
+
+  const int ndim = pmb->pmy_mesh->ndim;
+
+  const auto &thermal_diff = hydro_pkg->Param<ThermalDiffusivity>("thermal_diff");
+  // Using fixed and uniform coefficient so it's safe to get it outside the kernel.
+  // Using 0.0 as parameters rho and p as they're not used anyway for a fixed coeff.
+  const auto thermal_diff_coeff = thermal_diff.Get(0.0, 0.0);
 
-  Kokkos::parallel_reduce(
-      "EstimateConductionTimestep",
-      Kokkos::MDRangePolicy<Kokkos::Rank<4>>(
-          DevExecSpace(), {0, kb.s, jb.s, ib.s},
-          {prim_pack.GetDim(5), kb.e + 1, jb.e + 1, ib.e + 1},
-          {1, 1, 1, ib.e + 1 - ib.s}),
-      KOKKOS_LAMBDA(const int b, const int k, const int j, const int i, Real &min_dt) {
+  parthenon::par_for(
+      DEFAULT_LOOP_PATTERN, "Thermal conduction X1 fluxes (iso)",
+      parthenon::DevExecSpace(), 0, cons_pack.GetDim(5) - 1, kb.s, kb.e, jb.s, jb.e, ib.s,
+      ib.e + 1, KOKKOS_LAMBDA(const int b, const int k, const int j, const int i) {
         const auto &coords = prim_pack.GetCoords(b);
+        auto &cons = cons_pack(b);
         const auto &prim = prim_pack(b);
-        const auto &rho = prim(IDN, k, j, i);
-        const auto &p = prim(IPR, k, j, i);
-        // TODO(pgrete) when we introduce isotropic thermal conduction a lot of the
-        // following machinery should be hidden behind conditionals
-        const auto &Bx = prim(IB1, k, j, i);
-        const auto &By = prim(IB2, k, j, i);
-        const auto &Bz = prim(IB3, k, j, i);
-        const auto Bmag = sqrt(SQR(Bx) + SQR(By) + SQR(Bz));
-
-        const auto dTdx = 0.5 *
-                          (prim(IPR, k, j, i + 1) / prim(IDN, k, j, i + 1) -
-                           prim(IPR, k, j, i - 1) / prim(IDN, k, j, i - 1)) /
-                          coords.Dxc<1>(i);
-
-        const auto dTdy = 0.5 *
-                          (prim(IPR, k, j + 1, i) / prim(IDN, k, j + 1, i) -
-                           prim(IPR, k, j - 1, i) / prim(IDN, k, j - 1, i)) /
-                          coords.Dxc<2>(j);
-
-        const auto dTdz = ndim >= 3
-                              ? 0.5 *
-                                    (prim(IPR, k + 1, j, i) / prim(IDN, k + 1, j, i) -
-                                     prim(IPR, k - 1, j, i) / prim(IDN, k - 1, j, i)) /
-                                    coords.Dxc<3>(k)
-                              : 0.0;
-        const auto gradTmag = sqrt(SQR(dTdx) + SQR(dTdy) + SQR(dTdz));
-        auto thermal_diff_coeff = thermal_diff.Get(p, rho, gradTmag);
+        const auto T_i = prim(IPR, k, j, i) / prim(IDN, k, j, i);
+        const auto T_im1 = prim(IPR, k, j, i - 1) / prim(IDN, k, j, i - 1);
+        const auto dTdx = (T_i - T_im1) / coords.Dxc<1>(k, j, i);
+        const auto denf = 0.5 * (prim(IDN, k, j, i) + prim(IDN, k, j, i - 1));
+        cons.flux(X1DIR, IEN, k, j, i) -= thermal_diff_coeff * denf * dTdx;
+      });
 
-        const auto denom = Bmag * gradTmag;
-        // if either Bmag or gradTmag are 0, no anisotropic thermal conduction
-        if (denom == 0.0) {
-          return;
-        }
-        const auto costheta = fabs(Bx * dTdx + By * dTdy + Bz * dTdz) / denom;
-
-        min_dt = fmin(
-            min_dt, SQR(coords.Dxc<1>(k, j, i)) /
-                        (thermal_diff_coeff * fabs(Bx) / Bmag * costheta + TINY_NUMBER));
-        if (ndim >= 2) {
-          min_dt = fmin(min_dt, SQR(coords.Dxc<2>(k, j, i)) /
-                                    (thermal_diff_coeff * fabs(By) / Bmag * costheta +
-                                     TINY_NUMBER));
-        }
-        if (ndim >= 3) {
-          min_dt = fmin(min_dt, SQR(coords.Dxc<3>(k, j, i)) /
-                                    (thermal_diff_coeff * fabs(Bz) / Bmag * costheta +
-                                     TINY_NUMBER));
-        }
-      },
-      Kokkos::Min<Real>(min_dt_cond));
+  if (ndim < 2) {
+    return;
+  }
+  /* Compute heat fluxes in 2-direction  --------------------------------------*/
+  parthenon::par_for(
+      DEFAULT_LOOP_PATTERN, "Thermal conduction X2 fluxes (iso)",
+      parthenon::DevExecSpace(), 0, cons_pack.GetDim(5) - 1, kb.s, kb.e, jb.s, jb.e + 1,
+      ib.s, ib.e, KOKKOS_LAMBDA(const int b, const int k, const int j, const int i) {
+        const auto &coords = prim_pack.GetCoords(b);
+        auto &cons = cons_pack(b);
+        const auto &prim = prim_pack(b);
 
-  return fac * min_dt_cond;
+        const auto T_j = prim(IPR, k, j, i) / prim(IDN, k, j, i);
+        const auto T_jm1 = prim(IPR, k, j - 1, i) / prim(IDN, k, j - 1, i);
+        const auto dTdy = (T_j - T_jm1) / coords.Dxc<2>(k, j, i);
+        const auto denf = 0.5 * (prim(IDN, k, j, i) + prim(IDN, k, j - 1, i));
+        cons.flux(X2DIR, IEN, k, j, i) -= thermal_diff_coeff * denf * dTdy;
+      });
+  /* Compute heat fluxes in 3-direction, 3D problem ONLY  ---------------------*/
+  if (ndim < 3) {
+    return;
+  }
+
+  parthenon::par_for(
+      DEFAULT_LOOP_PATTERN, "Thermal conduction X3 fluxes (iso)",
+      parthenon::DevExecSpace(), 0, cons_pack.GetDim(5) - 1, kb.s, kb.e + 1, jb.s, jb.e,
+      ib.s, ib.e, KOKKOS_LAMBDA(const int b, const int k, const int j, const int i) {
+        const auto &coords = prim_pack.GetCoords(b);
+        auto &cons = cons_pack(b);
+        const auto &prim = prim_pack(b);
+
+        const auto T_k = prim(IPR, k, j, i) / prim(IDN, k, j, i);
+        const auto T_km1 = prim(IPR, k - 1, j, i) / prim(IDN, k - 1, j, i);
+        const auto dTdz = (T_k - T_km1) / coords.Dxc<3>(k, j, i);
+        const auto denf = 0.5 * (prim(IDN, k, j, i) + prim(IDN, k - 1, j, i));
+        cons.flux(X3DIR, IEN, k, j, i) -= thermal_diff_coeff * denf * dTdz;
+      });
 }
 
 //---------------------------------------------------------------------------------------
-//! Calculate anisotropic thermal conduction
+//! Calculate thermal conduction, general case, i.e., anisotropic and/or with varying
+//! (incl. saturated) coefficient
 
-void ThermalFluxAniso(MeshData<Real> *md) {
+void ThermalFluxGeneral(MeshData<Real> *md) {
   auto pmb = md->GetBlockData(0)->GetBlockPointer();
   IndexRange ib = pmb->cellbounds.GetBoundsI(IndexDomain::interior);
   IndexRange jb = pmb->cellbounds.GetBoundsJ(IndexDomain::interior);
@@ -146,18 +277,18 @@ void ThermalFluxAniso(MeshData<Real> *md) {
   const int ndim = pmb->pmy_mesh->ndim;
 
   const auto &thermal_diff = hydro_pkg->Param<ThermalDiffusivity>("thermal_diff");
+  const auto &flux_sat_prefac = hydro_pkg->Param<Real>("conduction_sat_prefac");
 
   parthenon::par_for(
-      DEFAULT_LOOP_PATTERN, "Thermal conduction X1 fluxes", parthenon::DevExecSpace(), 0,
-      cons_pack.GetDim(5) - 1, kb.s, kb.e, jb.s, jb.e, ib.s, ib.e + 1,
-      KOKKOS_LAMBDA(const int b, const int k, const int j, const int i) {
+      DEFAULT_LOOP_PATTERN, "Thermal conduction X1 fluxes (general)",
+      parthenon::DevExecSpace(), 0, cons_pack.GetDim(5) - 1, kb.s, kb.e, jb.s, jb.e, ib.s,
+      ib.e + 1, KOKKOS_LAMBDA(const int b, const int k, const int j, const int i) {
         const auto &coords = prim_pack.GetCoords(b);
         auto &cons = cons_pack(b);
         const auto &prim = prim_pack(b);
 
         // Variables only required in 3D case
         Real dTdz = 0.0;
-        Real Bz = 0.0;
 
         // clang-format off
         /* Monotonized temperature difference dT/dy */
@@ -170,7 +301,7 @@ void ThermalFluxAniso(MeshData<Real> *md) {
                            prim(IPR, k, j    , i - 1) / prim(IDN, k, j    , i - 1),
                            prim(IPR, k, j    , i - 1) / prim(IDN, k, j    , i - 1) -
                            prim(IPR, k, j - 1, i - 1) / prim(IDN, k, j - 1, i - 1)) /
-            coords.Dxc<2>(k, j, i);
+            coords.Dxc<2>( k, j, i);
 
         if (ndim >= 3) {
           /* Monotonized temperature difference dT/dz, 3D problem ONLY */
@@ -182,8 +313,7 @@ void ThermalFluxAniso(MeshData<Real> *md) {
                                 prim(IPR, k    , j, i - 1) / prim(IDN, k    , j, i - 1),
                                 prim(IPR, k    , j, i - 1) / prim(IDN, k    , j, i - 1) -
                                 prim(IPR, k - 1, j, i - 1) / prim(IDN, k - 1, j, i - 1)) /
-                 coords.Dxc<3>(k, j, i);
-          Bz = 0.5 * (prim(IB3, k, j, i - 1) + prim(IB3, k, j, i));
+                 coords.Dxc<3>( k, j, i);
         }
         // clang-format on
 
@@ -191,34 +321,69 @@ void ThermalFluxAniso(MeshData<Real> *md) {
         const auto T_im1 = prim(IPR, k, j, i - 1) / prim(IDN, k, j, i - 1);
         const auto dTdx = (T_i - T_im1) / coords.Dxc<1>(k, j, i);
 
-        // Calc interface values
-        const auto Bx = 0.5 * (prim(IB1, k, j, i - 1) + prim(IB1, k, j, i));
-        const auto By = 0.5 * (prim(IB2, k, j, i - 1) + prim(IB2, k, j, i));
-        auto B02 = SQR(Bx) + SQR(By) + SQR(Bz);
-        B02 = std::max(B02, TINY_NUMBER); /* limit in case B=0 */
-        const auto bDotGradT = Bx * dTdx + By * dTdy + Bz * dTdz;
-
         const auto denf = 0.5 * (prim(IDN, k, j, i) + prim(IDN, k, j, i - 1));
-        const auto gradTmag = sqrt(SQR(dTdx) + SQR(dTdy) + SQR(dTdz));
         const auto thermal_diff_f =
-            0.5 *
-            (thermal_diff.Get(prim(IPR, k, j, i), prim(IDN, k, j, i), gradTmag) +
-             thermal_diff.Get(prim(IPR, k, j, i - 1), prim(IDN, k, j, i - 1), gradTmag));
-        cons.flux(X1DIR, IEN, k, j, i) -= thermal_diff_f * denf * (Bx * bDotGradT) / B02;
+            0.5 * (thermal_diff.Get(prim(IPR, k, j, i), prim(IDN, k, j, i)) +
+                   thermal_diff.Get(prim(IPR, k, j, i - 1), prim(IDN, k, j, i - 1)));
+        const auto gradTmag = std::sqrt(SQR(dTdx) + SQR(dTdy) + SQR(dTdz));
+
+        // Calculate "classic" fluxes
+        Real flux_classic = 0.0;
+        Real flux_classic_mag = 0.0;
+        if (thermal_diff.GetType() == Conduction::anisotropic) {
+          const auto Bx = 0.5 * (prim(IB1, k, j, i - 1) + prim(IB1, k, j, i));
+          const auto By = 0.5 * (prim(IB2, k, j, i - 1) + prim(IB2, k, j, i));
+          const auto Bz =
+              ndim >= 3 ? 0.5 * (prim(IB3, k, j, i - 1) + prim(IB3, k, j, i)) : 0.0;
+          auto Bmag = std::sqrt(SQR(Bx) + SQR(By) + SQR(Bz));
+          Bmag = std::max(Bmag, TINY_NUMBER); /* limit in case B=0 */
+          const auto bx = Bx / Bmag;          // unit vector component
+          const auto bDotGradT = (Bx * dTdx + By * dTdy + Bz * dTdz) / Bmag;
+          flux_classic = -thermal_diff_f * denf * bDotGradT * bx;
+          flux_classic_mag = std::abs(thermal_diff_f * denf * bDotGradT);
+        } else if (thermal_diff.GetType() == Conduction::isotropic) {
+          flux_classic = -thermal_diff_f * denf * dTdx;
+          flux_classic_mag = thermal_diff_f * denf * gradTmag;
+        } else {
+          PARTHENON_FAIL("Unknown thermal diffusion flux.");
+        }
+
+        // Calculate saturated fluxes using upwinding, see (A3) in Mignone+12.
+        // Note that we are not concerned about the sign of flux_sat here. The way it is
+        // calculated it's always positive because we use it in the harmonic mean with
+        // the flux_classic_mag below. The correct sign is eventually picked up again from
+        // flux_classic.
+        Real flux_sat;
+        // Use first order limiting for now.
+        if (flux_classic > 0.0) {
+          flux_sat = flux_sat_prefac * std::sqrt(prim(IPR, k, j, i - 1) / denf) *
+                     prim(IPR, k, j, i - 1);
+        } else if (flux_classic < 0.0) {
+          flux_sat =
+              flux_sat_prefac * std::sqrt(prim(IPR, k, j, i) / denf) * prim(IPR, k, j, i);
+        } else {
+          const auto presf = 0.5 * (prim(IPR, k, j, i) + prim(IPR, k, j, i - 1));
+          flux_sat = flux_sat_prefac * std::sqrt(presf / denf) * presf;
+        }
+
+        cons.flux(X1DIR, IEN, k, j, i) +=
+            (flux_sat / (flux_sat + flux_classic_mag)) * flux_classic;
       });
 
+  if (ndim < 2) {
+    return;
+  }
   /* Compute heat fluxes in 2-direction  --------------------------------------*/
   parthenon::par_for(
-      DEFAULT_LOOP_PATTERN, "Thermal conduction X2 fluxes", parthenon::DevExecSpace(), 0,
-      cons_pack.GetDim(5) - 1, kb.s, kb.e, jb.s, jb.e + 1, ib.s, ib.e,
-      KOKKOS_LAMBDA(const int b, const int k, const int j, const int i) {
+      DEFAULT_LOOP_PATTERN, "Thermal conduction X2 fluxes (general)",
+      parthenon::DevExecSpace(), 0, cons_pack.GetDim(5) - 1, kb.s, kb.e, jb.s, jb.e + 1,
+      ib.s, ib.e, KOKKOS_LAMBDA(const int b, const int k, const int j, const int i) {
         const auto &coords = prim_pack.GetCoords(b);
         auto &cons = cons_pack(b);
         const auto &prim = prim_pack(b);
 
         // Variables only required in 3D case
         Real dTdz = 0.0;
-        Real Bz = 0.0;
 
         // clang-format off
         /* Monotonized temperature difference dT/dx */
@@ -245,7 +410,6 @@ void ThermalFluxAniso(MeshData<Real> *md) {
                                 prim(IPR, k - 1, j - 1, i) / prim(IDN, k - 1, j - 1, i)) /
                  coords.Dxc<3>(k, j, i);
 
-          Bz = 0.5 * (prim(IB3, k, j - 1, i) + prim(IB3, k, j, i));
         }
         // clang-format on
 
@@ -253,20 +417,50 @@ void ThermalFluxAniso(MeshData<Real> *md) {
         const auto T_jm1 = prim(IPR, k, j - 1, i) / prim(IDN, k, j - 1, i);
         const auto dTdy = (T_j - T_jm1) / coords.Dxc<2>(k, j, i);
 
-        // Calc interface values
-        const auto Bx = 0.5 * (prim(IB1, k, j - 1, i) + prim(IB1, k, j, i));
-        const auto By = 0.5 * (prim(IB2, k, j - 1, i) + prim(IB2, k, j, i));
-        Real B02 = SQR(Bx) + SQR(By) + SQR(Bz);
-        B02 = std::max(B02, TINY_NUMBER); /* limit in case B=0 */
-        const auto bDotGradT = Bx * dTdx + By * dTdy + Bz * dTdz;
-
         const auto denf = 0.5 * (prim(IDN, k, j, i) + prim(IDN, k, j - 1, i));
         const auto gradTmag = sqrt(SQR(dTdx) + SQR(dTdy) + SQR(dTdz));
         const auto thermal_diff_f =
-            0.5 *
-            (thermal_diff.Get(prim(IPR, k, j, i), prim(IDN, k, j, i), gradTmag) +
-             thermal_diff.Get(prim(IPR, k, j - 1, i), prim(IDN, k, j - 1, i), gradTmag));
-        cons.flux(X2DIR, IEN, k, j, i) -= thermal_diff_f * denf * (By * bDotGradT) / B02;
+            0.5 * (thermal_diff.Get(prim(IPR, k, j, i), prim(IDN, k, j, i)) +
+                   thermal_diff.Get(prim(IPR, k, j - 1, i), prim(IDN, k, j - 1, i)));
+
+        // Calculate "classic" fluxes
+        Real flux_classic = 0.0;
+        Real flux_classic_mag = 0.0;
+        if (thermal_diff.GetType() == Conduction::anisotropic) {
+          const auto Bx = 0.5 * (prim(IB1, k, j - 1, i) + prim(IB1, k, j, i));
+          const auto By = 0.5 * (prim(IB2, k, j - 1, i) + prim(IB2, k, j, i));
+          const auto Bz =
+              ndim >= 3 ? 0.5 * (prim(IB3, k, j - 1, i) + prim(IB3, k, j, i)) : 0.0;
+          auto Bmag = std::sqrt(SQR(Bx) + SQR(By) + SQR(Bz));
+          Bmag = std::max(Bmag, TINY_NUMBER); /* limit in case B=0 */
+          const auto by = By / Bmag;          // unit vector component
+          const auto bDotGradT = (Bx * dTdx + By * dTdy + Bz * dTdz) / Bmag;
+          flux_classic = -thermal_diff_f * denf * bDotGradT * by;
+          flux_classic_mag = std::abs(thermal_diff_f * denf * bDotGradT);
+        } else if (thermal_diff.GetType() == Conduction::isotropic) {
+          flux_classic = -thermal_diff_f * denf * dTdy;
+          flux_classic_mag = thermal_diff_f * denf * gradTmag;
+        } else {
+          PARTHENON_FAIL("Unknown thermal diffusion flux.");
+        }
+
+        // Calculate saturated fluxes,see comment above.
+        Real flux_sat;
+        // Use first order limiting for now.
+        if (flux_classic > 0.0) {
+          flux_sat = flux_sat_prefac * std::sqrt(prim(IPR, k, j - 1, i) / denf) *
+                     prim(IPR, k, j - 1, i);
+        } else if (flux_classic < 0.0) {
+          flux_sat =
+              flux_sat_prefac * std::sqrt(prim(IPR, k, j, i) / denf) * prim(IPR, k, j, i);
+        } else {
+          const auto presf = 0.5 * (prim(IPR, k, j, i) + prim(IPR, k, j - 1, i));
+          flux_sat = flux_sat_prefac * std::sqrt(presf / denf) * presf;
+        }
+
+        // Calc interface values
+        cons.flux(X2DIR, IEN, k, j, i) +=
+            (flux_sat / (flux_sat + flux_classic_mag)) * flux_classic;
       });
   /* Compute heat fluxes in 3-direction, 3D problem ONLY  ---------------------*/
   if (ndim < 3) {
@@ -274,9 +468,9 @@ void ThermalFluxAniso(MeshData<Real> *md) {
   }
 
   parthenon::par_for(
-      DEFAULT_LOOP_PATTERN, "Thermal conduction X3 fluxes", parthenon::DevExecSpace(), 0,
-      cons_pack.GetDim(5) - 1, kb.s, kb.e + 1, jb.s, jb.e, ib.s, ib.e,
-      KOKKOS_LAMBDA(const int b, const int k, const int j, const int i) {
+      DEFAULT_LOOP_PATTERN, "Thermal conduction X3 fluxes (general)",
+      parthenon::DevExecSpace(), 0, cons_pack.GetDim(5) - 1, kb.s, kb.e + 1, jb.s, jb.e,
+      ib.s, ib.e, KOKKOS_LAMBDA(const int b, const int k, const int j, const int i) {
         const auto &coords = prim_pack.GetCoords(b);
         auto &cons = cons_pack(b);
         const auto &prim = prim_pack(b);
@@ -311,20 +505,46 @@ void ThermalFluxAniso(MeshData<Real> *md) {
         const auto T_km1 = prim(IPR, k - 1, j, i) / prim(IDN, k - 1, j, i);
         const auto dTdz = (T_k - T_km1) / coords.Dxc<3>(k, j, i);
 
-        const auto Bx = 0.5 * (prim(IB1, k - 1, j, i) + prim(IB1, k, j, i));
-        const auto By = 0.5 * (prim(IB2, k - 1, j, i) + prim(IB2, k, j, i));
-        const auto Bz = 0.5 * (prim(IB3, k - 1, j, i) + prim(IB3, k, j, i));
-        Real B02 = SQR(Bx) + SQR(By) + SQR(Bz);
-        B02 = std::max(B02, TINY_NUMBER); /* limit in case B=0 */
-        const auto bDotGradT = Bx * dTdx + By * dTdy + Bz * dTdz;
-
         const auto denf = 0.5 * (prim(IDN, k, j, i) + prim(IDN, k - 1, j, i));
         const auto gradTmag = sqrt(SQR(dTdx) + SQR(dTdy) + SQR(dTdz));
         const auto thermal_diff_f =
-            0.5 *
-            (thermal_diff.Get(prim(IPR, k, j, i), prim(IDN, k, j, i), gradTmag) +
-             thermal_diff.Get(prim(IPR, k - 1, j, i), prim(IDN, k - 1, j, i), gradTmag));
+            0.5 * (thermal_diff.Get(prim(IPR, k, j, i), prim(IDN, k, j, i)) +
+                   thermal_diff.Get(prim(IPR, k - 1, j, i), prim(IDN, k - 1, j, i)));
+
+        // Calculate "classic" fluxes
+        Real flux_classic = 0.0;
+        Real flux_classic_mag = 0.0;
+        if (thermal_diff.GetType() == Conduction::anisotropic) {
+          const auto Bx = 0.5 * (prim(IB1, k - 1, j, i) + prim(IB1, k, j, i));
+          const auto By = 0.5 * (prim(IB2, k - 1, j, i) + prim(IB2, k, j, i));
+          const auto Bz = 0.5 * (prim(IB3, k - 1, j, i) + prim(IB3, k, j, i));
+          auto Bmag = std::sqrt(SQR(Bx) + SQR(By) + SQR(Bz));
+          Bmag = std::max(Bmag, TINY_NUMBER); /* limit in case B=0 */
+          const auto bz = Bz / Bmag;          // unit vector component
+          const auto bDotGradT = (Bx * dTdx + By * dTdy + Bz * dTdz) / Bmag;
+          flux_classic = -thermal_diff_f * denf * bDotGradT * bz;
+          flux_classic_mag = std::abs(thermal_diff_f * denf * bDotGradT);
+        } else if (thermal_diff.GetType() == Conduction::isotropic) {
+          flux_classic = -thermal_diff_f * denf * dTdz;
+          flux_classic_mag = thermal_diff_f * denf * gradTmag;
+        } else {
+          PARTHENON_FAIL("Unknown thermal diffusion flux.");
+        }
+        // Calculate saturated fluxes,see comment above.
+        Real flux_sat;
+        // Use first order limiting for now.
+        if (flux_classic > 0.0) {
+          flux_sat = flux_sat_prefac * std::sqrt(prim(IPR, k - 1, j, i) / denf) *
+                     prim(IPR, k - 1, j, i);
+        } else if (flux_classic < 0.0) {
+          flux_sat =
+              flux_sat_prefac * std::sqrt(prim(IPR, k, j, i) / denf) * prim(IPR, k, j, i);
+        } else {
+          const auto presf = 0.5 * (prim(IPR, k, j, i) + prim(IPR, k - 1, j, i));
+          flux_sat = flux_sat_prefac * std::sqrt(presf / denf) * presf;
+        }
 
-        cons.flux(X3DIR, IEN, k, j, i) -= thermal_diff_f * denf * (Bz * bDotGradT) / B02;
+        cons.flux(X3DIR, IEN, k, j, i) +=
+            (flux_sat / (flux_sat + flux_classic_mag)) * flux_classic;
       });
 }
diff --git a/src/hydro/diffusion/diffusion.cpp b/src/hydro/diffusion/diffusion.cpp
new file mode 100644
index 00000000..20617ca2
--- /dev/null
+++ b/src/hydro/diffusion/diffusion.cpp
@@ -0,0 +1,31 @@
+//========================================================================================
+// AthenaPK - a performance portable block structured AMR astrophysical MHD code.
+// Copyright (c) 2021, Athena-Parthenon Collaboration. All rights reserved.
+// Licensed under the 3-clause BSD License, see LICENSE file for details
+//========================================================================================
+//! \file diffusion.cpp
+//! \brief
+
+// Parthenon headers
+#include <parthenon/package.hpp>
+
+// AthenaPK headers
+#include "../../main.hpp"
+#include "diffusion.hpp"
+
+using namespace parthenon::package::prelude;
+
+TaskStatus CalcDiffFluxes(StateDescriptor *hydro_pkg, MeshData<Real> *md) {
+  const auto &conduction = hydro_pkg->Param<Conduction>("conduction");
+  if (conduction != Conduction::none) {
+    const auto &thermal_diff = hydro_pkg->Param<ThermalDiffusivity>("thermal_diff");
+
+    if (conduction == Conduction::isotropic &&
+        thermal_diff.GetCoeffType() == ConductionCoeff::fixed) {
+      ThermalFluxIsoFixed(md);
+    } else {
+      ThermalFluxGeneral(md);
+    }
+  }
+  return TaskStatus::complete;
+}
diff --git a/src/hydro/diffusion/diffusion.hpp b/src/hydro/diffusion/diffusion.hpp
index 273464d7..7d522902 100644
--- a/src/hydro/diffusion/diffusion.hpp
+++ b/src/hydro/diffusion/diffusion.hpp
@@ -69,23 +69,37 @@ KOKKOS_INLINE_FUNCTION Real lim4(const Real A, const Real B, const Real C, const
 
 struct ThermalDiffusivity {
  private:
-  Real mbar_over_kb_;
+  Real mbar_, me_, kb_;
   Conduction conduction_;
+  ConductionCoeff conduction_coeff_type_;
   // "free" coefficient/prefactor. Value depends on conduction is set in the constructor.
   Real coeff_;
 
  public:
   KOKKOS_INLINE_FUNCTION
-  ThermalDiffusivity(Conduction conduction, Real coeff, Real mbar_over_kb)
-      : coeff_(coeff), conduction_(conduction), mbar_over_kb_(mbar_over_kb) {}
+  ThermalDiffusivity(Conduction conduction, ConductionCoeff conduction_coeff_type,
+                     Real coeff, Real mbar, Real me, Real kb)
+      : conduction_(conduction), conduction_coeff_type_(conduction_coeff_type),
+        coeff_(coeff), mbar_(mbar), me_(me), kb_(kb) {}
 
   KOKKOS_INLINE_FUNCTION
-  Real Get(const Real pres, const Real rho, const Real gradTmag) const;
+  Real Get(const Real pres, const Real rho) const;
+
+  KOKKOS_INLINE_FUNCTION
+  Conduction GetType() const { return conduction_; }
+
+  KOKKOS_INLINE_FUNCTION
+  ConductionCoeff GetCoeffType() const { return conduction_coeff_type_; }
 };
 
 Real EstimateConductionTimestep(MeshData<Real> *md);
 
-//! Calculate anisotropic thermal conduction
-void ThermalFluxAniso(MeshData<Real> *md);
+//! Calculate isotropic thermal conduction with fixed coefficient
+void ThermalFluxIsoFixed(MeshData<Real> *md);
+//! Calculate thermal conduction (general case incl. anisotropic and saturated)
+void ThermalFluxGeneral(MeshData<Real> *md);
+
+// Calculate all diffusion fluxes, i.e., update the .flux views in md
+TaskStatus CalcDiffFluxes(StateDescriptor *hydro_pkg, MeshData<Real> *md);
 
 #endif //  HYDRO_DIFFUSION_DIFFUSION_HPP_
diff --git a/src/hydro/hydro.cpp b/src/hydro/hydro.cpp
index 6d0e087c..54a97b7f 100644
--- a/src/hydro/hydro.cpp
+++ b/src/hydro/hydro.cpp
@@ -55,6 +55,35 @@ parthenon::Packages_t ProcessPackages(std::unique_ptr<ParameterInput> &pin) {
   return packages;
 }
 
+// Using this per cycle function to populate various variables in
+// Params that require global reduction *and* need to be set/known when
+// the task list is constructed (versus when the task list is being executed).
+// TODO(next person touching this function): If more/separate feature are required
+// please separate concerns.
+void PreStepMeshUserWorkInLoop(Mesh *pmesh, ParameterInput *pin, SimTime &tm) {
+  auto hydro_pkg = pmesh->block_list[0]->packages.Get("Hydro");
+  const auto num_partitions = pmesh->DefaultNumPartitions();
+
+  if ((hydro_pkg->Param<DiffInt>("diffint") == DiffInt::rkl2) &&
+      (hydro_pkg->Param<Conduction>("conduction") != Conduction::none)) {
+    auto dt_diff = std::numeric_limits<Real>::max();
+    for (auto i = 0; i < num_partitions; i++) {
+      auto &md = pmesh->mesh_data.GetOrAdd("base", i);
+
+      dt_diff = std::min(dt_diff, EstimateConductionTimestep(md.get()));
+    }
+#ifdef MPI_PARALLEL
+    PARTHENON_MPI_CHECK(MPI_Allreduce(MPI_IN_PLACE, &dt_diff, 1, MPI_PARTHENON_REAL,
+                                      MPI_MIN, MPI_COMM_WORLD));
+#endif
+    hydro_pkg->UpdateParam("dt_diff", dt_diff);
+    const auto max_dt_ratio = hydro_pkg->Param<Real>("rkl2_max_dt_ratio");
+    if (max_dt_ratio > 0.0 && tm.dt / dt_diff > max_dt_ratio) {
+      tm.dt = max_dt_ratio * dt_diff;
+    }
+  }
+}
+
 template <Hst hst, int idx = -1>
 Real HydroHst(MeshData<Real> *md) {
   auto hydro_pkg = md->GetBlockData(0)->GetBlockPointer()->packages.Get("Hydro");
@@ -404,6 +433,7 @@ std::shared_ptr<StateDescriptor> Initialize(ParameterInput *pin) {
       pkg->AddParam<>("mu", mu);
       pkg->AddParam<>("mu_e", mu_e);
       pkg->AddParam<>("He_mass_fraction", He_mass_fraction);
+      pkg->AddParam<>("mbar", mu * units.atomic_mass_unit());
       // Following convention in the astro community, we're using mh as unit for the mean
       // molecular weight
       pkg->AddParam<>("mbar_over_kb", mu * units.mh() / units.k_boltzmann());
@@ -444,36 +474,93 @@ std::shared_ptr<StateDescriptor> Initialize(ParameterInput *pin) {
 
     auto conduction = Conduction::none;
     auto conduction_str = pin->GetOrAddString("diffusion", "conduction", "none");
-    if (conduction_str == "spitzer") {
-      if (!pkg->AllParams().hasKey("mbar_over_kb")) {
-        PARTHENON_FAIL("Spitzer thermal conduction requires units and gas composition. "
-                       "Please set a 'units' block and the 'hydro/He_mass_fraction' in "
-                       "the input file.");
-      }
-      conduction = Conduction::spitzer;
-
-      Real spitzer_coeff =
-          pin->GetOrAddReal("diffusion", "spitzer_cond_in_erg_by_s_K_cm", 4.6e-7);
-      // Convert to code units. No temp conversion as [T_phys] = [T_code].
-      auto units = pkg->Param<Units>("units");
-      spitzer_coeff *= units.erg() / (units.s() * units.cm());
-
-      auto mbar_over_kb = pkg->Param<Real>("mbar_over_kb");
-      auto thermal_diff = ThermalDiffusivity(conduction, spitzer_coeff, mbar_over_kb);
-      pkg->AddParam<>("thermal_diff", thermal_diff);
-
-    } else if (conduction_str == "thermal_diff") {
-      conduction = Conduction::thermal_diff;
-      Real thermal_diff_coeff_code = pin->GetReal("diffusion", "thermal_diff_coeff_code");
-      auto thermal_diff = ThermalDiffusivity(conduction, thermal_diff_coeff_code, 0.0);
-      pkg->AddParam<>("thermal_diff", thermal_diff);
-
+    if (conduction_str == "isotropic") {
+      conduction = Conduction::isotropic;
+    } else if (conduction_str == "anisotropic") {
+      conduction = Conduction::anisotropic;
     } else if (conduction_str != "none") {
       PARTHENON_FAIL(
-          "AthenaPK unknown conduction method. Options are: spitzer, thermal_diff");
+          "Unknown conduction method. Options are: none, isotropic, anisotropic");
+    }
+    // If conduction is enabled, process supported coefficients
+    if (conduction != Conduction::none) {
+      auto conduction_coeff_str =
+          pin->GetOrAddString("diffusion", "conduction_coeff", "none");
+      auto conduction_coeff = ConductionCoeff::none;
+
+      // Saturated conduction factor to account for "uncertainty", see
+      // Cowie & McKee 77 and a value of 0.3 is typical chosen (though using "weak
+      // evidence", see Balbus & MacKee 1982 and Max, McKee, and Mead 1980).
+      const auto conduction_sat_phi =
+          pin->GetOrAddReal("diffusion", "conduction_sat_phi", 0.3);
+      Real conduction_sat_prefac = 0.0;
+
+      if (conduction_coeff_str == "spitzer") {
+        if (!pkg->AllParams().hasKey("mbar")) {
+          PARTHENON_FAIL("Spitzer thermal conduction requires units and gas composition. "
+                         "Please set a 'units' block and the 'hydro/He_mass_fraction' in "
+                         "the input file.");
+        }
+        conduction_coeff = ConductionCoeff::spitzer;
+
+        // Default value assume fully ionized hydrogen plasma with Coulomb logarithm of 40
+        // to approximate ICM conditions, i.e., 1.84e-5/ln Lambda = 4.6e-7.
+        Real spitzer_coeff =
+            pin->GetOrAddReal("diffusion", "spitzer_cond_in_erg_by_s_K_cm", 4.6e-7);
+        // Convert to code units. No temp conversion as [T_phys] = [T_code].
+        auto units = pkg->Param<Units>("units");
+        spitzer_coeff *= units.erg() / (units.s() * units.cm());
+
+        const auto mbar = pkg->Param<Real>("mbar");
+        auto thermal_diff =
+            ThermalDiffusivity(conduction, conduction_coeff, spitzer_coeff, mbar,
+                               units.electron_mass(), units.k_boltzmann());
+        pkg->AddParam<>("thermal_diff", thermal_diff);
+
+        const auto mu = pkg->Param<Real>("mu");
+        // 6.86 again assumes a fully ionized hydrogen plasma in agreement with
+        // the assumptions above (technically this means mu = 0.5) and can be derived
+        // from eq (7) in CM77 assuming T_e = T_i.
+        conduction_sat_prefac = 6.86 * std::sqrt(mu) * conduction_sat_phi;
+
+      } else if (conduction_coeff_str == "fixed") {
+        conduction_coeff = ConductionCoeff::fixed;
+        Real thermal_diff_coeff_code =
+            pin->GetReal("diffusion", "thermal_diff_coeff_code");
+        auto thermal_diff = ThermalDiffusivity(conduction, conduction_coeff,
+                                               thermal_diff_coeff_code, 0.0, 0.0, 0.0);
+        pkg->AddParam<>("thermal_diff", thermal_diff);
+        // 5.0 prefactor comes from eq (8) in Cowie & McKee 1977
+        // https://doi.org/10.1086/154911
+        conduction_sat_prefac = 5.0 * conduction_sat_phi;
+
+      } else {
+        PARTHENON_FAIL("Thermal conduction is enabled but no coefficient is set. Please "
+                       "set diffusion/conduction_coeff to either 'spitzer' or 'fixed'");
+      }
+      PARTHENON_REQUIRE(conduction_sat_prefac != 0.0,
+                        "Saturated thermal conduction prefactor uninitialized.");
+      pkg->AddParam<>("conduction_sat_prefac", conduction_sat_prefac);
     }
     pkg->AddParam<>("conduction", conduction);
 
+    auto diffint_str = pin->GetOrAddString("diffusion", "integrator", "none");
+    auto diffint = DiffInt::none;
+    if (diffint_str == "unsplit") {
+      diffint = DiffInt::unsplit;
+    } else if (diffint_str == "rkl2") {
+      diffint = DiffInt::rkl2;
+      auto rkl2_dt_ratio = pin->GetOrAddReal("diffusion", "rkl2_max_dt_ratio", -1.0);
+      pkg->AddParam<>("rkl2_max_dt_ratio", rkl2_dt_ratio);
+    } else if (diffint_str != "none") {
+      PARTHENON_FAIL("AthenaPK unknown integration method for diffusion processes. "
+                     "Options are: none, unsplit, rkl2");
+    }
+    if (diffint != DiffInt::none) {
+      pkg->AddParam<Real>("dt_diff", 0.0, true); // diffusive timestep constraint
+    }
+    pkg->AddParam<>("diffint", diffint);
+
     if (fluid == Fluid::euler) {
       AdiabaticHydroEOS eos(pfloor, dfloor, efloor, vceil, eceil, gamma);
       pkg->AddParam<>("eos", eos);
@@ -702,7 +789,9 @@ Real EstimateTimestep(MeshData<Real> *md) {
     min_dt = std::min(min_dt, tabular_cooling.EstimateTimeStep(md));
   }
 
-  if (hydro_pkg->Param<Conduction>("conduction") != Conduction::none) {
+  // For RKL2 STS, the diffusive timestep is calculated separately in the driver
+  if ((hydro_pkg->Param<DiffInt>("diffint") == DiffInt::unsplit) &&
+      (hydro_pkg->Param<Conduction>("conduction") != Conduction::none)) {
     min_dt = std::min(min_dt, EstimateConductionTimestep(md));
   }
 
@@ -775,8 +864,8 @@ TaskStatus CalculateFluxes(std::shared_ptr<MeshData<Real>> &md) {
   int il, iu, jl, ju, kl, ku;
   jl = jb.s, ju = jb.e, kl = kb.s, ku = kb.e;
   // TODO(pgrete): are these looop limits are likely too large for 2nd order
-  if (pmb->block_size.nx2 > 1) {
-    if (pmb->block_size.nx3 == 1) // 2D
+  if (pmb->block_size.nx(X2DIR) > 1) {
+    if (pmb->block_size.nx(X3DIR) == 1) // 2D
       jl = jb.s - 1, ju = jb.e + 1, kl = kb.s, ku = kb.e;
     else // 3D
       jl = jb.s - 1, ju = jb.e + 1, kl = kb.s - 1, ku = kb.e + 1;
@@ -848,7 +937,7 @@ TaskStatus CalculateFluxes(std::shared_ptr<MeshData<Real>> &md) {
         parthenon::ScratchPad2D<Real>::shmem_size(num_scratch_vars, nx1) * 3;
     // set the loop limits
     il = ib.s - 1, iu = ib.e + 1, kl = kb.s, ku = kb.e;
-    if (pmb->block_size.nx3 == 1) // 2D
+    if (pmb->block_size.nx(X3DIR) == 1) // 2D
       kl = kb.s, ku = kb.e;
     else // 3D
       kl = kb.s - 1, ku = kb.e + 1;
@@ -943,9 +1032,9 @@ TaskStatus CalculateFluxes(std::shared_ptr<MeshData<Real>> &md) {
         });
   }
 
-  const auto &conduction = pkg->Param<Conduction>("conduction");
-  if (conduction != Conduction::none) {
-    ThermalFluxAniso(md.get());
+  const auto &diffint = pkg->Param<DiffInt>("diffint");
+  if (diffint == DiffInt::unsplit) {
+    CalcDiffFluxes(pkg.get(), md.get());
   }
 
   return TaskStatus::complete;
diff --git a/src/hydro/hydro.hpp b/src/hydro/hydro.hpp
index 0967d12e..d54e96b1 100644
--- a/src/hydro/hydro.hpp
+++ b/src/hydro/hydro.hpp
@@ -16,6 +16,7 @@ using namespace parthenon::package::prelude;
 namespace Hydro {
 
 parthenon::Packages_t ProcessPackages(std::unique_ptr<ParameterInput> &pin);
+void PreStepMeshUserWorkInLoop(Mesh *pmesh, ParameterInput *pin, parthenon::SimTime &tm);
 std::shared_ptr<StateDescriptor> Initialize(ParameterInput *pin);
 
 template <Fluid fluid>
diff --git a/src/hydro/hydro_driver.cpp b/src/hydro/hydro_driver.cpp
index e33b1b0d..36bdce91 100644
--- a/src/hydro/hydro_driver.cpp
+++ b/src/hydro/hydro_driver.cpp
@@ -1,6 +1,6 @@
 //========================================================================================
 // AthenaPK - a performance portable block structured AMR astrophysical MHD code.
-// Copyright (c) 2020-2022, Athena-Parthenon Collaboration. All rights reserved.
+// Copyright (c) 2020-2023, Athena-Parthenon Collaboration. All rights reserved.
 // Licensed under the BSD 3-Clause License (the "LICENSE").
 //========================================================================================
 
@@ -19,6 +19,7 @@
 #include "../eos/adiabatic_hydro.hpp"
 #include "../pgen/cluster/agn_triggering.hpp"
 #include "../pgen/cluster/magnetic_tower.hpp"
+#include "diffusion/diffusion.hpp"
 #include "glmmhd/glmmhd.hpp"
 #include "hydro.hpp"
 #include "hydro_driver.hpp"
@@ -76,6 +77,310 @@ TaskStatus CalculateGlobalMinDx(MeshData<Real> *md) {
   return TaskStatus::complete;
 }
 
+// Sets all fluxes to 0
+TaskStatus ResetFluxes(MeshData<Real> *md) {
+  auto pmb = md->GetBlockData(0)->GetBlockPointer();
+  IndexRange ib = pmb->cellbounds.GetBoundsI(IndexDomain::interior);
+  IndexRange jb = pmb->cellbounds.GetBoundsJ(IndexDomain::interior);
+  IndexRange kb = pmb->cellbounds.GetBoundsK(IndexDomain::interior);
+
+  // In principle, we'd only need to pack Metadata::WithFluxes here, but
+  // choosing to mirror other use in the code so that the packs are already cached.
+  std::vector<parthenon::MetadataFlag> flags_ind({Metadata::Independent});
+  auto cons_pack = md->PackVariablesAndFluxes(flags_ind);
+
+  const int ndim = pmb->pmy_mesh->ndim;
+  // Using separate loops for each dim as the launch overhead should be hidden
+  // by enough work over the entire pack and it allows to not use any conditionals.
+  parthenon::par_for(
+      DEFAULT_LOOP_PATTERN, "ResetFluxes X1", parthenon::DevExecSpace(), 0,
+      cons_pack.GetDim(5) - 1, 0, cons_pack.GetDim(4) - 1, kb.s, kb.e, jb.s, jb.e, ib.s,
+      ib.e + 1,
+      KOKKOS_LAMBDA(const int b, const int v, const int k, const int j, const int i) {
+        auto &cons = cons_pack(b);
+        cons.flux(X1DIR, v, k, j, i) = 0.0;
+      });
+
+  if (ndim < 2) {
+    return TaskStatus::complete;
+  }
+  parthenon::par_for(
+      DEFAULT_LOOP_PATTERN, "ResetFluxes X2", parthenon::DevExecSpace(), 0,
+      cons_pack.GetDim(5) - 1, 0, cons_pack.GetDim(4) - 1, kb.s, kb.e, jb.s, jb.e + 1,
+      ib.s, ib.e,
+      KOKKOS_LAMBDA(const int b, const int v, const int k, const int j, const int i) {
+        auto &cons = cons_pack(b);
+        cons.flux(X2DIR, v, k, j, i) = 0.0;
+      });
+
+  if (ndim < 3) {
+    return TaskStatus::complete;
+  }
+  parthenon::par_for(
+      DEFAULT_LOOP_PATTERN, "ResetFluxes X3", parthenon::DevExecSpace(), 0,
+      cons_pack.GetDim(5) - 1, 0, cons_pack.GetDim(4) - 1, kb.s, kb.e + 1, jb.s, jb.e,
+      ib.s, ib.e,
+      KOKKOS_LAMBDA(const int b, const int v, const int k, const int j, const int i) {
+        auto &cons = cons_pack(b);
+        cons.flux(X3DIR, v, k, j, i) = 0.0;
+      });
+  return TaskStatus::complete;
+}
+
+TaskStatus RKL2StepFirst(MeshData<Real> *md_Y0, MeshData<Real> *md_Yjm1,
+                         MeshData<Real> *md_Yjm2, MeshData<Real> *md_MY0, const int s_rkl,
+                         const Real tau) {
+  auto pmb = md_Y0->GetBlockData(0)->GetBlockPointer();
+  IndexRange ib = pmb->cellbounds.GetBoundsI(IndexDomain::interior);
+  IndexRange jb = pmb->cellbounds.GetBoundsJ(IndexDomain::interior);
+  IndexRange kb = pmb->cellbounds.GetBoundsK(IndexDomain::interior);
+
+  // Compute coefficients. Meyer+2014 eq. (18)
+  Real mu_tilde_1 = 4. / 3. /
+                    (static_cast<Real>(s_rkl) * static_cast<Real>(s_rkl) +
+                     static_cast<Real>(s_rkl) - 2.);
+
+  // In principle, we'd only need to pack Metadata::WithFluxes here, but
+  // choosing to mirror other use in the code so that the packs are already cached.
+  std::vector<parthenon::MetadataFlag> flags_ind({Metadata::Independent});
+  auto Y0 = md_Y0->PackVariablesAndFluxes(flags_ind);
+  auto Yjm1 = md_Yjm1->PackVariablesAndFluxes(flags_ind);
+  auto Yjm2 = md_Yjm2->PackVariablesAndFluxes(flags_ind);
+  auto MY0 = md_MY0->PackVariablesAndFluxes(flags_ind);
+
+  const int ndim = pmb->pmy_mesh->ndim;
+  // Using separate loops for each dim as the launch overhead should be hidden
+  // by enough work over the entire pack and it allows to not use any conditionals.
+  parthenon::par_for(
+      DEFAULT_LOOP_PATTERN, "RKL first step", parthenon::DevExecSpace(), 0,
+      Y0.GetDim(5) - 1, 0, Y0.GetDim(4) - 1, kb.s, kb.e, jb.s, jb.e, ib.s, ib.e,
+      KOKKOS_LAMBDA(const int b, const int v, const int k, const int j, const int i) {
+        Yjm1(b, v, k, j, i) =
+            Y0(b, v, k, j, i) + mu_tilde_1 * tau * MY0(b, v, k, j, i); // Y_1
+        Yjm2(b, v, k, j, i) = Y0(b, v, k, j, i);                       // Y_0
+      });
+
+  return TaskStatus::complete;
+}
+
+TaskStatus RKL2StepOther(MeshData<Real> *md_Y0, MeshData<Real> *md_Yjm1,
+                         MeshData<Real> *md_Yjm2, MeshData<Real> *md_MY0, const Real mu_j,
+                         const Real nu_j, const Real mu_tilde_j, const Real gamma_tilde_j,
+                         const Real tau) {
+  auto pmb = md_Y0->GetBlockData(0)->GetBlockPointer();
+  IndexRange ib = pmb->cellbounds.GetBoundsI(IndexDomain::interior);
+  IndexRange jb = pmb->cellbounds.GetBoundsJ(IndexDomain::interior);
+  IndexRange kb = pmb->cellbounds.GetBoundsK(IndexDomain::interior);
+
+  // In principle, we'd only need to pack Metadata::WithFluxes here, but
+  // choosing to mirror other use in the code so that the packs are already cached.
+  std::vector<parthenon::MetadataFlag> flags_ind({Metadata::Independent});
+  auto Y0 = md_Y0->PackVariablesAndFluxes(flags_ind);
+  auto Yjm1 = md_Yjm1->PackVariablesAndFluxes(flags_ind);
+  auto Yjm2 = md_Yjm2->PackVariablesAndFluxes(flags_ind);
+  auto MY0 = md_MY0->PackVariablesAndFluxes(flags_ind);
+
+  const int ndim = pmb->pmy_mesh->ndim;
+  // Using separate loops for each dim as the launch overhead should be hidden
+  // by enough work over the entire pack and it allows to not use any conditionals.
+  parthenon::par_for(
+      DEFAULT_LOOP_PATTERN, "RKL other step", parthenon::DevExecSpace(), 0,
+      Y0.GetDim(5) - 1, 0, Y0.GetDim(4) - 1, kb.s, kb.e, jb.s, jb.e, ib.s, ib.e,
+      KOKKOS_LAMBDA(const int b, const int v, const int k, const int j, const int i) {
+        // First calc this step
+        const auto &coords = Yjm1.GetCoords(b);
+        const Real MYjm1 =
+            parthenon::Update::FluxDivHelper(v, k, j, i, ndim, coords, Yjm1(b));
+        const Real Yj = mu_j * Yjm1(b, v, k, j, i) + nu_j * Yjm2(b, v, k, j, i) +
+                        (1.0 - mu_j - nu_j) * Y0(b, v, k, j, i) +
+                        mu_tilde_j * tau * MYjm1 +
+                        gamma_tilde_j * tau * MY0(b, v, k, j, i);
+        // Then shuffle vars for next step
+        Yjm2(b, v, k, j, i) = Yjm1(b, v, k, j, i);
+        Yjm1(b, v, k, j, i) = Yj;
+      });
+
+  return TaskStatus::complete;
+}
+
+// Assumes that prim and cons are in sync initially.
+// Guarantees that prim and cons are in sync at the end.
+void AddSTSTasks(TaskCollection *ptask_coll, Mesh *pmesh, BlockList_t &blocks,
+                 const Real tau) {
+
+  auto hydro_pkg = blocks[0]->packages.Get("Hydro");
+  auto mindt_diff = hydro_pkg->Param<Real>("dt_diff");
+
+  // get number of RKL steps
+  // eq (21) using half hyperbolic timestep due to Strang split
+  int s_rkl =
+      static_cast<int>(0.5 * (std::sqrt(9.0 + 16.0 * tau / mindt_diff) - 1.0)) + 1;
+  // ensure odd number of stages
+  if (s_rkl % 2 == 0) s_rkl += 1;
+
+  if (parthenon::Globals::my_rank == 0) {
+    const auto ratio = 2.0 * tau / mindt_diff;
+    std::cout << "STS ratio: " << ratio << " Taking " << s_rkl << " steps." << std::endl;
+    if (ratio > 400.1) {
+      std::cout << "WARNING: ratio is > 400. Proceed at own risk." << std::endl;
+    }
+  }
+
+  TaskID none(0);
+
+  // Store initial u0 in u1 as "base" will continuously be updated but initial state Y0 is
+  // required for each stage.
+  TaskRegion &region_copy_out = ptask_coll->AddRegion(blocks.size());
+  for (int i = 0; i < blocks.size(); i++) {
+    auto &tl = region_copy_out[i];
+    auto &Y0 = blocks[i]->meshblock_data.Get("u1");
+    auto &base = blocks[i]->meshblock_data.Get();
+    tl.AddTask(
+        none,
+        [](MeshBlockData<Real> *dst, MeshBlockData<Real> *src) {
+          dst->Get("cons").data.DeepCopy(src->Get("cons").data);
+          dst->Get("prim").data.DeepCopy(src->Get("prim").data);
+          return TaskStatus::complete;
+        },
+        Y0.get(), base.get());
+  }
+
+  TaskRegion &region_init = ptask_coll->AddRegion(blocks.size());
+  for (int i = 0; i < blocks.size(); i++) {
+    auto &pmb = blocks[i];
+    auto &tl = region_init[i];
+    auto &base = pmb->meshblock_data.Get();
+
+    // Add extra registers. No-op for existing variables so it's safe to call every
+    // time.
+    // TODO(pgrete) this allocates all Variables, i.e., prim and cons vector, but only a
+    // subset is actually needed. Streamline to allocate only required vars.
+    pmb->meshblock_data.Add("MY0", base);
+    pmb->meshblock_data.Add("Yjm2", base);
+  }
+
+  const int num_partitions = pmesh->DefaultNumPartitions();
+  TaskRegion &region_calc_fluxes_step_init = ptask_coll->AddRegion(num_partitions);
+  for (int i = 0; i < num_partitions; i++) {
+    auto &tl = region_calc_fluxes_step_init[i];
+    auto &base = pmesh->mesh_data.GetOrAdd("base", i);
+    const auto any = parthenon::BoundaryType::any;
+    auto start_bnd = tl.AddTask(none, parthenon::StartReceiveBoundBufs<any>, base);
+    auto start_flxcor_recv =
+        tl.AddTask(none, parthenon::StartReceiveFluxCorrections, base);
+
+    // Reset flux arrays (not guaranteed to be zero)
+    auto reset_fluxes = tl.AddTask(none, ResetFluxes, base.get());
+
+    // Calculate the diffusive fluxes for Y0 (here still "base" as nothing has been
+    // updated yet) so that we can store the result as MY0 and reuse later
+    // (in every subsetp).
+    auto hydro_diff_fluxes =
+        tl.AddTask(reset_fluxes, CalcDiffFluxes, hydro_pkg.get(), base.get());
+
+    auto send_flx =
+        tl.AddTask(hydro_diff_fluxes, parthenon::LoadAndSendFluxCorrections, base);
+    auto recv_flx =
+        tl.AddTask(start_flxcor_recv, parthenon::ReceiveFluxCorrections, base);
+    auto set_flx =
+        tl.AddTask(recv_flx | hydro_diff_fluxes, parthenon::SetFluxCorrections, base);
+
+    auto &Y0 = pmesh->mesh_data.GetOrAdd("u1", i);
+    auto &MY0 = pmesh->mesh_data.GetOrAdd("MY0", i);
+    auto &Yjm2 = pmesh->mesh_data.GetOrAdd("Yjm2", i);
+
+    auto init_MY0 = tl.AddTask(set_flx, parthenon::Update::FluxDivergence<MeshData<Real>>,
+                               base.get(), MY0.get());
+
+    // Initialize Y0 and Y1 and the recursion relation starting with j = 2 needs data from
+    // the two preceeding stages.
+    auto rkl2_step_first = tl.AddTask(init_MY0, RKL2StepFirst, Y0.get(), base.get(),
+                                      Yjm2.get(), MY0.get(), s_rkl, tau);
+
+    // Update ghost cells of Y1 (as MY1 is calculated for each Y_j).
+    // Y1 stored in "base", see rkl2_step_first task.
+    // Update ghost cells (local and non local), prolongate and apply bound cond.
+    // TODO(someone) experiment with split (local/nonlocal) comms with respect to
+    // performance for various tests (static, amr, block sizes) and then decide on the
+    // best impl. Go with default call (split local/nonlocal) for now.
+    // TODO(pgrete) optimize (in parthenon) to only send subset of updated vars
+    auto bounds_exchange = parthenon::AddBoundaryExchangeTasks(
+        rkl2_step_first | start_bnd, tl, base, pmesh->multilevel);
+
+    tl.AddTask(bounds_exchange, parthenon::Update::FillDerived<MeshData<Real>>,
+               base.get());
+  }
+
+  // Compute coefficients. Meyer+2012 eq. (16)
+  Real b_j = 1. / 3.;
+  Real b_jm1 = 1. / 3.;
+  Real b_jm2 = 1. / 3.;
+  Real w1 = 4. / (static_cast<Real>(s_rkl) * static_cast<Real>(s_rkl) +
+                  static_cast<Real>(s_rkl) - 2.);
+  Real mu_j, nu_j, j, mu_tilde_j, gamma_tilde_j;
+
+  // RKL loop
+  for (int jj = 2; jj <= s_rkl; jj++) {
+    j = static_cast<Real>(jj);
+    b_j = (j * j + j - 2.0) / (2 * j * (j + 1.0));
+    mu_j = (2.0 * j - 1.0) / j * b_j / b_jm1;
+    nu_j = -(j - 1.0) / j * b_j / b_jm2;
+    mu_tilde_j = mu_j * w1;
+    gamma_tilde_j = -(1.0 - b_jm1) * mu_tilde_j; // -a_jm1*mu_tilde_j
+
+    TaskRegion &region_calc_fluxes_step_other = ptask_coll->AddRegion(num_partitions);
+    for (int i = 0; i < num_partitions; i++) {
+      auto &tl = region_calc_fluxes_step_other[i];
+      auto &base = pmesh->mesh_data.GetOrAdd("base", i);
+
+      // Only need boundaries for base as it's the only "active" container exchanging
+      // data/fluxes with neighbors. All other containers are passive (i.e., data is only
+      // used but not exchanged).
+      const auto any = parthenon::BoundaryType::any;
+      auto start_bnd = tl.AddTask(none, parthenon::StartReceiveBoundBufs<any>, base);
+      auto start_flxcor_recv =
+          tl.AddTask(none, parthenon::StartReceiveFluxCorrections, base);
+
+      // Reset flux arrays (not guaranteed to be zero)
+      auto reset_fluxes = tl.AddTask(none, ResetFluxes, base.get());
+
+      // Calculate the diffusive fluxes for Yjm1 (here u1)
+      auto hydro_diff_fluxes =
+          tl.AddTask(reset_fluxes, CalcDiffFluxes, hydro_pkg.get(), base.get());
+
+      auto send_flx =
+          tl.AddTask(hydro_diff_fluxes, parthenon::LoadAndSendFluxCorrections, base);
+      auto recv_flx =
+          tl.AddTask(start_flxcor_recv, parthenon::ReceiveFluxCorrections, base);
+      auto set_flx =
+          tl.AddTask(recv_flx | hydro_diff_fluxes, parthenon::SetFluxCorrections, base);
+
+      auto &Y0 = pmesh->mesh_data.GetOrAdd("u1", i);
+      auto &MY0 = pmesh->mesh_data.GetOrAdd("MY0", i);
+      auto &Yjm2 = pmesh->mesh_data.GetOrAdd("Yjm2", i);
+
+      auto rkl2_step_other =
+          tl.AddTask(set_flx, RKL2StepOther, Y0.get(), base.get(), Yjm2.get(), MY0.get(),
+                     mu_j, nu_j, mu_tilde_j, gamma_tilde_j, tau);
+
+      // update ghost cells of base (currently storing Yj)
+      // Update ghost cells (local and non local), prolongate and apply bound cond.
+      // TODO(someone) experiment with split (local/nonlocal) comms with respect to
+      // performance for various tests (static, amr, block sizes) and then decide on the
+      // best impl. Go with default call (split local/nonlocal) for now.
+      // TODO(pgrete) optimize (in parthenon) to only send subset of updated vars
+      auto bounds_exchange = parthenon::AddBoundaryExchangeTasks(
+          rkl2_step_other | start_bnd, tl, base, pmesh->multilevel);
+
+      tl.AddTask(bounds_exchange, parthenon::Update::FillDerived<MeshData<Real>>,
+                 base.get());
+    }
+
+    b_jm2 = b_jm1;
+    b_jm1 = b_j;
+  }
+}
+
 // See the advection.hpp declaration for a description of how this function gets called.
 TaskCollection HydroDriver::MakeTaskCollection(BlockList_t &blocks, int stage) {
   TaskCollection tc;
@@ -232,6 +537,12 @@ TaskCollection HydroDriver::MakeTaskCollection(BlockList_t &blocks, int stage) {
 
   // First add split sources before the main time integration
   if (stage == 1) {
+    // If any tasks modify the conserved variables before this place, then
+    // the STS tasks should be updated to not assume prim and cons are in sync.
+    const auto &diffint = hydro_pkg->Param<DiffInt>("diffint");
+    if (diffint == DiffInt::rkl2) {
+      AddSTSTasks(&tc, pmesh, blocks, 0.5 * tm.dt);
+    }
     TaskRegion &strang_init_region = tc.AddRegion(num_partitions);
     for (int i = 0; i < num_partitions; i++) {
       auto &tl = strang_init_region[i];
@@ -277,7 +588,11 @@ TaskCollection HydroDriver::MakeTaskCollection(BlockList_t &blocks, int stage) {
     auto &tl = single_tasklist_per_pack_region[i];
     auto &mu0 = pmesh->mesh_data.GetOrAdd("base", i);
     auto &mu1 = pmesh->mesh_data.GetOrAdd("u1", i);
-    tl.AddTask(none, parthenon::StartReceiveFluxCorrections, mu0);
+
+    const auto any = parthenon::BoundaryType::any;
+    auto start_bnd = tl.AddTask(none, parthenon::StartReceiveBoundBufs<any>, mu0);
+    auto start_flxcor_recv =
+        tl.AddTask(none, parthenon::StartReceiveFluxCorrections, mu0);
 
     const auto flux_str = (stage == 1) ? "flux_first_stage" : "flux_other_stage";
     FluxFun_t *calc_flux_fun = hydro_pkg->Param<FluxFun_t *>(flux_str);
@@ -298,9 +613,9 @@ TaskCollection HydroDriver::MakeTaskCollection(BlockList_t &blocks, int stage) {
 
     auto send_flx =
         tl.AddTask(first_order_flux_correct, parthenon::LoadAndSendFluxCorrections, mu0);
-    auto recv_flx =
-        tl.AddTask(first_order_flux_correct, parthenon::ReceiveFluxCorrections, mu0);
-    auto set_flx = tl.AddTask(recv_flx, parthenon::SetFluxCorrections, mu0);
+    auto recv_flx = tl.AddTask(start_flxcor_recv, parthenon::ReceiveFluxCorrections, mu0);
+    auto set_flx = tl.AddTask(recv_flx | first_order_flux_correct,
+                              parthenon::SetFluxCorrections, mu0);
 
     // compute the divergence of fluxes of conserved variables
     auto update = tl.AddTask(
@@ -332,29 +647,14 @@ TaskCollection HydroDriver::MakeTaskCollection(BlockList_t &blocks, int stage) {
           tl.AddTask(source_split_strang_final, AddSplitSourcesFirstOrder, mu0.get(), tm);
     }
 
-    // Update ghost cells (local and non local)
+    // Update ghost cells (local and non local), prolongate and apply bound cond.
     // TODO(someone) experiment with split (local/nonlocal) comms with respect to
     // performance for various tests (static, amr, block sizes) and then decide on the
     // best impl. Go with default call (split local/nonlocal) for now.
-    parthenon::AddBoundaryExchangeTasks(source_split_first_order, tl, mu0,
+    parthenon::AddBoundaryExchangeTasks(source_split_first_order | start_bnd, tl, mu0,
                                         pmesh->multilevel);
   }
 
-  TaskRegion &async_region_3 = tc.AddRegion(num_task_lists_executed_independently);
-  for (int i = 0; i < blocks.size(); i++) {
-    auto &tl = async_region_3[i];
-    auto &u0 = blocks[i]->meshblock_data.Get("base");
-    auto prolongBound = none;
-    // Currently taken care of by AddBoundaryExchangeTasks above.
-    // Needs to be reintroduced once we reintroduce split (local/nonlocal) communication.
-    // if (pmesh->multilevel) {
-    //  prolongBound = tl.AddTask(none, parthenon::ProlongateBoundaries, u0);
-    //}
-
-    // set physical boundaries
-    auto set_bc = tl.AddTask(prolongBound, parthenon::ApplyBoundaryConditions, u0);
-  }
-
   // Single task in single (serial) region to reset global vars used in reductions in the
   // first stage.
   if (stage == integrator->nstages && hydro_pkg->Param<bool>("calc_c_h")) {
@@ -376,10 +676,21 @@ TaskCollection HydroDriver::MakeTaskCollection(BlockList_t &blocks, int stage) {
     auto &mu0 = pmesh->mesh_data.GetOrAdd("base", i);
     auto fill_derived =
         tl.AddTask(none, parthenon::Update::FillDerived<MeshData<Real>>, mu0.get());
+  }
+  const auto &diffint = hydro_pkg->Param<DiffInt>("diffint");
+  // If any tasks modify the conserved variables before this place and after FillDerived,
+  // then the STS tasks should be updated to not assume prim and cons are in sync.
+  if (diffint == DiffInt::rkl2 && stage == integrator->nstages) {
+    AddSTSTasks(&tc, pmesh, blocks, 0.5 * tm.dt);
+  }
 
-    if (stage == integrator->nstages) {
-      auto new_dt = tl.AddTask(
-          fill_derived, parthenon::Update::EstimateTimestep<MeshData<Real>>, mu0.get());
+  if (stage == integrator->nstages) {
+    TaskRegion &tr = tc.AddRegion(num_partitions);
+    for (int i = 0; i < num_partitions; i++) {
+      auto &tl = tr[i];
+      auto &mu0 = pmesh->mesh_data.GetOrAdd("base", i);
+      auto new_dt = tl.AddTask(none, parthenon::Update::EstimateTimestep<MeshData<Real>>,
+                               mu0.get());
     }
   }
 
diff --git a/src/hydro/prolongation/custom_ops.hpp b/src/hydro/prolongation/custom_ops.hpp
index 02d2b15f..4693e755 100644
--- a/src/hydro/prolongation/custom_ops.hpp
+++ b/src/hydro/prolongation/custom_ops.hpp
@@ -81,34 +81,40 @@ struct ProlongateCellMinModMultiD {
 
     Real dx1fm = 0;
     Real dx1fp = 0;
+    [[maybe_unused]] Real gx1m = 0, gx1p = 0;
     Real gx1c = 0;
     if constexpr (INCLUDE_X1) {
       Real dx1m, dx1p;
       GetGridSpacings<1, el>(coords, coarse_coords, cib, ib, i, fi, &dx1m, &dx1p, &dx1fm,
                              &dx1fp);
-      gx1c = GradMinMod(fc, coarse(element_idx, l, m, n, k, j, i - 1),
-                        coarse(element_idx, l, m, n, k, j, i + 1), dx1m, dx1p);
+      gx1c =
+          GradMinMod(fc, coarse(element_idx, l, m, n, k, j, i - 1),
+                     coarse(element_idx, l, m, n, k, j, i + 1), dx1m, dx1p, gx1m, gx1p);
     }
 
     Real dx2fm = 0;
     Real dx2fp = 0;
+    [[maybe_unused]] Real gx2m = 0, gx2p = 0;
     Real gx2c = 0;
     if constexpr (INCLUDE_X2) {
       Real dx2m, dx2p;
       GetGridSpacings<2, el>(coords, coarse_coords, cjb, jb, j, fj, &dx2m, &dx2p, &dx2fm,
                              &dx2fp);
-      gx2c = GradMinMod(fc, coarse(element_idx, l, m, n, k, j - 1, i),
-                        coarse(element_idx, l, m, n, k, j + 1, i), dx2m, dx2p);
+      gx2c =
+          GradMinMod(fc, coarse(element_idx, l, m, n, k, j - 1, i),
+                     coarse(element_idx, l, m, n, k, j + 1, i), dx2m, dx2p, gx2m, gx2p);
     }
     Real dx3fm = 0;
     Real dx3fp = 0;
+    [[maybe_unused]] Real gx3m = 0, gx3p = 0;
     Real gx3c = 0;
     if constexpr (INCLUDE_X3) {
       Real dx3m, dx3p;
       GetGridSpacings<3, el>(coords, coarse_coords, ckb, kb, k, fk, &dx3m, &dx3p, &dx3fm,
                              &dx3fp);
-      gx3c = GradMinMod(fc, coarse(element_idx, l, m, n, k - 1, j, i),
-                        coarse(element_idx, l, m, n, k + 1, j, i), dx3m, dx3p);
+      gx3c =
+          GradMinMod(fc, coarse(element_idx, l, m, n, k - 1, j, i),
+                     coarse(element_idx, l, m, n, k + 1, j, i), dx3m, dx3p, gx3m, dx3p);
     }
 
     // Max. expected total difference. (dx#fm/p are positive by construction)
diff --git a/src/hydro/rsolvers/hydro_hlle.hpp b/src/hydro/rsolvers/hydro_hlle.hpp
index 814acc63..922a4d80 100644
--- a/src/hydro/rsolvers/hydro_hlle.hpp
+++ b/src/hydro/rsolvers/hydro_hlle.hpp
@@ -50,7 +50,7 @@ struct Riemann<Fluid::euler, RiemannSolver::hlle> {
     Real gm1 = gamma - 1.0;
     Real igm1 = 1.0 / gm1;
     parthenon::par_for_inner(member, il, iu, [&](const int i) {
-      Real wli[(NHYDRO)], wri[(NHYDRO)];
+      Real wli[(NHYDRO)], wri[(NHYDRO)], wroe[(NHYDRO)];
       Real fl[(NHYDRO)], fr[(NHYDRO)], flxi[(NHYDRO)];
       //--- Step 1.  Load L/R states into local variables
       wli[IDN] = wl(IDN, i);
@@ -65,34 +65,37 @@ struct Riemann<Fluid::euler, RiemannSolver::hlle> {
       wri[IV3] = wr(ivz, i);
       wri[IPR] = wr(IPR, i);
 
-      //--- Step 2.  Compute middle state estimates with PVRS (Toro 10.5.2)
-      Real al, ar, el, er;
-      Real cl = eos.SoundSpeed(wli);
-      Real cr = eos.SoundSpeed(wri);
-      el = wli[IPR] * igm1 +
-           0.5 * wli[IDN] * (SQR(wli[IV1]) + SQR(wli[IV2]) + SQR(wli[IV3]));
-      er = wri[IPR] * igm1 +
-           0.5 * wri[IDN] * (SQR(wri[IV1]) + SQR(wri[IV2]) + SQR(wri[IV3]));
-      Real rhoa = .5 * (wli[IDN] + wri[IDN]); // average density
-      Real ca = .5 * (cl + cr);               // average sound speed
-      Real pmid = .5 * (wli[IPR] + wri[IPR] + (wli[IV1] - wri[IV1]) * rhoa * ca);
-
-      //--- Step 3.  Compute sound speed in L,R
-      Real ql, qr;
-      ql = (pmid <= wli[IPR])
-               ? 1.0
-               : (1.0 + (gamma + 1) / std::sqrt(2 * gamma) * (pmid / wli[IPR] - 1.0));
-      qr = (pmid <= wri[IPR])
-               ? 1.0
-               : (1.0 + (gamma + 1) / std::sqrt(2 * gamma) * (pmid / wri[IPR] - 1.0));
-
-      //--- Step 4. Compute the max/min wave speeds based on L/R states
-
-      al = wli[IV1] - cl * ql;
-      ar = wri[IV1] + cr * qr;
-
-      Real bp = ar > 0.0 ? ar : 0.0;
-      Real bm = al < 0.0 ? al : 0.0;
+      //--- Step 2.  Compute Roe-averaged state
+      Real sqrtdl = std::sqrt(wli[IDN]);
+      Real sqrtdr = std::sqrt(wri[IDN]);
+      Real isdlpdr = 1.0 / (sqrtdl + sqrtdr);
+
+      wroe[IDN] = sqrtdl * sqrtdr;
+      wroe[IV1] = (sqrtdl * wli[IV1] + sqrtdr * wri[IV1]) * isdlpdr;
+      wroe[IV2] = (sqrtdl * wli[IV2] + sqrtdr * wri[IV2]) * isdlpdr;
+      wroe[IV3] = (sqrtdl * wli[IV3] + sqrtdr * wri[IV3]) * isdlpdr;
+
+      // Following Roe(1981), the enthalpy H=(E+P)/d is averaged for adiabatic flows,
+      // rather than E or P directly.  sqrtdl*hl = sqrtdl*(el+pl)/dl = (el+pl)/sqrtdl
+      const Real el = wli[IPR] * igm1 +
+                      0.5 * wli[IDN] * (SQR(wli[IV1]) + SQR(wli[IV2]) + SQR(wli[IV3]));
+      const Real er = wri[IPR] * igm1 +
+                      0.5 * wri[IDN] * (SQR(wri[IV1]) + SQR(wri[IV2]) + SQR(wri[IV3]));
+      const Real hroe = ((el + wli[IPR]) / sqrtdl + (er + wri[IPR]) / sqrtdr) * isdlpdr;
+
+      //--- Step 3.  Compute sound speed in L,R, and Roe-averaged states
+
+      const Real cl = eos.SoundSpeed(wli);
+      const Real cr = eos.SoundSpeed(wri);
+      Real q = hroe - 0.5 * (SQR(wroe[IV1]) + SQR(wroe[IV2]) + SQR(wroe[IV3]));
+      const Real a = (q < 0.0) ? 0.0 : std::sqrt(gm1 * q);
+
+      //--- Step 4. Compute the max/min wave speeds based on L/R and Roe-averaged values
+      const Real al = std::min((wroe[IV1] - a), (wli[IV1] - cl));
+      const Real ar = std::max((wroe[IV1] + a), (wri[IV1] + cr));
+
+      Real bp = ar > 0.0 ? ar : TINY_NUMBER;
+      Real bm = al < 0.0 ? al : TINY_NUMBER;
 
       //-- Step 5. Compute L/R fluxes along lines bm/bp: F_L - (S_L)U_L; F_R - (S_R)U_R
       Real vxl = wli[IV1] - bm;
diff --git a/src/main.cpp b/src/main.cpp
index 87d82021..93f984bb 100644
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -44,6 +44,7 @@ int main(int argc, char *argv[]) {
 
   // Redefine defaults
   pman.app_input->ProcessPackages = Hydro::ProcessPackages;
+  pman.app_input->PreStepMeshUserWorkInLoop = Hydro::PreStepMeshUserWorkInLoop;
   const auto problem = pman.pinput->GetOrAddString("job", "problem_id", "unset");
 
   if (problem == "linear_wave") {
diff --git a/src/main.hpp b/src/main.hpp
index bbcd0786..033118ad 100644
--- a/src/main.hpp
+++ b/src/main.hpp
@@ -35,7 +35,9 @@ enum class Reconstruction { undefined, dc, plm, ppm, wenoz, weno3, limo3 };
 enum class Integrator { undefined, rk1, rk2, vl2, rk3 };
 enum class Fluid { undefined, euler, glmmhd };
 enum class Cooling { none, tabular };
-enum class Conduction { none, spitzer, thermal_diff };
+enum class Conduction { none, isotropic, anisotropic };
+enum class ConductionCoeff { none, fixed, spitzer };
+enum class DiffInt { none, unsplit, rkl2 };
 
 enum class Hst { idx, ekin, emag, divb };
 
diff --git a/src/pgen/cloud.cpp b/src/pgen/cloud.cpp
index bc9fed25..db53990a 100644
--- a/src/pgen/cloud.cpp
+++ b/src/pgen/cloud.cpp
@@ -32,6 +32,7 @@ using namespace parthenon::driver::prelude;
 Real rho_wind, mom_wind, rhoe_wind, r_cloud, rho_cloud;
 Real Bx = 0.0;
 Real By = 0.0;
+Real Bz = 0.0;
 
 //========================================================================================
 //! \fn void InitUserMeshData(Mesh *mesh, ParameterInput *pin)
@@ -77,9 +78,13 @@ void InitUserMeshData(Mesh *mesh, ParameterInput *pin) {
       By = std::sqrt(2.0 * pressure / plasma_beta);
     } else if (mag_field_angle_str == "transverse") {
       Bx = std::sqrt(2.0 * pressure / plasma_beta);
+    } else if (mag_field_angle_str == "oblique") {
+      const auto B = std::sqrt(2.0 * pressure / plasma_beta);
+      Bx = B / std::sqrt(5.0);
+      Bz = 2 * Bx;
     } else {
       PARTHENON_FAIL("Unsupported problem/cloud/mag_field_angle. Please use either "
-                     "'aligned' or 'transverse'.");
+                     "'aligned', 'transverse', or 'oblique'.");
     }
   }
 
@@ -152,7 +157,7 @@ void ProblemGenerator(MeshBlock *pmb, ParameterInput *pin) {
   const auto nscalars = hydro_pkg->Param<int>("nscalars");
 
   const bool mhd_enabled = hydro_pkg->Param<Fluid>("fluid") == Fluid::glmmhd;
-  if (((Bx != 0.0) || (By != 0.0)) && !mhd_enabled) {
+  if (((Bx != 0.0) || (By != 0.0) || (Bz != 0.0)) && !mhd_enabled) {
     PARTHENON_FAIL("Requested to initialize magnetic fields by `cloud/plasma_beta > 0`, "
                    "but `hydro/fluid` is not supporting MHD.");
   }
@@ -195,7 +200,8 @@ void ProblemGenerator(MeshBlock *pmb, ParameterInput *pin) {
         if (mhd_enabled) {
           u(IB1, k, j, i) = Bx;
           u(IB2, k, j, i) = By;
-          u(IEN, k, j, i) += 0.5 * (Bx * Bx + By * By);
+          u(IB3, k, j, i) = Bz;
+          u(IEN, k, j, i) += 0.5 * (Bx * Bx + By * By + Bz * Bz);
         }
 
         // Init passive scalars
@@ -213,7 +219,7 @@ void ProblemGenerator(MeshBlock *pmb, ParameterInput *pin) {
 }
 
 void InflowWindX2(std::shared_ptr<MeshBlockData<Real>> &mbd, bool coarse) {
-  std::shared_ptr<MeshBlock> pmb = mbd->GetBlockPointer();
+  auto pmb = mbd->GetBlockPointer();
   auto cons = mbd->PackVariables(std::vector<std::string>{"cons"}, coarse);
   // TODO(pgrete) Add par_for_bndry to Parthenon without requiring nb
   const auto nb = IndexRange{0, 0};
@@ -222,6 +228,7 @@ void InflowWindX2(std::shared_ptr<MeshBlockData<Real>> &mbd, bool coarse) {
   const auto rhoe_wind_ = rhoe_wind;
   const auto Bx_ = Bx;
   const auto By_ = By;
+  const auto Bz_ = Bz;
   pmb->par_for_bndry(
       "InflowWindX2", nb, IndexDomain::inner_x2, parthenon::TopologicalElement::CC,
       coarse, KOKKOS_LAMBDA(const int, const int &k, const int &j, const int &i) {
@@ -236,6 +243,10 @@ void InflowWindX2(std::shared_ptr<MeshBlockData<Real>> &mbd, bool coarse) {
           cons(IB2, k, j, i) = By_;
           cons(IEN, k, j, i) += 0.5 * By_ * By_;
         }
+        if (Bz_ != 0.0) {
+          cons(IB3, k, j, i) = Bz_;
+          cons(IEN, k, j, i) += 0.5 * Bz_ * Bz_;
+        }
       });
 }
 
diff --git a/src/pgen/cluster.cpp b/src/pgen/cluster.cpp
index a2b10f7e..1e73d8b5 100644
--- a/src/pgen/cluster.cpp
+++ b/src/pgen/cluster.cpp
@@ -25,6 +25,7 @@
 #include <string>    // c_str()
 
 // Parthenon headers
+#include "Kokkos_MathematicalFunctions.hpp"
 #include "kokkos_abstraction.hpp"
 #include "mesh/domain.hpp"
 #include "mesh/mesh.hpp"
@@ -347,6 +348,11 @@ void ProblemInitPackageData(ParameterInput *pin, parthenon::StateDescriptor *hyd
   hydro_pkg->AddField("mach_sonic", m);
   // temperature
   hydro_pkg->AddField("temperature", m);
+  // radial velocity
+  hydro_pkg->AddField("v_r", m);
+
+  // spherical theta
+  hydro_pkg->AddField("theta_sph", m);
 
   if (hydro_pkg->Param<Cooling>("enable_cooling") == Cooling::tabular) {
     // cooling time
@@ -359,6 +365,9 @@ void ProblemInitPackageData(ParameterInput *pin, parthenon::StateDescriptor *hyd
 
     // plasma beta
     hydro_pkg->AddField("plasma_beta", m);
+
+    // plasma beta
+    hydro_pkg->AddField("B_mag", m);
   }
 
   /************************************************************
@@ -563,7 +572,7 @@ void ProblemGenerator(Mesh *pmesh, ParameterInput *pin, MeshData<Real> *md) {
        ************************************************************/
       const auto &magnetic_tower = hydro_pkg->Param<MagneticTower>("magnetic_tower");
 
-      magnetic_tower.AddInitialFieldToPotential(pmb.get(), a_kb, a_jb, a_ib, A);
+      magnetic_tower.AddInitialFieldToPotential(pmb, a_kb, a_jb, a_ib, A);
 
       /************************************************************
        * Add dipole magnetic field to the magnetic potential
@@ -666,7 +675,7 @@ void ProblemGenerator(Mesh *pmesh, ParameterInput *pin, MeshData<Real> *md) {
     // Init phases on all blocks
     for (int b = 0; b < md->NumBlocks(); b++) {
       auto pmb = md->GetBlockData(b)->GetBlockPointer();
-      few_modes_ft.SetPhases(pmb.get(), pin);
+      few_modes_ft.SetPhases(pmb, pin);
     }
     // As for t_corr in few_modes_ft, the choice for dt is
     // in principle arbitrary because the inital v_hat is 0 and the v_hat_new will contain
@@ -705,9 +714,9 @@ void ProblemGenerator(Mesh *pmesh, ParameterInput *pin, MeshData<Real> *md) {
                                       MPI_SUM, MPI_COMM_WORLD));
 #endif // MPI_PARALLEL
 
-    const auto Lx = pmesh->mesh_size.x1max - pmesh->mesh_size.x1min;
-    const auto Ly = pmesh->mesh_size.x2max - pmesh->mesh_size.x2min;
-    const auto Lz = pmesh->mesh_size.x3max - pmesh->mesh_size.x3min;
+    const auto Lx = pmesh->mesh_size.xmax(X1DIR) - pmesh->mesh_size.xmin(X1DIR);
+    const auto Ly = pmesh->mesh_size.xmax(X2DIR) - pmesh->mesh_size.xmin(X2DIR);
+    const auto Lz = pmesh->mesh_size.xmax(X3DIR) - pmesh->mesh_size.xmin(X3DIR);
     auto v_norm = std::sqrt(v2_sum / (Lx * Ly * Lz) / (SQR(sigma_v)));
 
     pmb->par_for(
@@ -734,7 +743,7 @@ void ProblemGenerator(Mesh *pmesh, ParameterInput *pin, MeshData<Real> *md) {
     // Init phases on all blocks
     for (int b = 0; b < md->NumBlocks(); b++) {
       auto pmb = md->GetBlockData(b)->GetBlockPointer();
-      few_modes_ft.SetPhases(pmb.get(), pin);
+      few_modes_ft.SetPhases(pmb, pin);
     }
     // As for t_corr in few_modes_ft, the choice for dt is
     // in principle arbitrary because the inital b_hat is 0 and the b_hat_new will contain
@@ -783,9 +792,9 @@ void ProblemGenerator(Mesh *pmesh, ParameterInput *pin, MeshData<Real> *md) {
                                       MPI_SUM, MPI_COMM_WORLD));
 #endif // MPI_PARALLEL
 
-    const auto Lx = pmesh->mesh_size.x1max - pmesh->mesh_size.x1min;
-    const auto Ly = pmesh->mesh_size.x2max - pmesh->mesh_size.x2min;
-    const auto Lz = pmesh->mesh_size.x3max - pmesh->mesh_size.x3min;
+    const auto Lx = pmesh->mesh_size.xmax(X1DIR) - pmesh->mesh_size.xmin(X1DIR);
+    const auto Ly = pmesh->mesh_size.xmax(X2DIR) - pmesh->mesh_size.xmin(X2DIR);
+    const auto Lz = pmesh->mesh_size.xmax(X3DIR) - pmesh->mesh_size.xmin(X3DIR);
     auto b_norm = std::sqrt(b2_sum / (Lx * Ly * Lz) / (SQR(sigma_b)));
 
     pmb->par_for(
@@ -818,6 +827,8 @@ void UserWorkBeforeOutput(MeshBlock *pmb, ParameterInput *pin) {
   auto &entropy = data->Get("entropy").data;
   auto &mach_sonic = data->Get("mach_sonic").data;
   auto &temperature = data->Get("temperature").data;
+  auto &v_r = data->Get("v_r").data;
+  auto &theta_sph = data->Get("theta_sph").data;
 
   // for computing temperature from primitives
   auto units = pkg->Param<Units>("units");
@@ -844,8 +855,12 @@ void UserWorkBeforeOutput(MeshBlock *pmb, ParameterInput *pin) {
         const Real x = coords.Xc<1>(i);
         const Real y = coords.Xc<2>(j);
         const Real z = coords.Xc<3>(k);
-        const Real r2 = SQR(x) + SQR(y) + SQR(z);
-        log10_radius(k, j, i) = 0.5 * std::log10(r2);
+        const Real r = std::sqrt(SQR(x) + SQR(y) + SQR(z));
+        log10_radius(k, j, i) = std::log10(r);
+
+        v_r(k, j, i) = ((v1 * x) + (v2 * y) + (v3 * z)) / r;
+
+        theta_sph(k, j, i) = std::acos(z / r);
 
         // compute entropy
         const Real K = P / std::pow(rho / mbar, gam);
@@ -884,6 +899,7 @@ void UserWorkBeforeOutput(MeshBlock *pmb, ParameterInput *pin) {
   if (pkg->Param<Fluid>("fluid") == Fluid::glmmhd) {
     auto &plasma_beta = data->Get("plasma_beta").data;
     auto &mach_alfven = data->Get("mach_alfven").data;
+    auto &b_mag = data->Get("B_mag").data;
 
     pmb->par_for(
         "Cluster::UserWorkBeforeOutput::MHD", kb.s, kb.e, jb.s, jb.e, ib.s, ib.e,
@@ -896,6 +912,8 @@ void UserWorkBeforeOutput(MeshBlock *pmb, ParameterInput *pin) {
           const Real Bz = prim(IB3, k, j, i);
           const Real B2 = (SQR(Bx) + SQR(By) + SQR(Bz));
 
+          b_mag(k, j, i) = Kokkos::sqrt(B2);
+
           // compute Alfven mach number
           const Real v_A = std::sqrt(B2 / rho);
           const Real c_s = std::sqrt(gam * P / rho); // ideal gas EOS
diff --git a/src/pgen/cluster/magnetic_tower.hpp b/src/pgen/cluster/magnetic_tower.hpp
index ebcc1d44..7199ba51 100644
--- a/src/pgen/cluster/magnetic_tower.hpp
+++ b/src/pgen/cluster/magnetic_tower.hpp
@@ -67,7 +67,7 @@ class MagneticTowerObj {
       // Compute the potential in jet cylindrical coordinates
       a_r = 0.0;
       a_theta = 0.0;
-      if (fabs(h) >= 0.001 && fabs(h) <= offset_ + thickness_) {
+      if (fabs(h) >= offset_ && fabs(h) <= offset_ + thickness_) {
         a_h = field_ * l_scale_ * exp_r2_h2;
       } else {
         a_h = 0.0;
@@ -110,7 +110,7 @@ class MagneticTowerObj {
       const parthenon::Real exp_r2_h2 = exp(-pow(r / l_scale_, 2));
       // Compute the field in jet cylindrical coordinates
       b_r = 0.0;
-      if (fabs(h) >= 0.001 && fabs(h) <= offset_ + thickness_) {
+      if (fabs(h) >= offset_ && fabs(h) <= offset_ + thickness_) {
         b_theta = 2.0 * field_ * r / l_scale_ * exp_r2_h2;
       } else {
         b_theta = 0.0;
diff --git a/src/pgen/cpaw.cpp b/src/pgen/cpaw.cpp
index c269c378..0dcefacb 100644
--- a/src/pgen/cpaw.cpp
+++ b/src/pgen/cpaw.cpp
@@ -34,6 +34,8 @@
 
 namespace cpaw {
 using namespace parthenon::driver::prelude;
+using namespace parthenon::package::prelude;
+
 // Parameters which define initial solution -- made global so that they can be shared
 // with functions A1,2,3 which compute vector potentials
 Real den, pres, gm1, b_par, b_perp, v_perp, v_par;
@@ -213,8 +215,8 @@ void UserWorkAfterLoop(Mesh *mesh, ParameterInput *pin, parthenon::SimTime &tm)
   }
 
   // write errors
-  std::fprintf(pfile, "%d  %d", mesh->mesh_size.nx1, mesh->mesh_size.nx2);
-  std::fprintf(pfile, "  %d  %d  %e", mesh->mesh_size.nx3, tm.ncycle, rms_err);
+  std::fprintf(pfile, "%d  %d", mesh->mesh_size.nx(X1DIR), mesh->mesh_size.nx(X2DIR));
+  std::fprintf(pfile, "  %d  %d  %e", mesh->mesh_size.nx(X3DIR), tm.ncycle, rms_err);
   std::fprintf(pfile, "  %e  %e  %e  %e", err[IDN], err[IM1], err[IM2], err[IM3]);
   std::fprintf(pfile, "  %e", err[IEN]);
   std::fprintf(pfile, "  %e  %e  %e", err[IB1], err[IB2], err[IB3]);
diff --git a/src/pgen/diffusion.cpp b/src/pgen/diffusion.cpp
index 85ac913a..24ae60f2 100644
--- a/src/pgen/diffusion.cpp
+++ b/src/pgen/diffusion.cpp
@@ -1,6 +1,6 @@
 
 // AthenaPK - a performance portable block structured AMR MHD code
-// Copyright (c) 2021, Athena Parthenon Collaboration. All rights reserved.
+// Copyright (c) 2021-2023, Athena Parthenon Collaboration. All rights reserved.
 // Licensed under the 3-Clause License (the "LICENSE")
 
 // Parthenon headers
@@ -27,7 +27,15 @@ void ProblemGenerator(MeshBlock *pmb, ParameterInput *pin) {
   const auto By = pin->GetOrAddReal("problem/diffusion", "By", 0.0);
 
   const auto iprob = pin->GetInteger("problem/diffusion", "iprob");
-  const auto sigma = pin->GetOrAddReal("problem/diffusion", "sigma", 0.1);
+  Real t0 = 0.5;
+  Real diff_coeff = 0.0;
+  Real amp = 1e-6;
+  // Get parameters for Gaussian profile
+  if (iprob == 10) {
+    diff_coeff = pin->GetReal("diffusion", "thermal_diff_coeff_code");
+    t0 = pin->GetOrAddReal("problem/diffusion", "t0", t0);
+    amp = pin->GetOrAddReal("problem/diffusion", "amp", amp);
+  }
 
   auto &coords = pmb->coords;
 
@@ -64,7 +72,15 @@ void ProblemGenerator(MeshBlock *pmb, ParameterInput *pin) {
         } else if (iprob == 10) {
           u(IB1, k, j, i) = Bx;
           u(IB2, k, j, i) = By;
-          eint = 1 + std::exp(-SQR(coords.Xc<1>(i) / sigma) / 2.0);
+          // Adjust for anisotropic thermal conduction.
+          // If there's no conduction for the setup (because the field is perp.)
+          // treat as 1 (also in analysis) to prevent division by 0.
+          // Note, this is very constructed and needs to be updated/adjusted for isotropic
+          // conduction, other directions, and Bfield configs with |B| != 1
+          Real eff_diff_coeff = Bx == 0.0 ? diff_coeff : diff_coeff * Bx * Bx;
+          eint = 1 + amp / std::sqrt(4. * M_PI * eff_diff_coeff * t0) *
+                         std::exp(-(std::pow(coords.Xc<1>(i), 2.)) /
+                                  (4. * eff_diff_coeff * t0));
           // Ring diffusion in x1-x2 plane
         } else if (iprob == 20) {
           const auto x = coords.Xc<1>(i);
diff --git a/src/pgen/field_loop.cpp b/src/pgen/field_loop.cpp
index 66940d56..45eeb60e 100644
--- a/src/pgen/field_loop.cpp
+++ b/src/pgen/field_loop.cpp
@@ -135,13 +135,16 @@ void ProblemGenerator(MeshBlock *pmb, ParameterInput *pin) {
   int iprob = pin->GetInteger("problem/field_loop", "iprob");
   Real ang_2, cos_a2(0.0), sin_a2(0.0), lambda(0.0);
 
-  Real x1size = pmb->pmy_mesh->mesh_size.x1max - pmb->pmy_mesh->mesh_size.x1min;
-  Real x2size = pmb->pmy_mesh->mesh_size.x2max - pmb->pmy_mesh->mesh_size.x2min;
+  Real x1size =
+      pmb->pmy_mesh->mesh_size.xmax(X1DIR) - pmb->pmy_mesh->mesh_size.xmin(X1DIR);
+  Real x2size =
+      pmb->pmy_mesh->mesh_size.xmax(X2DIR) - pmb->pmy_mesh->mesh_size.xmin(X2DIR);
 
   const bool two_d = pmb->pmy_mesh->ndim < 3;
   // for 2D sim set x3size to zero so that v_z is 0 below
   Real x3size =
-      two_d ? 0 : pmb->pmy_mesh->mesh_size.x3max - pmb->pmy_mesh->mesh_size.x3min;
+      two_d ? 0
+            : pmb->pmy_mesh->mesh_size.xmax(X3DIR) - pmb->pmy_mesh->mesh_size.xmin(X3DIR);
 
   // For (iprob=4) -- rotated cylinder in 3D -- set up rotation angle and wavelength
   if (iprob == 4) {
diff --git a/src/pgen/linear_wave.cpp b/src/pgen/linear_wave.cpp
index bc504b06..5480875e 100644
--- a/src/pgen/linear_wave.cpp
+++ b/src/pgen/linear_wave.cpp
@@ -32,6 +32,7 @@
 
 namespace linear_wave {
 using namespace parthenon::driver::prelude;
+using namespace parthenon::package::prelude;
 
 // TODO(pgrete) temp fix to address removal in Parthenon. Update when merging with MHD
 constexpr int NWAVE = 5;
@@ -280,9 +281,9 @@ void UserWorkAfterLoop(Mesh *mesh, ParameterInput *pin, parthenon::SimTime &tm)
   if (parthenon::Globals::my_rank == 0) {
     // normalize errors by number of cells
     const auto mesh_size = mesh->mesh_size;
-    const auto vol = (mesh_size.x1max - mesh_size.x1min) *
-                     (mesh_size.x2max - mesh_size.x2min) *
-                     (mesh_size.x3max - mesh_size.x3min);
+    const auto vol = (mesh_size.xmax(X1DIR) - mesh_size.xmin(X1DIR)) *
+                     (mesh_size.xmax(X2DIR) - mesh_size.xmin(X2DIR)) *
+                     (mesh_size.xmax(X3DIR) - mesh_size.xmin(X3DIR));
     for (int i = 0; i < (NHYDRO + NFIELD); ++i)
       l1_err[i] = l1_err[i] / vol;
     // compute rms error
@@ -320,8 +321,8 @@ void UserWorkAfterLoop(Mesh *mesh, ParameterInput *pin, parthenon::SimTime &tm)
     }
 
     // write errors
-    std::fprintf(pfile, "%d  %d", mesh_size.nx1, mesh_size.nx2);
-    std::fprintf(pfile, "  %d  %d", mesh_size.nx3, tm.ncycle);
+    std::fprintf(pfile, "%d  %d", mesh_size.nx(X1DIR), mesh_size.nx(X2DIR));
+    std::fprintf(pfile, "  %d  %d", mesh_size.nx(X2DIR), tm.ncycle);
     std::fprintf(pfile, "  %e  %e", rms_err, l1_err[IDN]);
     std::fprintf(pfile, "  %e  %e  %e", l1_err[IM1], l1_err[IM2], l1_err[IM3]);
     std::fprintf(pfile, "  %e", l1_err[IEN]);
diff --git a/src/pgen/linear_wave_mhd.cpp b/src/pgen/linear_wave_mhd.cpp
index 7f1b549b..67affbe7 100644
--- a/src/pgen/linear_wave_mhd.cpp
+++ b/src/pgen/linear_wave_mhd.cpp
@@ -32,6 +32,7 @@
 
 namespace linear_wave_mhd {
 using namespace parthenon::driver::prelude;
+using namespace parthenon::package::prelude;
 
 constexpr int NMHDWAVE = 7;
 // Parameters which define initial solution -- made global so that they can be shared
@@ -300,9 +301,9 @@ void UserWorkAfterLoop(Mesh *mesh, ParameterInput *pin, parthenon::SimTime &tm)
   if (parthenon::Globals::my_rank == 0) {
     // normalize errors by number of cells
     const auto mesh_size = mesh->mesh_size;
-    const auto vol = (mesh_size.x1max - mesh_size.x1min) *
-                     (mesh_size.x2max - mesh_size.x2min) *
-                     (mesh_size.x3max - mesh_size.x3min);
+    const auto vol = (mesh_size.xmax(X1DIR) - mesh_size.xmin(X1DIR)) *
+                     (mesh_size.xmax(X2DIR) - mesh_size.xmin(X2DIR)) *
+                     (mesh_size.xmax(X3DIR) - mesh_size.xmin(X3DIR));
     for (int i = 0; i < (NGLMMHD); ++i)
       l1_err[i] = l1_err[i] / vol;
     // compute rms error
@@ -342,8 +343,8 @@ void UserWorkAfterLoop(Mesh *mesh, ParameterInput *pin, parthenon::SimTime &tm)
     }
 
     // write errors
-    std::fprintf(pfile, "%d  %d", mesh_size.nx1, mesh_size.nx2);
-    std::fprintf(pfile, "  %d  %d", mesh_size.nx3, tm.ncycle);
+    std::fprintf(pfile, "%d  %d", mesh_size.nx(X1DIR), mesh_size.nx(X2DIR));
+    std::fprintf(pfile, "  %d  %d", mesh_size.nx(X3DIR), tm.ncycle);
     std::fprintf(pfile, "  %e  %e", rms_err, l1_err[IDN]);
     std::fprintf(pfile, "  %e  %e  %e", l1_err[IM1], l1_err[IM2], l1_err[IM3]);
     std::fprintf(pfile, "  %e", l1_err[IEN]);
diff --git a/src/pgen/turbulence.cpp b/src/pgen/turbulence.cpp
index 58882f52..2c9ff9a7 100644
--- a/src/pgen/turbulence.cpp
+++ b/src/pgen/turbulence.cpp
@@ -217,10 +217,10 @@ void ProblemGenerator(Mesh *pmesh, ParameterInput *pin, MeshData<Real> *md) {
   const auto gm1 = pin->GetReal("hydro", "gamma") - 1.0;
   const auto p0 = pin->GetReal("problem/turbulence", "p0");
   const auto rho0 = pin->GetReal("problem/turbulence", "rho0");
-  const auto x3min = pmesh->mesh_size.x3min;
-  const auto Lx = pmesh->mesh_size.x1max - pmesh->mesh_size.x1min;
-  const auto Ly = pmesh->mesh_size.x2max - pmesh->mesh_size.x2min;
-  const auto Lz = pmesh->mesh_size.x3max - pmesh->mesh_size.x3min;
+  const auto x3min = pmesh->mesh_size.xmin(X3DIR);
+  const auto Lx = pmesh->mesh_size.xmax(X1DIR) - pmesh->mesh_size.xmin(X1DIR);
+  const auto Ly = pmesh->mesh_size.xmax(X2DIR) - pmesh->mesh_size.xmin(X2DIR);
+  const auto Lz = pmesh->mesh_size.xmax(X3DIR) - pmesh->mesh_size.xmin(X3DIR);
   const auto kz = 2.0 * M_PI / Lz;
 
   // already pack data here to get easy access to coords in kernels
@@ -402,9 +402,12 @@ void Perturb(MeshData<Real> *md, const Real dt) {
                                     MPI_SUM, MPI_COMM_WORLD));
 #endif // MPI_PARALLEL
 
-  const auto Lx = pmb->pmy_mesh->mesh_size.x1max - pmb->pmy_mesh->mesh_size.x1min;
-  const auto Ly = pmb->pmy_mesh->mesh_size.x2max - pmb->pmy_mesh->mesh_size.x2min;
-  const auto Lz = pmb->pmy_mesh->mesh_size.x3max - pmb->pmy_mesh->mesh_size.x3min;
+  const auto Lx =
+      pmb->pmy_mesh->mesh_size.xmax(X1DIR) - pmb->pmy_mesh->mesh_size.xmin(X1DIR);
+  const auto Ly =
+      pmb->pmy_mesh->mesh_size.xmax(X2DIR) - pmb->pmy_mesh->mesh_size.xmin(X2DIR);
+  const auto Lz =
+      pmb->pmy_mesh->mesh_size.xmax(X3DIR) - pmb->pmy_mesh->mesh_size.xmin(X3DIR);
   const auto accel_rms = hydro_pkg->Param<Real>("turbulence/accel_rms");
   auto norm = accel_rms / std::sqrt(sums[0] / (Lx * Ly * Lz));
 
diff --git a/src/units.hpp b/src/units.hpp
index a089e632..3290e87e 100644
--- a/src/units.hpp
+++ b/src/units.hpp
@@ -30,6 +30,7 @@ class Units {
   static constexpr parthenon::Real dyne_cm2_cgs = 1.0;                     // dyne/cm^2
   static constexpr parthenon::Real msun_cgs = 1.98841586e+33;              // g
   static constexpr parthenon::Real atomic_mass_unit_cgs = 1.660538921e-24; // g
+  static constexpr parthenon::Real electron_mass_cgs = 9.1093837015e-28;   // g
   static constexpr parthenon::Real g_cm3_cgs = 1.0;                        // gcm**3
   static constexpr parthenon::Real erg_cgs = 1;                            // erg
   static constexpr parthenon::Real gauss_cgs = 1;                          // gauss
@@ -124,6 +125,7 @@ class Units {
   parthenon::Real atomic_mass_unit() const {
     return atomic_mass_unit_cgs / code_mass_cgs();
   }
+  parthenon::Real electron_mass() const { return electron_mass_cgs / code_mass_cgs(); }
   parthenon::Real mh() const { return mh_cgs / code_mass_cgs(); }
   parthenon::Real erg() const { return erg_cgs / code_energy_cgs(); }
   parthenon::Real gauss() const { return gauss_cgs / code_magnetic_cgs(); }
diff --git a/src/utils/few_modes_ft.cpp b/src/utils/few_modes_ft.cpp
index f477b211..e42edde2 100644
--- a/src/utils/few_modes_ft.cpp
+++ b/src/utils/few_modes_ft.cpp
@@ -106,18 +106,18 @@ void FewModesFT::SetPhases(MeshBlock *pmb, ParameterInput *pin) {
                            "Few modes FT currently needs parthenon/mesh/pack_size=-1 "
                            "to work because of global reductions.")
 
-  auto Lx1 = pm->mesh_size.x1max - pm->mesh_size.x1min;
-  auto Lx2 = pm->mesh_size.x2max - pm->mesh_size.x2min;
-  auto Lx3 = pm->mesh_size.x3max - pm->mesh_size.x3min;
+  const auto Lx1 = pm->mesh_size.xmax(X1DIR) - pm->mesh_size.xmin(X1DIR);
+  const auto Lx2 = pm->mesh_size.xmax(X2DIR) - pm->mesh_size.xmin(X2DIR);
+  const auto Lx3 = pm->mesh_size.xmax(X3DIR) - pm->mesh_size.xmin(X3DIR);
 
   // Adjust (logical) grid size at levels other than the root level.
   // This is required for simulation with mesh refinement so that the phases calculated
   // below take the logical grid size into account. For example, the local phases at level
   // 1 should be calculated assuming a grid that is twice as large as the root grid.
   const auto root_level = pm->GetRootLevel();
-  auto gnx1 = pm->mesh_size.nx1 * std::pow(2, pmb->loc.level() - root_level);
-  auto gnx2 = pm->mesh_size.nx2 * std::pow(2, pmb->loc.level() - root_level);
-  auto gnx3 = pm->mesh_size.nx3 * std::pow(2, pmb->loc.level() - root_level);
+  auto gnx1 = pm->mesh_size.nx(X1DIR) * std::pow(2, pmb->loc.level() - root_level);
+  auto gnx2 = pm->mesh_size.nx(X2DIR) * std::pow(2, pmb->loc.level() - root_level);
+  auto gnx3 = pm->mesh_size.nx(X3DIR) * std::pow(2, pmb->loc.level() - root_level);
 
   // Restriction should also be easily fixed, just need to double check transforms and
   // volume weighting everywhere
@@ -126,13 +126,13 @@ void FewModesFT::SetPhases(MeshBlock *pmb, ParameterInput *pin) {
                            "FMFT has only been tested with cubic meshes and constant "
                            "dx/dy/dz. Remove this warning at your own risk.")
 
-  const auto nx1 = pmb->block_size.nx1;
-  const auto nx2 = pmb->block_size.nx2;
-  const auto nx3 = pmb->block_size.nx3;
+  const auto nx1 = pmb->block_size.nx(X1DIR);
+  const auto nx2 = pmb->block_size.nx(X2DIR);
+  const auto nx3 = pmb->block_size.nx(X3DIR);
 
-  const auto gis = pmb->loc.lx1() * pmb->block_size.nx1;
-  const auto gjs = pmb->loc.lx2() * pmb->block_size.nx2;
-  const auto gks = pmb->loc.lx3() * pmb->block_size.nx3;
+  const auto gis = pmb->loc.lx1() * pmb->block_size.nx(X1DIR);
+  const auto gjs = pmb->loc.lx2() * pmb->block_size.nx(X2DIR);
+  const auto gks = pmb->loc.lx3() * pmb->block_size.nx(X3DIR);
 
   // make local ref to capure in lambda
   const auto num_modes = num_modes_;
diff --git a/tst/regression/CMakeLists.txt b/tst/regression/CMakeLists.txt
index 3461a817..c43b4de5 100644
--- a/tst/regression/CMakeLists.txt
+++ b/tst/regression/CMakeLists.txt
@@ -45,6 +45,9 @@ setup_test_both("aniso_therm_cond_ring_conv" "--driver ${PROJECT_BINARY_DIR}/bin
   
 setup_test_both("aniso_therm_cond_ring_multid" "--driver ${PROJECT_BINARY_DIR}/bin/athenaPK \
   --driver_input ${PROJECT_SOURCE_DIR}/inputs/diffusion.in --num_steps 4" "convergence")
+ 
+setup_test_both("aniso_therm_cond_gauss_conv" "--driver ${PROJECT_BINARY_DIR}/bin/athenaPK \
+  --driver_input ${PROJECT_SOURCE_DIR}/inputs/diffusion.in --num_steps 24" "convergence")
 
 setup_test_both("field_loop" "--driver ${PROJECT_BINARY_DIR}/bin/athenaPK \
   --driver_input ${PROJECT_SOURCE_DIR}/inputs/field_loop.in --num_steps 12" "convergence")
diff --git a/tst/regression/test_suites/aniso_therm_cond_gauss_conv/__init__.py b/tst/regression/test_suites/aniso_therm_cond_gauss_conv/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/tst/regression/test_suites/aniso_therm_cond_gauss_conv/aniso_therm_cond_gauss_conv.py b/tst/regression/test_suites/aniso_therm_cond_gauss_conv/aniso_therm_cond_gauss_conv.py
new file mode 100644
index 00000000..9f002723
--- /dev/null
+++ b/tst/regression/test_suites/aniso_therm_cond_gauss_conv/aniso_therm_cond_gauss_conv.py
@@ -0,0 +1,221 @@
+# ========================================================================================
+# AthenaPK - a performance portable block structured AMR MHD code
+# Copyright (c) 2020-2021, Athena Parthenon Collaboration. All rights reserved.
+# Licensed under the 3-clause BSD License, see LICENSE file for details
+# ========================================================================================
+# (C) (or copyright) 2020. Triad National Security, LLC. All rights reserved.
+#
+# This program was produced under U.S. Government contract 89233218CNA000001 for Los
+# Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC
+# for the U.S. Department of Energy/National Nuclear Security Administration. All rights
+# in the program are reserved by Triad National Security, LLC, and the U.S. Department
+# of Energy/National Nuclear Security Administration. The Government is granted for
+# itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide
+# license in this material to reproduce, prepare derivative works, distribute copies to
+# the public, perform publicly and display publicly, and to permit others to do so.
+# ========================================================================================
+
+# Modules
+import math
+import numpy as np
+import matplotlib
+
+matplotlib.use("agg")
+import matplotlib.pylab as plt
+import sys
+import os
+import itertools
+import utils.test_case
+from scipy.optimize import curve_fit
+
+# To prevent littering up imported folders with .pyc files or __pycache_ folder
+sys.dont_write_bytecode = True
+
+int_cfgs = ["unsplit", "rkl2"]
+res_cfgs = [128, 256, 512]
+field_cfgs = ["none", "aligned", "angle", "perp"]
+tlim = 2.0
+
+all_cfgs = list(itertools.product(res_cfgs, field_cfgs, int_cfgs))
+
+
+def get_outname(all_cfg):
+    res, field_cfg, int_cfg = all_cfg
+    return f"{res}_{field_cfg}_{int_cfg}"
+
+
+def get_B(field_cfg):
+    if field_cfg == "aligned":
+        Bx = 1.0
+        By = 0.0
+    elif field_cfg == "perp":
+        Bx = 0.0
+        By = 1.0
+    elif field_cfg == "angle":
+        Bx = 1 / np.sqrt(2)
+        By = 1 / np.sqrt(2)
+    # isotropic case
+    elif field_cfg == "none":
+        Bx = 0.0
+        By = 0.0
+    else:
+        raise "Unknown field_cfg: %s" % field_cfg
+
+    return Bx, By
+
+
+class TestCase(utils.test_case.TestCaseAbs):
+    def Prepare(self, parameters, step):
+        assert parameters.num_ranks <= 4, "Use <= 4 ranks for diffusion test."
+
+        res, field_cfg, int_cfg = all_cfgs[step - 1]
+
+        Bx, By = get_B(field_cfg)
+
+        outname = get_outname(all_cfgs[step - 1])
+
+        if field_cfg == "none":
+            conduction = "isotropic"
+        else:
+            conduction = "anisotropic"
+
+        parameters.driver_cmd_line_args = [
+            "parthenon/mesh/nx1=%d" % res,
+            "parthenon/meshblock/nx1=64",
+            "parthenon/mesh/x1min=-6.0",
+            "parthenon/mesh/x1max=6.0",
+            "parthenon/mesh/nx2=32",
+            "parthenon/meshblock/nx2=32",
+            "parthenon/mesh/x2min=-1.0",
+            "parthenon/mesh/x2max=1.0",
+            "parthenon/mesh/nx3=1",
+            "parthenon/meshblock/nx3=1",
+            "parthenon/time/integrator=%s"
+            % ("rk2" if (int_cfg == "unsplit") else "rk1"),
+            "problem/diffusion/Bx=%f" % Bx,
+            "problem/diffusion/By=%f" % By,
+            "problem/diffusion/iprob=10",
+            "parthenon/output0/id=%s" % outname,
+            "hydro/gamma=2.0",
+            "parthenon/time/tlim=%f" % tlim,
+            "diffusion/conduction=%s" % conduction,
+            "diffusion/thermal_diff_coeff_code=0.25",
+            "diffusion/integrator=%s" % int_cfg,
+        ]
+
+        return parameters
+
+    def Analyse(self, parameters):
+        sys.path.insert(
+            1,
+            parameters.parthenon_path
+            + "/scripts/python/packages/parthenon_tools/parthenon_tools",
+        )
+
+        try:
+            import phdf
+        except ModuleNotFoundError:
+            print("Couldn't find module to read Parthenon hdf5 files.")
+            return False
+
+        tests_passed = True
+
+        def get_ref(x, Bx, field_cfg):
+            eff_diff_coeff = 0.25 if Bx == 0.0 else 0.25 * Bx * Bx
+            tlim_ = 0.0 if field_cfg == "perp" else tlim
+            return 1.0 + 1e-6 / (
+                np.sqrt(4 * np.pi * eff_diff_coeff * (0.5 + tlim_))
+                / np.exp(-(x**2) / (4.0 * eff_diff_coeff * (0.5 + tlim_)))
+            )
+
+        num_rows = len(res_cfgs)
+        num_cols = len(int_cfgs)
+        fig, p = plt.subplots(num_rows + 1, 2, sharey="row", sharex="row")
+
+        l1_err = np.zeros((len(field_cfgs), len(int_cfgs), len(res_cfgs)))
+        for step in range(len(all_cfgs)):
+            outname = get_outname(all_cfgs[step])
+            data_filename = f"{parameters.output_path}/parthenon.{outname}.final.phdf"
+            data_file = phdf.phdf(data_filename)
+            prim = data_file.Get("prim")
+            zz, yy, xx = data_file.GetVolumeLocations()
+            mask = yy == yy[0]
+            temp = prim[4][mask]
+            x = xx[mask]
+            res, field_cfg, int_cfg = all_cfgs[step]
+            row = res_cfgs.index(res)
+            col = int_cfgs.index(int_cfg)
+
+            Bx, By = get_B(field_cfg)
+            temp_ref = get_ref(x, Bx, field_cfg)
+            l1 = np.average(np.abs(temp - temp_ref))
+            l1_err[
+                field_cfgs.index(field_cfg),
+                int_cfgs.index(int_cfg),
+                res_cfgs.index(res),
+            ] = l1
+            p[row, col].plot(x, temp, label=f"{field_cfg} N={res} L$_1$={l1:.2g}")
+
+        # Plot convergence
+        for i, field_cfg in enumerate(field_cfgs):
+            for j, int_cfg in enumerate(int_cfgs):
+                p[0, j].set_title(f"Integrator: {int_cfg}")
+                if field_cfg == "perp":
+                    continue
+
+                p[-1, j].plot(
+                    res_cfgs,
+                    l1_err[i, j, :],
+                    label=f"{field_cfg} data",
+                )
+
+                # Simple convergence estimator
+                conv_model = lambda log_n, log_a, conv_rate: conv_rate * log_n + log_a
+                popt, pconv = curve_fit(
+                    conv_model, np.log(res_cfgs), np.log(l1_err[i, j, :])
+                )
+                conv_a, conv_measured = popt
+                # Note that the RKL2 convergence on the plots is currently significantly better
+                # than expected (<-3) though the L1 errors themself are larger than the unsplit
+                # integrator (as expected).
+                # For a more reasonable test (which would take longer), reduce the RKL2 ratio to,
+                # say, 200 and extend the resolution grid to 1024 (as the first data point at N=128
+                # is comparatively worse than at N>128).
+                if conv_measured > -1.98:
+                    print(
+                        f"!!!\nConvergence for {field_cfg} test with {int_cfg} integrator "
+                        f"is worse ({conv_measured}) than expected (-1.98).\n!!!"
+                    )
+                    tests_passed = False
+                p[-1, j].plot(
+                    res_cfgs,
+                    np.exp(conv_a) * res_cfgs**conv_measured,
+                    ":",
+                    lw=0.75,
+                    label=f"{field_cfg} Measured conv: {conv_measured:.2f}",
+                )
+
+        p[-1, 0].set_xscale("log")
+        p[-1, 0].set_yscale("log")
+        p[-1, 0].legend(fontsize=6)
+        p[-1, 1].legend(fontsize=6)
+
+        # Plot reference lines
+        x = np.linspace(-6, 6, 400)
+        for field_cfg in field_cfgs:
+            Bx, By = get_B(field_cfg)
+            for i in range(num_rows):
+                for j in range(num_cols):
+                    y = get_ref(x, Bx, field_cfg)
+                    p[i, j].plot(x, y, "-", lw=0.5, color="black", alpha=0.8)
+                    p[i, j].grid()
+                    p[i, j].legend(fontsize=6)
+
+        fig.tight_layout()
+        fig.savefig(
+            os.path.join(parameters.output_path, "cond.png"),
+            bbox_inches="tight",
+            dpi=300,
+        )
+
+        return tests_passed
diff --git a/tst/regression/test_suites/aniso_therm_cond_ring_conv/README.md b/tst/regression/test_suites/aniso_therm_cond_ring_conv/README.md
index bfd8226d..8222148a 100644
--- a/tst/regression/test_suites/aniso_therm_cond_ring_conv/README.md
+++ b/tst/regression/test_suites/aniso_therm_cond_ring_conv/README.md
@@ -2,5 +2,5 @@
 
 Executes 2D ring diffusion problem following Sharma & Hammett (2007) and calculates convergence rate.
 Errors are calculated based on comparison to steady state solution.
-Convergence for this problem is not great, but matches other numbers reported in literate, e.g., Balsara, Tilley & Howk MNRAS (2008) doi:10.1111/j.1365-2966.2008.13085.x .
+Convergence for this problem is not great, but matches other numbers reported in literature, e.g., Balsara, Tilley & Howk MNRAS (2008) doi:10.1111/j.1365-2966.2008.13085.x .
 Also the minium temperature is checked to ensure that limiting is working (i.e., the temperature is nowhere below the initial background temperature).