Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add comprehensive checks for problem cells #631

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

wilfonba
Copy link
Collaborator

@wilfonba wilfonba commented Sep 19, 2024

Description

comp_debug enables comprehensive error checking. At each Runge-Kutta sub-step, all conservative variables are checked for NaNs. The volume fractions are checked to ensure they are in the range [0, 1]. Negative densities are also checked for. If any of these checks find problems, the file comp_debug.txt will be written to the case directory with information about what problems were found and the simulation state will be saved for visualization.

Type of change

  • New feature (non-breaking change which adds functionality)

Scope

  • This PR comprises a set of related changes with a common goal

How Has This Been Tested?

I tested this by inserting problem cells after a certain number of time steps and seeing if the problems were identified and a save file dumped. I then ensured that post_process found this additional dump file and puts it in the silo database. I performed these tests on both CPUs and MI250x GPUs.

@sbryngelson
Copy link
Member

This seems obviously useful and helpful, the concern is with costs. What does the change in cost look like for a typical 3D problem on (a) 1 CPU and (b) 1 GPU?

Do you perform any allreduce calls? Like for getting maxs/mins over the entire domain? If so, that would be prohibitively expensive for a large simulation.

Copy link

codecov bot commented Sep 19, 2024

Codecov Report

Attention: Patch coverage is 1.38889% with 71 lines in your changes missing coverage. Please review.

Project coverage is 54.10%. Comparing base (75f5e3b) to head (7c48428).
Report is 55 commits behind head on master.

Files with missing lines Patch % Lines
src/simulation/m_sim_helpers.f90 0.00% 40 Missing ⚠️
src/post_process/m_start_up.f90 0.00% 19 Missing ⚠️
src/simulation/m_time_steppers.fpp 0.00% 12 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #631      +/-   ##
==========================================
- Coverage   54.38%   54.10%   -0.28%     
==========================================
  Files          61       61              
  Lines       13751    13821      +70     
  Branches     1720     1731      +11     
==========================================
  Hits         7478     7478              
- Misses       5817     5887      +70     
  Partials      456      456              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wilfonba
Copy link
Collaborator Author

I moved the comprehensive debug routine to m_sim_helpers but as a result I had to move the runtime info file subroutines to m_sim_helpers as well because m_sim_helpers now needs s_write_data to perform the data dump of the problematic state.

@@ -411,6 +415,8 @@ contains

if (model_eqns == 3) call s_pressure_relaxation_procedure(q_cons_ts(1)%vf)

if (comp_debug) call s_comprehensive_debug(q_cons_ts(1)%vf, t_step, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're going to add this call after every time step, why not put it outside the specific time stepper loop and instead in p_main? (or whatever calls the specific time stepper)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This called after each stage of a time-step, so it's called three times for one RK3 timestep

Copy link
Member

@sbryngelson sbryngelson Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then shouldn't it be in s_compute_rhs? (or, shouldn't there be a compute_timestep_stage subroutine that does compute_rhs + s_pressure_relaxation + whatever_else)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactoring the time stepping routines completely would seem to violate "This PR comprises a set of related changes with a common goal" in my opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps. Then we can do that one first.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced this change is possible since the update step for each substep is different.
Step 1

q_cons_ts(2)%vf(i)%sf(j, k, l) = &
                            q_cons_ts(1)%vf(i)%sf(j, k, l) &
                            + dt*rhs_vf(i)%sf(j, k, l)

Step 2

q_cons_ts(2)%vf(i)%sf(j, k, l) = &
                            (3d0*q_cons_ts(1)%vf(i)%sf(j, k, l) &
                             + q_cons_ts(2)%vf(i)%sf(j, k, l) &
                             + dt*rhs_vf(i)%sf(j, k, l))/4d0

Step 3

q_cons_ts(1)%vf(i)%sf(j, k, l) = &
                            (q_cons_ts(1)%vf(i)%sf(j, k, l) &
                             + 2d0*q_cons_ts(2)%vf(i)%sf(j, k, l) &
                             + 2d0*dt*rhs_vf(i)%sf(j, k, l))/3d0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Well, at the moment, each time stepper stage is followed by 80 lines of (at least) copy/pasting. The beginning and end of each time step (not stage) has 10 or so lines of copy/pasting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        call s_compute_rhs(q_cons_ts(1)%vf, q_prim_vf, rhs_vf, pb_ts(1)%sf, rhs_pb, mv_ts(1)%sf, rhs_mv, t_step, time_avg)

        if (run_time_info) then
            call s_write_run_time_information(q_prim_vf, t_step)
        end if

        if (probe_wrt) then
            call s_time_step_cycling(t_step)
        end if

        if (cfl_dt) then
            if (mytime >= t_stop) return
        else
            if (t_step == t_step_stop) return
        end if

        !$acc parallel loop collapse(4) gang vector default(present)
        do i = 1, sys_size
            do l = 0, p
                do k = 0, n
                    do j = 0, m
                        q_cons_ts(2)%vf(i)%sf(j, k, l) = &
                            q_cons_ts(1)%vf(i)%sf(j, k, l) &
                            + dt*rhs_vf(i)%sf(j, k, l)
                    end do
                end do
            end do
        end do

        !Evolve pb and mv for non-polytropic qbmm
        if (qbmm .and. (.not. polytropic)) then
            !$acc parallel loop collapse(5) gang vector default(present)
            do i = 1, nb
                do l = 0, p
                    do k = 0, n
                        do j = 0, m
                            do q = 1, nnode
                                pb_ts(2)%sf(j, k, l, q, i) = &
                                    pb_ts(1)%sf(j, k, l, q, i) &
                                    + dt*rhs_pb(j, k, l, q, i)
                            end do
                        end do
                    end do
                end do
            end do
        end if

        if (qbmm .and. (.not. polytropic)) then
            !$acc parallel loop collapse(5) gang vector default(present)
            do i = 1, nb
                do l = 0, p
                    do k = 0, n
                        do j = 0, m
                            do q = 1, nnode
                                mv_ts(2)%sf(j, k, l, q, i) = &
                                    mv_ts(1)%sf(j, k, l, q, i) &
                                    + dt*rhs_mv(j, k, l, q, i)
                            end do
                        end do
                    end do
                end do
            end do
        end if

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll condense some of this then

@sbryngelson
Copy link
Member

Useful feature but needs more attention when time is available.

@sbryngelson sbryngelson added the enhancement New feature or request label Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

Successfully merging this pull request may close these issues.

2 participants