Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running MONC with DEPHY forcings on ARCHER2 #51

Open
leifdenby opened this issue May 20, 2021 · 10 comments
Open

Running MONC with DEPHY forcings on ARCHER2 #51

leifdenby opened this issue May 20, 2021 · 10 comments

Comments

@leifdenby
Copy link
Collaborator

leifdenby commented May 20, 2021

I'm creating this issue to track progress in getting EUREC4A gases defined through DEHPY forcings running on ARCHER2 (@sjboeing is doing this work primarily, not me)

@leifdenby
Copy link
Collaborator Author

@sjboeing wrote 11/5/2021:

Just a quick heads up to let you know that the current version of MONC with CASIM and changes for DEPHY seems to be running successfully on ARCHER2. Thanks all for the hard work: I will do some more testing and then bring this onto the git repository.

In terms of the setup, I am currently using 108 compute cores and 18 IO cores per node on the ARCHER2. This is a ratio of 6:1, and allows us to make use of almost the entire node. Grids that have a multiple of 54 (factors as 33322 for FFTs) grid points in each dimension should be a good match. I have been running a 108108 domain with 200m grid spacing yesterday, and will be looking to upgrade this to 540540 @150 m grid spacing soon. Eventually, we will want to have something more like 2160*2160 @ 100 m (after initial validation, decisions on aerosols/CDNC, integration of SOCRATES, and resolution of a pending issue on surface pressure).

@leifdenby
Copy link
Collaborator Author

@MarkUoLeeds wrote 11/5/2021:

I find it really odd that there is no multiple of “NUMA regions” in your calculations. On Archer2 there are 8 numa regions, each with 16 cores; 15 moncs per IO would fit that well. I understand there is a requirement for the grids to match MPI decompositions and 120 monc procs might not work. Perhaps 8 IOs per node is also too low for your resolution? Your factors seem to relate to 9s.
Did you try the recommended (by me) 15 moncs per io?

@leifdenby
Copy link
Collaborator Author

@sjboeing wrote 11/5/2021:

I thought I would keep a small number of MONCs per IO at first, but possibly keeping things in the same NUMA region is more important as you say (I find it hard to predict things with the IO server). I will give it a go, in that case we can use domain sizes a multiple of 120 which would be nice.

@leifdenby
Copy link
Collaborator Author

leifdenby commented May 20, 2021

Having just spoken to @sjboeing about this there are currently some issues with MPI-related crashes with "PTHREADS" Pthreads error in IO server, error code=-2 (@sjboeing could you add a few more details below?)

@sjboeing here's what I wrote to Nick Brown back in 2017 on getting more useful errors back from MPI in fortran:

As regards to getting better error messages when MONC fails with an MPI-related error I looked into what you suggested (I also found the lecture notes from a course I attended during my MPhil http://people.ds.cam.ac.uk/nmm1/MPI/Notes/notes_06.pdf (archive: http://web.archive.org/web/20181008171031/http://people.ds.cam.ac.uk/nmm1/MPI/Notes/notes_06.pdf) which were really helpful). I can see that because the default behaviour of MPI is to die on any error, and because the cray fortran compiler doesn’t support producing tracebacks (both gfortran and ifort do…), this makes it particularly hard work out exactly where MONC went wrong. Have you considered changing the default error handling from MPI_ERRORS_ARE_FATAL to MPI_ERRORS_RETURN and then using the value of ierr in each MPI call in io/src/mpicommunication.F90? This would already make it much clearer exactly which MPI call caused the issue and each call to subroutines in io/src/mpicommunication.F90 could provide a string indicating what underlying IO-server operation was using MPI at the time (in effect providing a poor-mans stacktrace). I already did the latter for the netcdf error checker in model_core/src/utils/netcdf_misc.F90 which makes it much easier to work out what goes wrong with netcdf files.

The change I suggested was never made, but I still think this would be very useful. What do you think? @cemac-ccs I am wondering what you think about adding this (i.e. making changing the error handling to MPI_ERRORS_RETURN and making checks on the return code from all calls? We could then include the name of the module and subroutine in the error message we display before MONC dying).

@sjboeing
Copy link
Contributor

sjboeing commented May 20, 2021

Here is a snapshot from a 108*108 grid point run at 200m, running on a single ARCHER2 node:
image
Variable is cloud top height

@sjboeing
Copy link
Contributor

sjboeing commented May 21, 2021

Update:
With 16 MONCs per IO on 15 nodes, the simulation has crashed at the very start with a pthreads error (""Pthreads error in IO server, error code=-2").
With 6 MONCs per IO, it got about halfway the simulation before crashing with an MPI error, see below.

image

*** Error in `/lus/cls01095/work/n02/n02/sboeing/monc/./build/bin/monc_driver.exe': malloc(): memory corruption: 0x0000000001be1b30 ***

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0  0x2b9bdb12659f in ???
#1  0x2b9bdb126520 in ???
#2  0x2b9bdb127b00 in ???
#3  0x2b9bdb169956 in ???
#4  0x2b9bdb170172 in ???
#5  0x2b9bdb173548 in ???
#6  0x2b9bdb174fd6 in ???
#7  0x4d71c9 in ???
#8  0x4cc24b in ???
#9  0x488b14 in ???
#10  0x406480 in ???
#11  0x2b9bdb111349 in ???
#12  0x4064b9 in ???
        at ../sysdeps/x86_64/start.S:120
#13  0xffffffffffffffff in ???
mlx5: nid001878: got completion with error:
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000010 00000000 00000000 00000000
00000000 00008a12 0a00abe2 a48f5cd3
...
MPICH ERROR [Rank 126] [job id 277588.0] [Fri May 21 05:04:11 2021] [unknown] [nid001513] - Abort(136471695) (rank 126 in comm 0): Fatal error in PMPI_Test: Other MPI error, error stack:
PMPI_Test(205)................: MPI_Test(request=0x888e14, flag=0x7fffb184b42c, status=0x7fffb184b460) failed
MPIR_Test(85).................: 
MPIR_Test_impl(39)............: 
MPIDI_Progress_test(72).......: 
MPIDI_OFI_handle_cq_error(902): OFI poll failed (ofi_events.h:904:MPIDI_OFI_handle_cq_error:Input/output error)


Program received signal SIGABRT: Process abort signal.

@leifdenby
Copy link
Collaborator Author

leifdenby commented May 23, 2021

That's frustrating @sjboeing. The clouds are looking good though!

Did you run this with openmpi or mvapich? I was just wondering because I think @cemac-ccs said mvapich works better on ARC4.

edit: actually, I think that is a different issue that Chris' was addressing there: #44

@cemac-ccs
Copy link
Collaborator

cemac-ccs commented May 25, 2021 via email

@sjboeing
Copy link
Contributor

This is using Chris' scripts with minor modifications on ARCHER2 (so cray-mpich). One parameter which may need changing is the thread_pool number in the io configuration, which is currently set to 500.

@eers1
Copy link
Collaborator

eers1 commented Oct 7, 2021

I've also had this MPICH error a few times now, did you happen to make any progress with it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants