Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IEEE invalid operation when checking for central body collision #20

Open
tobiscode opened this issue Feb 8, 2020 · 8 comments
Open

Comments

@tobiscode
Copy link

tobiscode commented Feb 8, 2020

Hi all,

I sometimes stumbled across the following warning message when running Mercury:

Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG

With not much more to go on, I added the gfortran compiler flag -ffpe-trap=invalid (and removed -O2 as otherwise the debugger didn't shoe the correct line) to find the problem: calling acos() for an undefined value. Here is the part (line 963 in mercury6_2.for):

u0 = sign (acos((1.d0 - r0/a )/e), rv0)

I then added some debug output to check the values of r0, a, e and rv0 right before the error:

parameter value
r0 8699.8554879043295
a 4349.9294907593367
e 0.99999919683544536
inside acos -1.0000000000218023
rv0 -8.0957151629424393E-006

I run large suites of simulations, keeping big.in the same but having randomized particles in small.in, so every run is slightly different, and it doesn't happen in most runs. I attached the initial conditions for a sample run where the bug does occur.

  • System: Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-76-generic x86_64)
  • gfortran: GNU Fortran (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 using the provided Makefile
  • Mercury: most up-to-date version of this repository, except for three parameters I changed in mercury.inc before compiling (CMAX = 1000 and NFILES = 200)

A possible solution is referenced in #18, but it's probably not the best way as it masks possible other bugs as was pointed out.

Cheers

@texadactyl
Copy link
Contributor

I reproduced this using the supplied initialization files. My system:

Description:	Ubuntu Focal Fossa (development branch)
Release:	20.04
Codename:	focal
----- OS kernel name/release/version -----
5.4.0-12-generic #15-Ubuntu SMP Tue Jan 21 15:12:29 UTC 2020
----- Machine hardware platform -----
x86_64
----- gfortran
9.2.1-28ubuntu1

mercury 6 stdout/stderr:

   Integrating massive bodies and particles up to the same epoch.
   Beginning the main integration.
 Time:       1368.925 years   dE/E:  4.69510E-08   dL/L:  3.48461E-14
 Time:       2737.851 years   dE/E: -7.21165E-07   dL/L:  4.09303E-14
 Time:       4106.776 years   dE/E: -6.46237E-07   dL/L:  5.47581E-14
 Time:       5475.702 years   dE/E: -1.01207E-06   dL/L:  5.27300E-14
 Time:       6844.627 years   dE/E: -6.24822E-07   dL/L:  2.06495E-14
 Time:       8213.552 years   dE/E: -7.97592E-07   dL/L:  8.11232E-15
 Time:       9582.478 years   dE/E: -5.56704E-07   dL/L:  1.56346E-13
   Integration complete.
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG

@texadactyl
Copy link
Contributor

Added flag -ffpe-trap=invalid:

   Integrating massive bodies and particles up to the same epoch.
   Beginning the main integration.
 Time:       1368.925 years   dE/E:  4.69510E-08   dL/L:  3.48461E-14
 Time:       2737.851 years   dE/E: -7.21165E-07   dL/L:  4.09303E-14
 Time:       4106.776 years   dE/E: -6.46237E-07   dL/L:  5.47581E-14
 Time:       5475.702 years   dE/E: -1.01207E-06   dL/L:  5.27300E-14
 Time:       6844.627 years   dE/E: -6.24822E-07   dL/L:  2.06495E-14
 Time:       8213.552 years   dE/E: -7.97592E-07   dL/L:  8.11232E-15

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7f6e489aaca1 in ???
#1  0x7f6e489a9e75 in ???
#2  0x7f6e4867646f in ???
#3  0x7f6e4884db90 in ???
#4  0x5617e1d4b79e in mce_cent_
	at ./mercury6_2.for:963
#5  0x5617e1d44608 in mal_hcon_
	at ./mercury6_2.for:697
#6  0x5617e1d5f3a8 in MAIN__
	at ./mercury6_2.for:189
#7  0x5617e1d5fb3e in main
	at ./mercury6_2.for:221

@texadactyl
Copy link
Contributor

texadactyl commented Feb 9, 2020

Ran again with the custom parameter files supplied by @tobiscode.
Note subroutine mce_cent

Using 2-body approximation.
Collision with central body (q <= rcen).
Time of impact relative to the end of the timestep (e < 0)

Note that algor = 10 (HYBRID) is specified in the custom param.in.
If you change just the algor value back to BS like in the sample,
no collision with central body occurs and, therefore, no program crash occurs.

Something is wrong with the collision with central body logic.
It certainly needs protective code to avoid floating pt exceptions.
This will probably need an astrophysicist to figure out.

One other thing although not related to this issue. The calculations specific to algor=11 look suspicious. Uninitialized arrays are used in calculations. Assumed to be zero? Bad practice.

My customized source files used for tracing are attached - ZIP of 2 files: mercury6_2.for and mercury.inc. I added a tracing flag which controls whether or not to trace. Once this bug is fixed, the customization isn't really needed.

xx.tar.gz

@texadactyl
Copy link
Contributor

texadactyl commented Feb 10, 2020

Ran again with the custom parameter files supplied by @tobiscode.
Changed algor=BS2 in param.in and got another central body collision:
&MCE_CENT_E_LT_1 (output from line 979)
P= 1.6726537648361467E-003,
Q= 8.3632696282470525E-004,
RCEN= 4.6547587699999997E-003,
A= 4349.4098229833053 ,
E= 0.99999980771484021 ,
H= 58.974682251497846 ,
R0= 8698.8187078549345 ,
RV0= -2.4711172542430239E-004,
/

but mercury6 continued to the end without issues.

Note that when algor=HYBRID, this display appears before the program crash:
&MCE_CENT_E_LT_1
P= 6.9874155583106654E-003,
Q= 3.4937091821670221E-003,
RCEN= 4.6547587699999997E-003,
A= 4349.9294907593367 ,
E= 0.99999919683544536 ,
H= 100.00000000000000 ,
R0= 8699.8554879043295 ,
RV0= -8.0957151629424393E-006,
/

In summary,

  • Both BS2 and HYBRID algorithms produced a central body collision.
  • All of the algorithms ran without crashing for me except HYBRID.

It could be that @tobiscode tried a combination of parameters with algor=HYBRID that makes no sense (I am not qualified to judge). If that is true, then, optimally, the initialization code should be amended to diagnose such combinations. Needs an astrophysicist, I believe.

Clearly this early 1990s Fortran code was written to be used by scientists who would first check their parameters should a crash occur.

@idovgalyov-4xxi
Copy link
Collaborator

@tobiscode, @texadactyl, Hi guys!
Sorry for a great response delay, it turned out that I got notifications turned off all the time for this repo.
Thanks for your bug report. I have briefly looked over the discussion to understand the problem and it seems that I faced this problem several times before. I will try to come up with some ideas how to fix it when I get free of my job stuff.

This will probably need an astrophysicist to figure out.

Celestial mechanics specialist, to be precise ;-)

@texadactyl
Copy link
Contributor

@idovgalyov-4xxi See? I didn't event know who the right fixer would be! (=:

@texadactyl
Copy link
Contributor

@tobiscode
Is this still an issue?
If so, can figure out where the conflict is?
If not, please close this issue.

@tobiscode
Copy link
Author

Hi,
I've moved on to different projects and am not working with mercury anymore. I ended up being fine with the "fix" I made in the PR, and then assumed that @idovgalyov-4xxi would eventually tackle the problem. If that hasn't happened, then I believe the problem is still there. I'm sorry I can't help anymore with this.
Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants