Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parameter set causing "Segmentation fault" in python wrapper #419

Open
lukashergt opened this issue May 1, 2021 · 5 comments · May be fixed by #423
Open

parameter set causing "Segmentation fault" in python wrapper #419

lukashergt opened this issue May 1, 2021 · 5 comments · May be fixed by #423

Comments

@lukashergt
Copy link

lukashergt commented May 1, 2021

The following parameter set causes a Segmentation fault in the 'background' module (both python wrapper and command line call to CLASS):

Example python script to reproduce the error:

from classy import Class

paramdict = {'A_s': 1.9277426921445484e-09, 
             'n_s': 0.9451570080189712, 
             '100*theta_s': 1.045577375117725, 
             'omega_b': 0.017612175306787197, 
             'omega_cdm': 0.30057249803499375, 
             'm_ncdm': 0.06, 
             'tau_reio': 0.3055598544108618, 
             'N_ncdm': 1, 
             'N_ur': 2.0328, 
             'output': 'tCl'}

cosmo = Class()
cosmo.set(paramdict)
cosmo.compute(['background'])

I've chased it down to line 1355 in tools/evolver_ndf15.c:

  1339	  else{
  1340	    /*Normal case:*/
  1341	    for(j=1;j<=neq;j++){
  1342	      Fdiff_new = 0.0;
  1343	      Fdiff_absrm = 0.0;
  1344	      for(i=1;i<=neq;i++){
  1345	        Fdiff_absrm = MAX(fabs(Fdiff_new),Fdiff_absrm);
  1346	        Fdiff_new = nj_ws->ydel_Fdel[i][j] - fval[i];
  1347	        dFdy[i][j] = Fdiff_new/nj_ws->del[j];
  1348	        /*Find row maximums:*/
  1349	        if(fabs(Fdiff_new)>=Fdiff_absrm){
  1350	          /* Found new max location in column */
  1351	          nj_ws->Rowmax[j] = i;
  1352	          nj_ws->Difmax[j] = fabs(Fdiff_new);
  1353	        }
  1354	      }
  1355	      nj_ws->absFdelRm[j] = fabs(nj_ws->ydel_Fdel[nj_ws->Rowmax[j]][j]);
  1356	    }
  1357	  }

nj_ws->Rowmax[j] returns some random number, so I guess j overshoots the Rowmax array somehow.
I'm a bit lost with all these indices. Can anyone else reproduce this? Anybody an idea what might be going wrong here?

Versions:

Python 3.8.2
classy v3.0.1
@pstoecker
Copy link

pstoecker commented May 5, 2021

Hi @lukashergt ,

this is interesting. I have seen a similar issue in the context of energy injection with class v3.0.0 and I know exactly what goes wrong here. For your parameter combination, some, if not all, entries of the DE system that will be evolved by ndf15 contain a NaN such that the calculation that determines the value of nj_ws->Rowmax[j] fails and it leaves nj_ws->Rowmax[j] uninitialised such that you go out of bounds of the nj_ws->ydel_Fdel array and you access random data.

This can be fixed by explicitly checking the derivatives and throwing an error if they are ill-defined.

Best,
Patrick

@fruzsinaagocs
Copy link

fruzsinaagocs commented May 7, 2021

Hi @pstoecker and @lukashergt,

I tried reproducing this issue to see if my fix over at https://github.com/GambitBSM/gambit/issues/278 worked here. I couldn't reproduce the issue with either of these:

  • OS: Ubuntu 20.04 (personal machine), CentOS 8 (the Gadi cluster)
  • compiler: gcc 8.3.1, 9.3.0
  • Python: 3.7.4 (but also tried command line CLASS, without the Python wrapper)
  • classy 3.0.1

In a discussion with Lukas, we pinned down the following:

  • The issue was coming from the initial run of the 'shooting' method in CLASS, when it tries to determine H_0 from 100*theta_s.
  • During shooting, some background evolutions with ndf15 are ran. For these parameters, we get a shrinking H, and it trickles down from some enormous value all the way to nearly 0.
  • On Lukas' system, this initial background evolution fails to stop in time and produces NaNs for all background variables. On my system(s), it doesn't, although I'm doubtful the H I'm getting is correct.
  • For Lukas, the shooting produces a shooting failed error message, but the error is delayed until later:

/* If shooting fails, postpone error to background module to play nice with MontePython. */

  • The shooting failed error message is only visible with my fix implemented, otherwise we get a segmentation fault.
  • We tried various compiler flag options, most notably we switched off optimization flags, but Lukas still gets the error.

I'm going to look into

  • why the background evolution during shooting fails to terminate properly for him (although this is hard, as I can't reproduce the issue),
  • why the initial H CLASS starts shooting from is unnecessarily large.

But this discussion makes me more certain that the patch will not rule out valid points in parameter space - we definitely want to avoid that happening.

@pstoecker
Copy link

OK, I see.

I think there are two issues to fix here. The first thing is that ndf15 has to be fixed such that it throws an error whenever the derivatives are ill-defined. This should be independent of the question whether the parameter is actually fine and just due to some weird (system-dependent) computational issue the problem arises. Therefore it would be good if we could try your fix from GambitBSM/gambit#278 here

The other thing is the particular issue that is happening here. Also on my system (ubuntu 18.04, gcc 7.5) this particular parameter point does not crash. I haven't checked the results though but I guess that I will see a similar doubtful value of H. Given your explanation, I agree that this needs some investigation.

@fruzsinaagocs
Copy link

Ah, I should've been more clear - we tried the fix from https://github.com/GambitBSM/gambit/issues/278 and it successfully avoids the seg fault, so CLASS fails in a more elegant way and doesn't disrupt the MCMC chains.

fruzsinaagocs added a commit to fruzsinaagocs/class_public that referenced this issue May 8, 2021
@fruzsinaagocs fruzsinaagocs linked a pull request May 8, 2021 that will close this issue
@lukashergt
Copy link
Author

Hi @fruzsinaagocs and @pstoecker, thanks for both your inputs that was very helpful. The ndf15-fix definitely proved useful in some cases.

I found another issue, though, that relates to this particular case and to curvature runs in general. The following lines in source/background.c have been commented out:

  /** - control that cosmological parameter values make sense, otherwise inform user */

  /* H0 in Mpc^{-1} */
  /* Many users asked for this test to be supressed. It is commented out. */
  /*class_test((pba->H0 < _H0_SMALL_)||(pba->H0 > _H0_BIG_),
             pba->error_message,
             "H0=%g out of bounds (%g<H0<%g) \n",pba->H0,_H0_SMALL_,_H0_BIG_);*/

This test is important to filter out cases such as collapsing universes. I don't know why "many users asked for this test to be suppressed", but there are definitely situations where these are relevant. Maybe an approach similar to CAMB would be useful, where you are allowed to set a viable H0 range, maybe with the additional option of leaving it unspecified?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants