Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] NaNs in brightness_temp #400

Open
Kojobu opened this issue Jun 13, 2024 · 5 comments
Open

[BUG] NaNs in brightness_temp #400

Kojobu opened this issue Jun 13, 2024 · 5 comments
Labels

Comments

@Kojobu
Copy link

Kojobu commented Jun 13, 2024

Describe the bug:
When running the simulator with a specific set of parameters, there are singular NaN values. When slightly alternating one of the parameters, the NaNs disappear. Furthermore, the NaN values are exactly the HII_DIM size away from each other in the z dimension.
E.g. the minimal working example from below yields NaNs at

x = array([26, 26, 43, 43, 43]) 
y = array([50, 50,  0,  0,  0])
z = array([ 88, 208,  38, 158, 278]))

Note, that this issue does not occur if USE_TS_FLUCT is set to false.

To Reproduce:

import py21cmfast as p21c

user_params = p21c.UserParams(
    HII_DIM=120, BOX_LEN=200, NO_RNG=True
)
flag_options = p21c.FlagOptions(
    INHOMO_RECO=True, USE_TS_FLUCT=True
)

astro_params = p21c.AstroParams(
    **{'ION_Tvir_MIN': 4.798687324224536, 
    'HII_EFF_FACTOR': 153.93554078107047, 
    'L_X': 41.07914816577325, 
    'NU_X_THRESH': 122.31064697905447}
)

lightcone = p21c.run_lightcone(
    direc='_cache',
    user_params=user_params,
    flag_options=flag_options,
    astro_params=astro_params,
    redshift = 5.5
)

print(np.where(np.isnan(lightcone.brightness_temp)==True))

Please note that the brightness_temp does not look physical due to NO_RNG = True. It is only set to True for the minimal working example given above. This is a necessary condition to reproduce the error. Although setting the random_seed argument, the brightness_map will differ from run to run if NO_RNG = False.
Furthermore, the crash parameters are not unique. There are other combinations for which the error occurs. They may differ radically and show no obvious coherence.

Expected behavior:
A brightness map without NaNs or at least a warning that this particular choice of parameters causes trouble.

Details:
OS: Linux (behavior is reproducible on local hardware and cluster)
Python version: 3.12.2 and 3.11.2
Package version: pip version from pypi

Additional context
The error occurs during MCMC inference. Each step, the four astro parameters are sampled uniformly and the simulator is called as a sample from the likelihood. The ranges from which the parameters were sampled are:

ON_Tvir_MIN: [4,5.3]
HII_EFF_FACTOR: [10, 250]
L_X : [38,42]
NU_X_THRESH: [100, 1500]
@Kojobu Kojobu added the bug label Jun 13, 2024
@steven-murray
Copy link
Member

Thanks @Kojobu for the detailed report. I will try to reproduce.

@steven-murray
Copy link
Member

I have reproduced the issue with the given script, thank you! Actually, with those parameters I do get NaNs, but only two of them (not 5).

I turned on LOG_LEVEL=DEBUG when building the package, and I found that the NaN's are only in the TsBox.Ts_box, and only enter between redshifts 6.7681003887088105 and 6.6157848372239005.

I have not dug any deeper at this point. I wonder if @daviesje might be able to help here as well.

@daviesje
Copy link
Contributor

daviesje commented Jul 1, 2024

@steven-murray I'm not 100% sure if this is the issue, but something I've encountered before is that the density field can contain values of exactly -1. See PerturbField.c

if (*((float *)LOWRES_density_perturb + HII_R_FFT_INDEX(i,j,k)) < -1) {
*((float *)LOWRES_density_perturb + HII_R_FFT_INDEX(i,j,k)) = -1.+FRACT_FLOAT_ERR;
}

Since there's a strict inequality in the conditional, it can miss some values. Also later on in the same file:

if (*((float *)LOWRES_density_perturb + HII_R_FFT_INDEX(i,j,k)) < -1.0) { // shouldn't happen
if(bad_count<5) LOG_WARNING("LOWRES_density_perturb is <-1 for index %d %d %d (value=%f)", i,j,k, *((float *)LOWRES_density_perturb + HII_R_FFT_INDEX(i,j,k)));
if(bad_count==5) LOG_WARNING("Skipping further warnings for LOWRES_density_perturb.");
*((float *)LOWRES_density_perturb + HII_R_FFT_INDEX(i,j,k)) = -1+FRACT_FLOAT_ERR;
bad_count++;
}

Density of -1 results in kinetic temperature being zero, and spin temperature being NaN. In v4-prep I have a fix in SpinTemperature.c, but I think if this is what causes the issue the best fix is to change the two above srict inequalities to <=.

@steven-murray
Copy link
Member

Sounds good, thanks @daviesje. I'll go and implement those changes, and see if it fixes the problem

@steven-murray
Copy link
Member

Hmmm, this didn't seem to fix the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants