Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified CLASS not working with parallelization #590

Open
subhajitghosh-phy opened this issue Aug 28, 2024 · 1 comment
Open

Modified CLASS not working with parallelization #590

subhajitghosh-phy opened this issue Aug 28, 2024 · 1 comment

Comments

@subhajitghosh-phy
Copy link

Hi,

I am modifying the latest version of the CLASS to incorporate massive neutrino self-interaction. I implemented a new tight coupling approximation(tca) for the ncdm species.

My code is working fine on a single core. However, when I turn on parallelization ( by export OMP_NUM_THREADS = n, where n>1) the code fails randomly.

I said randomly because sometimes the code runs and sometimes fails for the same parameter value. The error typically looks like this (below) -- which I am guessing is coming from the new tca for ncdm. It seems that the code is not going through the tca conditions properly (this error is expected if there is no tca).

Error in perturbations_init
=>operator()(L:954) :error in perturbations_solve(ppr, pba, pth, ppt, index_md, index_ic, index_k, &pw);
=>perturbations_solve(L:3264) :error in generic_evolver(perturbations_derivs, interval_limit[index_interval], interval_limit[index_interval+1], ppw->pv->y, ppw->pv->used_in_sources, ppw->pv->pt_size, &ppaw, ppr->tol_perturbations_integration, ppr->smallest_allowed_variation, perturbations_timescale, ppr->perturbations_integration_stepsize, ppt->tau_sampling, tau_actual_size, perturbations_sources, perhaps_print_variables, ppt->error_message);
=>evolver_ndf15(L:468) :condition (absh <= hmin) is true; Step size too small: step:9.0382e-14, minimum:9.0382e-14, in interval: [0.279812:56.4887]

I tried to debug turning on higher perturbation verbose. It seems that when parallelization is on in the index_k loop for some of the k values perturbations_solve gives an error randomly. Sometimes just one k, sometimes multiple k. I reiterate the code is working just fine on a single core when I set 'export OMP_NUM_THREADS = 1'.

Any idea what may be the source of this or any suggestion on how to debug it? To give you more details in the TCA for ncdm I am just restricting l_max for ncdm.

Thanks in advance.

Best,
Subhajit

@subhajitghosh-phy
Copy link
Author

I have an update regarding this. I choose the rk evolver (setting evolver = 0) the code runs fine.

However, this is not a practical solution for me since after switching off TCA the equations are still a bit stiff for rk evolver for some relevant parameter space regions (for most of the parameter space it works fine).

No this is confusing -- why does rk evolver work with parallelization but ndf15 doesn't in my modified code?

I noticed that in the last few commits the OpenMP parallelization has been modified (actually removed). Can that be related to the error? Help will be very much appreciated.

Best,
Subhajit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant