Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inverse table improvements #388

Merged
merged 15 commits into from
May 17, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 39 additions & 40 deletions src/py21cmfast/inputs.py
Original file line number Diff line number Diff line change
Expand Up @@ -290,31 +290,6 @@ class GlobalParams(StructInstanceWrapper):
The peak gas temperatures behind the supersonic ionization fronts during reionization.
VAVG:
Avg value of the DM-b relative velocity [im km/s], ~0.9*SIGMAVCB (=25.86 km/s) normally.
STOC_MASS_TOL:
Mass tolerance for the stochastic halo sampling, a sample will be rejected if its mass sum falls
outside of the expected mass * (1 +- STOC_MASS_TOL)
SAMPLER_MIN_MASS:
Sets the minimum mass used by the halo sampler, halos below this mass will have the average contriubution
based on the integral of the given CMF. Greatly affects performance and memory usage.
MAXHALO_FACTOR:
Safety factor to multiply halo array sizes, total (all threads) array size will be UMF integral * MAXHALO_FACTOR
N_MASS_INTERP:
Number of mass bins for the sampler interpolation tables.
N_DELTA_INTERP:
Number of delta bins for the sampler interpolation tables.
N_PROB_INTERP:
Number of probability bins for the sampler interpolation tables.
MIN_LOGPROB:
Lower limit (in log probability) of the inverse CDF table.
SAMPLE_METHOD:
Sampling method to use for stochastic halos:
0: Mass sampling from CMF
1: N_halo sampling from CMF
2: Sheth & Lemson 99 Fcoll sampling
3: Darkforest (Qiu+20) binary splitting
AVG_BELOW_SAMPLER:
int flag to add the average halo proeprties in each cell between the source mass limit
and the mass sampled by the halo sampler
"""

def __init__(self, wrapped, ffi):
Expand Down Expand Up @@ -510,11 +485,6 @@ class UserParams(StructWithDefaults):
0: GSL QAG adaptive integration,
1: Gauss-Legendre integration, previously forced in the interpolation tables,
2: Approximate integration, assuming sharp cutoffs and a triple power-law for sigma(M) based on EPS
INTEGRATION_METHOD_HALOS: int, optional
The integration method to use for all conditional MF integrals for halos from the sampler catalogues:
0: GSL QAG adaptive integration,
1: Gauss-Legendre integration, previously forced in the interpolation tables,
2: Approximate integration, assuming sharp cutoffs and a triple power-law for sigma(M) based on EPS
USE_2LPT: bool, optional
Whether to use second-order Lagrangian perturbation theory (2LPT).
Set this to True if the density field or the halo positions are extrapolated to
Expand All @@ -532,6 +502,36 @@ class UserParams(StructWithDefaults):
If STOC_MINIMUM_Z is not provided, we simply sample at the given redshift, unless
USE_TS_FLUCT is given or we want a lightcone, in which case the minimum redshift is set
to the minimum redshift of those fields.
SAMPLER_MIN_MASS: float, optional
The minimum mass to sample in the halo sampler when USE_HALO_FIELD and HALO_STOCHASTICITY are true,
decreasing this can drastically increase both compute time and memory usage.
SAMPLER_BUFFER_FACTOR: float, optional
The arrays for the halo sampler will have size of SAMPLER_BUFFER_FACTOR multiplied by the expected
number of halos in the box. Ideally this should be close to unity but one may wish to increase it to
test alternative scenarios
N_COND_INTERP: int, optional
The number of condition bins in the inverse CMF tables.
N_PROB_INTERP: int, optional
The number of probability bins in the inverse CMF tables.
MIN_LOGPROB: float, optional
The minimum log-probability of the inverse CMF tables.
SAMPLE_METHOD: int, optional
The sampling method to use in the halo sampler when calculating progenitor populations:
0: Mass-limited CMF sampling, where samples are drawn until the expected mass is reached
1: Number-limited CMF sampling, where we select a number of halos from the Poisson distribution
and then sample the CMF that many times
2: Sheth et al 1999 Partition sampling, where the EPS collapsed fraction is sampled (gaussian tail)
and then the condition is updated using the conservation of mass.
3: Parkinsson et al 2008 Binary split model as in DarkForest (Qiu et al 2021) where the EPS merger rate
is sampled on small internal timesteps such that only binary splits can occur.
NOTE: Sampling from the density grid will ALWAYS use number-limited sampling (method 1)
Comment on lines +518 to +527
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the python-side parameter either a string or an enum? This is more explicit. It's harder to read code that just has SAMPLE_METHOD=2 rather than SAMPLE_METHOD=samplers.partition or SAMPLE_METHOD='partition'

AVG_BELOW_SAMPLER: bool, optional
When switched on, an integral is performed in each cell between the minimum source mass and SAMPLER_MIN_MASS,
effectively placing the average halo population in each HaloBox cell below the sampler resolution
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if it's not true? Those masses are just ignored totally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would imagine that most of the time SAMPLER_MIN_MASS would be set low enough so that the averaging is unnecessary, and that this would be for saving memory on larger boxes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool -- I think a sentence describing that would be helpful

HALOMASS_CORRECTION: float, optional
This provides a corrective factor to the mass-limited (SAMPLE_METHOD==0) sampling, which multiplies the
expected mass from a condition by this number.
This also is used in the partition (SAMPLE_METHOD==2) sampler, multiplying sigma(M) of each sample drawn.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the usefulness of this option? Why is its default 0.9?

Copy link
Contributor Author

@daviesje daviesje May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mass-limited sampler has a slight bias toward having too many halos, this bias is independent of delta or halo mass, so this factor is a correction to that. The value of 0.9, multiplying the expected collapsed mass from each descendant works well for the default timestep factor of 1.02. Since the partition method also uses a single correction factor (Mcquinn+ 2007 lowers sigma_8 slightly, I've implemented a correction in nu) we re-use the parameter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Then I think this should be documented here (i.e. probably don't touch this unless you have good reason, and these are the values it should take in these circumstances)

"""

_ffi = ffi
Expand All @@ -551,11 +551,19 @@ class UserParams(StructWithDefaults):
"USE_INTERPOLATION_TABLES": None,
"INTEGRATION_METHOD_ATOMIC": 1,
"INTEGRATION_METHOD_MINI": 1,
"INTEGRATION_METHOD_HALOS": 0,
"USE_2LPT": True,
"MINIMIZE_MEMORY": False,
"STOC_MINIMUM_Z": None,
"KEEP_3D_VELOCITIES": False,
"SAMPLER_MIN_MASS": 5e7,
"SAMPLER_BUFFER_FACTOR": 2.0,
"MAXHALO_FACTOR": 2.0,
"N_COND_INTERP": 200,
"N_PROB_INTERP": 400,
"MIN_LOGPROB": -12,
"SAMPLE_METHOD": 0,
"AVG_BELOW_SAMPLER": False,
"HALOMASS_CORRECTION": 0.9,
}

_hmf_models = ["PS", "ST", "WATSON", "WATSON-Z", "DELOS"]
Expand Down Expand Up @@ -667,15 +675,6 @@ def power_spectrum_model(self):
"""String representation of the power spectrum model used."""
return self._power_models[self.POWER_SPECTRUM]

@property
def INTEGRATION_METHOD_HALOS(self):
"""The integration methods other than QAG do not yet work for halos."""
if self._INTEGRATION_METHOD_HALOS != 0:
warnings.warn(
"Only the QAG integrator currently works for the halo sampler, setting to 1 or 2 is for testing only"
)
return self._INTEGRATION_METHOD_HALOS

@property
def cell_size(self) -> un.Quantity[un.Mpc]:
"""The resolution of a low-res cell."""
Expand Down
15 changes: 13 additions & 2 deletions src/py21cmfast/src/21cmFAST.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,20 @@ struct UserParams{
bool USE_INTERPOLATION_TABLES;
int INTEGRATION_METHOD_ATOMIC;
int INTEGRATION_METHOD_MINI;
int INTEGRATION_METHOD_HALOS;
bool USE_2LPT;
bool MINIMIZE_MEMORY;
bool KEEP_3D_VELOCITIES;

//Halo Sampler Options
float SAMPLER_MIN_MASS;
double SAMPLER_BUFFER_FACTOR;
float MAXHALO_FACTOR;
int N_COND_INTERP;
int N_PROB_INTERP;
double MIN_LOGPROB;
int SAMPLE_METHOD;
bool AVG_BELOW_SAMPLER;
double HALOMASS_CORRECTION;
};

struct AstroParams{
Expand Down Expand Up @@ -132,7 +142,7 @@ struct HaloBox{
float *n_ion; //weighted by F_ESC*PopN_ion
float *halo_sfr; //for x-rays and Ts stuff
float *halo_sfr_mini; //for x-rays and Ts stuff
float *whalo_sfr;
float *whalo_sfr; //SFR weighted by PopN_ion and F_ESC, used for Gamma12

double log10_Mcrit_LW_ave;
};
Expand Down Expand Up @@ -301,6 +311,7 @@ void initialise_SFRD_Conditional_table(double min_density, double max_density, d
float Fstar10, float Fstar7_MINI, int method, int method_mini, bool minihalos);

void initialise_dNdM_tables(double xmin, double xmax, double ymin, double ymax, double growth1, double param, bool from_catalog);
void initialise_dNdM_inverse_table(double xmin, double xmax, double lnM_min, double growth1, double param, bool from_catalog);

//evaluation of tables
double EvaluateNionTs(double redshift, double Mlim_Fstar, double Mlim_Fesc);
Expand Down
8 changes: 2 additions & 6 deletions src/py21cmfast/src/FindHaloes.c
Original file line number Diff line number Diff line change
Expand Up @@ -106,11 +106,7 @@ LOG_DEBUG("redshift=%f", redshift);
.HMF = user_params->HMF,
};

if(user_params->INTEGRATION_METHOD_HALOS == 1){
initialise_GL(NGL_INT,log(M_MIN),log(Mmax_debug));
}

double nhalo_debug = IntegratedNdM(log(M_MIN), log(Mmax_debug), params, 1, user_params->INTEGRATION_METHOD_HALOS) * VOLUME;
double nhalo_debug = IntegratedNdM(log(M_MIN), log(Mmax_debug), params, 1, 0) * VOLUME;
//expected halos above minimum filter mass
LOG_DEBUG("DexM: We expect %.2f Halos between Masses [%.2e,%.2e]",nhalo_debug,M_MIN,Mmax_debug);
}
Expand Down Expand Up @@ -166,7 +162,7 @@ LOG_DEBUG("redshift=%f", redshift);

//Pending a serious deep-dive into this algorithm, I will force DexM to use the fitted parameters to the
// Sheth-Tormen mass function (as of right now, We do not even reproduce EPS results)
Comment on lines 163 to 164
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this comment still true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it would take some time / code archaeology to fully understand why the excursion set halo finder gets the results it does, and find suitable corrections for any given CMF.

delta_crit = growth_factor*sheth_delc(Deltac/growth_factor, sigma_z0(M));
delta_crit = growth_factor*sheth_delc_dexm(Deltac/growth_factor, sigma_z0(M));

// if(global_params.DELTA_CRIT_MODE == 1){
// //This algorithm does not use the sheth tormen OR Jenkins parameters,
Expand Down
27 changes: 2 additions & 25 deletions src/py21cmfast/src/Globals.h
Original file line number Diff line number Diff line change
Expand Up @@ -81,17 +81,6 @@ struct GlobalParams{

float VAVG;

//Stochasticity Options
float STOC_MASS_TOL;
float SAMPLER_MIN_MASS;
float MAXHALO_FACTOR;
int N_MASS_INTERP;
int N_COND_INTERP;
int N_PROB_INTERP;
double MIN_LOGPROB;
int SAMPLE_METHOD;
int AVG_BELOW_SAMPLER;

bool USE_ADIABATIC_FLUCTUATIONS;
};

Expand Down Expand Up @@ -153,10 +142,8 @@ extern struct GlobalParams global_params = {
.OMtot = 1.0,
.Y_He = 0.245,
.wl = -1.0,
// .SHETH_b = 0.485, //In the literature this is 0.485 (RM08) or 0.5 (SMT01) Master 21cmFAST currently has 0.15
// .SHETH_c = 0.615, //In the literature this is 0.615 (RM08) or 0.6 (SMT01) Master 21cmFAST currently has 0.05
.SHETH_b = 0.15,
.SHETH_c = 0.05,
.SHETH_b = 0.15, //In the literature this is 0.485 (RM08) or 0.5 (SMT01) or 0.34 (Barkana+01) Master 21cmFAST currently has 0.15
.SHETH_c = 0.05, //In the literature this is 0.615 (RM08) or 0.6 (SMT01) or 0.81 (Barkana+01) Master 21cmFAST currently has 0.05
.Zreion_HeII = 3.0,
.FILTER = 0,
.R_BUBBLE_MIN = 0.620350491,
Expand All @@ -167,15 +154,5 @@ extern struct GlobalParams global_params = {

.VAVG=25.86,

.STOC_MASS_TOL = 5.0, //effectively infinite, mass tolerance semi-deprecated
.SAMPLER_MIN_MASS = 5e7,
.MAXHALO_FACTOR = 2,
.N_MASS_INTERP = 200,
.N_COND_INTERP = 200,
.N_PROB_INTERP = 200,
.MIN_LOGPROB = -16,
.SAMPLE_METHOD = 0,
.AVG_BELOW_SAMPLER = 0,

.USE_ADIABATIC_FLUCTUATIONS = 1,
};
17 changes: 8 additions & 9 deletions src/py21cmfast/src/HaloBox.c
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,6 @@ int set_fixed_grids(double redshift, double norm_esc, double alpha_esc, double M
user_params_stoc->INTEGRATION_METHOD_MINI,
flag_options_stoc->USE_MINI_HALOS, false);

//TODO: disable inverse table generation here with a flag or split up the functions
initialise_dNdM_tables(min_density, max_density, lnMmin, lnMmax, growth_z, lnMcell, false);
}

Expand Down Expand Up @@ -439,8 +438,8 @@ int ComputeHaloBox(double redshift, struct UserParams *user_params, struct Cosmo
}
else{
//set below-resolution properties
if(global_params.AVG_BELOW_SAMPLER && M_min < global_params.SAMPLER_MIN_MASS){
set_fixed_grids(redshift, norm_esc, alpha_esc, M_min, global_params.SAMPLER_MIN_MASS, ini_boxes,
if(user_params->AVG_BELOW_SAMPLER && M_min < user_params->SAMPLER_MIN_MASS){
set_fixed_grids(redshift, norm_esc, alpha_esc, M_min, user_params->SAMPLER_MIN_MASS, ini_boxes,
perturbed_field, previous_spin_temp, previous_ionize_box, grids, averages_box);
//This is pretty redundant, but since the fixed grids have density units (X Mpc-3) I have to re-multiply before adding the halos.
// I should instead have a flag to output the summed values in cell. (2*N_pixel > N_halo so generally i don't want to do it in the halo loop)
Expand All @@ -451,7 +450,7 @@ int ComputeHaloBox(double redshift, struct UserParams *user_params, struct Cosmo
grids->halo_sfr_mini[idx] *= cell_volume;
grids->whalo_sfr[idx] *= cell_volume;
}
LOG_DEBUG("finished subsampler M[%.2e %.2e]",M_min,global_params.SAMPLER_MIN_MASS);
LOG_DEBUG("finished subsampler M[%.2e %.2e]",M_min,user_params->SAMPLER_MIN_MASS);
}
#pragma omp parallel num_threads(user_params->N_THREADS) private(idx)
{
Expand Down Expand Up @@ -585,10 +584,10 @@ int ComputeHaloBox(double redshift, struct UserParams *user_params, struct Cosmo
M_turn_a_avg /= halos->n_halos;
}

get_box_averages(redshift, norm_esc, alpha_esc, global_params.SAMPLER_MIN_MASS, global_params.M_MAX_INTEGRAL, M_turn_a_avg, M_turn_m_avg, averages_global);
get_box_averages(redshift, norm_esc, alpha_esc, M_min, global_params.SAMPLER_MIN_MASS, M_turn_a_avg, M_turn_m_avg, averages_subsampler);
get_box_averages(redshift, norm_esc, alpha_esc, global_params.SAMPLER_MIN_MASS, global_params.M_MAX_INTEGRAL, M_turn_a_nofb, M_turn_m_nofb_avg, averages_nofb);
get_box_averages(redshift, norm_esc, alpha_esc, M_min, global_params.SAMPLER_MIN_MASS, M_turn_a_nofb, M_turn_m_nofb_avg, averages_nofb_sub);
get_box_averages(redshift, norm_esc, alpha_esc, user_params->SAMPLER_MIN_MASS, global_params.M_MAX_INTEGRAL, M_turn_a_avg, M_turn_m_avg, averages_global);
get_box_averages(redshift, norm_esc, alpha_esc, M_min, user_params->SAMPLER_MIN_MASS, M_turn_a_avg, M_turn_m_avg, averages_subsampler);
get_box_averages(redshift, norm_esc, alpha_esc, user_params->SAMPLER_MIN_MASS, global_params.M_MAX_INTEGRAL, M_turn_a_nofb, M_turn_m_nofb_avg, averages_nofb);
get_box_averages(redshift, norm_esc, alpha_esc, M_min, user_params->SAMPLER_MIN_MASS, M_turn_a_nofb, M_turn_m_nofb_avg, averages_nofb_sub);
}
}

Expand All @@ -605,7 +604,7 @@ int ComputeHaloBox(double redshift, struct UserParams *user_params, struct Cosmo
LOG_DEBUG("Ratio: (HM %11.3e, NION %11.3e, SFR %11.3e, SFR_MINI %11.3e, WSFR %11.3e)",hm_avg/averages_global[0],nion_avg/averages_global[3],
sfr_avg/averages_global[1],sfr_avg_mini/averages_global[2],
wsfr_avg/averages_global[4]);
if(global_params.AVG_BELOW_SAMPLER && M_min < global_params.SAMPLER_MIN_MASS){
if(user_params->AVG_BELOW_SAMPLER && M_min < user_params->SAMPLER_MIN_MASS){
LOG_DEBUG("SUB-SAMPLER",redshift);
LOG_DEBUG("Exp. averages: (HM %11.3e, NION %11.3e, SFR %11.3e, SFR_MINI %11.3e, WSFR %11.3e)",averages_subsampler[0],averages_subsampler[3],averages_subsampler[1],
averages_subsampler[2],averages_subsampler[4]);
Expand Down
2 changes: 1 addition & 1 deletion src/py21cmfast/src/SpinTemperatureBox.c
Original file line number Diff line number Diff line change
Expand Up @@ -1255,7 +1255,7 @@ void ts_main(float redshift, float prev_redshift, struct UserParams *user_params
zpp_for_evolve_list[R_ct],M_min_R[R_ct],M_max_R[R_ct],sigma_min[R_ct],sigma_max[R_ct]);
}

//As far as I can tell, the only thing used from this is the X_e array
//Initialize heating interpolation arrays
init_heat();
if(redshift >= global_params.Z_HEAT_MAX){
LOG_DEBUG("redshift greater than Z_HEAT_MAX");
Expand Down
Loading
Loading