Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inverse table improvements #388

Merged
merged 15 commits into from
May 17, 2024
Merged

Inverse table improvements #388

merged 15 commits into from
May 17, 2024

Conversation

daviesje
Copy link
Contributor

@daviesje daviesje commented May 15, 2024

This PR mainly contains improvements to the HMF tables used in the halo sampler.

For the inverse CDF tables used in the Sampling:

  • I replaced the interpolation on the CDF with a rootfind to get the inverse, this is more accurate so we require fewer bins.
  • The output is no longer log-mass, but the ratio of mass to condtition mass, this is slightly faster and makes the table interpolation a little more accurate.
  • Both tables for grid and halo conditions are now in log-probability, this should allow longer timesteps in the halo sampler (see Issues with halo sampler snapshot cadence #376).
  • Below the minimum log-probability I have added an extrapolation function, which assumes exponential decay in p(M)

The Sheth-Tormen conditional mass function has been fixed and optimized (fixing #370), The Delos+24 conditional mass function is available for the default case but does not perform well in the sampler due to the small timestep.

Alternate sampling methods (Sheth+1999 Partition and Parkinson+08 binary split) have been fixed (#374), they are less conforming to the given CMF but as far as I can tell they are working.

Old global parameters have been removed and many sampler related global parameters have been moved to the input structures.

Copy link
Member

@steven-murray steven-murray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, @daviesje. Just a few small comments and I think we'll be gtm.

Comment on lines +518 to +527
SAMPLE_METHOD: int, optional
The sampling method to use in the halo sampler when calculating progenitor populations:
0: Mass-limited CMF sampling, where samples are drawn until the expected mass is reached
1: Number-limited CMF sampling, where we select a number of halos from the Poisson distribution
and then sample the CMF that many times
2: Sheth et al 1999 Partition sampling, where the EPS collapsed fraction is sampled (gaussian tail)
and then the condition is updated using the conservation of mass.
3: Parkinsson et al 2008 Binary split model as in DarkForest (Qiu et al 2021) where the EPS merger rate
is sampled on small internal timesteps such that only binary splits can occur.
NOTE: Sampling from the density grid will ALWAYS use number-limited sampling (method 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the python-side parameter either a string or an enum? This is more explicit. It's harder to read code that just has SAMPLE_METHOD=2 rather than SAMPLE_METHOD=samplers.partition or SAMPLE_METHOD='partition'

Comment on lines 528 to 530
AVG_BELOW_SAMPLER: bool, optional
When switched on, an integral is performed in each cell between the minimum source mass and SAMPLER_MIN_MASS,
effectively placing the average halo population in each HaloBox cell below the sampler resolution
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if it's not true? Those masses are just ignored totally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would imagine that most of the time SAMPLER_MIN_MASS would be set low enough so that the averaging is unnecessary, and that this would be for saving memory on larger boxes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool -- I think a sentence describing that would be helpful

Comment on lines 531 to 534
HALOMASS_CORRECTION: float, optional
This provides a corrective factor to the mass-limited (SAMPLE_METHOD==0) sampling, which multiplies the
expected mass from a condition by this number.
This also is used in the partition (SAMPLE_METHOD==2) sampler, multiplying sigma(M) of each sample drawn.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the usefulness of this option? Why is its default 0.9?

Copy link
Contributor Author

@daviesje daviesje May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mass-limited sampler has a slight bias toward having too many halos, this bias is independent of delta or halo mass, so this factor is a correction to that. The value of 0.9, multiplying the expected collapsed mass from each descendant works well for the default timestep factor of 1.02. Since the partition method also uses a single correction factor (Mcquinn+ 2007 lowers sigma_8 slightly, I've implemented a correction in nu) we re-use the parameter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Then I think this should be documented here (i.e. probably don't touch this unless you have good reason, and these are the values it should take in these circumstances)

Comment on lines 163 to 164
//Pending a serious deep-dive into this algorithm, I will force DexM to use the fitted parameters to the
// Sheth-Tormen mass function (as of right now, We do not even reproduce EPS results)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this comment still true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it would take some time / code archaeology to fully understand why the excursion set halo finder gets the results it does, and find suitable corrections for any given CMF.

Comment on lines 363 to 364
//fudge factor for assuming that internal lagrangian volumes are independent
exp_M *= user_params_stoc->HALOMASS_CORRECTION;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment and code are unclear to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the implementation of the above correction in the mass-limited sampling. One of the approximations in our sample is that we sample the final CMF in an uncorrelated way, assuming each sampled halo is independent of the others. This means that the CMF doesn't change after each sample, like it does in the partition or binary split methods. This results in a bias where the last halo sampled is on average larger.

I'll change the comment to something more generic and expand the explanation in inputs.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

src/py21cmfast/src/Stochasticity.c Outdated Show resolved Hide resolved
src/py21cmfast/src/UsefulFunctions.c Show resolved Hide resolved
Comment on lines +620 to +651
def _get_enum_property(self, prop, enum_list, propname=""):
"""
Retrieve a value for a property with defined enum list (see UserParams._power_models etc.).

Arguments
---------
prop: the hidden attribute to find in the enum list
enum_list: the list of parameter value strings
propname: the name of the property (for error messages)

Returns
-------
The index of prop within the parameter list, corresponding to it's integer value in
the C backend
"""
# if it's a string we grab the index of the list
if isinstance(prop, str):
val = enum_list.index(prop.upper())
# otherwise it's a number so we leave it alone
else:
val = prop

try:
val = int(val)
except (ValueError, TypeError) as e:
raise ValueError(f"{val} is an invalid value for {propname}") from e

if not 0 <= val < len(enum_list):
raise ValueError(f"HMF must be an int between 0 and {len(enum_list) - 1}")

return val

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the merge, do you think this is a good way to implement the enums? I found trying to work with the python enum.IntEnum a bit more cumbersome

@daviesje daviesje merged commit 62dff49 into v4-prep May 17, 2024
3 of 4 checks passed
@daviesje daviesje deleted the inverse_table_improvements branch May 17, 2024 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Alternative Conditional Mass Functions Finalize Alternate Halo Sampling Methods
2 participants