Inverse table improvements #388

daviesje · 2024-05-15T08:39:12Z

This PR mainly contains improvements to the HMF tables used in the halo sampler.

For the inverse CDF tables used in the Sampling:

I replaced the interpolation on the CDF with a rootfind to get the inverse, this is more accurate so we require fewer bins.
The output is no longer log-mass, but the ratio of mass to condtition mass, this is slightly faster and makes the table interpolation a little more accurate.
Both tables for grid and halo conditions are now in log-probability, this should allow longer timesteps in the halo sampler (see Issues with halo sampler snapshot cadence #376).
Below the minimum log-probability I have added an extrapolation function, which assumes exponential decay in p(M)

The Sheth-Tormen conditional mass function has been fixed and optimized (fixing #370), The Delos+24 conditional mass function is available for the default case but does not perform well in the sampler due to the small timestep.

Alternate sampling methods (Sheth+1999 Partition and Parkinson+08 binary split) have been fixed (#374), they are less conforming to the given CMF but as far as I can tell they are working.

Old global parameters have been removed and many sampler related global parameters have been moved to the input structures.

…le improvements

…bals to user

steven-murray

This looks great, @daviesje. Just a few small comments and I think we'll be gtm.

steven-murray · 2024-05-15T10:26:36Z

src/py21cmfast/inputs.py

+ SAMPLE_METHOD: int, optional
+ The sampling method to use in the halo sampler when calculating progenitor populations:
+ 0: Mass-limited CMF sampling, where samples are drawn until the expected mass is reached
+ 1: Number-limited CMF sampling, where we select a number of halos from the Poisson distribution
+ and then sample the CMF that many times
+ 2: Sheth et al 1999 Partition sampling, where the EPS collapsed fraction is sampled (gaussian tail)
+ and then the condition is updated using the conservation of mass.
+ 3: Parkinsson et al 2008 Binary split model as in DarkForest (Qiu et al 2021) where the EPS merger rate
+ is sampled on small internal timesteps such that only binary splits can occur.
+ NOTE: Sampling from the density grid will ALWAYS use number-limited sampling (method 1)


Can we make the python-side parameter either a string or an enum? This is more explicit. It's harder to read code that just has SAMPLE_METHOD=2 rather than SAMPLE_METHOD=samplers.partition or SAMPLE_METHOD='partition'

steven-murray · 2024-05-15T10:27:23Z

src/py21cmfast/inputs.py

+ AVG_BELOW_SAMPLER: bool, optional
+ When switched on, an integral is performed in each cell between the minimum source mass and SAMPLER_MIN_MASS,
+ effectively placing the average halo population in each HaloBox cell below the sampler resolution


What happens if it's not true? Those masses are just ignored totally?

Yes, I would imagine that most of the time SAMPLER_MIN_MASS would be set low enough so that the averaging is unnecessary, and that this would be for saving memory on larger boxes.

Cool -- I think a sentence describing that would be helpful

steven-murray · 2024-05-15T10:28:10Z

src/py21cmfast/inputs.py

+ HALOMASS_CORRECTION: float, optional
+ This provides a corrective factor to the mass-limited (SAMPLE_METHOD==0) sampling, which multiplies the
+ expected mass from a condition by this number.
+ This also is used in the partition (SAMPLE_METHOD==2) sampler, multiplying sigma(M) of each sample drawn.


What's the usefulness of this option? Why is its default 0.9?

The mass-limited sampler has a slight bias toward having too many halos, this bias is independent of delta or halo mass, so this factor is a correction to that. The value of 0.9, multiplying the expected collapsed mass from each descendant works well for the default timestep factor of 1.02. Since the partition method also uses a single correction factor (Mcquinn+ 2007 lowers sigma_8 slightly, I've implemented a correction in nu) we re-use the parameter

OK. Then I think this should be documented here (i.e. probably don't touch this unless you have good reason, and these are the values it should take in these circumstances)

steven-murray · 2024-05-15T10:29:08Z

src/py21cmfast/src/FindHaloes.c

 //Pending a serious deep-dive into this algorithm, I will force DexM to use the fitted parameters to the
 // Sheth-Tormen mass function (as of right now, We do not even reproduce EPS results)


is this comment still true?

Yes, I think it would take some time / code archaeology to fully understand why the excursion set halo finder gets the results it does, and find suitable corrections for any given CMF.

steven-murray · 2024-05-15T10:33:50Z

src/py21cmfast/src/Stochasticity.c

+ //fudge factor for assuming that internal lagrangian volumes are independent
+ exp_M *= user_params_stoc->HALOMASS_CORRECTION;


this comment and code are unclear to me

This is the implementation of the above correction in the mass-limited sampling. One of the approximations in our sample is that we sample the final CMF in an uncorrelated way, assuming each sampled halo is independent of the others. This means that the CMF doesn't change after each sample, like it does in the partition or binary split methods. This results in a bias where the last halo sampled is on average larger.

I'll change the comment to something more generic and expand the explanation in inputs.py

src/py21cmfast/src/Stochasticity.c

src/py21cmfast/src/UsefulFunctions.c

for more information, see https://pre-commit.ci

daviesje · 2024-05-16T15:08:48Z

src/py21cmfast/inputs.py

+ def _get_enum_property(self, prop, enum_list, propname=""):
+ """
+ Retrieve a value for a property with defined enum list (see UserParams._power_models etc.).
+
+ Arguments
+ ---------
+ prop: the hidden attribute to find in the enum list
+ enum_list: the list of parameter value strings
+ propname: the name of the property (for error messages)
+
+ Returns
+ -------
+ The index of prop within the parameter list, corresponding to it's integer value in
+ the C backend
+ """
+ # if it's a string we grab the index of the list
+ if isinstance(prop, str):
+ val = enum_list.index(prop.upper())
+ # otherwise it's a number so we leave it alone
+ else:
+ val = prop
+
+ try:
+ val = int(val)
+ except (ValueError, TypeError) as e:
+ raise ValueError(f"{val} is an invalid value for {propname}") from e
+
+ if not 0 <= val < len(enum_list):
+ raise ValueError(f"HMF must be an int between 0 and {len(enum_list) - 1}")
+
+ return val
+


Before the merge, do you think this is a good way to implement the enums? I found trying to work with the python enum.IntEnum a bit more cumbersome

daviesje added 10 commits April 16, 2024 09:15

make test plots for sampler and massfunc tests, WIP interpolation tab…

f09233e

…le improvements

cleanup dndm table init

47f3cba

WIP put linear tables back for testing

7070d59

WIP testing lower limit buffer and sampler edge cases

e7618c9

split dNdM table functions, add rootfind WIP

e0e5344

binary split works, improve table extrapolation

3074d59

switch to mass ratio tables, add extrapolation

b7f071e

remove mass tolerance and non-rf inverse table construction, move glo…

6affee5

…bals to user

move fudge factor to userparams, fix partition method

283048a

add alternate CMFs to non-sampler method

232852c

daviesje requested a review from steven-murray May 15, 2024 08:44

This was linked to issues May 15, 2024

Alternative Conditional Mass Functions #370

Open

Finalize Alternate Halo Sampling Methods #374

Closed

steven-murray reviewed May 15, 2024

View reviewed changes

daviesje and others added 5 commits May 15, 2024 17:33

improve descriptions and UserParams enum flags

ea1811c

expand halomass correction docstring

a2cccd4

Merge branch 'v4-prep' into inverse_table_improvements

c0f2a64

[pre-commit.ci] auto fixes from pre-commit.com hooks

bebe59e

for more information, see https://pre-commit.ci

pre-commit fixes

d518903

daviesje commented May 16, 2024

View reviewed changes

steven-murray approved these changes May 17, 2024

View reviewed changes

daviesje merged commit 62dff49 into v4-prep May 17, 2024
3 of 4 checks passed

daviesje deleted the inverse_table_improvements branch May 17, 2024 15:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inverse table improvements #388

Inverse table improvements #388

daviesje commented May 15, 2024 •

edited

Loading

steven-murray left a comment

steven-murray May 15, 2024

steven-murray May 15, 2024

daviesje May 15, 2024

steven-murray May 15, 2024

steven-murray May 15, 2024

daviesje May 15, 2024 •

edited

Loading

steven-murray May 15, 2024

steven-murray May 15, 2024

daviesje May 15, 2024

steven-murray May 15, 2024

daviesje May 15, 2024

steven-murray May 15, 2024

daviesje May 16, 2024

		//Pending a serious deep-dive into this algorithm, I will force DexM to use the fitted parameters to the
		// Sheth-Tormen mass function (as of right now, We do not even reproduce EPS results)

		//fudge factor for assuming that internal lagrangian volumes are independent
		exp_M *= user_params_stoc->HALOMASS_CORRECTION;

Inverse table improvements #388

Inverse table improvements #388

Conversation

daviesje commented May 15, 2024 • edited Loading

steven-murray left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daviesje May 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daviesje commented May 15, 2024 •

edited

Loading

daviesje May 15, 2024 •

edited

Loading