-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate code against CCL predictions #61
Comments
Hi @damonge, Following the last discussion, we have been working on the tests for the Cls using theoretical predictions made with CCL. The script used for this (with some modifications) is cl_test_ccl.py. We have some doubts/problems that we were not able to solve by ourselves and we wanted to share them with you in case you have any idea of what is happening. The first thing that we noted is in the line 122 of the previous script: cl_dd_t = np.array([ccl.angular_cl(cosmo, tr_d[p1], tr_d[p2], larr, p_of_k_a=pk2d_mm)
for p1, p2 in pairs]) Here it is used as power spectrum to project the We understand that we should match the Another doubt related to the script is located in the line 114: tr_d = [ccl.NumberCountsTracer(cosmo, False, (z_nz, nz_tot[i]), bias=(z_nz, np.ones_like(z_nz)))
for i in range(nbins)] Here it is used as bias a one_like array. We understand that this happens because we are using directly the galaxies as the objects and they already include the bias. Is it right? Moving to the other quantities (dm,mm); we modified the script to allow computing them. However, the results do not look convincent to us: Here 0,1 are the two redshift bins considered ((0,0.15) and (0.15,inf)). The results for the galaxy-galaxy part are good (as they were in the original script), even with the line 122 changed (it does not affect this plot by much). To compute the other quantities (dm,mm) the script was modified as follows: DataIn the script cl_test_ccl.py data is computed as: cl_dd_d = np.array([hp.anafast(dmap[p1], dmap[p2]) for p1, p2 in pairs]) We used healpy anafast in a similar way as it was used in the scrip cl_test.py as it allows us to compute all the needed quantities at the same time. For a given pair (p1,p2) (pair of bins): d_values =
hp.anafast(np.asarray([dmap[p1],e1map[p1],e2map[p1]]),np.asarray([dmap[p2],e1map[p2],e2map[p2]]), pol=True) This numpy array For the plots we considered E to be the lensing part. That is we used the equivalence:
Theoretical predictionsIn cl_test_ccl.py for the dd case it is computed as: cl_dd_t = np.array([ccl.angular_cl(cosmo, tr_d[p1], tr_d[p2], larr, p_of_k_a=pk2d_mm)
for p1, p2 in pairs]) For our theoretical predictions we used: cl_dd_t = np.array([ccl.angular_cl(cosmo, tr_d[p1], tr_d[p2], larr, p_of_k_a=pk2d_dd)
for p1, p2 in pairs])
cl_dm_t = np.array([ccl.angular_cl(cosmo, tr_d[p1], tr_l[p2], larr, p_of_k_a=pk2d_dm)
for p1, p2 in pairs])
cl_mm_t = np.array([ccl.angular_cl(cosmo, tr_l[p1], tr_l[p2], larr, p_of_k_a=pk2d_mm)
for p1, p2 in pairs]) Note again the change Other computationsDifferent sizeWe tried for a larger simulation to check if a different behavior could tell us what is happening. The quantities that were changed are:
These quantities are similar to the ones used in the LSST simulations (except for the This simulation was run with the New shear method and the result is the following: The output was downsampled evenly across the sky and redshifts to save memory (10% downsampling). The results are quite confusing and I don't think they give much information. However, maybe they are showing something relevant that we are not understanding. Thanks! |
Hi @cramirezpe! Thanks a lot for the detailed analysis. I think that you are right in that you should use |
Hi @fjaviersanchez , You are totally right, that's a silly error from my side. I corrected it while keeping the value of Now the results look more sensible: I'm playing with other values of Thanks for your help! |
Thanks a lot! I'm still puzzled by the large difference between in the panels on the right. Yes, please keep us updated :) |
Hi @damonge , @fjaviersanchez I continued the analysis with the big test adding more features. I use the following parameters:
I also included error bars estimated from the standard deviation in the rebinning (is there a way to obtain errors from pyccl?). But the problem with the More specifically when computing the ratio between the data and the theoretical prediction: There is a bias 2 affecting the lensing part (therefore 2 for the It can be corrected by hardcoding: cl_mm_d = cl_mm_d/4
cl_md_d = cl_md_d/2 and then everything looks reasonable: The problem is to know where is this bias coming from. I've change the input bias from CoLoRe and the ratio stills the same, so I'm quite sure the problem is not coming from there. Summarizing we have two questions:
|
Hi @damonge - we had a factor of 2 issue in CoLoRe's lensing a few month ago, might this be related? |
Hi @cramirezpe! This looks beautiful! Great agreement (modulo the typical factors of 2 that we have to figure out).
|
Hi @cramirezpe - Could you redo the measurement using a larger fraction of galaxies? 10% should be enough for now, to reduce a bit the noise in the plots. Is the limitation speed, or memory? It would also be good to try to use the mode-counting errorbar that @fjaviersanchez suggested, instead of the ones you have now. @damonge - Would you be able to dig into the code to chase the missing factor of 2? Is there a test that we could run to help you identify where it is? |
Hi @andreufont. I've cut the number of galaxies two times.
In this plot the error bars are computed following Javier's suggestion. |
Thinking about it, the equation from @fjaviersanchez is probably only valid in the absence of shot noise, i.e., when we are dominated by cosmic variance. Is this true, Javi? The density of galaxies does not appear in his equation. They clearly underpredict the scatter in some plots, like in d1-d2, don't you think? @cramirezpe , could you redo the plot using again your old errorbars? Also, do you mind correcting for the factors of 2,4 that we discussed? |
Yes, that's right you have to add the shot-noise term, that you already calculated here: https://github.com/damonge/CoLoRe/blob/master/examples/cl_test/cl_test_ccl.py#L131. So the total error would be sqrt((Delta C_ell)^2 + n_l) |
mmm... the cross-terms should not have shot-noise contributions, right? How does the low density of tracers affect the errorbars in a cross-correlation of galaxy densities at different bins? And how does it affect the lensing correlations, since these should not have shot-noise perse either, right? Sorry if these are basic questions, I'm not used to thinking about these tomographic measurements. |
It looks to me that the shot-noise should not affect the signal in cross-bins like d1-d2, or d0-d2, but it should affect its errorbars... We could always get the errorbars by looking at the scatter in 10 boxes or so... |
I'm pretty sure the error bars you're computing are fine though. |
Hi @damonge , @fjaviersanchez Ignoring for a moment the factor 2 (that could be solved with the convention thing). We still have some bad behavior at some plots (specially in plots d1d1 and d2d2 and for l ≈ 200, but the problem is in the same place for other plots). Do you think the results are good enough to continue, or we should try to find a solution for it? In case we are good to go, we may need the help of @fjaviersanchez to run the analysis in bigger boxes. |
I think this is good to go, actually. Would you agree @damonge? The large scale "problems" aren't necessarily significant so, I think now it's on my plate to run larger sims. I'll report here as soon as I have results. @cramirezpe, would you be able to run the same tests once the sims are done? Also, just a suggestion for the ratio plots, for the next round could you subtract 1 to the ratio so they are centered at 0, please? (That way it's easier to spot biases by eye). |
Yes, @fjaviersanchez. I do have everything set up to do the analysis, but the CoLoRe boxes are quite large to be run from scratches by me. Just save them in a place where I can access them. And you're right, it is better to set the ideal value to 0. Thanks for the suggestion! |
Regarding the ell>200 stuff: these should be smoothing effects (due to pixelization, 3D cell size, and interpolation) that are completely ignored in the script I gave you originally. I'm pretty sure we can correct for this accurately enough. I can confirm that the factor 2 is fine, it's just convention. Just multiply cross-correlations by 2 and lensing auto-correlations by 4. When you say "bigger boxes" do you mean the current box but with 100% of the sources? Because I thought for the previous plots you had already been using the LSST setup (but with fewer sources). |
Yes. The first plot of the thread correspond to the test example (which is smaller and with less sources). The following plots correspond to the LSST-like simulation bit with less sources. |
Right, so, to correct for all the different smoothings, @cramirezpe can you send me the param file and the command-line output of one of the simulations? Just wanna come up with a prescription for you to apply the right beams in the right places. |
OK, I just remembered that most of the smoothing should already be taken into account by the P(k) you're using, so we need to understand why the galaxy-galaxy plots seem overpredicted. I might worry that the lognormal transformation prediction coded by Anze could be numerically unstable, so let's check that: Can you take one of the Pk files (e.g. Just to clarify: I think you said you're not using all the ~5-billion objects generated in the sims for these plots, right? If that's the case, what's the limitation? Memory? |
Here it is the plot you asked for: Regarding the limitation on the sim. I was hitting memory limit when trying to use all the galaxies, that's why I used 10% of them (and I also set the nside to 1024). That is strange because for the other issue I was able to run at least one for 100% of galaxies and nside set to 2048. Now I'm trying to run it again and apparently it works (but to show the results I need to wait for the analysis script which will take quite long). Related to this, the script I'm using (a modification of the one in https://github.com/damonge/CoLoRe/blob/master/examples/cl_test/cl_test_ccl.py) is very slow when computing the power spectrum using anafast. What I'm doing is the following: d_values = np.array([hp.anafast(np.asarray([dmap[p1],e1map[p1],e2map[p1]]),np.asarray([dmap[p2],e1map[p2],e2map[p2]]), pol=True) for p1,p2 in pairs]) and then: cl_dd_d = d_values[:,0]
cl_dm_d = d_values[:,3]
cl_mm_d = d_values[:,1] I know that doing this I compute components that I don't need. Is there a faster way of doing it? Or maybe the possibility of calling it using multiple cores? |
Okay there is an error in my analysis that made me think I was using downsampling when not. The last plot I showed is with all the objects from the CoLoRe output, that is the |
Sorry, slowly getting back to this. @cramirezpe can you remind me whether you were using any of the Line 11 in 9e357a7
|
Also, I have one realization of 4096 full-LSST number density (with one population only though) here: |
@damonge Both lines with -D_BIAS_MODEL are commented out. A part from this I will take a better look to the last error bars as we believe that I applied wrongly the error formulae (and that's why it looks so bad). |
Sorry @cramirezpe (I had the automatic mode while changing the group). Now it's readable by desi, please let me know if you still are unable to access the files. Thanks! |
Hi @cramirezpe OK, I have a couple of ideas of what could be going wrong, but I'd like to try them out rather than go back-and-forth here ad infinitum. To speed things up, if you send me the following, I could then try these out:
I think I'd be able to take it from there. You can send these by email. |
I was able to run the ccl analysis for nside 1024 both on my simulations and the one by Javier (always considering the n_grid=4096 version). I added to the plot the My simulations where run with the new shear method and options: Now, the plot of the cls look like this. For my simulation: Apparently now the differences that we could see have been reduced but they still there. I sent you the things you asked for. There are two files corresponding to the simulations run by me and by javier). They are composed by:
For Javier's simulation I don't have much information because the N(z) and P(k) input files where missing but I assume he used the same ones as me (due to the naming that appears in the |
@cramirezpe Thanks a lot for running these diagnostics. I'm also using |
This is great @cramirezpe I think I know what is happening, but to make sure I'd need the following:
We're getting very close now! |
Hi @damonge I currently don't have the B-modes because I didn't save them. However they are given by anafast so I will send them to you once I compute them again. (It shouldn't take long). However I don't know how to compute the theoretical values. With the others I used: cl_dd_t = np.array([ccl.angular_cl(cosmo, tr_d[p1], tr_d[p2], larr, p_of_k_a=pk2d_dd)
for p1, p2 in pairs])
cl_dm_t = np.array([ccl.angular_cl(cosmo, tr_d[p1], tr_l[p2], larr, p_of_k_a=pk2d_dm)
for p1, p2 in pairs])
cl_mm_t = np.array([ccl.angular_cl(cosmo, tr_l[p1], tr_l[p2], larr, p_of_k_a=pk2d_mm)
for p1, p2 in pairs])
cl_md_t = np.array([ccl.angular_cl(cosmo, tr_d[p1], tr_l[p2], larr, p_of_k_a=pk2d_dm)
for p2, p1 in pairs]) Which tracer should I use now? (If any). Regarding bins. I'm worried that the redshift limits can be affecting the results. In the example In the LSST example, the |
No theory for the B-modes (they're only generated by numerical noise - and shape noise, which we don't have in these sims), so don't worry about that. About binning: the redshift limit shouldn't affect the results as long as the theory is calculated with the right corresponding N(z)s. Where exactly do you set redshift limits in the script? |
Well, there are two places. For the data: The usual split of galaxies in two (here). This is like a [0, inf] range (?) My question was whether having these two ranges with different values could be problematic.. |
Oh, I see! Right, yes, the first bit (the data part) is agnostic to the outer edges, just the middle one. For building the nz, yes, just use [0, 3]. That should be enough to capture both redshift bins. Any parts of the nz arrays that are zero (or very close to zero) are automatically taken care of by CCL. |
Hi @cramirezpe - what is the status of the tests above? Did you send the b-modes power to David? |
Hi @damonge I was having problems with the binning and I wanted to check that everything was running ok. I've computed the b-modes and they look okay for the crosscorrelation b-m b-d but not for b-b at low ls. (Plots below). The bmodes are obtained from the anafast function (It returns TT, EE, BB, TE, EB, TB which I identified as dd, mm, bb, dm, mb, db). I also set the split point at The ratio between the power spectrum from data and the theoretical power spectrum: |
@cramirezpe thanks a lot for this. I've got this mostly working, but there are still a couple of things it'd be good to pin down. Essentially, what is happening is that we need to account for two extra sources of smoothing: One associated problem is the non-flat B-modes you're seeing. These are due to numerical errors, and can come from two places:
So, to make sure we understand what's happening, here are a few questions:
|
One extra question (probably for both @cramirezpe and @fjaviersanchez ): could we run one version of these simulations with a smoothing scale of 5 Mpc/h? The problem with the current sim is that the smoothing (2 Mpc/h) is practically the same as the cell size, and accounting for the smoothing of the cell size is a bit tricky. If we compared with a 5 Mpc/h smoothing we'd have a much more solid theory prediction. |
Sure, I'll run with a larger smoothing. Do you also want one with the full LSST density with the "slow" shear? |
@fjaviersanchez if possible, having one slow LSST sim would be very useful (unless you can confirm the sim you gave @cramirezpe used the slow method already) |
@cramirezpe |
Hi @damonge, @cramirezpe! The realization I sent around is for the "fast_shear" method. I just generated another one (with a larger r_s=5) here: |
Awesome, thanks so much @fjaviersanchez ! |
Hi @damonge , I tried to use the script to produce the power spectrum results. I needed to change some lines to make it work but it does not give sensible results. The cls computed using it look like this: Apparently the modifications I made to the Could you help me with that? I've created a new branch to push the changes into master, then it will be easier for you to see what has been changed. Also @fjaviersanchez , I cannot access the r_smooth=5 simulation due to permissions (they are not set to desi group 😅) |
Hi everyone! Good news! I managed to finish one "slow" simulation with the full LSST density. It took 36 hours and 55 minutes 🤣 running on 16 Haswell nodes. The data lives here: |
More info about the charges:
And from the log:
|
Hi @fjaviersanchez , @damonge - is it possible to change the settings of this folder, such that everyone can access the simulation? This way, both @cramirezpe and @damonge would be able to run their analysis script and compare results. |
I just changed the permissions so it's public for everyone (I did the same for the previous "fast" version as well). Please let me know if this works. If not, I can try putting the files at some other location. |
Nice to have the huge simulation available, I'll see how it compares with the faster shear simulation. @damonge , you were asking for the d,e1,e2 maps. I've shared the folder where I have all the analysis. In concrete here: A part from that, I continued the analysis of what I had until now. I compared the The first thing to note here is that the Comparing both simulations at the same time we obtain (Sim 5 is the sim with And the ratio sim/prediction (I restricted the y axis to make easy to see low ell regions): In general the new where we see again that the new I also had a comparison between a faster shear and slow shear (for less galaxies than the sim by Javier). But I'm observing non-zero B modes at high ell that I don't see in other sims. This can be an error from my side so I decided to share the plots once I'm confident about them. |
@cramirezpe there's no reason it's 4095 except that I don't know how to write... 🤦 I can rerun if needed. Apologies! |
Another option is to make the fast version with n_grid=4095 too :p |
@fjaviersanchez - I wasn't aware you could do a FFT with an odd number of cells. Is it possible that the code is secretely rounding it up to 4096 or 4094? |
I think that this is not the case anymore. You may lose a bit of performance (I think it might not even be the case with the newer processors) but it will still compute it with an odd number of cells (as far as I know). |
Anyway, I'm rerunning with 4096 (I got some extra hours at NERSC so let's just use them). |
We should validate the code in a realistic user scenario
The text was updated successfully, but these errors were encountered: