-
Notifications
You must be signed in to change notification settings - Fork 10
Inferring the Mass Map of the Observable Universe from 10 Billion Galaxies
Mapping the Universe is an activity of fundamental interest, linking as it does some of the biggest questions in modern astrophysics and cosmology: What is the Universe made of, and why is it accelerating? How do the initial seeds of structure form and grow to produce our own Galaxy? Wide field astronomical surveys, such as that planned with the Large Synoptic Survey Telescope (LSST), will provide measurements of billions of galaxies over half of the sky; we want to analyze these datasets with sophisticated statistical methods that allow us to create the most accurate map of the distribution of mass in the Universe to date. The sky locations, colors and brightnesses of the galaxies allow us to infer (approximately) their positions in 3D, and their stellar masses; the distorted apparent shapes of galaxies contain information about the gravitational effects of mass in other galaxies along the line of sight.
We are currently taking the first step in using all of this information in a giant hierarchical inference of our Universe's cosmological and galaxy population model hyper-parameters, after explicit marginalization of the parameters describing millions - and perhaps billions - of individual galaxies. We need to develop the statistical machinery to perform this inference, and implement it at the appropriate computational scale. Training and testing will require large cosmological simulations, generating plausible mock galaxy catalogs; we plan to make all of our data public to enable further investigations of this type.
-
Halo Inference: Following Spencer Everett's thesis work, the Pangloss code now computes the weak lensing log likelihood for a given set of halos. Now we need to plug that into a Bayesian inference, and sample the posterior PDF for both the halo masses and the model hyper-parameters. We'll start by looking at just the halo masses, at fixed model assumptions: the mass mapping problem. Some investigation of optimization and probabilistic catalogs will be needed.
-
Working with Databases: Scaling the calculation up even to 40 sq arcmin fields will need us to reduce our memory loading and make use of multithreading / multiple cores. This means switching to storing data in databases. An opportunity here is to design the tables and their relational mappings to make the calculation efficient.
Spencer investigated the halo model's accuracy and computational feasibility in his senior undergraduate thesis project. You can read his thesis here.
-
Weak Lensing Model Checking Spencer Everett led the software engineering work needed in order for us to be able to start analyzing weak lensing data. You can read his final report here, and browse the following demos of his code:
-
Mock Weak Lensing Catalogs: We have Hilbert et al's ray-traced shear and convergence maps for the Millenium Simulation: Spencer sampled these at random galaxy positions to make mock catalogs of lensed background galaxy ellipticity measurements: see this demo. These catalogs will be the inputs to our mass model inferences.
-
Pangloss Shear Prediction: Our simple mass model has been used in the past to predict convergence, but now we need it to predict the reduced shear. Spencer implemented this upgrade, still in the weak lensing regime: see how well he was able to predict the observed shapes so far in this demo.
-
-
Giant Inference Options: We have two inference methodologies in mind: importance sampling (as in the MBI weak lensing analysis, Schneider et al 2014), and Approximate Bayesian Computation (which may lend itself well to this problem). Some thinking and derivation is needed here: Tom, Matt and Phil did some work, and made some initial notes.
This project was proposed to, and subsequently funded by, the Stanford Data Science Initiative as a small student project. You can read the proposal text here. The local proposal team consists of Risa Wechsler (PI), Phil Marshall, Matt Becker and Sam Skillman, but we are continuing to work with our other Pangloss collaborators (undergraduate research student Spencer Everett and the code's original developer Tom Collett), and in future hope to collaborate with members of the Stanford statistics and computer science departments, as well as computational scientists in the LSST Data Management team at SLAC. Check out our write-up in the 2015 SDSI Research Brochure!
-
Distributed Computation: We are undoubtedly going to need to large numbers of cores just to evaluate the likelihood for the hyper-parameters (since this involves a huge integral over the properties of all individual halos in the field). There are some brief notes on this here but need some advice from some CS people about how to perform such a high throughput massively parallel calculation. The SLAC CS and LSST DM database groups could help a lot here!
-
Including Groups and Clusters: Perhaps starting from the output of an optical group and cluster finder, such as redMaPPer (Rozo, Rykoff et al) to improve the assignment of dark mass to the most massive collapsed objects. Repeating the simple tests of assuming known halo mass, known stellar mass, known richness etc will be important here.
-
Hierarchical Modeling: If the halo properties can be sampled efficiently we will be able to move to implementing the hierarchical model. Components of this could/should include: the Galaxy M*-Mhalo Relation, the Cluster Richness-Mhalo Relation, the Halo Mass Function, the Concentration-Mass Relation, and so on. A good start would be to implement a state of the art halo model
-
Strong Lens Line of Sight Contamination: assign hyperpriors based on published halo models (fitted to summary statistics in eg CFHTLS) and infer the halo masses in each of the H0LiCOW lens fields. This approach would need testing in some mock lens fields.
-
Mass Mapping in DES: apply methodology to DES weak lensing and galaxy data. A precursor to this could be to make a mass map in a smaller, well-studied field, like CFHTLS.