You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need to get data into scarlet, and the problem is that it's often lying around in many different files. That's hard to manage and a massive hit on distributed file systems. So, we should package up everything we need to run scarlet into one or a small number of files and write a data loader to provide it to scarlet. I'm thinking of the following contents:
Observation, i.e. fluxes
Weight maps (inverse variance of flux)
PSF model
WCS and band and observation/instrument definitions
This brings up an interesting question: What's the size of these? We could say "full image", but that's a weird concept in the age of survey-sized coadds and ill-defined in multi-observation cases. So, it's probably better if we think of the fundamental unit as the scene, i.e. one or multiple source (components) that are supposed to be modeled together. That does mean we need code to run beforehand to chop up larger images into scene-size bites. LSST will do something like that and have PSF models for these small "cells", but other data sources may not. At this point we might as well utilize the detection code that must have run to define these cells, so I'm going to add:
Detection coordinates and (possibly) source type
What I have in mind is a UI like this:
loader=scarlet2.make_dataloader(path)
fordatainloader:
obs=Observation.from_loader(data)
model_frame=Frame.from_observations(obs)
detections=data.detectionswithScene(model_frame) asscene:
fordetectionindetections:
center=detection.center# initialize source components and create sourcessource=Source(...)
# define parameters and fitscene.fit(obs)
In the multi-observation case, it would look like this:
Note a few things here: There are multiple sets of observations but only one loader. Doing so ensures that the observations are synchronized: it's the same scene in obs1 and obs2. It also has only one set of detections: joint detections.
So, there's a fair bit of code that needs to run before we can create a data loader like this. The main point is that we can split the creation of the scene-level data products from the fitting of those products. The result of these prior calls should be stored in a file format like HDF5 or parquet for fast access. Alternatively we can stick to existing ML loader formats and use an interface like jax-dataloader. One thing to note is that we require fast row-level access, while most ML dataloaders streamline fast column-level access to create batches of the same structure.
It's conceivable that we have preprocessed data with some form of scene-level synchronization. Then one could this call in a more modular fashion:
We need to get data into scarlet, and the problem is that it's often lying around in many different files. That's hard to manage and a massive hit on distributed file systems. So, we should package up everything we need to run scarlet into one or a small number of files and write a data loader to provide it to scarlet. I'm thinking of the following contents:
This brings up an interesting question: What's the size of these? We could say "full image", but that's a weird concept in the age of survey-sized coadds and ill-defined in multi-observation cases. So, it's probably better if we think of the fundamental unit as the
scene
, i.e. one or multiple source (components) that are supposed to be modeled together. That does mean we need code to run beforehand to chop up larger images into scene-size bites. LSST will do something like that and have PSF models for these small "cells", but other data sources may not. At this point we might as well utilize the detection code that must have run to define these cells, so I'm going to add:What I have in mind is a UI like this:
In the multi-observation case, it would look like this:
Note a few things here: There are multiple sets of observations but only one loader. Doing so ensures that the observations are synchronized: it's the same scene in
obs1
andobs2
. It also has only one set of detections: joint detections.So, there's a fair bit of code that needs to run before we can create a data loader like this. The main point is that we can split the creation of the scene-level data products from the fitting of those products. The result of these prior calls should be stored in a file format like HDF5 or parquet for fast access. Alternatively we can stick to existing ML loader formats and use an interface like jax-dataloader. One thing to note is that we require fast row-level access, while most ML dataloaders streamline fast column-level access to create batches of the same structure.
It's conceivable that we have preprocessed data with some form of scene-level synchronization. Then one could this call in a more modular fashion:
and possibly do some operations (like joint detection) on the fly. That would be my preference, but may be impractically slow.
The text was updated successfully, but these errors were encountered: