Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JP-3690: Switch from ModelContainer to ModelLibrary for image3 pipeline #8683

Merged
merged 64 commits into from
Sep 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
5a82f06
add ModelLibrary
braingram Jun 18, 2024
98fb07f
update tweakreg to use ModelLibrary
braingram Jun 20, 2024
395298d
remove minimize_memory option for skymatch
braingram Jun 20, 2024
4b2ce13
update skymatch to use ModelLibrary
braingram Jun 20, 2024
018e3ce
temporary ModelContainer to and from ModelLibrary converter, assign_m…
emolter Jul 18, 2024
74b6607
replaced container with library in resample
emolter Jul 22, 2024
9233dd2
Revert "temporary ModelContainer to and from ModelLibrary converter, …
emolter Jul 29, 2024
bd6207b
put in nogroupid try except statements
emolter Jul 29, 2024
5c3cfc3
ModelLibrary for resample imaging, ModelContainer for resample spec
emolter Jul 29, 2024
c1615dd
Merge remote-tracking branch 'braingram/outlier_detection_steps' into…
emolter Jul 30, 2024
b4a7ce1
fixing problems from merge of outlier detection changes
emolter Jul 30, 2024
36c2b40
outlier detection to ModelLibrary for imaging modes
emolter Jul 30, 2024
c78841a
small fix to resample_spec
emolter Jul 30, 2024
9cdf2b7
Merge remote-tracking branch 'upstream/master' into JP-3690
emolter Jul 30, 2024
3103f82
convert to ModelLibrary and back again in resample_spec_step to avoid…
emolter Jul 30, 2024
2f85218
bugfixes for outlier detection unit tests
emolter Jul 31, 2024
570dc26
expose on_disk for all steps, some cleanup to passing libraries betwe…
emolter Jul 31, 2024
ad9902d
using map_function where applicable, more unit test bug fixes
emolter Jul 31, 2024
3d4fa0d
fix typo in pyproject.toml
emolter Jul 31, 2024
3db69d3
bump version of stpipe
emolter Jul 31, 2024
5ca774e
mark ModelLibrary as not part of stdatamodels
emolter Jul 31, 2024
d694178
added changelog entry
emolter Jul 31, 2024
6066f2b
first draft of docs changes
emolter Jul 31, 2024
48b1050
fixing regtest failures for spec3 pipeline, adding library to mtwcs
emolter Aug 1, 2024
99ef2f6
integrate assign_mtwcs changes with spec3 pipeline
emolter Aug 1, 2024
55262c0
debug coron3 pipeline
emolter Aug 2, 2024
f3ae92e
emptying data arrays in input to model_blender inside resample
emolter Aug 2, 2024
e27acc2
decreasing memory usage of outlier step using profiler
emolter Aug 2, 2024
d6c1642
revert refactor that introduced a bug in resample
emolter Aug 2, 2024
189b375
update skymatch input spec
emolter Aug 12, 2024
a2dd947
bugfix for failed asdf load of recursive wcs transform
emolter Aug 12, 2024
4307f22
Merge branch 'master' into JP-3690
emolter Aug 13, 2024
1ca5c9b
small changes from memory profiling and review
emolter Aug 13, 2024
6f08b61
bugfix for failed to load area extension and other metadata
emolter Aug 14, 2024
b124a8b
handle asn_table and asn_pool metadata properly
emolter Aug 14, 2024
62bacbd
make ind_asn_type case-insensitive
emolter Aug 14, 2024
30dcb3f
fix output filenames from outlier_detection
emolter Aug 14, 2024
5e9e672
bugfix for HDRTAB association info in i2d files
emolter Aug 15, 2024
f87809e
re-add python 3.13 pin that was accidentally clobbered
emolter Aug 15, 2024
ca62baf
reverting accidental clobber of stcal pin dependency
emolter Aug 15, 2024
d47ef4b
updates after reviews by myself and by @braingram
emolter Aug 16, 2024
26e5436
ruff style check
emolter Aug 16, 2024
8f902b7
call img.area
emolter Aug 19, 2024
33d66c9
bugfixes for remove s2d files and for mtimage regtest
emolter Aug 19, 2024
b357edc
Merge branch 'master' into JP-3690
emolter Aug 20, 2024
f35725a
attempted fix for resample and source_catalog result filenames
emolter Aug 20, 2024
4ed0892
remove setting of asn pool and table name in resample
emolter Aug 20, 2024
79f91c6
pushing bad things to remote to diagnose regtest
emolter Aug 20, 2024
3fd25fc
yet another attempt to fix filename issue
emolter Aug 20, 2024
28fa217
attempt to propagate fix also into source_catalog
emolter Aug 20, 2024
1863ef1
bugfix for updating table and pool name in library._assign_member_to_…
emolter Aug 21, 2024
cb5594e
fixes based on @braingram review
emolter Aug 21, 2024
099580c
fix ruff style check and remove unnecessary comment
emolter Aug 21, 2024
6ff67d3
fix unit test and revert changes to file naming
emolter Aug 22, 2024
6cad768
new attempted fix of output filenames
emolter Aug 23, 2024
91b1b19
removed one more manual change to output file naming
emolter Aug 23, 2024
bf069ed
changed has_groups conditional to reflect master branch
emolter Aug 23, 2024
5a44905
attempted fix single regtest failure for miri image3 crf files
emolter Aug 23, 2024
e828ebc
Merge branch master into JP-3690
emolter Aug 27, 2024
5272ac9
fixes per @melanieclarke comments
emolter Aug 27, 2024
2afe670
merge master into JP-3690
emolter Aug 30, 2024
04e2fa5
fixed failure to raise NoGroupID for in-memory models with None for o…
emolter Aug 30, 2024
59387a2
Merge branch 'master' into JP-3690
emolter Sep 4, 2024
364cd60
Merge branch 'master' into JP-3690
tapastro Sep 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,12 @@ ami_average

- Fix error in step spec that prevents step creation. [#8677]

assign_mtwcs
------------

- Step now uses `ModelLibrary` to handle accessing models consistently
whether they are in memory or on disk. [#8683]

assign_wcs
----------

Expand All @@ -34,6 +40,12 @@ cube_build

- Fixed a bug when ``cube_build`` was called from the ``mrs_imatch`` step. [#8728]

datamodels
----------

- Added `ModelLibrary` class to allow passing on-disk models between steps in the
image3 pipeline. [#8683]

documentation
-------------

Expand Down Expand Up @@ -84,11 +96,27 @@ outlier_detection
images. Intermediate files now have suffix ``outlier_s2d`` and are saved to
the output directory alongside final products. [#8735]

- For imaging modes, step now uses `ModelLibrary` to handle accessing models consistently
whether they are in memory or on disk. [#8683]

set_telescope_pointing
----------------------

- replace usage of ``copy_arrays=True`` with ``memmap=False`` [#8660]

pipeline
--------

- Updated `calwebb_image3` to use `ModelLibrary` instead of `ModelContainer`, added
optional `on_disk` parameter to govern whether models in the library should be stored
in memory or on disk. [#8683]

resample
--------

- Step now uses `ModelLibrary` to handle accessing models consistently
whether they are in memory or on disk. [#8683]

resample_spec
-------------

Expand Down Expand Up @@ -123,6 +151,12 @@ scripts
- Removed many non-working and out-dated scripts. Including
many scripts that were replaced by ``strun``. [#8619]

skymatch
--------

- Step now uses `ModelLibrary` to handle accessing models consistently
whether they are in memory or on disk. [#8683]

stpipe
------

Expand Down Expand Up @@ -150,6 +184,9 @@ tweakreg
- Removed direct setting of the ``self.skip`` attribute from within the step
itself. [#8600]

- Step now uses `ModelLibrary` to handle accessing models consistently
whether they are in memory or on disk. [#8683]


1.15.1 (2024-07-08)
===================
Expand Down
60 changes: 12 additions & 48 deletions docs/jwst/outlier_detection/outlier_detection_imaging.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@ Specifically, this routine performs the following operations:

#. Convert input data, as needed, to make sure it is in a format that can be processed.

* A :py:class:`~jwst.datamodels.ModelContainer` serves as the basic format for
* A :py:class:`~jwst.datamodels.ModelLibrary` serves as the basic format for
all processing performed by
this step, as each entry will be treated as an element of a stack of images
to be processed to identify bad-pixels/cosmic-rays and other artifacts.
* If the input data is a :py:class:`~jwst.datamodels.CubeModel`, convert it into a ModelContainer.
* If the input data is a :py:class:`~jwst.datamodels.CubeModel`, convert it into a ModelLibrary.
This allows each plane of the cube to be treated as a separate 2D image
for resampling (if done) and for combining into a median image.

Expand Down Expand Up @@ -62,13 +62,13 @@ Specifically, this routine performs the following operations:
if the input model container has an <asn_id>, otherwise the suffix will be ``_outlier_i2d.fits``
by default.
* **If resampling is turned off** through the use of the ``resample_data`` parameter,
a copy of the unrectified input images (as a ModelContainer)
a copy of the unrectified input images (as a ModelLibrary)
will be used for subsequent processing.

#. Create a median image from all grouped observation mosaics.

* The median image is created by combining all grouped mosaic images or
non-resampled input data (as planes in a ModelContainer) pixel-by-pixel.
non-resampled input data (as planes in a ModelLibrary) pixel-by-pixel.
* The ``maskpt`` parameter sets the percentage of the weight image values to
use, and any pixel with a weight below this value gets flagged as "bad" and
ignored when resampled.
Expand Down Expand Up @@ -129,7 +129,7 @@ The outlier detection algorithm can end up using massive amounts of memory
depending on the number of inputs, the size of each input, and the size of the
final output product. Specifically,

#. The input :py:class:`~jwst.datamodels.ModelContainer` or
#. The input :py:class:`~jwst.datamodels.ModelLibrary` or
:py:class:`~jwst.datamodels.CubeModel`
for IFU data, by default, all input exposures would have been kept open in memory to make
processing more efficient.
Expand All @@ -152,56 +152,20 @@ memory usage at the expense of file I/O. The control over this memory model hap
with the use of the ``in_memory`` parameter. The full impact of this parameter
during processing includes:

#. The ``save_open`` parameter gets set to `False`
when opening the input :py:class:`~jwst.datamodels.ModelContainer` object.
This forces all input models in the input :py:class:`~jwst.datamodels.ModelContainer` or
:py:class:`~jwst.datamodels.CubeModel` to get written out to disk. The ModelContainer
then uses the filename of the input model during subsequent processing.
#. The input :py:class:`~jwst.datamodels.ModelLibrary` object is loaded with `on_disk=True`.
This ensures that input models are loaded into memory one at at time,
and saved to a temporary file when not in use; these read-write operations are handled by
the :py:class:`~jwst.datamodels.ModelLibrary` object.

#. The ``in_memory`` parameter gets passed to the :py:class:`~jwst.resample.ResampleStep`
to set whether or not to keep the resampled images in memory or not. By default,
the outlier detection processing sets this parameter to `False` so that each resampled
image gets written out to disk.
#. The ``on_disk`` status of the :py:class:`~jwst.datamodels.ModelLibrary` gets passed to the
:py:class:`~jwst.resample.ResampleStep` as well, to set whether or not to keep the
resampled images in memory or not.

#. Computing the median image works section-by-section by only keeping 1Mb of each input
in memory at a time. As a result, only the final output product array for the final
median image along with a stack of 1Mb image sections are kept in memory.

#. The final resampling step also avoids keeping all inputs in memory by only reading
each input into memory 1 at a time as it gets resampled onto the final output product.

These changes result in a minimum amount of memory usage during processing at the obvious
expense of reading and writing the products from disk.


Outlier Detection for Coronagraphic Data
----------------------------------------
Coronagraphic data is processed in a near-identical manner to direct imaging data, but
no resampling occurs.


Outlier Detection for TSO data
-------------------------------
Normal imaging data benefit from combining all integrations into a
single image. TSO data's value, however, comes from looking for variations from one
integration to the next. The outlier detection algorithm, therefore, gets run with
a few variations to accomodate the nature of these 3D data. See the
:ref:`TSO outlier detection <outlier-detection-tso>` documentation for details.


Outlier Detection for IFU data
------------------------------
Integral Field Unit (IFU) data is handled as 2D images, similar to direct
imaging modes. The nature of the detection algorithm, however, is quite
different and involves measuring the differences between neighboring pixels
in the spatial (cross-dispersion) direction within the IFU slice images.
See the :ref:`IFU outlier detection <outlier-detection-ifu>` documentation for
all the details.


Outlier Detection for Slit data
-------------------------------
See the :ref:`IFU outlier detection <outlier-detection-spec>` documentation for
details.

.. automodapi:: jwst.outlier_detection.imaging
4 changes: 3 additions & 1 deletion docs/jwst/pipeline/calwebb_image3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,9 @@ processed using the :ref:`calwebb_tso3 <calwebb_tso3>` pipeline.
Arguments
---------

The ``calwebb_image3`` pipeline does not have any optional arguments.
``--in_memory``
Boolean governing whether to load all models in the input association to memory at once (faster)
or to save to temporary files when not in use (slower, less memory usage). Default is True.

Inputs
------
Expand Down
6 changes: 6 additions & 0 deletions docs/jwst/skymatch/arguments.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,3 +67,9 @@ The ``skymatch`` step uses the following optional arguments:
Bin width, in sigma, used to sample the distribution of pixel
values in order to compute the sky background using statistics
that require binning, such as `mode` and `midpt`.

**Memory management parameters:**

``in_memory`` (boolean, default=True)
If False, preserve memory using temporary files
at the expense of having to run many I/O operations.
18 changes: 13 additions & 5 deletions docs/jwst/tweakreg/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ models to the custom catalog file name, the ``tweakreg_step`` also supports two
other ways of supplying custom source catalogs to the step:

1. Adding ``tweakreg_catalog`` attribute to the ``members`` of the input ASN
table - see `~jwst.datamodels.ModelContainer` for more details.
table - see `~jwst.datamodels.ModelLibrary` for more details.
Catalog file names are relative to ASN file path.

2. Providing a simple two-column text file, specified via step's parameter
Expand Down Expand Up @@ -165,17 +165,17 @@ telescope pointing will be identical in all these images and it is assumed
that the relative positions of (e.g., NIRCam) detectors do not change.
Identification of images that belong to the same "exposure" and therefore
can be grouped together is based on several attributes described in
`~jwst.datamodels.ModelContainer`. This grouping is performed automatically
`~jwst.datamodels.ModelLibrary`. This grouping is performed automatically
in the ``tweakreg`` step using the
`~jwst.datamodels.ModelContainer.models_grouped` property, which assigns
a group ID to each input image model in ``meta.group_id``.
`~jwst.datamodels.ModelLibrary.group_names` property.


However, when detector calibrations are not accurate, alignment of groups
of images may fail (or result in poor alignment). In this case, it may be
desirable to align each image independently. This can be achieved either by
setting the ``image_model.meta.group_id`` attribute to a unique string or integer
value for each image, or by adding the ``group_id`` attribute to the ``members`` of the input ASN
table - see `~jwst.datamodels.ModelContainer` for more details.
table - see `~jwst.datamodels.ModelLibrary` for more details.

.. note::
Group ID (``group_id``) is used by both ``tweakreg`` and ``skymatch`` steps
Expand Down Expand Up @@ -428,6 +428,14 @@ in the ``assign_wcs`` step.

* ``sip_npoints``: Number of points for the SIP fit. (Default=12).

**stpipe general options:**

* ``output_use_model``: A boolean indicating whether to use `DataModel.meta.filename`
when saving the results. (Default=True)

* ``in_memory``: A boolean indicating whether to keep models in memory, or to save
temporary files on disk while not in use to save memory. (Default=True)

Further Documentation
---------------------
The underlying algorithms as well as formats of source catalogs are described
Expand Down
24 changes: 10 additions & 14 deletions jwst/assign_mtwcs/assign_mtwcs_step.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
#! /usr/bin/env python
import logging

from stdatamodels.jwst import datamodels

from jwst.datamodels import ModelContainer
from jwst.datamodels import ModelLibrary
from jwst.stpipe.utilities import record_step_status

from ..stpipe import Step
from .moving_target_wcs import assign_moving_target_wcs
Expand Down Expand Up @@ -32,17 +31,14 @@ class AssignMTWcsStep(Step):
"""

def process(self, input):
if isinstance(input, str):
input = datamodels.open(input)

# Can't apply the step if we aren't given a ModelContainer as input
if not isinstance(input, ModelContainer):
log.warning("Input data type is not supported.")
# raise ValueError("Expected input to be an association file name or a ModelContainer.")
input.meta.cal_step.assign_mtwcs = 'SKIPPED'
return input

# Apply the step
if not isinstance(input, ModelLibrary):
try:
input = ModelLibrary(input)
except Exception:
log.warning("Input data type is not supported.")
record_step_status(input, "assign_mtwcs", False)
return input

result = assign_moving_target_wcs(input)

return result
80 changes: 43 additions & 37 deletions jwst/assign_mtwcs/moving_target_wcs.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,57 +16,63 @@

from stdatamodels.jwst import datamodels

from jwst.datamodels import ModelContainer
from jwst.datamodels import ModelLibrary
from jwst.stpipe.utilities import record_step_status

log = logging.getLogger(__name__)
log.setLevel(logging.DEBUG)

__all__ = ["assign_moving_target_wcs"]


def assign_moving_target_wcs(input_model):
def assign_moving_target_wcs(input_models):

if not isinstance(input_model, ModelContainer):
raise ValueError("Expected a ModelContainer object")
if not isinstance(input_models, ModelLibrary):
raise ValueError("Expected a ModelLibrary object")

# get the indices of the science exposures in the ModelContainer
ind = input_model.ind_asn_type('science')
sci_models = np.asarray(input_model._models)[ind]
# Get the MT RA/Dec values from all the input exposures
mt_ra = np.array([model.meta.wcsinfo.mt_ra for model in sci_models])
mt_dec = np.array([model.meta.wcsinfo.mt_dec for model in sci_models])
# loop over only science exposures in the ModelLibrary
ind = input_models.indices_for_exptype("science")
mt_ra = np.empty(len(ind))
mt_dec = np.empty(len(ind))
with input_models:
for i in ind:
model = input_models.borrow(i)
mt_ra[i] = model.meta.wcsinfo.mt_ra
mt_dec[i] = model.meta.wcsinfo.mt_dec
input_models.shelve(model, i, modify=False)

# Compute the mean MT RA/Dec over all exposures
if None in mt_ra or None in mt_dec:
log.warning("One or more MT RA/Dec values missing in input images")
log.warning("Step will be skipped, resulting in target misalignment")
for model in sci_models:
model.meta.cal_step.assign_mtwcs = 'SKIPPED'
return input_model
else:
mt_avra = mt_ra.mean()
mt_avdec = mt_dec.mean()

for model in sci_models:
model.meta.wcsinfo.mt_avra = mt_avra
model.meta.wcsinfo.mt_avdec = mt_avdec
if isinstance(model, datamodels.MultiSlitModel):
for ind, slit in enumerate(model.slits):
new_wcs = add_mt_frame(slit.meta.wcs,
mt_avra, mt_avdec,
slit.meta.wcsinfo.mt_ra, slit.meta.wcsinfo.mt_dec)
del model.slits[ind].meta.wcs
model.slits[ind].meta.wcs = new_wcs
else:

new_wcs = add_mt_frame(model.meta.wcs, mt_avra, mt_avdec,
model.meta.wcsinfo.mt_ra, model.meta.wcsinfo.mt_dec)
del model.meta.wcs
model.meta.wcs = new_wcs

model.meta.cal_step.assign_mtwcs = 'COMPLETE'

return input_model
record_step_status(input_models, "assign_mtwcs", False)
return input_models

mt_avra = mt_ra.mean()
mt_avdec = mt_dec.mean()

with input_models:
for i in ind:
model = input_models.borrow(i)
model.meta.wcsinfo.mt_avra = mt_avra
model.meta.wcsinfo.mt_avdec = mt_avdec
if isinstance(model, datamodels.MultiSlitModel):
for ind, slit in enumerate(model.slits):
new_wcs = add_mt_frame(slit.meta.wcs,
mt_avra, mt_avdec,
slit.meta.wcsinfo.mt_ra, slit.meta.wcsinfo.mt_dec)
del model.slits[ind].meta.wcs
model.slits[ind].meta.wcs = new_wcs
else:

new_wcs = add_mt_frame(model.meta.wcs, mt_avra, mt_avdec,
model.meta.wcsinfo.mt_ra, model.meta.wcsinfo.mt_dec)
del model.meta.wcs
model.meta.wcs = new_wcs
record_step_status(model, "assign_mtwcs", True)
input_models.shelve(model, i, modify=True)

return input_models


def add_mt_frame(wcs, ra_average, dec_average, mt_ra, mt_dec):
Expand Down
Loading
Loading