Skip to content

Commit

Permalink
Added solar aware MAE, solar limb masking in datamodule, and adaption…
Browse files Browse the repository at this point in the history
… of benchmarking tools
  • Loading branch information
dead-water committed Apr 16, 2024
1 parent 47ce577 commit 77fd6ad
Show file tree
Hide file tree
Showing 11 changed files with 1,111 additions and 59 deletions.
35 changes: 20 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,26 @@ SDO-FM is envisioned as a ‘multi-modal’ foundation model, integrating instru
└── └── pretraining # modules for pretraining
```

## Datasets

| Name | Description | Granularity & Source |
|--- |--- |--- |
| NASA’s Solar Dynamics Observatory (SDO) [Pesnell et al. 2012](https://ui.adsabs.harvard.edu/link_gateway/2012SoPh..275....3P/doi:10.1007/s11207-011-9841-3) | Three instruments:<br><ul><li>Atmospheric Imaging Assembly (AIA) 2 ultraviolet, 1600 & 1700 Å 7 extreme ultraviolet, 94, 131, 171, 193, 211, 304, and 335 Å.</li><li>Helioseismic and Magnetic Imager (HMI) - visible filtergrams processed into: photospheric Dopplergrams line-of-sight magnetograms vector magnetograms.</li><li>EUV Variability Experiment (EVE) - EUV spectral irradiance from 1 to 1050 Å. MEGS disperse EUV light from full disk of the Sun and corona onto a charge coupled device.</li></ul> | 4096x4096 12 second cadence:<br>AIA - [Lemen et al. 2012](https://ui.adsabs.harvard.edu/link_gateway/2012SoPh..275...17L/doi:10.1007/s11207-011-9776-8).<br>HMI - [Hoeksema et al. 2014](https://ui.adsabs.harvard.edu/link_gateway/2014SoPh..289.3483H/doi:10.1007/s11207-014-0516-8).<br>1024 x 2048: <br>EUV - [Woods et al. 2012](https://ui.adsabs.harvard.edu/link_gateway/2012SoPh..275..115W/doi:10.1007/s11207-009-9487-6).<br>Downsampled 512x512/0.6, 512x512/0.5 arcsec 6 (AIA) 12 (HMI) minute cadence for machine learning: [Galvez et al. 2019](https://iopscience.iop.org/article/10.3847/1538-4365/ab1005) via [sdoml.org](sdoml.org). |

## Models
### Backbones
| Name | Paper |
|--- |--- |
| Masked Autoencoders Are Scalable Vision Learners | He, Kaiming, et al. "Masked autoencoders are scalable vision learners." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022 ([link](https://openaccess.thecvf.com/content/CVPR2022/papers/He_Masked_Autoencoders_Are_Scalable_Vision_Learners_CVPR_2022_paper.pdf)) |
| Foundation Models for Generalist Geospatial Artificial Intelligence (Prithvi) | Jakubik, Johannes, et al. "Foundation models for generalist geospatial artificial intelligence." arXiv preprint arXiv:2310.18660 (2023) ([link](https://arxiv.org/pdf/2310.18660.pdf)) |
| NVAE: A Deep Hierarchical Variational Autoencoder | Vahdat, Arash, and Jan Kautz. "NVAE: A deep hierarchical variational autoencoder." Advances in neural information processing systems 33 (2020): 19667-19679 ([link](https://arxiv.org/abs/2007.03898)) |
| StyleSwin: Transformer-Based GAN for High-Resolution Image Generation | Zhang, Bowen, et al. "Styleswin: Transformer-based gan for high-resolution image generation." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022 ([link](https://openaccess.thecvf.com/content/CVPR2022/html/Zhang_StyleSwin_Transformer-Based_GAN_for_High-Resolution_Image_Generation_CVPR_2022_paper.html)) |

### Heads
| Name | Paper |
|--- |--- |
| Multichannel autocalibration for the Atmospheric Imaging Assembly using machine learning | Dos Santos, Luiz FG, et al. "Multichannel autocalibration for the Atmospheric Imaging Assembly using machine learning." Astronomy & Astrophysics 648 (2021): A53 ([link](https://www.aanda.org/articles/aa/full_html/2021/04/aa40051-20/aa40051-20.html)) |

## Setup
### Installation
SDO-FM can be installed locally by directly installing the package in this repository.
Expand All @@ -42,21 +62,6 @@ CLI overrides are still possible with this selection but be aware of some shells
python scripts/main.py --config-name=default experiment.seed=37
```

### Datasets

| Name | Description | Granularity & Source |
|--- |--- |--- |
| NASA’s Solar Dynamics Observatory (SDO) [Pesnell et al. 2012](https://ui.adsabs.harvard.edu/link_gateway/2012SoPh..275....3P/doi:10.1007/s11207-011-9841-3) | Three instruments:<br><ul><li>Atmospheric Imaging Assembly (AIA) 2 ultraviolet, 1600 & 1700 Å 7 extreme ultraviolet, 94, 131, 171, 193, 211, 304, and 335 Å.</li><li>Helioseismic and Magnetic Imager (HMI) - visible filtergrams processed into: photospheric Dopplergrams line-of-sight magnetograms vector magnetograms.</li><li>EUV Variability Experiment (EVE) - EUV spectral irradiance from 1 to 1050 Å. MEGS disperse EUV light from full disk of the Sun and corona onto a charge coupled device.</li></ul> | 4096x4096 12 second cadence:<br>AIA - [Lemen et al. 2012](https://ui.adsabs.harvard.edu/link_gateway/2012SoPh..275...17L/doi:10.1007/s11207-011-9776-8).<br>HMI - [Hoeksema et al. 2014](https://ui.adsabs.harvard.edu/link_gateway/2014SoPh..289.3483H/doi:10.1007/s11207-014-0516-8).<br>1024 x 2048: <br>EUV - [Woods et al. 2012](https://ui.adsabs.harvard.edu/link_gateway/2012SoPh..275..115W/doi:10.1007/s11207-009-9487-6).<br>Downsampled 512x512/0.6, 512x512/0.5 arcsec 6 (AIA) 12 (HMI) minute cadence for machine learning: [Galvez et al. 2019](https://iopscience.iop.org/article/10.3847/1538-4365/ab1005) via [sdoml.org](sdoml.org). |

### Models
| Name | Paper |
|--- |--- |
| Masked Autoencoders Are Scalable Vision Learners | He, Kaiming, et al. "Masked autoencoders are scalable vision learners." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022 ([link](https://openaccess.thecvf.com/content/CVPR2022/papers/He_Masked_Autoencoders_Are_Scalable_Vision_Learners_CVPR_2022_paper.pdf)) |
| Foundation Models for Generalist Geospatial Artificial Intelligence (Prithvi) | Jakubik, Johannes, et al. "Foundation models for generalist geospatial artificial intelligence." arXiv preprint arXiv:2310.18660 (2023) ([link](https://arxiv.org/pdf/2310.18660.pdf)) |
| Multichannel autocalibration for the Atmospheric Imaging Assembly using machine learning | Dos Santos, Luiz FG, et al. "Multichannel autocalibration for the Atmospheric Imaging Assembly using machine learning." Astronomy & Astrophysics 648 (2021): A53 ([link](https://www.aanda.org/articles/aa/full_html/2021/04/aa40051-20/aa40051-20.html)) |
| NVAE: A Deep Hierarchical Variational Autoencoder | Vahdat, Arash, and Jan Kautz. "NVAE: A deep hierarchical variational autoencoder." Advances in neural information processing systems 33 (2020): 19667-19679 ([link](https://arxiv.org/abs/2007.03898)) |
| StyleSwin: Transformer-Based GAN for High-Resolution Image Generation | Zhang, Bowen, et al. "Styleswin: Transformer-based gan for high-resolution image generation." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022 ([link](https://openaccess.thecvf.com/content/CVPR2022/html/Zhang_StyleSwin_Transformer-Based_GAN_for_High-Resolution_Image_Generation_CVPR_2022_paper.html)) |

## Pre-training
```bash
python scripts/main.py --config-name=pretrain_tiny
Expand Down
18 changes: 13 additions & 5 deletions experiments/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,12 +72,12 @@ data:
model:
# PRETRAINERS
mae:
img_size: 224
img_size: 512
patch_size: 16
num_frames: 3
tubelet_size: 1
in_chans: 3
embed_dim: 1024
num_frames: 5
tubelet_size: 5
in_chans: 9
embed_dim: 4096
depth: 24
num_heads: 16
decoder_embed_dim: 512
Expand All @@ -86,6 +86,14 @@ model:
mlp_ratio: 4.0
# norm_layer: defaults to nn.LayerNorm
norm_pix_loss: False
samae:
# uses all parameters as in mae plus these
masking_type: "solar_aware" # 'random' or 'solar_aware'
active_region_mu_degs: 15.73
active_region_std_degs: 6.14
active_region_scale: 1.0
active_region_abs_lon_max_degs: 60
active_region_abs_lat_max_degs: 60
nvae:
use_se: true
res_dist: true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ model:
mae:
img_size: 512
patch_size: 16
num_frames: 3
num_frames: 1
tubelet_size: 1
in_chans: 9
embed_dim: 128
Expand All @@ -87,6 +87,14 @@ model:
mlp_ratio: 4.0
# norm_layer: defaults to nn.LayerNorm
norm_pix_loss: False
samae:
# uses all parameters as in mae plus these
masking_type: "solar_aware" # 'random' or 'solar_aware'
active_region_mu_degs: 15.73
active_region_std_degs: 6.14
active_region_scale: 1.0
active_region_abs_lon_max_degs: 60
active_region_abs_lat_max_degs: 60
# FINE-TUNERS
dimming:
num_neck_filters: 32
Expand All @@ -100,7 +108,7 @@ model:
loss: "mse" # options: "mae", "mse", "mape"
scheduler: "constant" #other options: "cosine", "plateau", "exp"
scheduler_warmup: 0
batch_size: 8
batch_size: 1
learning_rate: 0.0001
weight_decay: 0.0
optimiser: "adam"
Expand Down
200 changes: 200 additions & 0 deletions notebooks/test_pretraining_samae.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The autoreload extension is already loaded. To reload it, use:\n",
" %reload_ext autoreload\n"
]
}
],
"source": [
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from pathlib import Path\n",
"\n",
"import pytorch_lightning as pl\n",
"import torch\n",
"import wandb\n",
"from sdofm import utils\n",
"from sdofm.datasets import SDOMLDataModule, DimmedSDOMLDataModule\n",
"from sdofm.pretraining import SAMAE"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"import omegaconf\n",
"cfg = omegaconf.OmegaConf.load(\"../experiments/pretrain_tiny_mae.yaml\")"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[* CACHE SYSTEM *] Found cached index data in /mnt/sdoml/cache/aligndata_AIA_FULL_12min.csv.\n",
"[* CACHE SYSTEM *] Found cached normalization data in /mnt/sdoml/cache/normalizations_AIA_FULL_12min.json.\n",
"[* CACHE SYSTEM *] Found cached HMI mask data in /mnt/sdoml/cache/hmi_mask_512x512.npy.\n"
]
}
],
"source": [
"data_module = SDOMLDataModule(\n",
" hmi_path=None,\n",
" aia_path=os.path.join(\n",
" cfg.data.sdoml.base_directory, cfg.data.sdoml.sub_directory.aia\n",
" ),\n",
" eve_path=None,\n",
" components=cfg.data.sdoml.components,\n",
" wavelengths=cfg.data.sdoml.wavelengths,\n",
" ions=cfg.data.sdoml.ions,\n",
" frequency=cfg.data.sdoml.frequency,\n",
" batch_size=cfg.model.opt.batch_size,\n",
" num_workers=cfg.data.num_workers,\n",
" val_months=cfg.data.month_splits.val,\n",
" test_months=cfg.data.month_splits.test,\n",
" holdout_months=cfg.data.month_splits.holdout,\n",
" cache_dir=os.path.join(\n",
" cfg.data.sdoml.base_directory, cfg.data.sdoml.sub_directory.cache\n",
" ),\n",
")\n",
"data_module.setup()"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"model = SAMAE(\n",
" **cfg.model.mae,\n",
" **cfg.model.samae,\n",
" optimiser=cfg.model.opt.optimiser,\n",
" lr=cfg.model.opt.learning_rate,\n",
" weight_decay=cfg.model.opt.weight_decay,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"GPU available: True (cuda), used: True\n",
"TPU available: False, using: 0 TPU cores\n",
"IPU available: False, using: 0 IPUs\n",
"HPU available: False, using: 0 HPUs\n",
"LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]\n",
"\n",
" | Name | Type | Params\n",
"-----------------------------------------------------------------\n",
"0 | autoencoder | SolarAwareMaskedAutoencoderViT3D | 3.3 M \n",
"-----------------------------------------------------------------\n",
"3.0 M Trainable params\n",
"262 K Non-trainable params\n",
"3.3 M Total params\n",
"13.005 Total estimated model params size (MB)\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "2e5a818f36c54a2ab890c11acfd02d01",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Sanity Checking: | | 0/? [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "37f9ef68d4334ae098f777aa1c20c99c",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Training: | | 0/? [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/opt/conda/envs/sdofm/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py:54: Detected KeyboardInterrupt, attempting graceful shutdown...\n"
]
}
],
"source": [
"trainer = pl.Trainer(\n",
" devices=1, accelerator=cfg.experiment.accelerator, max_epochs=cfg.model.opt.epochs\n",
")\n",
"trainer.fit(model=model, datamodule=data_module)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "sdofm",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading

0 comments on commit 77fd6ad

Please sign in to comment.