Skip to content

Commit

Permalink
Clean production run (#22)
Browse files Browse the repository at this point in the history
* update readme

* update flowchart

* re-export env.yml, set 1980 in config

* ignore FutureWarning, clean mamba env, rerun nbs

* black formatting

* fixed count_ars mean calc, rerun QC nb

* change rounding for ratio_m field of events shapefile

* supress warnings

---------

Co-authored-by: jdpaul3 <[email protected]>
Co-authored-by: kyleredilla <[email protected]>
Co-authored-by: jdpaul3 <[email protected]>
  • Loading branch information
4 people authored Sep 16, 2023
1 parent 19292fa commit 1d55a45
Show file tree
Hide file tree
Showing 12 changed files with 1,403 additions and 656 deletions.
94 changes: 51 additions & 43 deletions AR_QC.ipynb

Large diffs are not rendered by default.

562 changes: 460 additions & 102 deletions AR_avalanche_exploration.ipynb

Large diffs are not rendered by default.

262 changes: 176 additions & 86 deletions AR_detection.ipynb

Large diffs are not rendered by default.

46 changes: 40 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Atmospheric Rivers and Avalanches

## Background
This codebase will download [ERA5 6 hourly pressure level data](https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-pressure-levels?tab=overview) in the vicinity of Alaska and apply an atmospheric river (AR) detection algorithm. Outputs will include a datacube of AR objects and multiple attributed shapefiles for future work correlating avalanche events to ARs.
This codebase will download [ERA5 6 hourly pressure level data](https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-pressure-levels?tab=overview) in the vicinity of Alaska and apply an atmospheric river (AR) detection algorithm. Outputs include a multiple attributed shapefiles for future work correlating avalanche events (or other phenomena) to ARs.

The AR detection algorithm used here is adapted from [Guan & Waliser (2015)](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2015JD024257) and uses a combination of vertically integrated water vapor transport (IVT), geometric shape, and directional criteria to define ARs. See the annotated bibilography document for more detail and other references. Users of this codebase should know the following information about atmospheric river criteria:
The AR detection algorithm used here is adapted from [Guan & Waliser (2015)](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2015JD024257) and uses a combination of vertically integrated water vapor transport (IVT), geometric shape, and directional criteria to define ARs. See the annotated bibilography document for more detail and other references. Users of this codebase should know the following information about AR criteria:

All of the AR criteria proposed by Guan and Waliser (2015) either judge the geometry of the candidate AR object and/or measure some parameter of integrated water vapor transport (magnitude, direction, and poleward component). Each candidate AR object represents a specific time (six hour resolution) slice of the IVT data where contiguous groups of grid cells in which the magntiude the target quantile are labeled with unqiue integers >=1. The 0 value is reserved to mask the region that did not exceed the IVT quantile threshold.

Expand All @@ -27,12 +27,46 @@ The overall orientation of the object (i.e., the direction of the shape elongati
- The `ar_detection.py` module contains a collection of AR detection functions that will filter AR candidates based on the criteria and create shapefile outputs of objects classified as ARs. These functions may be orchestrated from a notebook (see `AR_detection.ipynb`) or ran as a script.

## Usage
1. Register for a Climate Data Store (CDS) account and install the CDS API client according to the instructions [here].(https://cds.climate.copernicus.eu/api-how-to). Be sure to accept the user agreement.
2. Create a new conda environment using the `environment.yml` file in this codebase.
3. Set a local environment variable defining the data directory. If the directory does not yet exist, `config.py` will create it, e.g. `export AR_DATA_DIR=path/to/store/ar/data`
1. Register for a Climate Data Store (CDS) account and install the CDS API client according to the instructions [here](https://cds.climate.copernicus.eu/api-how-to). Be sure to accept the user agreement.
2. Create a new conda environment using the `environment.yml` file in this codebase, or install packages manually from the list below. (If you run into problems building the environment using conda, or experience errors when running `ar_detection.py`, we recommend trying [mamba](https://github.com/mamba-org/mamba) to build your environment before attempting to debug any code.)
3. Set a local environment variable defining the data directory (e.g. `export AR_DATA_DIR=path/to/store/ar/data`).
5. Review parameters in `config.py` and adjust if desired. Note that there is a download request limit of 120,000 items, so adjusting the timestep or date range may overload the request and break the script.
6. Execute `download.py`
7. Execute `compute_ivt.py`
8. Execute `ar_detection.py` or use `AR_detection.ipynb` to orchestrate the detection.
9. Examine the results using the `AR_QC.ipynb` notebook.
10. Explore spatiotemporal relationships between avalanches and AR events using the `AR_avalanche_exploration.ipynb` notebook.
10. Explore spatiotemporal relationships between avalanches and AR events using the `AR_avalanche_exploration.ipynb` notebook.

## Packages
- rasterio
- rioxarray
- xarray
- pyproj
- geopandas
- pandas
- tqdm
- scipy
- haversine
- shapely
- cdsapi
- matplotlib
- seaborn
- jupyterlab
- dask
- scikit-image

## Figures

Figure 1. Subsetting candidate regions before applying AR criteria.
Upper left: IVT magnitude of an individual timestep calculated from 6hr ERA5 data.
Upper right: 90th percentile of IVT magnitude, calculated across all years in the dataset.
Lower left: IVT magnitude after applying a threshold of 90th percentile IVT magnitude.
Lower right: Contiguous regions identified and labeled after applying threshold. Each individual region will be evaluated against the AR criteria defined in `config.py`.

![AR thresholding figure](ar_thresholding.png)


Figure 2. Flowchart of AR detection processing.


![FLowchart figure](flowchart.png)
171 changes: 103 additions & 68 deletions ar_detection.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
"""This module will threshold the IVT Dataset and identify AR candidates. Each candidate is tested to determine if it meets the various criteria for AR classification. Candidate areas are converted to polygons and exported as a shapefile."""
import math
# ignore future warnings, or they will print tens of thousands of times in the terminal
import warnings

warnings.simplefilter(action="ignore", category=FutureWarning)

import math
import xarray as xr
import numpy as np
import pyproj
import geopandas as gpd
import pandas as pd

from tqdm import tqdm
from scipy.ndimage import label, generate_binary_structure
from skimage.measure import regionprops
Expand All @@ -31,7 +36,7 @@
landfall_events_csv,
coastal_impact_shp,
coastal_impact_csv,
log_fp
log_fp,
)


Expand Down Expand Up @@ -337,6 +342,7 @@ def get_directional_coherence(blob):
pct_coherent = round((pixel_count / total_pixels) * 100)
return blob.label, pct_coherent, mean_reference_deg


def get_total_ivt_strength(blob):
"""
Computes the total strength of a labeled region as the regional sum of IVT.
Expand Down Expand Up @@ -368,7 +374,7 @@ def get_relative_ivt_strength(blob):
tuple
(label of the region, relative IVT strength)
"""
return blob.label, round(blob.image_intensity.sum()/blob.area)
return blob.label, round(blob.image_intensity.sum() / blob.area)


def get_total_ivt_strength(blob):
Expand Down Expand Up @@ -733,7 +739,7 @@ def landfall_ars_export(
landfall_events_shp,
landfall_events_csv,
coastal_impact_shp,
coastal_impact_csv
coastal_impact_csv,
):
"""Filter the raw AR detection shapefile output to include only ARs making landfall in Alaska, and export the result to a new shapefile. Process the landfall ARs to condense adjacent dates into event multipolygons, and export the result to a second new shapefile. For both outputs, include a CSV with column names and descriptions.
Expand Down Expand Up @@ -859,7 +865,7 @@ def circ_mean(x):
# round results
res = res.round(
{
"ratio_m": 0,
"ratio_m": 1,
"len_km_m": 0,
"orient_m": 0,
"poleward_m": 0,
Expand Down Expand Up @@ -910,83 +916,107 @@ def circ_mean(x):
landfall_events_csv, header=["shp_col", "desc"], index=False
)

del(cols, desc, csv_dict)
del (cols, desc, csv_dict)

#### FIND COASTAL IMPACT POINTS FROM AR IVT AXIS LINE / AK COASTLINE INTERSECTION:
#create simple line to represent southern AK coast boundary, extended to Seattle capture Canadian coastline
lonlats = {173:52.5, 178.5:51.5, -176:52, -173:52, -164:54, -153:57, -151:59, -147:60, -141:60, -137:58, -133:55, -131:52, -125:48}
df = pd.DataFrame.from_dict(lonlats, orient='index').reset_index().rename(columns={'index':'X', 0:'Y'})
#zip the coords into a point object and convert to gdf
# create simple line to represent southern AK coast boundary, extended to Seattle capture Canadian coastline
lonlats = {
173: 52.5,
178.5: 51.5,
-176: 52,
-173: 52,
-164: 54,
-153: 57,
-151: 59,
-147: 60,
-141: 60,
-137: 58,
-133: 55,
-131: 52,
-125: 48,
}
df = (
pd.DataFrame.from_dict(lonlats, orient="index")
.reset_index()
.rename(columns={"index": "X", 0: "Y"})
)
# zip the coords into a point object and convert to gdf
geometry = [Point(xy) for xy in zip(df.X, df.Y)]
geo_df = gpd.GeoDataFrame(df, geometry=geometry)
geo_df['id'] = 'ak_coast'
#use the points in order to create a linestring
geo_df2 = geo_df.groupby('id')['geometry'].apply(lambda x: LineString(x.tolist()))
#convert to geodataframe and add CRS, then convert CRS to 3338; the planar projection is required here to handle coastline overlapping the meridian!
ak_coast = gpd.GeoDataFrame(geo_df2, geometry='geometry', crs=events.crs)
ak_coast = ak_coast.to_crs('epsg:3338')

#copy events to new gdf, resetting geomtery to centroid
#round-trip thru 3338 to use planar centroid (more accurate)
geo_df["id"] = "ak_coast"
# use the points in order to create a linestring
geo_df2 = geo_df.groupby("id")["geometry"].apply(lambda x: LineString(x.tolist()))
# convert to geodataframe and add CRS, then convert CRS to 3338; the planar projection is required here to handle coastline overlapping the meridian!
ak_coast = gpd.GeoDataFrame(geo_df2, geometry="geometry", crs=events.crs)
ak_coast = ak_coast.to_crs("epsg:3338")

# copy events to new gdf, resetting geomtery to centroid
# round-trip thru 3338 to use planar centroid (more accurate)
e = events.copy()
e['geom_cent'] = e.to_crs('epsg:3338').geometry.centroid.to_crs(e.crs)
e = e.set_geometry('geom_cent')
e.drop('geometry', axis=1, inplace=True)
e["geom_cent"] = e.to_crs("epsg:3338").geometry.centroid.to_crs(e.crs)
e = e.set_geometry("geom_cent")
e.drop("geometry", axis=1, inplace=True)

#create new standalone x/y columns for zipping
e['xo'], e['yo'] = e.geom_cent.x, e.geom_cent.y
# create new standalone x/y columns for zipping
e["xo"], e["yo"] = e.geom_cent.x, e.geom_cent.y

#set distance for axis line extension from centroid (m)
# set distance for axis line extension from centroid (m)
dist = 3000000

#get mean direction of IVT
# get mean direction of IVT
o = e.mean_dir_m
#or option to use mean orientation of shape
#o = e.orient_m
# or option to use mean orientation of shape
# o = e.orient_m

#create geoid for great circle distance
#compute endpoints using forward azimuth and distance, and write to new gdf columns
g = pyproj.Geod(ellps='WGS84')
# create geoid for great circle distance
# compute endpoints using forward azimuth and distance, and write to new gdf columns
g = pyproj.Geod(ellps="WGS84")
f = [g.fwd(xo, yo, o, dist) for xo, yo, o in zip(e.xo, e.yo, o)]
e['x1'] = [lon[0] for lon in f]
e['y1'] = [lat[1] for lat in f]
#compute WGS84 endpoints using backward azimuth and distance, and write to new gdf columns
e["x1"] = [lon[0] for lon in f]
e["y1"] = [lat[1] for lat in f]
# compute WGS84 endpoints using backward azimuth and distance, and write to new gdf columns
b = [g.fwd(xo, yo, o, dist) for xo, yo, o in zip(e.xo, e.yo, ((o - 180) % 360))]
e['x2'] = [lon[0] for lon in b]
e['y2'] = [lat[1] for lat in b]
#create line feature running between endpoints and thru centroid
e['geom_line'] = [LineString([[x1, y1], [xo, yo], [x2, y2]]) for x1, y1, xo, yo, x2, y2 in zip(e.x1, e.y1, e.xo, e.yo, e.x2, e.y2)]
#create a new gdf and set line feature as geometry, then convert to 3338 for intersection function
line_gdf = gpd.GeoDataFrame(e, geometry='geom_line', crs='epsg:4326')
line_gdf = line_gdf.to_crs('epsg:3338')
e["x2"] = [lon[0] for lon in b]
e["y2"] = [lat[1] for lat in b]
# create line feature running between endpoints and thru centroid
e["geom_line"] = [
LineString([[x1, y1], [xo, yo], [x2, y2]])
for x1, y1, xo, yo, x2, y2 in zip(e.x1, e.y1, e.xo, e.yo, e.x2, e.y2)
]
# create a new gdf and set line feature as geometry, then convert to 3338 for intersection function
line_gdf = gpd.GeoDataFrame(e, geometry="geom_line", crs="epsg:4326")
line_gdf = line_gdf.to_crs("epsg:3338")

pt_dfs = []

for index, row in line_gdf.iterrows():
#get row event id
i = row['event_id']
#get GeoSeries of intersections of row AK coast
r = line_gdf.iloc[index]['geom_line'].intersection(ak_coast)
#keep only points and convert to a gdf, default 'geometry' column is created by gdf.intersection()
rpts_gdf = gpd.GeoDataFrame(r[r.geom_type == 'Point'])
rpts_gdf.set_geometry(col='geometry', crs=line_gdf.crs, inplace=True)
#add event id and merge with original axis line gdf attributes, and add result to list of dataframes
rpts_gdf['event_id'] = i
rpts_gdf = rpts_gdf.merge(line_gdf, how='left', left_on='event_id', right_on='event_id')
# get row event id
i = row["event_id"]
# get GeoSeries of intersections of row AK coast
r = line_gdf.iloc[index]["geom_line"].intersection(ak_coast)
# keep only points and convert to a gdf, default 'geometry' column is created by gdf.intersection()
rpts_gdf = gpd.GeoDataFrame(r[r.geom_type == "Point"])
rpts_gdf.set_geometry(col="geometry", crs=line_gdf.crs, inplace=True)
# add event id and merge with original axis line gdf attributes, and add result to list of dataframes
rpts_gdf["event_id"] = i
rpts_gdf = rpts_gdf.merge(
line_gdf, how="left", left_on="event_id", right_on="event_id"
)
pt_dfs.append(rpts_gdf)

#concatenate event id point intersections and convert back to 4326 for export
# concatenate event id point intersections and convert back to 4326 for export
gdf_ak_int = pd.concat(pt_dfs)
gdf_ak_int = gdf_ak_int.to_crs('epsg:4326')
#drop all columns except event id and intersection point geometry
gdf_ak_int = gdf_ak_int[['event_id', 'geometry']]
gdf_ak_int = gdf_ak_int.to_crs("epsg:4326")
# drop all columns except event id and intersection point geometry
gdf_ak_int = gdf_ak_int[["event_id", "geometry"]]
# export coastal intersection point geodataframe to shp
gdf_ak_int.to_file(coastal_impact_shp, index=False)
# set up AR events columns decription table
cols = gdf_ak_int.columns.to_list()
desc = [
"unique AR event ID",
"geometry string for AR event coastal intersection point"]
"geometry string for AR event coastal intersection point",
]
csv_dict = dict(zip(cols, desc))
# export event AR column description table to csv
pd.DataFrame.from_dict(data=csv_dict, orient="index").reset_index().to_csv(
Expand Down Expand Up @@ -1016,16 +1046,21 @@ def create_log(start_year, end_year, bbox, ar_params, log_fp):
None
"""
now = datetime.now()
d = {'processing_date_time' : str(now),
'start year' : start_year,
'end year' : end_year,
'lon_1' : bbox[1],
'lon_2' : bbox[3],
'lat_1' : bbox[0],
'lat_2' : bbox[2]
}
d = {
"processing_date_time": str(now),
"start year": start_year,
"end year": end_year,
"lon_1": bbox[1],
"lon_2": bbox[3],
"lat_1": bbox[0],
"lat_2": bbox[2],
}
d.update(ar_params)
df = pd.DataFrame.from_dict(d, orient='index').reset_index().rename(columns={'index':'config_property', 0:'value'})
df = (
pd.DataFrame.from_dict(d, orient="index")
.reset_index()
.rename(columns={"index": "config_property", 0: "value"})
)
df.to_csv(log_fp, index=False)


Expand All @@ -1041,7 +1076,7 @@ def detect_all_ars(
landfall_events_csv,
coastal_impact_shp,
coastal_impact_csv,
log_fp
log_fp,
):
"""Run the entire AR detection pipeline and generate shapefile output.
Expand Down Expand Up @@ -1099,7 +1134,7 @@ def detect_all_ars(
landfall_events_shp,
landfall_events_csv,
coastal_impact_shp,
coastal_impact_csv
coastal_impact_csv,
)
create_log(start_year, end_year, bbox, ar_params, log_fp)

Expand Down Expand Up @@ -1201,7 +1236,7 @@ def detect_all_ars(
landfall_events_csv=args.output_events_csv,
coastal_impact_shp=args.output_coastal_shapefile,
coastal_impact_csv=args.output_coastal_csv,
log_fp=args.output_log_csv
log_fp=args.output_log_csv,
)

print(
Expand Down
Binary file added ar_thresholding.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 1d55a45

Please sign in to comment.