Clean production run (#22)

* update readme * update flowchart * re-export env.yml, set 1980 in config * ignore FutureWarning, clean mamba env, rerun nbs * black formatting * fixed count_ars mean calc, rerun QC nb * change rounding for ratio_m field of events shapefile * supress warnings --------- Co-authored-by: jdpaul3 <[email protected]> Co-authored-by: kyleredilla <[email protected]> Co-authored-by: jdpaul3 <[email protected]>
ua-snap · Sep 16, 2023 · 1d55a45 · 1d55a45
1 parent 19292fa
commit 1d55a45
Show file tree

Hide file tree

Showing 12 changed files with 1,403 additions and 656 deletions.
diff --git a/AR_QC.ipynb b/AR_QC.ipynb
diff --git a/AR_avalanche_exploration.ipynb b/AR_avalanche_exploration.ipynb
diff --git a/AR_detection.ipynb b/AR_detection.ipynb
diff --git a/README.md b/README.md
@@ -1,9 +1,9 @@
 # Atmospheric Rivers and Avalanches
 
 ## Background
-This codebase will download [ERA5 6 hourly pressure level data](https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-pressure-levels?tab=overview) in the vicinity of Alaska and apply an atmospheric river (AR) detection algorithm. Outputs will include a datacube of AR objects and multiple attributed shapefiles for future work correlating avalanche events to ARs. 
+This codebase will download [ERA5 6 hourly pressure level data](https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-pressure-levels?tab=overview) in the vicinity of Alaska and apply an atmospheric river (AR) detection algorithm. Outputs include a multiple attributed shapefiles for future work correlating avalanche events (or other phenomena) to ARs.
 
-The AR detection algorithm used here is adapted from [Guan & Waliser (2015)](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2015JD024257) and uses a combination of vertically integrated water vapor transport (IVT), geometric shape, and directional criteria to define ARs. See the annotated bibilography document for more detail and other references. Users of this codebase should know the following information about atmospheric river criteria:
+The AR detection algorithm used here is adapted from [Guan & Waliser (2015)](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2015JD024257) and uses a combination of vertically integrated water vapor transport (IVT), geometric shape, and directional criteria to define ARs. See the annotated bibilography document for more detail and other references. Users of this codebase should know the following information about AR criteria:
 
 All of the AR criteria proposed by Guan and Waliser (2015) either judge the geometry of the candidate AR object and/or measure some parameter of integrated water vapor transport (magnitude, direction, and poleward component). Each candidate AR object represents a specific time (six hour resolution) slice of the IVT data where contiguous groups of grid cells in which the magntiude the target quantile are labeled with unqiue integers >=1. The 0 value is reserved to mask the region that did not exceed the IVT quantile threshold.
 
@@ -27,12 +27,46 @@ The overall orientation of the object (i.e., the direction of the shape elongati
  - The `ar_detection.py` module contains a collection of AR detection functions that will filter AR candidates based on the criteria and create shapefile outputs of objects classified as ARs. These functions may be orchestrated from a notebook (see `AR_detection.ipynb`) or ran as a script. 
 
 ## Usage
-1. Register for a Climate Data Store (CDS) account and install the CDS API client according to the instructions [here].(https://cds.climate.copernicus.eu/api-how-to). Be sure to accept the user agreement. 
-2. Create a new conda environment using the `environment.yml` file in this codebase.
-3. Set a local environment variable defining the data directory. If the directory does not yet exist, `config.py` will create it, e.g. `export AR_DATA_DIR=path/to/store/ar/data`
+1. Register for a Climate Data Store (CDS) account and install the CDS API client according to the instructions [here](https://cds.climate.copernicus.eu/api-how-to). Be sure to accept the user agreement. 
+2. Create a new conda environment using the `environment.yml` file in this codebase, or install packages manually from the list below. (If you run into problems building the environment using conda, or experience errors when running `ar_detection.py`, we recommend trying [mamba](https://github.com/mamba-org/mamba) to build your environment before attempting to debug any code.)
+3. Set a local environment variable defining the data directory (e.g. `export AR_DATA_DIR=path/to/store/ar/data`).
 5. Review parameters in `config.py` and adjust if desired. Note that there is a download request limit of 120,000 items, so adjusting the timestep or date range may overload the request and break the script.
 6. Execute `download.py`
 7. Execute `compute_ivt.py`
 8. Execute `ar_detection.py` or use `AR_detection.ipynb` to orchestrate the detection.
 9. Examine the results using the `AR_QC.ipynb` notebook.
-10. Explore spatiotemporal relationships between avalanches and AR events using the `AR_avalanche_exploration.ipynb` notebook.
+10. Explore spatiotemporal relationships between avalanches and AR events using the `AR_avalanche_exploration.ipynb` notebook.
+
+## Packages
+- rasterio
+- rioxarray
+- xarray
+- pyproj
+- geopandas
+- pandas
+- tqdm
+- scipy
+- haversine
+- shapely
+- cdsapi
+- matplotlib
+- seaborn
+- jupyterlab
+- dask
+- scikit-image
+
+## Figures  
+
+Figure 1. Subsetting candidate regions before applying AR criteria. 
+Upper left: IVT magnitude of an individual timestep calculated from 6hr ERA5 data.
+Upper right: 90th percentile of IVT magnitude, calculated across all years in the dataset.
+Lower left: IVT magnitude after applying a threshold of 90th percentile IVT magnitude.
+Lower right: Contiguous regions identified and labeled after applying threshold. Each individual region will be evaluated against the AR criteria defined in `config.py`.  
+
+![AR thresholding figure](ar_thresholding.png)
+
+
+Figure 2. Flowchart of AR detection processing.  
+
+
+![FLowchart figure](flowchart.png)
diff --git a/ar_detection.py b/ar_detection.py
@@ -1,11 +1,16 @@
 """This module will threshold the IVT Dataset and identify AR candidates. Each candidate is tested to determine if it meets the various criteria for AR classification. Candidate areas are converted to polygons and exported as a shapefile."""
-import math
+# ignore future warnings, or they will print tens of thousands of times in the terminal
+import warnings
+
+warnings.simplefilter(action="ignore", category=FutureWarning)
 
+import math
 import xarray as xr
 import numpy as np
 import pyproj
 import geopandas as gpd
 import pandas as pd
+
 from tqdm import tqdm
 from scipy.ndimage import label, generate_binary_structure
 from skimage.measure import regionprops
@@ -31,7 +36,7 @@
     landfall_events_csv,
     coastal_impact_shp,
     coastal_impact_csv,
-    log_fp
+    log_fp,
 )
 
 
@@ -337,6 +342,7 @@ def get_directional_coherence(blob):
     pct_coherent = round((pixel_count / total_pixels) * 100)
     return blob.label, pct_coherent, mean_reference_deg
 
+
 def get_total_ivt_strength(blob):
     """
     Computes the total strength of a labeled region as the regional sum of IVT.
@@ -368,7 +374,7 @@ def get_relative_ivt_strength(blob):
     tuple
         (label of the region, relative IVT strength)
     """
-    return blob.label, round(blob.image_intensity.sum()/blob.area)
+    return blob.label, round(blob.image_intensity.sum() / blob.area)
 
 
 def get_total_ivt_strength(blob):
@@ -733,7 +739,7 @@ def landfall_ars_export(
     landfall_events_shp,
     landfall_events_csv,
     coastal_impact_shp,
-    coastal_impact_csv
+    coastal_impact_csv,
 ):
     """Filter the raw AR detection shapefile output to include only ARs making landfall in Alaska, and export the result to a new shapefile. Process the landfall ARs to condense adjacent dates into event multipolygons, and export the result to a second new shapefile. For both outputs, include a CSV with column names and descriptions.
 
@@ -859,7 +865,7 @@ def circ_mean(x):
         # round results
         res = res.round(
             {
-                "ratio_m": 0,
+                "ratio_m": 1,
                 "len_km_m": 0,
                 "orient_m": 0,
                 "poleward_m": 0,
@@ -910,83 +916,107 @@ def circ_mean(x):
         landfall_events_csv, header=["shp_col", "desc"], index=False
     )
 
-    del(cols, desc, csv_dict)
+    del (cols, desc, csv_dict)
 
     #### FIND COASTAL IMPACT POINTS FROM AR IVT AXIS LINE / AK COASTLINE INTERSECTION:
-    #create simple line to represent southern AK coast boundary, extended to Seattle capture Canadian coastline
-    lonlats = {173:52.5, 178.5:51.5, -176:52, -173:52, -164:54, -153:57, -151:59, -147:60, -141:60, -137:58, -133:55, -131:52, -125:48}
-    df = pd.DataFrame.from_dict(lonlats, orient='index').reset_index().rename(columns={'index':'X', 0:'Y'})
-    #zip the coords into a point object and convert to gdf
+    # create simple line to represent southern AK coast boundary, extended to Seattle capture Canadian coastline
+    lonlats = {
+        173: 52.5,
+        178.5: 51.5,
+        -176: 52,
+        -173: 52,
+        -164: 54,
+        -153: 57,
+        -151: 59,
+        -147: 60,
+        -141: 60,
+        -137: 58,
+        -133: 55,
+        -131: 52,
+        -125: 48,
+    }
+    df = (
+        pd.DataFrame.from_dict(lonlats, orient="index")
+        .reset_index()
+        .rename(columns={"index": "X", 0: "Y"})
+    )
+    # zip the coords into a point object and convert to gdf
     geometry = [Point(xy) for xy in zip(df.X, df.Y)]
     geo_df = gpd.GeoDataFrame(df, geometry=geometry)
-    geo_df['id'] = 'ak_coast'
-    #use the points in order to create a linestring
-    geo_df2 = geo_df.groupby('id')['geometry'].apply(lambda x: LineString(x.tolist()))
-    #convert to geodataframe and add CRS, then convert CRS to 3338; the planar projection is required here to handle coastline overlapping the meridian!
-    ak_coast = gpd.GeoDataFrame(geo_df2, geometry='geometry', crs=events.crs)
-    ak_coast = ak_coast.to_crs('epsg:3338')
-
-    #copy events to new gdf, resetting geomtery to centroid
-    #round-trip thru 3338 to use planar centroid (more accurate)
+    geo_df["id"] = "ak_coast"
+    # use the points in order to create a linestring
+    geo_df2 = geo_df.groupby("id")["geometry"].apply(lambda x: LineString(x.tolist()))
+    # convert to geodataframe and add CRS, then convert CRS to 3338; the planar projection is required here to handle coastline overlapping the meridian!
+    ak_coast = gpd.GeoDataFrame(geo_df2, geometry="geometry", crs=events.crs)
+    ak_coast = ak_coast.to_crs("epsg:3338")
+
+    # copy events to new gdf, resetting geomtery to centroid
+    # round-trip thru 3338 to use planar centroid (more accurate)
     e = events.copy()
-    e['geom_cent'] = e.to_crs('epsg:3338').geometry.centroid.to_crs(e.crs)
-    e = e.set_geometry('geom_cent')
-    e.drop('geometry', axis=1, inplace=True)
+    e["geom_cent"] = e.to_crs("epsg:3338").geometry.centroid.to_crs(e.crs)
+    e = e.set_geometry("geom_cent")
+    e.drop("geometry", axis=1, inplace=True)
 
-    #create new standalone x/y columns for zipping
-    e['xo'], e['yo'] = e.geom_cent.x, e.geom_cent.y
+    # create new standalone x/y columns for zipping
+    e["xo"], e["yo"] = e.geom_cent.x, e.geom_cent.y
 
-    #set distance for axis line extension from centroid (m)
+    # set distance for axis line extension from centroid (m)
     dist = 3000000
 
-    #get mean direction of IVT
+    # get mean direction of IVT
     o = e.mean_dir_m
-    #or option to use mean orientation of shape
-    #o = e.orient_m
+    # or option to use mean orientation of shape
+    # o = e.orient_m
 
-    #create geoid for great circle distance
-    #compute endpoints using forward azimuth and distance, and write to new gdf columns
-    g = pyproj.Geod(ellps='WGS84')
+    # create geoid for great circle distance
+    # compute endpoints using forward azimuth and distance, and write to new gdf columns
+    g = pyproj.Geod(ellps="WGS84")
     f = [g.fwd(xo, yo, o, dist) for xo, yo, o in zip(e.xo, e.yo, o)]
-    e['x1'] = [lon[0] for lon in f]
-    e['y1'] = [lat[1] for lat in f]
-    #compute WGS84 endpoints using backward azimuth and distance, and write to new gdf columns
+    e["x1"] = [lon[0] for lon in f]
+    e["y1"] = [lat[1] for lat in f]
+    # compute WGS84 endpoints using backward azimuth and distance, and write to new gdf columns
     b = [g.fwd(xo, yo, o, dist) for xo, yo, o in zip(e.xo, e.yo, ((o - 180) % 360))]
-    e['x2'] = [lon[0] for lon in b]
-    e['y2'] = [lat[1] for lat in b]
-    #create line feature running between endpoints and thru centroid 
-    e['geom_line'] = [LineString([[x1, y1], [xo, yo], [x2, y2]]) for x1, y1, xo, yo, x2, y2 in zip(e.x1, e.y1, e.xo, e.yo, e.x2, e.y2)]
-    #create a new gdf and set line feature as geometry, then convert to 3338 for intersection function
-    line_gdf = gpd.GeoDataFrame(e, geometry='geom_line', crs='epsg:4326')
-    line_gdf = line_gdf.to_crs('epsg:3338')
+    e["x2"] = [lon[0] for lon in b]
+    e["y2"] = [lat[1] for lat in b]
+    # create line feature running between endpoints and thru centroid
+    e["geom_line"] = [
+        LineString([[x1, y1], [xo, yo], [x2, y2]])
+        for x1, y1, xo, yo, x2, y2 in zip(e.x1, e.y1, e.xo, e.yo, e.x2, e.y2)
+    ]
+    # create a new gdf and set line feature as geometry, then convert to 3338 for intersection function
+    line_gdf = gpd.GeoDataFrame(e, geometry="geom_line", crs="epsg:4326")
+    line_gdf = line_gdf.to_crs("epsg:3338")
 
     pt_dfs = []
 
     for index, row in line_gdf.iterrows():
-        #get row event id
-        i = row['event_id']
-        #get GeoSeries of intersections of row AK coast 
-        r = line_gdf.iloc[index]['geom_line'].intersection(ak_coast)
-        #keep only points and convert to a gdf, default 'geometry' column is created by gdf.intersection()
-        rpts_gdf = gpd.GeoDataFrame(r[r.geom_type == 'Point'])
-        rpts_gdf.set_geometry(col='geometry', crs=line_gdf.crs, inplace=True)
-        #add event id and merge with original axis line gdf attributes, and add result to list of dataframes
-        rpts_gdf['event_id'] = i
-        rpts_gdf = rpts_gdf.merge(line_gdf, how='left', left_on='event_id', right_on='event_id')
+        # get row event id
+        i = row["event_id"]
+        # get GeoSeries of intersections of row AK coast
+        r = line_gdf.iloc[index]["geom_line"].intersection(ak_coast)
+        # keep only points and convert to a gdf, default 'geometry' column is created by gdf.intersection()
+        rpts_gdf = gpd.GeoDataFrame(r[r.geom_type == "Point"])
+        rpts_gdf.set_geometry(col="geometry", crs=line_gdf.crs, inplace=True)
+        # add event id and merge with original axis line gdf attributes, and add result to list of dataframes
+        rpts_gdf["event_id"] = i
+        rpts_gdf = rpts_gdf.merge(
+            line_gdf, how="left", left_on="event_id", right_on="event_id"
+        )
         pt_dfs.append(rpts_gdf)
 
-    #concatenate event id point intersections and convert back to 4326 for export
+    # concatenate event id point intersections and convert back to 4326 for export
     gdf_ak_int = pd.concat(pt_dfs)
-    gdf_ak_int = gdf_ak_int.to_crs('epsg:4326')
-    #drop all columns except event id and intersection point geometry
-    gdf_ak_int = gdf_ak_int[['event_id', 'geometry']]
+    gdf_ak_int = gdf_ak_int.to_crs("epsg:4326")
+    # drop all columns except event id and intersection point geometry
+    gdf_ak_int = gdf_ak_int[["event_id", "geometry"]]
     # export coastal intersection point geodataframe to shp
     gdf_ak_int.to_file(coastal_impact_shp, index=False)
     # set up AR events columns decription table
     cols = gdf_ak_int.columns.to_list()
     desc = [
         "unique AR event ID",
-        "geometry string for AR event coastal intersection point"]
+        "geometry string for AR event coastal intersection point",
+    ]
     csv_dict = dict(zip(cols, desc))
     # export event AR column description table to csv
     pd.DataFrame.from_dict(data=csv_dict, orient="index").reset_index().to_csv(
@@ -1016,16 +1046,21 @@ def create_log(start_year, end_year, bbox, ar_params, log_fp):
     None
     """
     now = datetime.now()
-    d = {'processing_date_time' : str(now),
-        'start year' : start_year,
-        'end year' : end_year,
-        'lon_1' : bbox[1],
-        'lon_2' : bbox[3],
-        'lat_1' : bbox[0],
-        'lat_2' : bbox[2]
-        }
+    d = {
+        "processing_date_time": str(now),
+        "start year": start_year,
+        "end year": end_year,
+        "lon_1": bbox[1],
+        "lon_2": bbox[3],
+        "lat_1": bbox[0],
+        "lat_2": bbox[2],
+    }
     d.update(ar_params)
-    df = pd.DataFrame.from_dict(d, orient='index').reset_index().rename(columns={'index':'config_property', 0:'value'})
+    df = (
+        pd.DataFrame.from_dict(d, orient="index")
+        .reset_index()
+        .rename(columns={"index": "config_property", 0: "value"})
+    )
     df.to_csv(log_fp, index=False)
 
 
@@ -1041,7 +1076,7 @@ def detect_all_ars(
     landfall_events_csv,
     coastal_impact_shp,
     coastal_impact_csv,
-    log_fp
+    log_fp,
 ):
     """Run the entire AR detection pipeline and generate shapefile output.
 
@@ -1099,7 +1134,7 @@ def detect_all_ars(
             landfall_events_shp,
             landfall_events_csv,
             coastal_impact_shp,
-            coastal_impact_csv
+            coastal_impact_csv,
         )
         create_log(start_year, end_year, bbox, ar_params, log_fp)
 
@@ -1201,7 +1236,7 @@ def detect_all_ars(
         landfall_events_csv=args.output_events_csv,
         coastal_impact_shp=args.output_coastal_shapefile,
         coastal_impact_csv=args.output_coastal_csv,
-        log_fp=args.output_log_csv
+        log_fp=args.output_log_csv,
     )
 
     print(

diff --git a/ar_thresholding.png b/ar_thresholding.png