Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 44 #101

Merged
merged 12 commits into from
Dec 12, 2024
91 changes: 79 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,18 +42,85 @@ this can be accomplished are detailed in the **AWS Credentials** section below.

## Assumptions

- Checksums are all SHA256
- In the data files to be ingested:
- The global attribute "date_modified" exists and will be used to represent
the production date and time.
- Global attributes "time_coverage_start" and "time_coverage_end" exist and
will be used for the time range metadata values.
- Only one coordinate system is used by all variables (i.e. only one grid mapping variable is present in a file)
- (x[0],y[0]) represents the upper left corner of the spatial coverage.
- x,y coordinates represent the center of the pixel
- The grid mapping variable contains a GeoTransform attribute (which defines the pixel size ), and
can be used to determine the padding added to x and y values.
- Date/time strings can be parsed using `datetime.fromisoformat`
* Checksums are all SHA256
* NetCDF files have an extension of `.nc` (required by CF conventions)
* (x[0],y[0]) represents the upper left corner of the spatial coverage.
* x and y coordinate values represent the center of the pixel
* Date/time strings can be parsed using `datetime.fromisoformat`
* Only one coordinate system is used by all data variables (i.e. only one grid
mapping variable is present in a file)

### Reference links

* https://wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3
* https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html

### NetCDF Attributes Used to Populate UMM-G

- **Required** required
- **RequiredC** conditionally required
- **R+** highly or strongly recommended
- **R** recommended
- **S** suggested

| Attribute in use (location) | ACDD | CF Conventions | NSIDC Guidelines | Note |
| ----------------------------- | ---- | -------------- | ---------------- | ------- |
| date_modified (global) | S | | R | 1 |
| time_coverage_start (global) | R | | R | 2 |
| time_coverage_end (global) | R | | R | 2 |
| crs_wkt (`crs` variable) | | | R | 3 |
| GeoTransform (`crs` variable) | | | R | 4 |
| data (`x` variable) | | | R | 5 |
| data (`y` variable) | | | R | 6 |


| Attributes not currently used | ACDD | CF Conventions | NSIDC Guidelines | Comments |
| ----------------------------- | ---- | -------------- | ---------------- | -------- |
| Conventions (global) | R+ | Required | R | |
| standard_name (variable) | R+ | R+ | | |
| grid_mapping (data variable) | | RequiredC | R+ | 7 |
| grid_mapping_name (variable) | | RequiredC | R+ | 7 |
| `projection_x_coordinate` standard name (variable) | | RequiredC | | 8 |
| `projection_y_coordinate` standard name (variable) | | RequiredC | | 9 |
| axis (variable) | | R | | 8, 9 |
| geospatial_bounds (global) | R | | R | |
| geospatial_bounds_crs (global)| R | | R | |
| geospatial_lat_min (global) | R | | R | |
| geospatial_lat_max (global) | R | | R | |
| geospatial_lat_units (global) | R | | R | |
| geospatial_lon_min (global) | R | | R | |
| geospatial_lon_max (global) | R | | R | |
| geospatial_lon_units (global) | R | | R | |

Notes:
1. Used to populate the production date and time values in UMM-G output.
2. Used to populate the time begin and end UMM-G values.
3. The `crs_wkt` ("well known text") value is handed to the
`CRS` and `Transformer` modules in `pyproj` to conveniently deal
with the reprojection of (y,x) values to EPSG 4326 (lon, lat) values.
4. The `GeoTransform` value provides the pixel size per data value, which is then used
to calculate the padding added to x and y values to create a GPolygon enclosing all
of the data.
5. The `x` coordinate variable values are reprojected and thinned to create a GPolygon.
6. The `y` coordinate variable values are reprojected and thinned to create a GPolygon.
7. A grid mapping variable is required if the horizontal spatial coordinates are not
longitude and latitude and the intent of the data provider is to geolocate
the data. `grid_mapping` and `grid_mapping_name` allow programmatic identification of
the variable holding information about the horizontal coordinate reference system.
`metgenc` code currently assumes a variable named `crs` exists with grid
information. **TODO:** Identify the coordinate reference system variable by
looking for the `grid_mapping_name` or `grid_mapping` attribute.
8. `metgenc` code currently assumes a coordinate variable `x` exists whose
data values represent spatial information in meters.
**TODO:** Identify the x-axis coordinate variable by looking for the `standard_name`
attribute with a value of `projection_x_coordinate`, or an `axis` attribute with
the value `X`, rather than assuming the variable is named `x`.
9. `metgenc` code currently assumes a coordinate variable `y` exists whose
data values represent spatial information in meters.
**TODO:** Identify the y-axis coordinate variable by looking for the `standard_name`
attribute with a value of `projection_y_coordinate`, or an `axis` attribute with
the value `Y`, rather than assuming the variable is named `x`.


## Installing MetGenC

Expand Down
6 changes: 3 additions & 3 deletions src/nsidc/metgen/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

from nsidc.metgen import config, constants, metgen

LOGGER = logging.getLogger("metgenc")
LOGGER = logging.getLogger(constants.ROOT_LOGGER)


@click.group(epilog="For detailed help on each command, run: metgenc COMMAND --help")
Expand Down Expand Up @@ -133,13 +133,13 @@ def process(config_filename, dry_run, env, number, write_cnm, overwrite):
config.validate(configuration)
metgen.process(configuration)
except config.ValidationError as e:
logger = logging.getLogger("metgenc")
logger = logging.getLogger(constants.ROOT_LOGGER)
logger.error("\nThe configuration is invalid:")
for error in e.errors:
logger.error(f" * {error}")
exit(1)
except Exception as e:
logger = logging.getLogger("metgenc")
logger = logging.getLogger(constants.ROOT_LOGGER)
logger.error("\nUnable to process data: " + str(e))
exit(1)
click.echo("Processing complete")
Expand Down
2 changes: 1 addition & 1 deletion src/nsidc/metgen/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ class Config:
def show(self):
# TODO: add section headings in the right spot
# (if we think we need them in the output)
LOGGER = logging.getLogger("metgenc")
LOGGER = logging.getLogger(constants.ROOT_LOGGER)
LOGGER.info("")
LOGGER.info("Using configuration:")
for k, v in self.__dict__.items():
Expand Down
5 changes: 5 additions & 0 deletions src/nsidc/metgen/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,14 @@
DEFAULT_NUMBER = 1000000
DEFAULT_DRY_RUN = False

# Logging
ROOT_LOGGER = "metgenc"

# JSON schema locations and versions
CNM_JSON_SCHEMA = ("nsidc.metgen.json-schema", "cumulus_sns_schema.json")
CNM_JSON_SCHEMA_VERSION = "1.6.1"
UMMG_JSON_SCHEMA = ("nsidc.metgen.json-schema", "umm-g-json-schema.json")
UMMG_JSON_SCHEMA_VERSION = "1.6.6"

# Configuration sections
SOURCE_SECTION_NAME = "Source"
Expand Down
Loading
Loading