Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NLDAS Met data Download and Data Model #34

Closed
rburghol opened this issue May 23, 2022 · 2 comments
Closed

NLDAS Met data Download and Data Model #34

rburghol opened this issue May 23, 2022 · 2 comments
Assignees

Comments

@rburghol
Copy link
Contributor

rburghol commented May 23, 2022

  • Download and import process: Updated Scripts to Add Meteorology Data set #29
  • Drought model scenario development:https://github.com/HARPgroup/vahydro/issues/565
  • see QA data model/workflow: Data Model for NLDAS2 QA/Process Log Data #31
  • running model in drought scenario mode
    • we gotta make this up! so far:
      • take the most recent 3 years (to allow for extra extra warmup)
      • append data from the year with the lowest 90day precip-evap for each land segment... this has the downside that a given river segment might have a mish-mash of land segments running off into it... might have to figure out a way around this?
  • Reference:
    • function dh_weather_get_noaa_gridded_precip(): imports daily NOAA drought analysis dataset into dh_timeseries_weather
    • raster2pgsql: takes raster files and creates SQL code to import to database.
      • -R: use this switch to make an "out-of-db" raster, keeping it in the file system (we have not used this, but might explore the performance and storage tradeoffs that may be involved here)
      • -a: append into a table
      • -c: creates a new table for this raster data
      • -F Add a column with the filename of the raster. This could be useful to handle tracking import into a temporary storage table, then using an SQL query to migrate those data to the dh_timeseries_weather table
    • gdalwarp: command line utility to reproject rasters
      • Ex: gdalwarp $file_path -t_srs EPSG:4326 \"${file_path}.conus-4326.tif\"
    • GRIB file format: https://gdal.org/drivers/raster/grib.html

Data Model

  • NLDAS2 data files are hourly
    • file name
      • format: NLDAS_FORA0125_H.AYYYYMMDD.HH00.002.grb
      • Ex: NLDAS_FORA0125_H.A20200101.2300.002.grb
    • Timezone: GMT
    • Hour in file name is END of recorded period.
    • Band 10 is precip
  • extent of entire CBP land area to clip raster for storage
    • Any CBP type feature: select st_astext(st_multi(st_extent(dh_geofield_geom))) from dh_feature as a left outer join field_data_dh_geofield as b on (a.hydroid = b.entity_id and b.entity_type = 'dh_feature') where a.ftype like 'cbp%';
      • MULTIPOLYGON(((-83.6754134197279 35.8433695994306,-83.6754134197279 44.0969678458828,-74.1646796092988 44.0969678458828,-74.1646796092988 35.8433695994306,-83.6754134197279 35.8433695994306)))
      • Feature: ows-watershed-dash-info/617850
      • Create feature importable:
copy (
    select 'CBP6 Meteorology Coverage' as name, 'cbp6_met_coverage' as hydrocode,
    st_astext(st_multi(st_extent(dh_geofield_geom))) as wkt, 'landunit' as bundle, 'cbp_met_grid' as ftype
    from dh_feature as a left outer join field_data_dh_geofield as b 
    on (
      a.hydroid = b.entity_id 
      and b.entity_type = 'dh_feature'
    ) 
    where a.ftype like 'cbp%'
) to '/tmp/cbp6_met_coverage.wkt' WITH HEADER CSV DELIMITER AS E'\t';

# stash in vahydro dir for import
cd ../files/vahydro/
sftp dbase2
get /tmp/cbp6_met_coverage.wkt
bye
cd /var/www/html/d.dh
 drush scr modules/dh/src/import_features.php /var/www/html/files/vahydro/cbp6_met_coverage.wkt
  • Other geom options:
    • p5 land segs: select st_astext(st_multi(st_extent(dh_geofield_geom))) from dh_feature as a left outer join field_data_dh_geofield as b on (a.hydroid = b.entity_id and b.entity_type = 'dh_feature') where a.ftype = 'cbp532_landseg';
      • MULTIPOLYGON(((-83.6754134197279 35.8433695994306,-83.6754134197279 44.0969588250423,-74.1646796092988 44.0969588250423,-74.1646796092988 35.8433695994306,-83.6754134197279 35.8433695994306)))
    • p6 land segs: select st_astext(st_multi(st_extent(dh_geofield_geom))) from dh_feature as a left outer join field_data_dh_geofield as b on (a.hydroid = b.entity_id and b.entity_type = 'dh_feature') where a.ftype = 'cbp6_landseg';
      • MULTIPOLYGON(((-81.0144924130837 36.5503513152315,-81.0144924130837 44.0969678458827,-74.1646796294801 44.0969678458827,-74.1646796294801 36.5503513152315,-81.0144924130837 36.5503513152315)))
    • p6 watersheds: select st_astext(st_multi(st_extent(b.dh_geofield_geom))) from dh_feature as a left outer join field_data_dh_geofield as b on (a.hydroid = b.entity_id and b.entity_type = 'dh_feature') where a.bundle = 'watershed' and a.ftype like 'cbp%';
      • MULTIPOLYGON(((-83.6754131797072 36.1183484992513,-83.6754131797072 44.0969678458828,-74.1646796294801 44.0969678458828,-74.1646796294801 36.1183484992513,-83.6754131797072 36.1183484992513)))
    • Importing coverages from file:
  • code/workflow to import grid data
  • Storage requirements of raster data: select relname, round(sum(8.0 * relpages / (1024.0 * 1024.0))) as size_gb, round(sum(8.0 * relpages / 1024)) as size_mb FROM pg_class where relname = 'dh_timeseries_weather' and relkind in ('r','t') group by relname order by relname;
    • What is storage increase when we use clipped versus unclipped?
    • Try with temp table, insert many copies of a single file, or just with a single file?

Raw data Files

  • uses gdalinfo
. hspf_config # will load the NLDAS_ROOT directory var
gdalinfo $NLDAS_ROOT/1984/001/NLDAS_FORA0125_H.A19840101.0000.002.grb
Driver: GRIB/GRIdded Binary (.grb, .grb2)
Files: /backup/meteorology/1984/001/NLDAS_FORA0125_H.A19840101.0000.002.grb
Size is 464, 224
Coordinate System is:
GEOGCRS["Coordinate System imported from GRIB file",
    DATUM["unnamed",
        ELLIPSOID["Sphere",6371200,0,
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433,
            ID["EPSG",9122]]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]]]
Data axis to CRS axis mapping: 2,1
Origin = (-125.000500000000002,53.000500000000002)
Pixel Size = (0.125000000000000,-0.125000000000000)
Corner Coordinates:
Upper Left  (-125.0005000,  53.0005000) (125d 0' 1.80"W, 53d 0' 1.80"N)
Lower Left  (-125.0005000,  25.0005000) (125d 0' 1.80"W, 25d 0' 1.80"N)
Upper Right ( -67.0005000,  53.0005000) ( 67d 0' 1.80"W, 53d 0' 1.80"N)
Lower Right ( -67.0005000,  25.0005000) ( 67d 0' 1.80"W, 25d 0' 1.80"N)
Center      ( -96.0005000,  39.0005000) ( 96d 0' 1.80"W, 39d 0' 1.80"N)
Band 1 Block=464x1 Type=Float64, ColorInterp=Undefined
  Description = 2[m] HTGL (Specified height level above ground)
  NoData Value=9999
  Metadata:
    GRIB_COMMENT=Temperature [C]
    GRIB_ELEMENT=TMP
    GRIB_FORECAST_SECONDS=0 sec
    GRIB_REF_TIME=   441763200 sec UTC
    GRIB_SHORT_NAME=2-HTGL
    GRIB_UNIT=[C]
    GRIB_VALID_TIME=   441763200 sec UTC
Band 2 Block=464x1 Type=Float64, ColorInterp=Undefined
  Description = 2[m] HTGL (Specified height level above ground)
  NoData Value=9999
  Metadata:
    GRIB_COMMENT=Specific humidity [kg/kg]
    GRIB_ELEMENT=SPFH
    GRIB_FORECAST_SECONDS=0 sec
    GRIB_REF_TIME=   441763200 sec UTC
    GRIB_SHORT_NAME=2-HTGL
    GRIB_UNIT=[kg/kg]
    GRIB_VALID_TIME=   441763200 sec UTC
Band 3 Block=464x1 Type=Float64, ColorInterp=Undefined
  Description = 0[-] SFC (Ground or water surface)
  NoData Value=9999
  Metadata:
    GRIB_COMMENT=Pressure [Pa]
    GRIB_ELEMENT=PRES
    GRIB_FORECAST_SECONDS=0 sec
    GRIB_REF_TIME=   441763200 sec UTC
    GRIB_SHORT_NAME=0-SFC
    GRIB_UNIT=[Pa]
    GRIB_VALID_TIME=   441763200 sec UTC
Band 4 Block=464x1 Type=Float64, ColorInterp=Undefined
  Description = 10[m] HTGL (Specified height level above ground)
  NoData Value=9999
  Metadata:
    GRIB_COMMENT=u-component of wind [m/s]
    GRIB_ELEMENT=UGRD
    GRIB_FORECAST_SECONDS=0 sec
    GRIB_REF_TIME=   441763200 sec UTC
    GRIB_SHORT_NAME=10-HTGL
    GRIB_UNIT=[m/s]
    GRIB_VALID_TIME=   441763200 sec UTC
Band 5 Block=464x1 Type=Float64, ColorInterp=Undefined
  Description = 10[m] HTGL (Specified height level above ground)
  NoData Value=9999
  Metadata:
    GRIB_COMMENT=v-component of wind [m/s]
    GRIB_ELEMENT=VGRD
    GRIB_FORECAST_SECONDS=0 sec
    GRIB_REF_TIME=   441763200 sec UTC
    GRIB_SHORT_NAME=10-HTGL
    GRIB_UNIT=[m/s]
    GRIB_VALID_TIME=   441763200 sec UTC
Band 6 Block=464x1 Type=Float64, ColorInterp=Undefined
  Description = 0[-] SFC (Ground or water surface)
  NoData Value=9999
  Metadata:
    GRIB_COMMENT=Downward longwave radiation flux [W/m^2]
    GRIB_ELEMENT=DLWRF
    GRIB_FORECAST_SECONDS=0 sec
    GRIB_REF_TIME=   441763200 sec UTC
    GRIB_SHORT_NAME=0-SFC
    GRIB_UNIT=[W/m^2]
    GRIB_VALID_TIME=   441763200 sec UTC
Band 7 Block=464x1 Type=Float64, ColorInterp=Undefined
  Description = 0[-] SFC (Ground or water surface)
  NoData Value=9999
  Metadata:
    GRIB_COMMENT=undefined [-]
    GRIB_ELEMENT=var153
    GRIB_FORECAST_SECONDS=3600 sec
    GRIB_REF_TIME=   441759600 sec UTC
    GRIB_SHORT_NAME=0-SFC
    GRIB_UNIT=[-]
    GRIB_VALID_TIME=   441763200 sec UTC
Band 8 Block=464x1 Type=Float64, ColorInterp=Undefined
  Description = 180-0[hPa] SPDY (Level between 2 levels at specified pressure difference from ground to level)
  NoData Value=9999
  Metadata:
    GRIB_COMMENT=Convective available potential energy [J/kg]
    GRIB_ELEMENT=CAPE
    GRIB_FORECAST_SECONDS=0 sec
    GRIB_REF_TIME=   441763200 sec UTC
    GRIB_SHORT_NAME=180-0-SPDY
    GRIB_UNIT=[J/kg]
    GRIB_VALID_TIME=   441763200 sec UTC
Band 9 Block=464x1 Type=Float64, ColorInterp=Undefined
  Description = 0[-] SFC (Ground or water surface)
  NoData Value=9999
  Metadata:
    GRIB_COMMENT=Potential evaporation [kg/m^2]
    GRIB_ELEMENT=PEVAP
    GRIB_FORECAST_SECONDS=3600 sec
    GRIB_REF_TIME=   441759600 sec UTC
    GRIB_SHORT_NAME=0-SFC
    GRIB_UNIT=[kg/m^2]
    GRIB_VALID_TIME=   441763200 sec UTC
Band 10 Block=464x1 Type=Float64, ColorInterp=Undefined
  Description = 0[-] SFC (Ground or water surface)
  NoData Value=9999
  Metadata:
    GRIB_COMMENT=Total precipitation [kg/m^2]
    GRIB_ELEMENT=APCP
    GRIB_FORECAST_SECONDS=3600 sec
    GRIB_REF_TIME=   441759600 sec UTC
    GRIB_SHORT_NAME=0-SFC
    GRIB_UNIT=[kg/m^2]
    GRIB_VALID_TIME=   441763200 sec UTC
Band 11 Block=464x1 Type=Float64, ColorInterp=Undefined
  Description = 0[-] SFC (Ground or water surface)
  NoData Value=9999
  Metadata:
    GRIB_COMMENT=Downward shortwave radiation flux [W/m^2]
    GRIB_ELEMENT=DSWRF
    GRIB_FORECAST_SECONDS=0 sec
    GRIB_REF_TIME=   441763200 sec UTC
    GRIB_SHORT_NAME=0-SFC
    GRIB_UNIT=[W/m^2]
    GRIB_VALID_TIME=   441763200 sec UTC

Setting up raster import from NLDAS2

cd /backup/meteorology/out/sql
. hspf_config
fname=$NLDAS_ROOT/1984/001/NLDAS_FORA0125_H.A19840101.0000.002.grb
tifname="${fname}-4326.tif"
# use -a to append, omit -a and it will create the table anew
# create
raster2pgsql -t 1000x1000 $tifname tmp_nldas2 > tmp_nldas2.sql
# append 24 copies
raster2pgsql -a -t 1000x1000 $tifname tmp_nldas2 > tmp_nldas2-test.sql
# set year
thisdate="2023-01-01"
coverage_hydrocode='cbp6_met_coverage'
yr=`date -d "$thisdate" +%Y`
mo=`date -d "$thisdate" +%m`
da=`date -d "$thisdate" +%d`
maskExtent="/backup/meteorology/cbp_extent.csv"
jday=`date -d "$thisdate" +%j`
ymd="$yr$mo$da"
for i in 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 23; do 
   hr2digit=`printf %02d $i`
   hr4digit="${hr2digit}00"
   fname="$NLDAS_ROOT/$yr/$jday/NLDAS_FORA0125_H.A${ymd}.${hr4digit}.002.grb"
   tifname="${fname}-4326.tif"
   tifname_clip="/tmp/nldas2_clip.tif"
   tstime=`TZ="America/New_York" date -d "$thisdate ${hr2digit}:00:00" +'%s'`
   # Reproject to 4326
   #gdalinfo gdalinfo PRISM_ppt_stable_4kmD2_20090407_bil.bil
   gdalwarp "$fname" -t_srs EPSG:4326 "$tifname"
   rm /tmp/nldas2_clip.tif
   #Clipping the raster: Use gdalwarp to crop to the cutline maskExtent.csv, which is a csv of the CBP regions 
   gdalwarp -cutline $maskExtent -crop_to_cutline $tifname $tifname_clip
  # create
  # use -a to append, use -t and it try to drop an existing table then will create the table anew
   raster2pgsql -d -t 1000x1000 $tifname_clip tmp_nldas2 > /tmp/tmp_nldas2.sql
   # import this raster into a temp table
   cat /tmp/tmp_nldas2.sql | psql -h dbase2 drupal.dh03 
   # now insert the raster into the timeseries table, with feature and variable information linked
   inquery="insert into dh_timeseries_weather(tstime, varid, featureid, entity_type, rast)"
   inquery="$inquery select '$tstime', v.hydroid as varid, f.hydroid as featureid, 'dh_feature', met.rast"
   inquery="$inquery from dh_feature as f "
   inquery="$inquery left outer join dh_variabledefinition as v"
   inquery="$inquery on (v.varkey = 'nldas2_obs_hourly')"
   inquery="$inquery left outer join dh_timeseries_weather as w"
   inquery="$inquery on (f.hydroid = w.featureid and w.tstime = '${tstime}' and w.varid = v.hydroid) "
   inquery="$inquery left outer join tmp_nldas2 as met"
   inquery="$inquery on (1 = 1)"
   inquery="$inquery WHERE w.tid is null"
   inquery="$inquery AND f.hydrocode = '$coverage_hydrocode' "
   echo $inquery |psql -h dbase2 drupal.dh03
done

Using SLURM and meta-model to download and import

metsrc="nldas2"
yr=2023
doy=`date -d "${yr}-12-31" +%j`
i=0
while [ $i -lt $doy ]; do
  thisdate=`date -d "${yr}-01-01 +$i days" +%Y-%m-%d`
  echo "Running: sbatch /opt/model/meta_model/run_model raster_met \"$thisdate\" $metsrc auto met"
  sbatch /opt/model/meta_model/run_model raster_met "$thisdate" $metsrc auto met
  i=$((i + 1))
done

Inventory Met data

select extract(year from to_timestamp(met.tstime)) as year,
  min(to_timestamp(met.tstime)) as start_date,
  max(to_timestamp(met.tstime)) as end_date,
  count(*)
from (
  select met.tstime,
    (ST_SummaryStatsAgg(met.rast, 1, TRUE)).mean as precip_in
  from dh_feature as mcov
  left outer join dh_variabledefinition as v
  on (
    v.varkey = 'nldas2_obs_hourly'
  )
  left outer join dh_timeseries_weather as met
  on (
    mcov.hydroid = met.featureid and met.varid = v.hydroid
    and met.entity_type = 'dh_feature'
  )
  where mcov.hydrocode = 'cbp6_met_coverage'
  met.rast is not null
  group by met.tstime
) as met
group by extract(year from to_timestamp(met.tstime))
order by extract(year from to_timestamp(met.tstime));

select extract(year from to_timestamp(met.tstime)) as year,
  min(to_timestamp(met.tstime)) as start_date,
  max(to_timestamp(met.tstime)) as end_date,
  round(0.0393701 * sum(precip_in)::numeric,1) as precip_in
from (
  select met.tstime,
    (ST_SummaryStatsAgg(met.rast, 1, TRUE)).mean as precip_in
  from dh_feature as mcov
  left outer join dh_variabledefinition as v
  on (
    v.varkey = 'nldas2_obs_hourly'
  )
  left outer join dh_timeseries_weather as met
  on (
    mcov.hydroid = met.featureid and met.varid = v.hydroid
    and met.entity_type = 'dh_feature'
  )
  where mcov.hydrocode = 'cbp6_met_coverage'
  group by met.tstime
) as met
group by extract(year from to_timestamp(met.tstime))
order by extract(year from to_timestamp(met.tstime));

Verify Raster Data

  • Summary of precip over the entire raster
select c.featureid, to_timestamp(c.tstime) as obs_date,
        extract(month from to_timestamp(c.tstime)) as mo,
        (ST_summarystats(c.rast, 10, TRUE)).mean as precip_kgm3,
        0.0393701 * (ST_summarystats(c.rast, 10, TRUE)).mean as precip_in
from dh_feature as f
 left outer join dh_variabledefinition as v
on (v.varkey = 'nldas2_obs_hourly')
left outer join dh_timeseries_weather as c
on ( f.hydroid = c.featureid and c.varid = v.hydroid)
where f.hydrocode = 'cbp6_met_coverage'
order by c.tstime;
  • summary of precip over a specific feature
  • Shenandoah gage at Strasburg hydrocode = 'usgs_ws_01634000'
select met.featureid, to_timestamp(met.tstime) as obs_date,
  extract(month from to_timestamp(met.tstime)) as mo,
  (ST_summarystats(st_clip(met.rast, fgeo.dh_geofield_geom), 10, TRUE)).mean as precip_kgm3,
  0.0393701 * (ST_summarystats(st_clip(met.rast, fgeo.dh_geofield_geom), 10, TRUE)).mean as precip_in
from dh_feature as f
left outer join field_data_dh_geofield as fgeo
on (
  fgeo.entity_id = f.hydroid
  and fgeo.entity_type = 'dh_feature'
)
left outer join dh_variabledefinition as v
on (
  v.varkey = 'nldas2_obs_hourly'
)
left outer join dh_feature as mcov 
on (
  mcov.hydrocode = 'cbp6_met_coverage'
)
left outer join dh_timeseries_weather as met
on (
  mcov.hydroid = met.featureid and met.varid = v.hydroid
)
where f.hydrocode = 'usgs_ws_01634000'
order by met.tstime;

Estimate storage needs (not sue if this is very illustrative

select met.featureid, to_timestamp(met.tstime) as obs_date,
        extract(month from to_timestamp(met.tstime)) as mo, (ST_MemSize(st_clip(met.rast, fgeo.dh_geofield_geom))/1024)/1024 as size_mb,
        (ST_summarystats(st_clip(met.rast, fgeo.dh_geofield_geom), 10, TRUE)).mean as precip_kgm3,
        0.0393701 * (ST_summarystats(st_clip(met.rast, fgeo.dh_geofield_geom), 10, TRUE)).mean as precip_in
from dh_feature as f
left outer join field_data_dh_geofield as fgeo
on (
  fgeo.entity_id = f.hydroid
      and fgeo.entity_type = 'dh_feature'
)
left outer join dh_variabledefinition as v
on (v.varkey = 'nldas2_obs_hourly')
left outer join dh_feature as mcov
on (
  mcov.hydrocode = 'cbp6_met_coverage'
)
left outer join dh_timeseries_weather as met
on ( mcov.hydroid = met.featureid and met.varid = v.hydroid)
where f.hydrocode = 'usgs_ws_01634000'
order by met.tstime;

@rburghol rburghol changed the title 2022/05/24 Met data Download and Data Model NLDAS Met data Download and Data Model Nov 3, 2023
@rburghol rburghol transferred this issue from another repository May 15, 2024
@rburghol
Copy link
Contributor Author

New import script:

  • Use: nohup ./import_nldas_rasters YEAR JDAY all &
  • Ex: nohup ./import_nldas_rasters 2022 1 all &

@rburghol
Copy link
Contributor Author

Test import from HARPgroup/meta_model#50

select met.featureid, to_timestamp(met.tstime) as obs_date,
  extract(month from to_timestamp(met.tstime)) as mo,
  (ST_summarystats(st_clip(met.rast, fgeo.dh_geofield_geom), 10, TRUE)).mean as precip_kgm3,
  0.0393701 * (ST_summarystats(st_clip(met.rast, fgeo.dh_geofield_geom), 10, TRUE)).mean as precip_in
from dh_feature as f
left outer join field_data_dh_geofield as fgeo
on (
  fgeo.entity_id = f.hydroid
  and fgeo.entity_type = 'dh_feature'
)
left outer join dh_variabledefinition as v
on (
  v.varkey = 'nldas2_obs_hourly'
)
left outer join dh_feature as mcov 
on (
  mcov.hydrocode = 'cbp6_met_coverage'
)
left outer join dh_timeseries_weather as met
on (
  mcov.hydroid = met.featureid and met.varid = v.hydroid 
  and extract(year from to_timestamp(met.tstime)) = 2021
)
where f.hydrocode = 'usgs_ws_01634000'
order by met.tstime;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants