Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PAVICS homepage notebooks to Jenkins testsuite #29

Merged
merged 9 commits into from
Jul 6, 2021

Conversation

tlvu
Copy link
Collaborator

@tlvu tlvu commented Jul 5, 2021

Various changes so the notebooks pass Jenkins:

  • Stable interface to setup the layout (unzip output.zip, .geojson files path are hardcoded) required by the notebooks. With this stable interface, future layout change will be transparent to Jenkins
  • Make all prod Thredds access go through httpS so Jenkins test server outside of Ouranos can still run the notebooks (CRIM)
  • Add special markup for all Thredds url to get data directly from prod Pavics since that's too much .ncml (and .nc) files to replicate to all the test servers
  • Disable output checking for failing output that is not critical to the notebooks so we do not need to refresh too often the notebook output

See each commit description for more detailed info.

Not regenerating all the .html files since the code change is really minimal (adding comments and one line switch the Thredds url from direct url to behind twitcher).

Matching PR that actually add the homepage notebooks to Jenkins: Ouranosinc/PAVICS-e2e-workflow-tests#79

tlvu added 9 commits July 5, 2021 12:52
To fix this error in Jenkins:

```
  _ PAVICS-landing-add-pavics-homepage-notebooks-to-jenkins-testsuite/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-3Climate-Indicators.ipynb::Cell 1 _
  Notebook cell execution failed
  Cell 1: Cell outputs differ

  Input:
  freq = 'YS'
  print(f'calculating yearly output using freq="{freq}"')
  with ProgressBar():
      out = atmos.tx_days_above(tasmax=ds.tasmax, thresh='27 degC', freq=freq) # Yearly frequency

      # Average over spatial domain and plot time-series
      fig1 = plt.figure(figsize=(20,4))
      plt.subplot(1,2,1)
      out.mean(dim=['lon','lat'], keep_attrs=True).plot()
      plt.title('spatial mean')

      # Calculate a 30 year climatology and plot a map
      plt.subplot(1,2,2)
      subset.subset_time(out, start_date='1981', end_date='2010').mean(dim='time', keep_attrs=True).plot()
      plt.title('1981-2010 mean')
      display()

  Traceback:
   mismatch 'stdout'

   assert reference_output == test_output failed:

    'calculating ...one |  1.0s\n' == 'calculating ...one |  1.0s\n'
    Skipping 94 identical leading characters in diff, use -v to show
    - mpleted | 18.0s
    + mpleted |  1min 47.7s
      [########################################] | 100% Completed |100% Done |  1.0s
```
This is of type `application/javascript` which should already be ignored
by default
(https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/blob/3806e67e7ad46c915797f483a373780c36f8a8f6/conftest.py#L3)

Not sure why have to force ignore here but don't have time to
investigate.
For error:
```
  _ PAVICS-landing-add-pavics-homepage-notebooks-to-jenkins-testsuite/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-1DataAccess.ipynb::Cell 1 _
  Notebook cell execution failed
  Cell 1: Cell outputs differ

  Input:
  import xarray as xr

  # This does not download the entire dataset, just the metadata and attributes describing the content.
  ds = xr.open_dataset(cds.access_urls["OPENDAP"], chunks='auto')

  # What we see here is an in-memory representation of the full content, the actual data is still on the server.
  ds

  Traceback:
   mismatch 'text/plain'

   assert reference_output == test_output failed:

    '<xarray.Data...id:       NCC' == '<xarray.Data...id:       NCC'
    Skipping 357 identical leading characters in diff, use -v to show
    Skipping 792 identical trailing characters in diff, use -v to show
    - hunksize=(7693, 40, 109), meta=np.ndarray>
    ?           ^^^   ^   ^^^
    + hunksize=(323, 320, 323), meta=np.ndarray>
    ?           ^^   ^^   ^^^
    -     tasmax   (time, lat, lon) float32 dask.array<chunksize=(7693, 40, 109), meta=np.ndarray>
    ?                                                             ^^^   ^   ^^^
    +     tasmax   (time, lat, lon) float32 dask.array<chunksize=(323, 320, 323), meta=np.ndarray>
    ?                                                             ^^   ^^   ^^^
    -     pr       (time, lat, lon) float32 dask.array<chunksize=(7693, 40, 109), meta=n
    ?                                                             ^^^   ^   ^^^
    +     pr       (time, lat, lon) float32 dask.array<chunksize=(323, 320, 323), meta=n
    ?                                                             ^^   ^^   ^^^
```
For error:
```
  _ PAVICS-landing-add-pavics-homepage-notebooks-to-jenkins-testsuite/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-3Climate-Indicators.ipynb::Cell 3 _
  Notebook cell execution failed
  Cell 3: Cell outputs differ

  Input:
  with xclim.set_options(metadata_locales=['fr']):
      out_fr = atmos.tx_days_above(tasmax=ds.tasmax,
                                  thresh = '27 degC',
                                  freq='YS')
  out_fr

  Traceback:
   mismatch 'text/plain'

   assert reference_output == test_output failed:

    '<xarray.Data...ent_fr:      ' == '<xarray.Data...ent_fr:      '
    Skipping 537 identical leading characters in diff, use -v to show
    Skipping 370 identical trailing characters in diff, use -v to show
    - ys_above: TX_DAYS_ABOVE(tasma...
    + ys_above: tx_days_above(tasma...
```
For errors:

```
  _ PAVICS-landing-add-pavics-homepage-notebooks-to-jenkins-testsuite/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-4Ensembles.ipynb::Cell 0 _
  Notebook cell execution failed
  Cell 0: Cell outputs differ

  Input:
  from xclim import ensembles as xens
  from clisops.core import subset
  from pathlib import Path
  import matplotlib.pyplot as plt
  import warnings
  import logging
  logging.getLogger().disabled = True
  warnings.simplefilter("ignore")
  output = '/notebook_dir/writable-workspace/tmp/tutorial3/output'

  infolder = Path(output)

  # Create a list of rcp 4.5 files  (n=11)
  ncfiles = [d for d in infolder.glob('tx_days_above*QS-DEC*rcp45*.nc')]

  #Create an ensemble dataset from the 11 simulations
  ds_ens = xens.create_ensemble(ncfiles)
  display(ds_ens)

  # Plot time series of single grid point
  lon = -66
  lat = 48.5

  ds1 = subset.subset_gridpoint(ds_ens, lon=lon, lat=lat)
  #  plot summer season
  ds1.tx_days_above.sel(time=ds1['time.season']=='JJA').plot.line(figsize=(10,5), x='time' ,linewidth=0.5)
  #plt.title('RCP 4.5 individuals runs : ')
  display()

  Traceback:
   mismatch 'text/plain'

   assert reference_output == test_output failed:

    '<xarray.Data...    EPSG:4326' == '<xarray.Data...    EPSG:4326'
    Skipping 678 identical leading characters in diff, use -v to show
    -           DATE_TIME_TZ CMOR rewrote data to com...
    +           MOHC pp to CMOR/NetCDF convertor (version 1.1...
          institution:                Ouranos Consortium on Regional Climatology an...
    -     source:                     NorESM1-M 2011  atmosphere: CAM-Oslo (CAM4-Os...
    +     source:                     HadGEM2-CC (2011) atmosphere: HadGAM2(N96L60)...
    -     driving_model:              NorESM1-M
    ?                                 ^^^ - ^ ^
    +     driving_model:              HadGEM2-CC
    ?                                 ^^^^  ^ ^^
          ...                         ...
          modeling_realm:             atmos
          target_dataset:             CANADA : ANUSPLIN interpolated Canada daily 3...
          target_dataset_references:  CANADA : https://doi.org/10.1175/2011BAMS3132...
    -     driving_institution:        Norwegian Climate Centre
    ?                                 ^^^^ ^  ^^^ ----
    +     driving_institution:        Met Office Hadley Centre
    ?                                 ^ ^^^^^ ++++ ^  +
    -     driving_institute_id:       NCC
    ?                                 ^^
    +     driving_institute_id:       MOHC
    ?                                 ^^^
          crs:                        EPSG:4326

  _ PAVICS-landing-add-pavics-homepage-notebooks-to-jenkins-testsuite/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-4Ensembles.ipynb::Cell 1 _
  Notebook cell execution failed
  Cell 1: Cell outputs differ

  Input:
  # Calculate percentiles
  ds_perc = xens.ensemble_percentiles(ds_ens, values=[10, 50, 90], split=False)
  display(ds_perc)

  # compare with indiviudual runs
  sel1 = ds1.tx_days_above.sel(time=ds1['time.season']=='JJA')
  sel1.plot.line(label='individual runs',color=[0.6, .6 , .6],figsize=(10,5),x='time' ,linewidth=0.5)

  perc1 = subset.subset_gridpoint(ds_perc,
          lon=lon, lat=lat).tx_days_above.sel(time=ds_perc['time.season']=='JJA')

  # plot uncertainty bounds
  plt.fill_between(color='b',label="RCP 4.5 : 10th - 90th percentile",
                   x=perc1.time.values, y1=perc1.sel(percentiles=10),
                   y2=perc1.sel(percentiles=90), alpha=0.2)
  # plot median
  perc1.sel(percentiles=50).plot(label="RCP 4.5 : 50th percentile",
                                 color='b', linewidth=.85, )

  # combine legend entries for individual runs
  handles, labels = plt.gca().get_legend_handles_labels()
  by_label = dict(zip(labels, handles))
  plt.legend(by_label.values(), by_label.keys())
  plt.title('Summer tx_days_above : RCP 4.5 ensembles percentiles vs individual runs')
  display()

  Traceback:
   mismatch 'text/plain'

   assert reference_output == test_output failed:

    '<xarray.Data...ersion 1.1...' == '<xarray.Data...    EPSG:4326'
    Skipping 507 identical leading characters in diff, use -v to show
    - utes: (12/33)
    ?           ^^
    + utes: (12/28)
    ?           ^^
    -     units:                      days
    -     cell_methods:                time: maximum within days time: sum over days
    -     xclim_history:              DATE_TIME_TZ CMOR rewrote data to com...
    -     standard_name:              number_of_days_with_air_temperature_above_thr...
    -     long_name:                  Number of days with tmax > 27 degc
    -     description:                Seasonal number of days where daily maximum t...
    +     Conventions:                CF-1.5
    +     title:                      Ouranos standard ensemble of bias-adjusted cl...
    +     history:                    MOHC pp to CMOR/NetCDF convertor (version 1.1...
    +     institution:                Ouranos Consortium on Regional Climatology an...
    +     source:                     HadGEM2-CC (2011) atmosphere: HadGAM2(N96L60)...
    +     driving_model:              HadGEM2-CC
          ...                         ...
    -     modeling_realm:             atmos
          target_dataset:             CANADA : ANUSPLIN interpolated Canada daily 3...
          target_dataset_references:  CANADA : https://doi.org/10.1175/2011BAMS3132...
    -     driving_institution:        Norwegian Climate Centre
    ?                                 ^^^^ ^  ^^^ ----
    +     driving_institution:        Met Office Hadley Centre
    ?                                 ^ ^^^^^ ++++ ^  +
    -     driving_institute_id:       NCC
    ?                                 ^^
    +     driving_institute_id:       MOHC
    ?                                 ^^^
    -     crs:                        EPSG:4326
    +     crs:                        EPSG:4326
    ?                                          +
    +     xclim_history:              MOHC pp to CMOR/NetCDF convertor (version 1.1...
```
For errors:
```
  _ PAVICS-landing-add-pavics-homepage-notebooks-to-jenkins-testsuite/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-5Visualization.ipynb::Cell 1 _
  Notebook cell execution failed
  Cell 1: Cell outputs differ

  Input:
  # 30 year means and delta calculations
  window = 30
  d30yAvg = ds_ens.rolling(year=window).mean()
  d30yAvg = d30yAvg.isel(year=slice(window-1,None)) # Select from the first full windowed mean

  # Select every horizons in 10 y intervals
  d30yAvg = d30yAvg.sel(year=(d30yAvg.year.values%10==0))
  horizons = xr.DataArray([f'{yr - 29}-{yr}' for yr in d30yAvg.year.values], dims=dict(year=d30yAvg.year))
  d30yAvg = d30yAvg.assign_coords(horizon=horizons)

  # Calculate deltas
  ref = d30yAvg.sel(year=(d30yAvg.horizon=='1981-2010')).squeeze()
  for v in d30yAvg.data_vars:
      with xr.set_options(keep_attrs=True):
          d30yAvg[f"{v}_delta"]= d30yAvg[v] - ref[v]
          for a in ['description', 'long_name']:
              d30yAvg[f"{v}_delta"].attrs[a] = f"{d30yAvg[f'{v}_delta'].attrs[a]} : delta vs 1981-2010"

  # Calculate percentiles on 30y normals
  d30yAvg_ens = xens.ensemble_percentiles(d30yAvg, split=False).load()
  display(d30yAvg_ens)
  map1 = d30yAvg_ens.tx_mean.hvplot.quadmesh(cmap='Spectral_r', geo=True,tiles='EsriImagery', framewise=False, frame_width=400)
  map1

  Traceback:
   mismatch 'text/plain'

   assert reference_output == test_output failed:

    '<xarray.Data...entiles on...' == '<xarray.Data...    EPSG:4326'
    Skipping 956 identical leading characters in diff, use -v to show
    + ttributes:
    +     xclim_history:  [DATE_TIME] : Computation of the percentiles on...
    - ttributes: (12/35)
    -     long_name:                  Mean daily maximum temperature
    -     standard_name:              air_temperature
    -     units:                      ��C
    -     _ChunkSizes:                [256  16  16]
    -     grid_mapping:               crs
    -     cell_methods:                time: maximum within days time: mean over days
    -     ...                         ...
    -     modeling_realm:             atmos
    -     target_dataset:             CANADA : ANUSPLIN interpolated Canada daily 3...
    -     target_dataset_references:  CANADA : https://doi.org/10.1175/2011BAMS3132...
    -     driving_institution:        Norwegian Climate Centre
    -     driving_institute_id:       NCC
    -     crs:                        EPSG:4326

  _ PAVICS-landing-add-pavics-homepage-notebooks-to-jenkins-testsuite/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-5Visualization.ipynb::Cell 5 _
  Notebook cell execution failed
  Cell 5: Cell outputs differ

  Input:
  rolling = pn.widgets.FloatInput(value=30, step=10, start=10, end=30, width=50)
  rolling1 = pn.Column(pn.pane.Markdown('Smoothing'),rolling)

  ## Time-series plot
  @pn.depends(vars.param.value, regions.param.value, seasons.param.value, rolling.param.value)
  def plot_region_ts(v=vars.param.value, reg=regions.param.value, s=seasons.param.value,
                     wind=rolling.param.value):
      colors = dict(rcp45="#0000FF",rcp85="#FF0000")
      plt1 = None
      wind = max(wind,1)
      for r in reg_ts.rcp.values:

          if plt1 is None:
              plt1 = reg_ts.rolling(year=wind, center=True, min_periods=1).mean(dim='year')\
                      .sel(geom=reg, season=s, rcp=r).hvplot.area(title=f"{reg} {s}", width=800,
                      height=300, x='year', y= f"{v}_p10", y2= f"{v}_p90",
                      color=colors[r], alpha=0.3, line_alpha=.1,
                      label=f"RCP {r[-2:].replace('5','.5')}")
              plt1 = plt1 * reg_ts.rolling(year=wind, center=True, min_periods=1).mean(dim='year')\
                      .sel(geom=reg, season=s, rcp=r).hvplot.line(x='year', y= f"{v}_p50", color=colors[r],
                      alpha=0.7, label=f"RCP {r[-2:].replace('5','.5')}")

          else:
              plt1 = plt1 * reg_ts.rolling(year=wind, center=True, min_periods=1).mean(dim='year')\
                      .sel(geom=reg, season=s, rcp=r).hvplot.area(x='year', y=v + '_p10', y2= v + '_p90',
                      color=colors[r], alpha=0.3, line_alpha=.1, label=f"RCP {r[-2:].replace('5','.5')}")

              plt1 = plt1 * reg_ts.rolling(year=wind, center=True, min_periods=1).mean(dim='year')\
                      .sel(geom=reg, season=s, rcp=r).hvplot.line(x='year', y= f"{v}_p50", color=colors[r],
                      alpha=0.7, label=f"RCP {r[-2:].replace('5','.5')}")

          for vv in ['_p10','_p90']:
              plt1 = plt1 * reg_ts.rolling(year=wind, center=True, min_periods=1).mean(dim='year')\
                      .sel(geom=reg, season=s, rcp=r).hvplot.line(x='year', y= f"{v}{vv}", color=colors[r],
                      line_width=0.1, alpha=0.3, label=f"RCP {r[-2:].replace('5','.5')}")

      title = pn.pane.Markdown(f"### {s} {reg_ts[f'{v}_p50'].attrs['long_name'].lower()} \
                          ({reg_ts[f'{v}_p50'].attrs['units'].lower()})<br/><br/>")

      return pn.Column(pn.Row(plt1.opts(legend_position='top_left'), rolling1))

  ## Table summary
  @pn.depends(vars.param.value, regions.param.value, hors.param.value, delta.param.value)
  def create_table(v = vars.param.value, r=regions.param.value, h=hors.param.value, delta_flag=delta.param.value):
      title1 = f"Summary : {r} {h}"
      var_cols = var_cols =  [vv for vv in df.columns if v in vv]
      if delta_flag:
          var_cols = [vv for vv in var_cols if 'delta' in vv]
          title1 = f"{title1} (delta vs 1981-2010)"
      else:
          var_cols = [vv for vv in var_cols if 'delta' not in vv]
      out = df[var_cols].iloc[(df.index.get_level_values('horizon') == h)&(df.index.get_level_values('region') == r)]
      return(out.sort_values(['season']).round(decimals=1).hvplot.table(title=title1, width=800, dynamic=True))

  pn.Column(pn.pane.Markdown('# A simple PAVICS dashboard'), map1, regions1, plot_region_ts, create_table)

  Traceback:
   mismatch 'text/plain'

   assert reference_output == test_output failed:

    'Column\n    ...ion(function)' == 'Column\n    ...ion(function)'
    Skipping 806 identical leading characters in diff, use -v to show
    - 41-2070', width=200)
    + 41-2070', value_throttled='2041-2070', width=200)
                  [5] Column
                      [0] Markdown(str)
                      [1] Checkbox()
          [2] Column
              [0] Markdown(str)
              [1] Select(options=['Avignon', 'Bonaventure',...], value='Avignon')
          [3] ParamFunction(function)
          [4] ParamFunction(function)
```
They use too much .ncml which means too much .nc files to replicate to
test servers.
…utside of Ouranos

Since we use data from prod, test servers outside of Ouranos do not have
acess to port 8083 so will fail.

All prod Thredds access much go through port 443.
@tlvu tlvu merged commit e545b17 into master Jul 6, 2021
@tlvu tlvu deleted the add-pavics-homepage-notebooks-to-jenkins-testsuite branch July 6, 2021 20:51
tlvu added a commit to Ouranosinc/PAVICS-e2e-workflow-tests that referenced this pull request Jul 6, 2021
…to-jenkins-testsuite

Add PAVICS homepage notebooks to Jenkins testsuite and new Jupyter env build

# Overview

- Add PAVICS homepage notebooks to Jenkins testsuite and those notebooks enabled by default.

- Also released new Jupyter build for the homepage notebook.  Only `unzip` is newly added to the Jupyter env.  No other update.

Matching required notebook changes PR Ouranosinc/PAVICS-landing#29

Matching PAVICS PR deployment change to test this PR bird-house/birdhouse-deploy#180

Matching PR to update pavics-landing deployment config bird-house/birdhouse-deploy-ouranos#12

## Other Changes

- Add new ability to force link to stay on `pavics.ouranos.ca` in notebooks.  Pavics homepage notebooks use a lot of .ncml so it was too much of .nc files to replicated to all the test servers.

## Related Issue / Discussion

- Homepage notebooks are also deployed on Binder under `tutorial-notebooks`.  They are already previously deployed to Jupyter env outside of `tutorial-notebooks` so it is kept as-is

- There is a stable interface between the homepage notebooks and Jenkins for setup the layout the notebooks requires.  The stable interface means future layout changes will be transparent to Jenkins.

## Additional Information

- Test passing against prod Pavics: http://jenkins.ouranos.ca/job/PAVICS-e2e-workflow-tests/job/add-pavics-homepage-notebooks-to-jenkins-testsuite/13/console

- Test passing against my dev server: http://jenkins.ouranos.ca/job/PAVICS-e2e-workflow-tests/job/add-pavics-homepage-notebooks-to-jenkins-testsuite/14/console

- Test passing against CRIM server to test remote server and ensure the IAC pipeline will also work: https://daccs-jenkins.crim.ca/job/PAVICS-e2e-workflow-tests/job/add-pavics-homepage-notebooks-to-jenkins-testsuite/3/console (one failure unrelated).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants