Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update examples and docs for RdTools 3 #429

Merged
merged 39 commits into from
Oct 17, 2024
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
9188488
Delete legacy notebook
mdeceglie Sep 10, 2024
2b4748f
update the nanoseconds pandas alias
mdeceglie Sep 10, 2024
ff5ee23
fix deprecations
mdeceglie Sep 11, 2024
df4e176
Fix mistake in filtering docstrings
mdeceglie Sep 11, 2024
ab42b19
Call out aggregated filters in docstrings
mdeceglie Sep 11, 2024
9857ef9
Add allowed filter keys to TrendAnalysis docstring
mdeceglie Sep 11, 2024
e7fa479
Fix deprecations
mdeceglie Sep 11, 2024
a60cfbe
Overhaul trend analysis notebook
mdeceglie Sep 11, 2024
97e5599
Update README.md
mdeceglie Sep 17, 2024
521cfaf
Docs overview page update
mdeceglie Sep 17, 2024
a256ee1
Deal with pandas future warnings
mdeceglie Sep 17, 2024
9a6546e
Notebook overhaul
mdeceglie Sep 17, 2024
cc8c100
Merge branch 'new_notebooks' of https://github.com/NREL/rdtools into …
mdeceglie Sep 17, 2024
3be8231
Update sphinx docs for notebooks
mdeceglie Sep 17, 2024
f1a9170
update nbval action
mdeceglie Sep 17, 2024
e2f4295
update example data url
mdeceglie Sep 20, 2024
ec6cc81
Update NSRDB notebook to use online file
mdeceglie Sep 20, 2024
34c1c1e
update minimum pandas version
mdeceglie Sep 20, 2024
397e513
update setup.py to agree with requirements-min.txt
mdeceglie Sep 20, 2024
7e66423
fix long lines
mdeceglie Sep 20, 2024
c849ca9
fix typo from last commit
mdeceglie Sep 20, 2024
7530443
add comma to nbval workflow
martin-springer Sep 25, 2024
66e6dad
update actions/upload-artifact@v4
martin-springer Sep 25, 2024
665a524
Merge branch 'aggregated_filters_for_trials' into new_notebooks
mdeceglie Oct 14, 2024
bf53a17
Change version name in availability example
mdeceglie Oct 14, 2024
11639e6
remove experimental warning from availability
mdeceglie Oct 15, 2024
e4cfddf
remove availability is experimental warning from plotting module
mdeceglie Oct 15, 2024
97b2b1e
Remove extra line (style fix)
mdeceglie Oct 15, 2024
4b30122
add availability warning removal to change log
mdeceglie Oct 15, 2024
ad95657
Run and polish notebooks
mdeceglie Oct 15, 2024
a37ff1b
add ability to seed CircularBlockBootstrap
mdeceglie Oct 16, 2024
69e6798
add missing comma
mdeceglie Oct 16, 2024
892dd53
Run notebooks with bootstrap seed
mdeceglie Oct 16, 2024
232fb55
change minimum version of arch to 5.0
mdeceglie Oct 16, 2024
b00d6ff
change approach to pandas future warning in CODS
mdeceglie Oct 16, 2024
db3ccdd
Merge branch 'aggregated_filters_for_trials' into new_notebooks
mdeceglie Oct 16, 2024
2ddfed6
run notebooks
mdeceglie Oct 16, 2024
d6025f8
remove verbose cods output
mdeceglie Oct 16, 2024
e916cd3
Review changes
mdeceglie Oct 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/nbval.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,10 @@ jobs:
fail-fast: false # don't cancel other matrix jobs when one fails
matrix:
notebook-file: [
'TrendAnalysis_example_pvdaq4.ipynb',
'degradation_and_soiling_example_pvdaq_4.ipynb',
'TrendAnalysis_example.ipynb',
'TrendAnalysis_example_NSRDB.ipynb',
'degradation_and_soiling_example.ipynb',
'system_availability_example.ipynb'
# can't run the DKASC notebook here because it requires pre-downloaded data
]

steps:
Expand Down
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ Code coverage:
RdTools is an open-source library to support reproducible technical analysis of
time series data from photovoltaic energy systems. The library aims to provide
best practice analysis routines along with the building blocks for users to
tailor their own analyses.
Current applications include the evaluation of PV production over several years to obtain
rates of performance degradation and soiling loss. RdTools can handle
both high frequency (hourly or better) or low frequency (daily, weekly,
etc.) datasets. Best results are obtained with higher frequency data.
tailor their own analyses. Current applications include the evaluation of PV
production over several years to obtain rates of performance degradation and
soiling loss. RdTools can handle both high frequency (hourly or better) or low
frequency (daily, weekly, etc.) datasets. Best results are obtained with higher
frequency data.

RdTools can be installed automatically into Python from PyPI using the
command line:
Expand All @@ -27,7 +27,7 @@ pip install rdtools

For API documentation and full examples, please see the [documentation](https://rdtools.readthedocs.io).

RdTools currently is tested on Python 3.7+.
RdTools currently is tested on Python 3.9+.

## Citing RdTools

Expand Down
mdeceglie marked this conversation as resolved.
Show resolved Hide resolved

Large diffs are not rendered by default.

336 changes: 336 additions & 0 deletions docs/TrendAnalysis_example_NSRDB.ipynb

Large diffs are not rendered by default.

937 changes: 0 additions & 937 deletions docs/cods_example.ipynb

This file was deleted.

mdeceglie marked this conversation as resolved.
Show resolved Hide resolved

Large diffs are not rendered by default.

8 changes: 5 additions & 3 deletions docs/sphinx/source/changelog/pending.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ Enhancements
* Added a new wrapper function for clearsky filters (:pull:`412`)
* Improve test coverage, especially for the newly added filter capabilities (:pull:`413`)
* Added codecov.yml configuration file (:pull:`420`)
* Availability module no longer considered experimental (:pull:`429`)
* Add capability to seed the CircularBlockBootstrap (:pull:`429`)

Bug fixes
---------
Expand All @@ -43,7 +45,7 @@ Documentation
Requirements
------------
* Specified versions in ``requirements.txt``, ``requirements_min.txt`` and ``docs/notebook_requirements.txt``
have been updated (:pull:`412`, :pull:`428`)
have been updated (:pull:`412`, :pull:`428`, :pull:`429`)

* Updated certifi==2024.7.4 in ``requirements.txt`` (:pull:`428`)
* Updated chardet==5.2.0 in ``requirements.txt`` (:pull:`428`)
Expand Down Expand Up @@ -86,7 +88,7 @@ have been updated (:pull:`412`, :pull:`428`)
* Updated h5py==3.7.0 in ``requirements_min.txt`` (:pull:`428`)
* Updated pvlib==0.11.0 in ``requirements_min.txt`` (:pull:`428`)
* Updated scikit-learn==1.1.3 in ``requirements_min.txt`` (:pull:`428`)
* Updated arch==4.11 in ``requirements_min.txt`` (:pull:`428`)
* Updated arch==5.0 in ``requirements_min.txt`` (:pull:`429`)
* Updated filterpy==1.4.5 in ``requirements_min.txt`` (:pull:`428`)
* Updated xgboost==1.6.0 in ``requirements_min.txt`` (:pull:`431`)

Expand Down Expand Up @@ -143,7 +145,7 @@ have been updated (:pull:`412`, :pull:`428`)
* Increase maximum version of pvlib to <0.12 (:pull:`423`)
* Updated classifiers to accomodate new python versions (:pull:`428`)
* Add pytest-cov to TESTS_REQUIRE (:pull:`420`)
* Add arch >= 4.11 to INSTALL_REQUIRES (:pull:`428`)
* Add arch >= 5.0 to INSTALL_REQUIRES (:pull:`429`)
* Add filterpy >= 1.4.2 to INSTALL_REQUIRES (:pull:`428`)
* Updated matplotlib >= 3.5.3 in INSTALL_REQUIRES (:pull:`428`)
* Updated numpy >= 1.22.4 in INSTALL_REQUIRES (:pull:`428`)
Expand Down
6 changes: 3 additions & 3 deletions docs/sphinx/source/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ This page shows example usage of the RdTools analysis functions.

.. nbgallery::

examples/degradation_and_soiling_example_pvdaq_4
examples/TrendAnalysis_example_pvdaq4
examples/degradation_and_soiling_example
examples/TrendAnalysis_example
examples/TrendAnalysis_example_NSRDB
examples/system_availability_example
examples/cods_example
3 changes: 3 additions & 0 deletions docs/sphinx/source/examples/TrendAnalysis_example.nblink
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"path": "../../../TrendAnalysis_example.ipynb"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"path": "../../../TrendAnalysis_example_NSRDB.ipynb"
}

This file was deleted.

3 changes: 0 additions & 3 deletions docs/sphinx/source/examples/cods_example.nblink

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"path": "../../../degradation_and_soiling_example.ipynb"
}

This file was deleted.

56 changes: 31 additions & 25 deletions docs/sphinx/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,17 +36,15 @@ supported. A typical analysis of soiling and degradation contains the following:

0. Import and preliminary calculations
1. Normalize data using a performance metric
2. Filter data that creates bias
2. Filter data to reduce error
3. Aggregate data
4. Analyze aggregated data to estimate the degradation rate and/or
4. Filter aggregated data to remove anomalies
5. Analyze aggregated data to estimate the degradation rate and/or
soiling loss

Steps 1 and 2 may be accomplished with the clearsky workflow (see the
:ref:`examples`) which can help eliminate problems from irradiance sensor
drift.

.. image:: _images/RdTools_workflows.png
:alt: RdTools workflow diagram
It can be helpful to repeat the above steps with both ground-based measurements of weather
and satellite weather to check for drift in the ground-based measurements. This is illustrated
in the TrendAnalysis with NSRDB example.

Degradation
^^^^^^^^^^^
Expand All @@ -63,28 +61,26 @@ the uncertainty in the estimate via a bootstrap calculation. The
.. image:: _images/Clearsky_result_updated.png
:alt: RdTools degradation results plot

Two workflows are available for system performance ratio calculation,
and illustrated in an example notebook. The sensor-based approach
assumes that site irradiance and temperature sensors are calibrated and
in good repair. Since this is not always the case, a 'clear-sky'
workflow is provided that is based on modeled temperature and
irradiance. Note that site irradiance data is still required to identify
clear-sky conditions to be analyzed. In many cases, the 'clear-sky'
analysis can identify conditions of instrument errors or irradiance
sensor drift, such as in the above analysis.

The clear-sky analysis tends to provide less stable results than sensor-based
Drift of weather sensors over time (particularly irradiance) can bias the results
of this workflow. The preferred way to check for this is to also run the workflow using
satellite-derived weather data such as the National Solar Radiation Database (NSRDB) and
compare results to the sensor-based analysis. If satellite data is not available,
a 'clear-sky' workflow is also available in RdTools. THis workflow is based on modeled
temperature and irradiance. Note that site irradiance data is still required to identify
clear-sky conditions to be analyzed.

Satellite and clear-sky analysis tends to provide less stable results than sensor-based
analysis when details such as filtering are changed. We generally recommend
that the clear-sky analysis be used as a check on the sensor-based results,
that the these be used only as a check on the sensor-based results,
rather than as a stand-alone analysis.

Soiling
^^^^^^^

Soiling can be estimated with the stochastic rate and recovery (SRR)
method (Deceglie 2018). This method works well when soiling patterns
follow a "sawtooth" pattern, a linear decline followed by a sharp
recovery associated with natural or manual cleaning.
RdTools provides two methods for soiling analysis. The first is the
stochastic rate and recovery (SRR) method (Deceglie 2018). This method works
well when soiling patternsfollow a "sawtooth" pattern, a linear decline followed
mdeceglie marked this conversation as resolved.
Show resolved Hide resolved
by a sharp recovery associated with natural or manual cleaning.
:py:func:`.soiling.soiling_srr` performs the calculation and returns the P50
insolation-weighted soiling ratio, confidence interval, and additional
information (``soiling_info``) which includes a summary of the soiling
Expand All @@ -97,6 +93,12 @@ identified soiling rates for the dataset.
:width: 320
:height: 216

The combined estimation of degradation and soiling (CODS) method (Skomedal 2020) is also available
in RdTools. CODS self-consistently extracts degradation, soiling, and seasonality
of the daily-aggregated normalized performance signal. It is particularly useful
when soiling trends are biasing degradation results. It's use is shown in both the TrendAnalysis
example notebook as well as the funtional API example notebook for degradation and soiling.

TrendAnalysis
^^^^^^^^^^^^^
An object-oriented API for complete soiling and degradation analysis including
Expand Down Expand Up @@ -151,7 +153,7 @@ Usage and examples
------------------

Full workflow examples are found in the notebooks in :ref:`examples`.
The examples are designed to work with python 3.10. For a consistent
The examples are designed to work with python 3.12. For a consistent
experience, we recommend installing the packages and versions documented
in ``docs/notebook_requirements.txt``. This can be achieved in your
environment by first installing RdTools as described above, then running
Expand Down Expand Up @@ -259,6 +261,10 @@ appropriate:
Directly From PV Yield," in IEEE Journal of Photovoltaics, 8(2),
pp. 547-551, 2018 DOI: `10.1109/JPHOTOV.2017.2784682 <https://doi.org/10.1109/JPHOTOV.2017.2784682>`_

- Åsmund Skomedal and Michael G. Deceglie, "Combined Estimation of Degradation and Soiling Losses in
Photovoltaic Systems," in IEEE Journal of Photovoltaics, 10(6) pp. 1788-1796, 2020.
DOI: `10.1109/JPHOTOV.2020.3018219 <https://doi.org/10.1109/JPHOTOV.2020.3018219>`_

- Kevin Anderson and Ryan Blumenthal, "Overcoming Communications Outages in
Inverter Downtime Analysis", 2020 IEEE 47th Photovoltaic Specialists
Conference (PVSC). DOI: `10.1109/PVSC45281.2020.9300635 <https://doi.org/10.1109/PVSC45281.2020.9300635>`_
Expand Down
35 changes: 8 additions & 27 deletions docs/system_availability_example.ipynb

Large diffs are not rendered by default.

29 changes: 18 additions & 11 deletions rdtools/analysis_chains.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,21 +69,28 @@ class TrendAnalysis:
----------
(not all attributes documented here)
filter_params: dict
parameters to be passed to rdtools.filtering functions. Keys are the
Parameters to be passed to rdtools.filtering functions. Keys are the
names of the rdtools.filtering functions. Values are dicts of parameters
to be passed to those functions. Also has a special key `ad_hoc_filter`
the associated value is a boolean mask joined with the rest of the filters.
filter_params defaults to empty dicts for each function in rdtools.filtering,
in which case those functions use default parameter values, `ad_hoc_filter`
defaults to None. See examples for more information.
to be passed to those functions. Allowed keys are `normalized_filter`*,
`poa_filter`*, `tcell_filter`*, `clip_filter`*, `hour_angle_filter`,
`clearsky_filter`* (only used in a clear sky analysis), and
`sensor_clearsky_filter` (only used in a sensor analysis). (* indicates a
filter included by default). To invoke `clearsky_filter` for a sensor
analysis, use the special key `sensor_clearsky_filter`. Also has a special
key `ad_hoc_filter`, the associated value is a boolean mask joined with the
rest of the filters. Defaults to empty dicts for each function as described
above, in which case those functions use default parameter values,
`ad_hoc_filter` defaults to None. See examples for more information.
filter_params_aggregated: dict
parameters to be passed to rdtools.filtering functions that specifically handle
aggregated data (dily filters, etc). Keys are the names of the rdtools.filtering functions.
Values are dicts of parameters to be passed to those functions. To invoke `clearsky_filter`
for a sensor analysis, use the special key `sensor_clearsky_filter`. Also has a special key
aggregated data (daily filters, etc). Keys are the names of rdtools.filtering
functions. Allowed keys are `two_way_window_filter`*, `insolation_filter`,
`hampel_filter`, and `directional_tukey_filter` (* indicates filters included by
default). Values are dicts of parameters to be passed to those functions (empty
dict calls the funtion with its default parameters). Also has a special key
`ad_hoc_filter`; this filter is a boolean mask joined with the rest of the filters.
filter_params_aggregated defaults to empty dicts for each function in rdtools.filtering,
in which case those functions use default parameter values, `ad_hoc_filter`
filter_params_aggregated defaults to an empty dict for two_way_window_filter,
in which case the filter is run with its default parameter values. `ad_hoc_filter`
defaults to None. See examples for more information.
results : dict
Nested dict used to store the results of methods ending with `_analysis`
Expand Down
6 changes: 0 additions & 6 deletions rdtools/availability.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,6 @@
from scipy.interpolate import interp1d
import warnings

warnings.warn(
'The availability module is currently experimental. The API, results, '
'and default behaviors may change in future releases (including MINOR '
'and PATCH releases) as the code matures.'
)


class AvailabilityAnalysis:
"""
Expand Down
9 changes: 7 additions & 2 deletions rdtools/bootstrap.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@


def _make_time_series_bootstrap_samples(
signal, model_fit, sample_nr=1000, block_length=90, decomposition_type='multiplicative'
signal, model_fit, sample_nr=1000, block_length=90,
decomposition_type='multiplicative', bootstrap_seed=None
):
'''
Generate bootstrap samples based a time series signal and its model fit
Expand All @@ -34,6 +35,10 @@ def _make_time_series_bootstrap_samples(
decomposition_type : string, default 'multiplicative'
The type of decomposition to use with the model,
either 'multiplicative' or 'additive'
bootstrap_seed: {Generator, RandomState, int}, default None
Seed passed to CircularBlockBootstrap use to ensure reproducable results.
If an int, passes the value to value to ``np.random.default_rng``.
mdeceglie marked this conversation as resolved.
Show resolved Hide resolved
If None (default), a fresh Generator is constructed with system-provided entropy.

Returns
-------
Expand All @@ -54,7 +59,7 @@ def _make_time_series_bootstrap_samples(
index=signal.index, columns=range(sample_nr))

# Create circular blocks of boostrap samples
bs = CircularBlockBootstrap(block_length, residuals)
bs = CircularBlockBootstrap(block_length, residuals, seed=bootstrap_seed)
for b, bootstrapped_residuals in enumerate(bs.bootstrap(sample_nr)):
if decomposition_type == 'multiplicative':
bootstrap_samples.loc[:, b] = \
Expand Down
14 changes: 10 additions & 4 deletions rdtools/filtering.py
Original file line number Diff line number Diff line change
Expand Up @@ -742,6 +742,7 @@ def _calculate_xgboost_model_features(df, sampling_frequency):
# Get the max value for the day and see how each value compares
df["date"] = list(pd.to_datetime(pd.Series(df.index)).dt.date)
df["daily_max"] = df.groupby(["date"])["scaled_value"].transform("max")

# Get percentage of daily max
df["percent_daily_max"] = df["scaled_value"] / (df["daily_max"] + 0.00001)
# Get the standard deviation, median and mean of the first order
Expand Down Expand Up @@ -871,6 +872,7 @@ def xgboost_clip_filter(power_ac, mounting_type="fixed"):
# data frequency.
xgb_predictions = xgb_predictions.reindex(index=power_ac.index, method="ffill")
xgb_predictions.loc[xgb_predictions.isnull()] = False

# Regenerate the features with the original sampling frequency
# (pre-resampling or interpolation).
power_ac_df = power_ac.to_frame()
Expand Down Expand Up @@ -924,6 +926,7 @@ def two_way_window_filter(
"""
Removes anomalies based on forward and backward window of the rolling median. Points beyond
outlier_threshold from both the forward and backward-looking median are excluded by the filter.
Designed for use after the aggregation step in the RdTools trend analysis workflows.

Parameters
----------
Expand Down Expand Up @@ -964,7 +967,8 @@ def two_way_window_filter(
def insolation_filter(insolation, quantile=0.1):
"""
A simple quantile filter. Primary application in RdTools is to exclude
low insolation points after the aggregation step.
low insolation points after the aggregation step in the trend analysis
workflows.

Parameters
----------
Expand All @@ -986,7 +990,8 @@ def insolation_filter(insolation, quantile=0.1):

def hampel_filter(series, k="14d", t0=3):
"""
Hampel outlier filter primarily applied after aggregation step, but broadly
Hampel outlier designed for use after the aggregation step
in the RdTools trend analysis workflows, but broadly
applicable.

Parameters
Expand Down Expand Up @@ -1026,8 +1031,9 @@ def _tukey_fence(series, k=1.5):
def directional_tukey_filter(series, roll_period=pd.to_timedelta("7 Days"), k=1.5):
"""
Performs a forward and backward looking rolling Tukey filter. Points more than k*IQR
above the third quartile or below the first quartile are classified as outliers.Points
must only pass one of either the forward or backward looking filters to be kept.
above the third quartile or below the first quartile are classified as outliers. Points
must only pass one of either the forward or backward looking filters to be kept. Designed
for use after the aggregation step in the RdTools trend analysis workflows


Parameters
Expand Down
10 changes: 0 additions & 10 deletions rdtools/plotting.py
Original file line number Diff line number Diff line change
Expand Up @@ -347,11 +347,6 @@ def availability_summary_plots(power_system, power_subsystem, loss_total,
:py:meth:`.availability.AvailabilityAnalysis.plot` instead of running
this function manually.

.. warning::
The availability module is currently experimental. The API, results,
and default behaviors may change in future releases (including MINOR
and PATCH releases) as the code matures.

Parameters
----------
power_system : pandas.Series
Expand Down Expand Up @@ -389,11 +384,6 @@ def availability_summary_plots(power_system, power_subsystem, loss_total,
... aa.power_subsystem, aa.loss_total, aa.energy_cumulative,
... aa.energy_expected_rescaled, aa.outage_info)
"""
warnings.warn(
'The availability module is currently experimental. The API, results, '
'and default behaviors may change in future releases (including MINOR '
'and PATCH releases) as the code matures.'
)

fig = plt.figure(figsize=(16, 8))
gs = fig.add_gridspec(3, 2)
Expand Down
Loading