Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add timeseries section to model schema #99

Merged
merged 19 commits into from
Mar 26, 2024
Merged

feat: add timeseries section to model schema #99

merged 19 commits into from
Mar 26, 2024

Conversation

Batch21
Copy link
Contributor

@Batch21 Batch21 commented Feb 7, 2024

No description provided.

@Batch21
Copy link
Contributor Author

Batch21 commented Feb 7, 2024

@jetuk there is quite a bit left to do here but it would be good to get your opinion on the approach.

I have a few queries:

  • Should Timeseries be a variant of MetricFloatValue? Originally I made it a variant of MetricFloatReference but as the load method of this enum is called in the model build method when setting up the inter-network transfers, it required creating an empty timeseries collection, which was a bit ugly.
  • Should the timeseries inputs be able to handle index timeseries? The load_column and load_df methods could probably take a generic type arg for loading either floats or ints.
  • Should timeseries and tables be combined into a single inputs section?

Copy link
Member

@jetuk jetuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a quick read through. I've left a couple of comments. I think this is what I was thinking it would look like.

pywr-schema/src/test_models/timeseries.json Outdated Show resolved Hide resolved
pywr-schema/src/timeseries/mod.rs Outdated Show resolved Hide resolved
pywr-schema/src/timeseries/mod.rs Outdated Show resolved Hide resolved
pywr-schema/src/timeseries/mod.rs Outdated Show resolved Hide resolved
pywr-schema/src/timeseries/mod.rs Show resolved Hide resolved
pywr-schema/src/timeseries/polars_dataset.rs Outdated Show resolved Hide resolved
pywr-schema/src/timeseries/polars_dataset.rs Outdated Show resolved Hide resolved
@Batch21
Copy link
Contributor Author

Batch21 commented Feb 11, 2024

@jetuk I've had a go at some resampling using polars. It still needs some work but one change that might simplify the code a bit would be to use chrono across the project instead of time-rs. This would also open up the possibility of using polars duration strings to specify model timestep frequency and the polars date_range function to generate the timesteps. What do you think?

@jetuk
Copy link
Member

jetuk commented Feb 12, 2024

I think the polars integration and the duration strings are a good argument for switching to chrono. How should we do that? Should we do a separate PR to swap, and the rebase this against it?

@Batch21
Copy link
Contributor Author

Batch21 commented Mar 3, 2024

@jetuk could you take another look at this. I responded to most of your comments thought there are still some issues to fix:

  • Resampling to or from timesteps with non-uniform durations does not currently work:
    • If the input data has a non-uniform datetime column the code will currently hit a todo!(). One issue with implementing something here is that polars does not have a direct way to infer frequency from a datetime column. I can't find any other rust crates with this functionality but looking at the Pandas code for it, it might not be too hard to implement it for a few frequencies (maybe we only need monthly initially?).
    • For the model timesteps, currently the TimeDomain derives step duration using only the first timestep so will not account for non-uniform timestep durations. This needs fixing alongside the last part of Time domain improvements #89, which I think you said you wanted to have a look at?
  • Its probably unnecessary to use nanosecond precision when comparing the durations. Seconds is probably fine?
  • I've fixed an issue where a ParameterNameAlreadyExist error occurred if a dataframe or dataframe column was referenced more than once. However there is still the issue that a user could define a parameter with a name that matches with one that would be created for a timeseries input. In this case, it would now use the values from the user-defined parameter if this was loaded first and not load the timeseries data. I wonder if parameters should have some kind of namespacing so that we can easily separate user-defined parameters from those created internally.

At least the first of these is probably worth doing in another PR?

@jetuk
Copy link
Member

jetuk commented Mar 3, 2024

@jetuk could you take another look at this. I responded to most of your comments thought there are still some issues to fix:

* Resampling to or from timesteps with non-uniform durations does not currently work:
  
  * If the input data has a non-uniform datetime column the code will currently hit a `todo!()`. One issue with implementing something here is that polars does not have a direct way to infer frequency from a datetime column. I can't find any other rust crates with this functionality but looking at the Pandas code for it, it might not be too hard to implement it for a few frequencies (maybe we only need monthly initially?).

That's probably OK. There are other TODOs and unfinished items at the moment. We can get this framework in and work on it.

  * For the model timesteps, currently the `TimeDomain` derives step duration using only the first timestep so will not account for non-uniform timestep durations. This needs fixing alongside the last part of [Time domain improvements #89](https://github.com/pywr/pywr-next/issues/89), which I think you said you wanted to have a look at?

Yes, I had some ideas, but nothing implemented. Can we merge this with a limitation? That method could do a check that all time-steps have the same duration and panic if not?

* Its probably unnecessary to use nanosecond precision when comparing the durations. Seconds is probably fine?

Yes!

* I've fixed an issue where a `ParameterNameAlreadyExist` error occurred if a dataframe or dataframe column was referenced more than once. However there is still the issue that a user could define a parameter with a name that matches with one that would be created for a timeseries input. In this case, it would now use the values from the user-defined parameter if this was loaded first and not load the timeseries data. I wonder if parameters should have some kind of namespacing so that we can easily separate user-defined parameters from those created internally.

This could mirror the "sub-names" for the nodes? I hadn't anticipated a schema parameter creating many core parameters.

At least the first of these is probably worth doing in another PR?

Probably all of them can be addressed with additional updates?

@Batch21
Copy link
Contributor Author

Batch21 commented Mar 7, 2024

@jetuk this is ready for another look. All your original comments should be addressed in some fashion.

One update that's probably necessary is adding the parameters_mut method to all the nodes. However, I wanted to check first if you though there was a better approach?

@jetuk
Copy link
Member

jetuk commented Mar 8, 2024

I think there are other approaches, but they will delay this PR further.

Two ideas:

  1. Remove the manually writing of parameters and parameters_mut using a derive macro (see v1 schema).
  2. Move to a builder pattern for mutating nodes (I'm not sure about this).
  3. Use a visitor pattern instead. Here there would be a visit or visit_mut method which you would give a closure that's called with each field or parameter. I am more used to (1), but thought about this after reading some of the tracing stuff.

So, probably (1) would just remove this boiler plate. I guess it might be more efficient to do (1) in a separate PR and then update this one? Assuming we can do that in the short-term to not delay this further.

@Batch21
Copy link
Contributor Author

Batch21 commented Mar 9, 2024

I forgot that the proc macros you created in the v1 schema did this. Looking at them, we should be able to copy them across without too many modifications. I can have a go in a separate PR this evening.

Copy link
Member

@jetuk jetuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I've left a few, but minor, comments to improve docstrings and tidy things up etc.

I think this might need a cargo fmt too.

@@ -58,6 +58,10 @@ impl PywrDuration {
pub fn fractional_days(&self) -> f64 {
self.0.num_seconds() as f64 / SECS_IN_DAY as f64
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add docstring. Would be good to make the one above a proper docstring as well.

@@ -412,8 +445,9 @@ impl PywrModel {
Timestepper::default()
});

let nodes = v1
let nodes_and_ts: Vec<NodeAndTimeseries> = v1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worthwhile adding some comments here about this process.

(None, None)
};

ts_data.extend(param_ts_data.into_iter().flatten());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can move this up into the above if let Some(v1_parameters) ... block to avoid wrapping in an Option?


if durations.len() > 1 {
// Non-uniform timestep are not yet supported
todo!();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put the comment in the todo!() macro.

.time()
.step_duration()
.whole_nanoseconds()
.expect("Nano seconds could not be extracted from model step duration");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a TimesreiesError variant?

use crate::outputs::Output;
use crate::parameters::{MetricFloatReference, TryIntoV2Parameter};
use crate::parameters::{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some unused imports to resolve here.

@@ -5,6 +5,7 @@ use crate::nodes::NodeAttribute;
use crate::parameters::{
DynamicFloatValue, IntoV2Parameter, NodeReference, ParameterMeta, TryFromV1Parameter, TryIntoV2Parameter,
};
use crate::timeseries::{self, LoadedTimeseriesCollection};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import self

@@ -1,5 +1,6 @@
use crate::data_tables::LoadedTableCollection;
use crate::parameters::{ConstantValue, DynamicFloatValue, DynamicFloatValueType, ParameterMeta};
use crate::parameters::{ConstantValue, DynamicFloatValue, DynamicFloatValueType, ParameterMeta, VariableSettings};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import VariableSettings

@@ -0,0 +1,245 @@
use chrono::TimeDelta;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import

use pywr_core::parameters::{Array1Parameter, Array2Parameter, ParameterIndex};
use pywr_core::PywrError;
use pywr_v1_schema::tables::TableVec;
use std::sync::Arc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import

@Batch21
Copy link
Contributor Author

Batch21 commented Mar 23, 2024

@jetuk thanks for the review. I think I've addressed all your comments apart from the one I reply to above.

@jetuk
Copy link
Member

jetuk commented Mar 24, 2024

Should we remove DataframeParameter in this PR? Or mark it deprecated in some fashion?

@Batch21
Copy link
Contributor Author

Batch21 commented Mar 24, 2024

Should we remove DataframeParameter in this PR? Or mark it deprecated in some fashion?

yeah, I did think of doing this earlier on. At this stage of development it could probably just be removed?

@jetuk
Copy link
Member

jetuk commented Mar 25, 2024

@Batch21 Are you finished with this now?

@Batch21
Copy link
Contributor Author

Batch21 commented Mar 25, 2024

@Batch21 Are you finished with this now?

Yes, though I did spot another issue when removing the df parameter. The polars timeseries backend does not load gzip files, which will be needed when the test that loads the simple-wasm test model is enabled. I can add an issue for this.

@jetuk jetuk changed the title WIP: feat: add timeseries section to model schema feat: add timeseries section to model schema Mar 26, 2024
@jetuk jetuk merged commit 2e38724 into main Mar 26, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants