-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #634 - Implemented data entry date option for TS data retrieval #927
base: develop
Are you sure you want to change the base?
Conversation
…on bug in progress
@@ -84,7 +83,7 @@ | |||
|
|||
public class TimeSeriesController implements CrudHandler { | |||
private static final Logger logger = Logger.getLogger(TimeSeriesController.class.getName()); | |||
|
|||
private static final String INCLUDE_ENTRY_DATE = "include-entry-date"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a purely pedantic standpoint this should really be a parameter of the content-type, but I may have to accept the reality of this being easier for everyone.
@krowvin, we were just discussing this conceptually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Path parameters are the "what" I am retrieving, query parameters are the "how" am I adding to or filtering that data, and content-type is the "shape" that is returned. This change effects the how and given the way the column names and data array are paired together to already give this flexibility, I don't see a change in shape.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a column is definitely a change in shape. The what is a time series, the query parameters specify exactly which time series, or at least which portion of a given time series (I guess if we're being really pedantic begin and end should be in the fragment... but I digress).
While the flexibility is there, it's flexibility to change the shape. I don't totally disagree with you but given we haven't communicated that portion of the contract very well we are introducing a breaking change. We already have more than one downstream library dependent on these types.
I'm going to type up something on the wiki, or maybe discussion, for the philosophy I'm going for with these, hopefully my argument makes more sense in regards to query vs content type. Especially as it relates to some of the challenges we're currently seeing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is not conveyed very well that one should use the columns array to determine which index to grab from the data array. I don't think that adding information to the swagger docs to communicate that should trigger a content-type change though (maybe version=2.5 but that seems like a huge headache) especially since not including the parameter to retrieve entry date keeps the array intact for backwards compatibility.
Also, I've never seen an API where the content-type changes how much extra (or less) data gets returned to the client. I'd like to see some examples. I also don't see the reason (pedantic or not) for adding begin/end as path parameters as those are filters on the time series. Everything for the identifier of the time series encompasses the time series (which is also why the date version shouldn't be in the path and is a query parameter).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: not arguing for it, but explanation of my logic on the fragement.
a time series is:
/timeseries/Alder Springs.Precip-INC.Total.1Hour.1Hour.Calc?version_date=unversioned
identifies a specific time series which is a mathematical series of data - and technically all of it. The entire series could be consider a "document", a fragment is a section within a document; traditionally we would think of lines in a file, but it applies to the time series as well. A file, after all, is just a series of lines.
As I said, extremely pedantic.... admittedly almost to the point of being useless because literally no one does it that way, nor would they even if it could be proven objectively correct.
Technically the units are also representation and not identification but given how limited the use of content-type features are used it would be incredibly difficult to get people to use it; I don't even think the Swagger-UI has a mechanism to slightly tweak the content-type.
But back on the topic of what's correct for us, it seems we're all in agreement that @DanielTOsborne 's initial design is already sufficiently flexible in the current scheme and our failure was in how we documented that for the general end user.
So we leave the inclusion as a query parameter unless a better way actually comes along.
@@ -248,6 +256,11 @@ public static class Record { | |||
@JsonProperty(value = "quality-code", index = 2) | |||
int qualityCode; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be better as a sub class. I know that adds some complex but eventually we may also want to include the version date and text notes that may be attached and that would be a lot of logic in this one class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to a subclass. I added a custom deserializer to handle the different classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so I said this, Adam said this, but after reading the code and our other discussions above timeseries does it make more sense to just make TimeSeries more generic?
e.g.
A builder where you manually add column names, index, and type, and and functions in the row builder to set such?
something like
withColumn(int index, String name, String description, Class<T> type) {
"logic"
}
... Record:
<T> setColumn(int index, T value, Class<T> type) {
"logic"
}
Or something like that, it would prevent the need in TimeSeriesDaoImpl have two different loops doing almost 90% the same work. just a check for "I have this column requested, let's also add it."
Basically instead of hard coding the columns at all (okay, maybe time... is a time series) , the user of the given TimeSeries object (after set by builder) can define them at run-time.
Sorry, I know you did a lot here, that just came to me now looking through the current PR.
…dle custom output, adds data entry date support
@jbkolze what are your thoughts on this? At least how it's done. Some appears it may be a breaking change which we want to avoid. |
@@ -623,7 +622,7 @@ private static TimeSeries buildTimeSeries(ILocationLevelRef levelRef, Interval i | |||
if (qualityCode != null) { | |||
quality = qualityCode.intValue(); | |||
} | |||
timeSeries.addValue(dateTime, value, quality); | |||
timeSeries.addValue(dateTime, value, quality, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure this isn't a breaking change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll double check with additional test cases, but this change should not break any existing functionality. A null data entry date parameter is treated as if a standard Time-Value-Quality data entry was provided. The implementation of addValue
will use a TimeSeries data record with only three input fields under normal circumstances:
if (dataEntryDate != null) { values.add(new TimeSeriesRecordWithDate(dateTime, value, qualityCode, dataEntryDate)); } else { values.add(new Record(dateTime, value, qualityCode)); }
The existing use cases of TimeSeries should be unaffected, as they will be handled exactly as they were before these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend subclassing TimeSeries instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would break CWMS.js, The javascript openapi generator already has trouble with our TimeSeries class given some specific assumptions the generator chose to make.
Conceptually, I don't personally have any qualms with it. You had mentioned in the typing discussion that you all were trying to leave flexibility for dynamically adjusting the time series values array, and this seems like a prime use case for that. That being said, I am a little concerned about the part you marked as a breaking change. I don't know the CDA source that well, but is this indicating that the response would include a fourth null value even if the include_entry_date is false? Because that definitely would not be ideal -- we've written a lot of CDA code already (and I think other districts have as well) that would have to be updated. Not as big of a deal if an API version were included in the path ( My understanding from previous conversations is that you'd get the normal 3-value array if this were set to false, but receive a 4-value array if include_entry_date is true. And the "value-columns" object would be updated to match. That'd be my "ideal" implementation. |
You are correct that setting the include_entry_date parameter to true would result in a four-value array, whereas setting it to false would return a three-value array. Your "ideal" implementation is what I was aiming for to retain backwards compatibility and avoid breaking any of the other endpoints that rely on the TimeSeries implementation. |
tsRecord.getValue(qualityNormCol).intValue() | ||
) | ||
); | ||
if (includeEntryDate) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This query is doubling the time it takes to retrieve time series. Can this replace the retrieve_ts_out_tab
calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it could replace the retrieve_ts_out_tab
call above, doing so would require implementing trim support into the query, as that is currently handled by the retrieve_ts_out_tab
call. I haven't quite figured out the best way to do so, so maybe we can discuss this in more detail
@@ -159,7 +164,8 @@ public ZonedDateTime getEnd() { | |||
} | |||
|
|||
// Use the array shape to optimize data transfer to client | |||
@JsonFormat(shape=JsonFormat.Shape.ARRAY) | |||
@JsonFormat(shape = JsonFormat.Shape.ARRAY) | |||
@JsonDeserialize(contentUsing = TimeSeriesRecordDeserializer.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is overwritten for XML using a Mixin, did you verify that behavior works as intended still?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a Mixin test that verifies that the XML tags for the value records have the appropriate labels. There are also a couple serialization/deserialization tests that verify that this works as intended.
|
||
@JsonProperty(value = "value-columns") | ||
@Schema(name = "value-columns", accessMode = AccessMode.READ_ONLY) | ||
public List<Column> getValueColumnsJSON() { | ||
return getColumnDescriptor(); | ||
return getColumnDescriptor((values != null && !values.isEmpty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like behavior for a subclass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to TimeSeries subclass
@@ -218,7 +228,16 @@ private List<Column> getColumnDescriptor() { | |||
columns.add(new TimeSeries.Column(fieldName, fieldIndex + 1, f.getType())); | |||
} | |||
} | |||
|
|||
if (includeDataEntryDate) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be accomplished better with a subclass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved into subclass
|
||
// This class is used to deserialize the time-series data JSON into an object | ||
// Solves the issue of the deserializer getting stuck in a loop | ||
// and throwing a StackOverflowError when trying to handle the Record class directly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems sketchy to me, why is your custom deserializer causing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed custom deserializer
return jsonParser.getCodec().treeToValue(node, TimeSeriesRecordWithDate.class); | ||
} | ||
String nodeString = node.toString(); | ||
if (nodeString.startsWith("[")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A mixin doesn't solve the need for this custom parsing? All this logic looks like we're circumventing jackson too much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed custom serializer
Timestamp dateTime = Timestamp.from(Instant.ofEpochMilli(Long.parseLong(valList[0]))); | ||
double value = Double.parseDouble(valList[1]); | ||
int quality = Integer.parseInt(valList[2]); | ||
Timestamp entryDate = Timestamp.from(Instant.ofEpochMilli(Long.parseLong(valList[3]))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to convert from Instant to Timestamp, Timestamp's constructor takes in epoch millis.
@Override | ||
public TimeSeries.Record deserialize(JsonParser jsonParser, DeserializationContext deserializationContext) throws IOException { | ||
JsonNode node = jsonParser.readValueAsTree(); | ||
if (node.get("data-entry-date") != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a constant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably also be ignored, data-entry-date is always set by the database itself, an external system/user isn't allowed to change it.
@@ -623,7 +622,7 @@ private static TimeSeries buildTimeSeries(ILocationLevelRef levelRef, Interval i | |||
if (qualityCode != null) { | |||
quality = qualityCode.intValue(); | |||
} | |||
timeSeries.addValue(dateTime, value, quality); | |||
timeSeries.addValue(dateTime, value, quality, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend subclassing TimeSeries instead
That is correct for the location indicated, which is location levels backed by a time series. I don't know how much it would affect what you're currently using, but it's definitely not ideal. At the least it definitely shouldn't be null, may as well provide the data, but seems like a parameter should be added to match on the location level end point. But it sounds like your on the same page with Adam about the shape being already explicitly flexible so I'm okay with that section now; not "ideal", but what is, definitely something to better document though. |
}); | ||
logger.fine(() -> query2.getSQL(ParamType.INLINED)); | ||
final TimeSeriesWithDate timeSeries = new TimeSeriesWithDate(timeseries); | ||
query2.forEach(tsRecord -> timeSeries.addValue( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to solve this now, but definitely before we add requesting any text notes as well. There has to be a better way to handle this with the builders.
if (pageSize != 0) { | ||
if (versionDate != null) { | ||
whereCond = whereCond.and(AV_TSV_DQU.AV_TSV_DQU.VERSION_DATE.eq(versionDate == null ? null : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this logic handle max version? or will AV_TSV_DQU always return every version? or only the specifically requested version?
@@ -248,6 +256,11 @@ public static class Record { | |||
@JsonProperty(value = "quality-code", index = 2) | |||
int qualityCode; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so I said this, Adam said this, but after reading the code and our other discussions above timeseries does it make more sense to just make TimeSeries more generic?
e.g.
A builder where you manually add column names, index, and type, and and functions in the row builder to set such?
something like
withColumn(int index, String name, String description, Class<T> type) {
"logic"
}
... Record:
<T> setColumn(int index, T value, Class<T> type) {
"logic"
}
Or something like that, it would prevent the need in TimeSeriesDaoImpl have two different loops doing almost 90% the same work. just a check for "I have this column requested, let's also add it."
Basically instead of hard coding the columns at all (okay, maybe time... is a time series) , the user of the given TimeSeries object (after set by builder) can define them at run-time.
Sorry, I know you did a lot here, that just came to me now looking through the current PR.
Fixes #634 - Implements data entry date as option for TimeSeries data retrieval - Serialization bug in progress