Do not check for monotonic increase of timestamps in validation by default #503

steffenhauf · 2024-03-21T08:58:09Z

While train ids should always be monotonically increasing, for timestamps there is no such requirement. A device is free to choose any timestamp for a given train id - they are technically not coupled.

The evolved DAQ in fact by design introduces non-monotonically increasing timestamps whenever "late" data, i.e. data that is outside its finite buffer size is applied on a best effort basis. It will then be applied for latest train still in the buffer onwards, which in this case might have a newer timestamp attached to its data than the late data applied. That means that trains that had already left the buffer can have newer timestamps too.

Consider the following scenario, where train id is the train id of the data, and reference train id, when it was registered on the DAQ. The buffer length of the DAQ is 5

idx	train id	reference train id	p1	p2	t1	t2
0	42	42	1	a	100	100
1	46	47	3	no update	500	600
2	47	51	4	no update	600	no update
3	48	53	5	c	700	700
4	45	56	2	b	400	400
5	49	53	5	e	700	800

Here, data with idx:=4 occurred "late". In the DAQ it will be applied best effort, i.e. to the first train still within the buffer, , which in this case would be the train idx:=2, since the train with idx:=1 is already outside the buffer at this point. The file data will thus look like this:

idx	train id	reference train id	p1	p2	t1	t2
0	42	42	1	a	100	100
1	46	47	3	a	500	600
2	47	51	4	b	600	400
3	48	53	5	c	700	700
4	49	54	5	e	700	800

Note how the timestamps now do not monotonically increase.

…fault

steffenhauf · 2024-03-21T13:32:15Z

I'm not sure what to do regarding the failing coverage check - Coverage decreased because of line wrapping as far as I can tell.

takluyver · 2024-03-21T16:00:47Z

I understand that we can get this while the DAQ itself is behaving as expected. But do we still consider that the data recording system as a whole has malfunctioned? It seems like either the stored value at idx 1 is wrong because a valid update didn't make it to the DAQ in time, or the timestamp associated with idx 2 is wrong because something incorrectly backdated a change that really happened later.

So it seems reasonable to check for this and mark it as a problem, even if it's not a bug in the DAQ. If there are contexts where this occurs a lot and we want to check for other issues, perhaps the check should be opt-out rather than opt-in?

takluyver · 2024-03-21T16:06:28Z

(And don't worry about the coverage measurement - it's useful to have the measurements and the reminder, but we don't generally insist that every changed line must be covered, and codecov can give odd results on PRs in any case)

steffenhauf · 2024-03-21T16:10:56Z

If the data is late, then the data recording system as a whole is misbehaving. But I would argue we should track this e.g. through the new ERRATA section which has the purpose to inform that data could not be stored fully, or normally, or both. In my view there is no implied causality of a monotonically increasing train id and timestamp (at least for Karabo). Consider for instance a scenario where a device connects to hardware which has an internal buffer to which it writes its measurement data non-sequentially and in large intervals. The device now pulls all entries in the buffer once so often and they are returned in a readout order that is not necessarily the write order. It sends out one buffer entry per train, and because the developer knows that the data might be quite old (by DAQ standards), they attach the train id of when the data was read out to the train, and keep the timestamp as recorded by the hardware. In my view, while hypothetical (I'm not aware of such a device existing), this is a completely valid read out approach

the developer took care that data can be digested by a DAQ with a finite train buffer length
the developer preserved as much of the time information as possible, i.e. the time stamps
the DAQ will ingest the data and also preserve the information given to it.

I don't see why this would be an error. Timestamps and train ids are not coupled in Karabo exactly for this reason.

takluyver · 2024-03-21T16:48:12Z

because the developer knows that the data might be quite old (by DAQ standards), they attach the train id of when the data was read out to the train, and keep the timestamp as recorded by the hardware...

You're obviously the expert here, but this sounds wrong to me? Or if that kind of thing is really expected, we might need to substantially rethink how people access data 😞 . We assume that data saved with a train ID represents values physically measured at the time of that train, not when one piece of hardware/software sends it to another. This assumption is fundamental to correlating data from different sources, it's baked into the design of EXtra-data and what we've taught people over the last few years. If it's not reliable, figuring out what the value really was for a given train becomes much more complicated.

So what I'd expect to happen in that scenario is that something ( ™️ ) takes the hardware timestamps and translates them to train IDs so the DAQ can at least make an effort to write the correct values for the trains they correspond with.

Even with that, though, I don't think it's viable for the update interval to be arbitrarily long, because that means we can miss an arbitrary amount of data at the end of a run. And worse, I don't think we can even represent missing CONTROL data at present, so there would just be wrong data.

steffenhauf · 2024-03-21T18:32:50Z

Don't worry, you don't have to scrap all (or any) assumptions. The something (TM) are the Karabo devices, and we take great care that the train id matches the time points when the data is valid. I chose this hypothetical example to illustrate why there is not enforced coupling of timestamp and train ids on the control system side. Any real world device would have either had reason to see the train ids at the time of read out as valid for the data, e.g., the developer would have reason to believe the timestamps are inconsistent, or would have aligned train ids with timestamps.

The unique time correlator is the train id, and a device developer will take care that data assigned one train id will correlate in time with data with the same train id. The timestamps usually will as well, but are "just" metadata, and there's nothing that technically enforces a correlation. In other words, one should still be able to correlate data by train id, even if no timestamp was assigned to it.

You are correct that the acceptance interval of the DAQ is finite in length, and thus at the end there is always a chance that some data that was generated after the interval with a train id inside a run's train id range was not accepted. You would indeed need an infinite interval to handle all cases.

In practice with a combination of ERRATA, best effort application of late values as they come, and Influx DB we are pretty well setup though.

The whole motivation of this suggestion is not to question or cast our data correlation model in doubt. It covers all cases that are coverable with reasonable (and even more) effort in my view, and we go through quite some length to at least log when the DAQ technically had to resort to fallbacks.

What I want to motivate is that there are better indicators than not monotonically increasing timestamps for late or disordered data, such as evaluating the ERRATA section. Thinking about it, maybe one should strive for a combination of both indicators. A non-monotonic timestamp that cannot be explained by ERRATA logs is a validation error, while one that is warrants more of an informational output?

takluyver · 2024-03-22T10:53:12Z

Thanks! So in the scenario you initially described, with an out-of-order update arriving too late to be fully backdated, there would be some information in ERRATA about this? What would that look like?

For extra-data-validate, I'd still be inclined to say that it should flag this scenario as a problem by default - it's a problem in the saved data that we can detect, even if it's not a DAQ bug.

philsmt · 2024-03-25T10:52:52Z

I can definitely see both side of the argument. Even if we accept no fixed correlation between train ID and timestamp as illustrated by Steffen (and thus do not consider it an error), we're left with the fact that it's still unexpected or out of the ordinary, and thus extra-data-validate would be a tool to draw attention to it.

Does it make sense now to introduce two tiers of validation problems, something akin to an error (critical to properly accessing the data) and some warning? It seems like extra-data-validate is increasingly used in a CI-sense requiring a boolean result whether we consider a file acceptable, as well as in the historical sense of running it more manually "is-something-up-in-the-files". The check for monotonic timestamps naturally would fall in the latter category.

takluyver · 2024-03-25T15:07:41Z

Thanks! That may well make sense, although you could argue that problems which produce valid files with potentially incorrect data are more severe than problems which produce files we can't read. They're certainly different kinds of problem, though, so it makes sense to allow separating them.

Maybe like code linter tools, we should have IDs for the different checks we do so we can have a standard CLI for enabling/disabling each check, or enabling but not counting it as a failure for the exit code. Or maybe that's overkill. 🤔

takluyver · 2024-05-17T15:12:36Z

From our discussion a few weeks ago, I've had a go in #522 at making specific checks skippable with an option like:

extra-data-validate --skip control_timestamps_order

Do not check for monitonic increase of timestamps in validation by de…

f37bc30

…fault

steffenhauf changed the title ~~Do not check for monitonic increase of timestamps in validation by default~~ Do not check for monotonic increase of timestamps in validation by default Mar 21, 2024

steffenhauf added 2 commits March 21, 2024 09:57

Correct passed args

42cae3f

Adapt tests

1963ef8

takluyver mentioned this pull request May 17, 2024

Allow skipping specific checks in extra-data-validate #522

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not check for monotonic increase of timestamps in validation by default #503

Do not check for monotonic increase of timestamps in validation by default #503

steffenhauf commented Mar 21, 2024 •

edited

Loading

steffenhauf commented Mar 21, 2024

takluyver commented Mar 21, 2024

takluyver commented Mar 21, 2024

steffenhauf commented Mar 21, 2024

takluyver commented Mar 21, 2024

steffenhauf commented Mar 21, 2024 •

edited

Loading

takluyver commented Mar 22, 2024

philsmt commented Mar 25, 2024 •

edited

Loading

takluyver commented Mar 25, 2024

takluyver commented May 17, 2024

idx	train id	reference train id	p1	p2	t1	t2
0	42	42	1	a	100	100
1	46	47	3	a	500	600
2	47	51	4	b	600	400
3	48	53	5	c	700	700
4	49	54	5	e	700	800

idx	train id	reference train id	p1	p2	t1	t2
0	42	42	1	a	100	100
1	46	47	3	a	500	600
2	47	51	4	b	600	400
3	48	53	5	c	700	700
4	49	54	5	e	700	800

Do not check for monotonic increase of timestamps in validation by default #503

Are you sure you want to change the base?

Do not check for monotonic increase of timestamps in validation by default #503

Conversation

steffenhauf commented Mar 21, 2024 • edited Loading

steffenhauf commented Mar 21, 2024

takluyver commented Mar 21, 2024

takluyver commented Mar 21, 2024

steffenhauf commented Mar 21, 2024

takluyver commented Mar 21, 2024

steffenhauf commented Mar 21, 2024 • edited Loading

takluyver commented Mar 22, 2024

philsmt commented Mar 25, 2024 • edited Loading

takluyver commented Mar 25, 2024

takluyver commented May 17, 2024

steffenhauf commented Mar 21, 2024 •

edited

Loading

steffenhauf commented Mar 21, 2024 •

edited

Loading

philsmt commented Mar 25, 2024 •

edited

Loading

idx	train id	reference train id	p1	p2	t1	t2
0	42	42	1	a	100	100
1	46	47	3	a	500	600
2	47	51	4	b	600	400
3	48	53	5	c	700	700
4	49	54	5	e	700	800