Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: Ensure that events are aggregated only once #144

Merged
merged 1 commit into from
Nov 9, 2023

Conversation

psaiz
Copy link
Contributor

@psaiz psaiz commented Nov 8, 2023

❤️ Thank you for your contribution!

Description

Ensures that a second aggregation does not reprocess any aggregation already performed and that has not changed,

It addresses https://github.com/zenodo/rdm-project/issues/523

@psaiz psaiz assigned slint and ntarocco and unassigned slint and ntarocco Nov 8, 2023
@psaiz psaiz requested review from slint and ntarocco November 8, 2023 07:22
@@ -225,10 +225,9 @@ def agg_iter(self, dt, previous_bookmark):
)
if previous_bookmark:
# Let's make sure that both objects are of the same type. This is important for the test
previous = float(previous_bookmark.strftime("%s"))
previous = previous_bookmark.isoformat()
Copy link
Contributor

@ntarocco ntarocco Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@psaiz this is not now a string, but down there it is compared like a number/date with <. Are you sure that this is correct?
Are the tests catching this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this. It is a string, isn't it? Note that the comparison is with strings.

I've added these prints to make sure:

                 ).replace(**dict.fromkeys(INTERVAL_ROUNDING[self.interval], 0))
+                print("Comparing")
+                print(aggregation["last_update"]["value_as_string"])
+                print(previous)
                 if aggregation["last_update"]["value_as_string"] and previous:
                     if aggregation["last_update"]["value_as_string"] < previous:
+                        print("SKIPPING EVENT")
                         continue

And the test seems to be doing it properly:

Comparing
2017-06-02T11:00:00.000Z
None
Comparing
2017-06-02T11:00:00.000Z
None
Comparing
2017-07-01T05:00:00.000Z
2017-06-02T11:59:00
Comparing
2017-07-01T05:00:00.000Z
2017-06-02T11:59:00
Comparing
2017-07-01T05:00:00.000Z
2017-07-01T05:59:00
SKIPPING EVENT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I wanted to write it is now a string, but I wrote it is not a string, I am sorry :)
Do we really want to compare strings in this way?
https://docs.python.org/3/tutorial/datastructures.html#comparing-sequences-and-other-types

I would feel much safer if we convert everything first with datetime.fromisoformat(), and then we use the Python date comparison. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have no strong opinions about comparing strings or dates. The added value of comparing the dates is that there is no need to modify the incoming object.

Do you prefer this approach?

@psaiz psaiz force-pushed the reaggregate branch 2 times, most recently from f501160 to dcc08aa Compare November 8, 2023 10:25
@kpsherva kpsherva merged commit 76a8303 into inveniosoftware:master Nov 9, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants