Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a repetition check for release date #78

Open
jpmckinney opened this issue Jan 20, 2023 · 3 comments
Open

Add a repetition check for release date #78

jpmckinney opened this issue Jan 20, 2023 · 3 comments
Labels
dataset checks Relating to dataset-level checks new check

Comments

@jpmckinney
Copy link
Member

Instead of the chart on the Overview page, we can have a dataset-level check for release date repetition.

Should the granularity for repetition be day, week, or month?

A common issue is to use the request time as the release date. If a spider is slow, it might take several days. In that case, grouping by week might be better (of course, it's still possible that the spider will bridge two weeks depending on when it started, but that problem will also occur if we group by month – though less frequently).

If we have a lot of spiders that take more than a week, we maybe want to group by month. I don't know if there is a real risk of causing false positives if we use a monthly granularity.

Another common issue is to use the creating time as the release date: for example, when creating historical releases. If the publisher's export process is slow, this could also take several days.

cc @yolile for input on the methodology to use.

@jpmckinney jpmckinney added dataset checks Relating to dataset-level checks new check labels Jan 20, 2023
@yolile
Copy link
Member

yolile commented Jan 20, 2023

For the first case, we have two real examples: Argentina Vialidad (they have only one bulk file, same date for all the releases) and Paraguay Hacienda (slow API, but still only two different days (https://data.open-contracting.org/es/publication/62)

I think we could use weeks for now.

@jpmckinney
Copy link
Member Author

Should we implement this check in isolation, or combine it with a repetition of the release date checks from: contracting process timeline, milestone dates, amendment dates, document dates ?

@yolile
Copy link
Member

yolile commented Jun 24, 2024

I think it is fine to implement this one in isolation, as we won't be using other dates for what is described in this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset checks Relating to dataset-level checks new check
Projects
None yet
Development

No branches or pull requests

2 participants