Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(preprocessing): parse dates into date ranges to correctly represent incomplete dates #3263

Merged
merged 17 commits into from
Dec 2, 2024

Conversation

corneliusroemer
Copy link
Contributor

@corneliusroemer corneliusroemer commented Nov 20, 2024

Discussed on Slack: https://loculus.slack.com/archives/C05G172HL6L/p1732110859357449

preview URL: https://date-ranges.loculus.org

Summary

  • Replace Ebola Zaire with Ebola Sudan to have a small, fast loading, real organism (Zaire is 3k sequences, Sudan 150, we already have large ones with WNV and CCHF). This speeds up time to useful preview.
  • Change processing of dates to make sampleCollectionDate be of type string and of form YYYY or YYYY-MM or YYYY-MM-DD (instead of type date with YYYY-MM-DD and MM and DD forced to 01 if missing)
  • Introduce new processed data fields of type date sampleCollectionDateRangeLower and sampleCollectionDateRangeUpper
  • Upper bound of date range is constrained by submission date (sequences cannot be from samples collected after they were submitted)
  • Currently no lower bound is chosen, though doing so might be useful for ranged searches - we might decide to introduce an artificial min date for search purposes only, otherwise this might require a LAPIS/SILO feature, see Allow including null fields in filter query with addition of a new query param flag GenSpectrum/LAPIS#1012 for thoughts

Screenshot

Brave Browser 2024-11-25 16 38 29

PR Checklist

  • All necessary documentation has been adapted.
  • The implemented feature is covered by an appropriate test.

@corneliusroemer corneliusroemer added the preview Triggers a deployment to argocd label Nov 20, 2024
@corneliusroemer corneliusroemer changed the title Parse dates into ranges feat(preprocessing): parse dates into date ranges to correctly represent incomplete dates Nov 25, 2024
Copy link
Contributor

@anna-parker anna-parker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some test fixes - could you take a look and merge - then Im happy to approve :-)

@corneliusroemer corneliusroemer added the review please PR waiting for final review label Dec 2, 2024
@corneliusroemer
Copy link
Contributor Author

Thanks for the reviews @fhennig and @anna-parker - have a look again, should be better now!

@fhennig
Copy link
Contributor

fhennig commented Dec 2, 2024

thx for the changes! Having a look now 👀

Copy link
Contributor

@fhennig fhennig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, good work!

Copy link
Contributor

@anna-parker anna-parker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me - would be great to also add the option to parse input ranges, but we can add that in a follow up PR :-)

@corneliusroemer corneliusroemer merged commit 0fdac51 into main Dec 2, 2024
17 checks passed
@corneliusroemer corneliusroemer deleted the date-ranges branch December 2, 2024 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Triggers a deployment to argocd review please PR waiting for final review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants