You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, there is only one standard metadata named Date in ZIM metadata. Documentation specifically states this is the ZIM creation date.
There is no standard metadata to store information about when the ZIM content has been captured / fetched / crawled / scraped / ...
Given the fact that we rebuild regularly ZIMs (see ZIM Update v2 at #35 and https://wiki.openzim.org/wiki/ZIM_Updates) and we more and more process content that has been harvested at a time different than the ZIM creation (all stackexchange, some zimit with warcs reprocessed), it is useful to consider adding a new standard metadata to store this information.
Given the fact that content (e.g. with zimit) can be scrapped across multiple days, it seems important that the date is in fact a range from-to.
Just like current Date metadata, I think that we should keep this metadata understandable / easy to grab by keeping it only a day, not a day+time.
Given the fact that some content might come with lower precision than a day (e.g. when a content provider says "this is the content for April 2023, do not mind which day I published it"), I think we need to allow passing only a month or only a year in this metadata.
I hence propose to introduce this new standard ZIM metadata:
Name: ContentDate
Mandatory: No
Description: Date of the content, i.e. when content has been fetched to create the ZIM ; preferably a day (ISO format YYYY-MM-DD) but can be a year (YYYY) or month (YYYY-MM) if daily precision makes no sense ; can be a single value or a range from start to end, with format "from,to"
Examples: 2012-11 or 2023-01-12,2023-01-15
WDYT?
The text was updated successfully, but these errors were encountered:
Currently, there is only one standard metadata named
Date
in ZIM metadata. Documentation specifically states this is the ZIM creation date.There is no standard metadata to store information about when the ZIM content has been captured / fetched / crawled / scraped / ...
Given the fact that we rebuild regularly ZIMs (see ZIM Update v2 at #35 and https://wiki.openzim.org/wiki/ZIM_Updates) and we more and more process content that has been harvested at a time different than the ZIM creation (all stackexchange, some zimit with warcs reprocessed), it is useful to consider adding a new standard metadata to store this information.
Given the fact that content (e.g. with zimit) can be scrapped across multiple days, it seems important that the date is in fact a range from-to.
Just like current
Date
metadata, I think that we should keep this metadata understandable / easy to grab by keeping it only a day, not a day+time.Given the fact that some content might come with lower precision than a day (e.g. when a content provider says "this is the content for April 2023, do not mind which day I published it"), I think we need to allow passing only a month or only a year in this metadata.
I hence propose to introduce this new standard ZIM metadata:
ContentDate
No
Date of the content, i.e. when content has been fetched to create the ZIM ; preferably a day (ISO format YYYY-MM-DD) but can be a year (YYYY) or month (YYYY-MM) if daily precision makes no sense ; can be a single value or a range from start to end, with format "from,to"
2012-11
or2023-01-12,2023-01-15
WDYT?
The text was updated successfully, but these errors were encountered: