-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove lastmod tag in Sitemap for reindexing #7801
Conversation
-Removed lastmod tag for search.gov reindexing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@klin2020 Thanks for the detailed explanation. Some questions:
- Which meta fields will search.gov use when we re-index after we remove the
lastmod
field? - We don't use a
lastmod
field in the markdown to set for each page, is this something we should consider adding in the future if we want to improve our sitemap? - Could we use
.Params.date
field to as the next best option for setting thelastmod
field?
<lastmod>{{ safeHTML ( .Params.date "2006-01-02T15:04:05-07:00" ) }}</lastmod>{{ end }}{{ with .Sitemap.ChangeFreq }}
I was wondering this as well. I was expecting the last modified date to default to the date published. |
|
To be clear, I'm not sure we need to remove
Unless I'm misunderstanding something, updating the Please explain the proper procedure if I am incorrect. If I am not, please update the |
@nick-mon1 @RileySeaburg Re-introduced lastmod tag with page date. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@klin2020 requested changes.
@@ -2,8 +2,8 @@ | |||
{{ range .Data.Pages }} | |||
<url> | |||
<loc>https://digital.gov{{ .Permalink | relURL }}</loc>{{ if not .Lastmod.IsZero }} | |||
<lastmod>{{ safeHTML ( .Lastmod.Format "2006-01-02T15:04:05-07:00" ) }}</lastmod>{{ end }}{{ with .Sitemap.ChangeFreq }} | |||
<changefreq>{{ . }}</changefreq>{{ end }}{{ if ge .Sitemap.Priority 0.0 }} | |||
<lastmod>{{ safeHTML ( .Date | time.Format "2006-01-02T15:04:05-07:00" ) }}</lastmod>{{ end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you filter the date to YYYY-MM-DD?
.Lastmod.Format "2006-01-02"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you and great job!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@klin2020 This looks great, thanks for following up with the search team.
Note
Just a note for pages with no date will display:
<url>
<loc>https://digital.gov/authors/alicia-rouault/</loc>
<lastmod>0001-01-01</lastmod>
<changefreq>daily</changefreq>
<priority>1</priority>
</url>
Removed lastmod tag for review again @nick-mon1 @RileySeaburg |
I'm going to merge this so we can test the re-index today. @mejiaj there will be another PR where the tag is added back in. |
- Please note that pages with no date, such as authors, still display 0001-01-01 as date
187ce99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I Tested
- ran hugo build
- checked the
public/sitemap.xml
andlastmod
uses thedate
field as the value
Summary
Current search on DG results in outdated articles that bury more recent articles.
Search.gov search results are based on a ranking algorithm that looks at the tag in a website's sitemap or a page's date, whichever is most recent. Our sitemap currently updates the tag to be the current date, leading to the ranking algorithm to weigh every page on DG equally, rather than by its proper publish date
Solution
Remove the tag in the DG sitemap build, so when we re-index DG, the re-index will use the page metadata for its proper date.
Once re-index occurs, we can edit the tag to reflect the page's publish date, rather than the current date.
Screenshots
Current sitemap (including ). Every date reflects the same date, causing issues with the ranking algorithm
Proposed change to sitemap (temporarily remove for Search.gov re-indexing)