Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More diagnostics and ASCII refstrings #4

Merged
merged 6 commits into from
Mar 2, 2022
Merged

More diagnostics and ASCII refstrings #4

merged 6 commits into from
Mar 2, 2022

Conversation

pkgw
Copy link
Contributor

@pkgw pkgw commented Mar 2, 2022

Based on my investigations, it looks like we get better resolver results when we emit refstrings in ASCII, on average — which isn't ideal, cf. adsabs/reference_service#52 . Also add some more diagnostics for reference resolution.

…utions

This allows us to do analytics on the metric that counts, without having
to wait hours and hours to resolve refstrings for every single last
item.
As described in the comment, at the moment, we get the greatest success
*on average* by reducing our refstrings down to ASCII right now. That's
disappointing, but for the time being, let's do what works. This imports
an MIT-licensed file that provides a useful, if naive, routine for
normalizing out various pieces of common Unicode punctuation.
@pkgw pkgw merged commit 740c401 into adsabs:main Mar 2, 2022
@pkgw pkgw deleted the iterate branch March 2, 2022 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant