You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey! Thanks for the kind words @venthur. Sorry about the late reply: I had these messages filtered out.
It's been a while since I looked at this project, but you're right: it's not a SHA256 hash. I dug into where I generate this, and for some reason I chose to use Oid::hash_objecthere, from libgit2.
That was... an unfortunate decision, and I can't see why I chose to do it that way. Parts of this project where pretty experimental, and I was pretty much learning Rust at the time.
So it's going to be a SHA1 hash of blob ${length}\0${content}. Which is actually so inconvenient.
According to the docs, the metadata-dataset about every file uploaded to PyPI, i.e. the parquet files listed in https://github.com/pypi-data/data/raw/main/links/dataset.txt, contain a SHA256 hash. However, it is not described how the hash is calculated.
When trying to verify that you calculate the SHA256 over the respective file itself, i encountered some issues:
Can you explain, which hash you are using and if you are hashing the contents of the file linked to via the
path
?Thank you very much for the awesome dataset!
The text was updated successfully, but these errors were encountered: