You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I manually maintain my own dataset about Python packages available on PyPI (but more around dependency metadata and PyPI-specific information like maintainers). Do you have any interest in supporting these use-cases? Would happily stop maintaining my own dataset and point to py-code if this information is made available (your dataset is much more automated and has a nice frontend ✨)
Let me know what you think, and thanks again!
The text was updated successfully, but these errors were encountered:
sethmlarson
changed the title
Capture more PyPI-specific and metadata about packages
Capture more PyPI-specific and dependency metadata about packages
Sep 5, 2023
Hey! I absolutely do, something like this is the next phase of the "pypi-data cinematic universe". I have have some of this raw data already captured from pypi, but it seems you have enriched it a bit.
Right now we have a few disconnected pieces that we can jam together to do cool things:
Find the unique git OIDs of all some-interesting-file-name.py files, or others by a specific pattern
Fetch and parse the contents of those files to extract some interesting metrics, producing a mapping of {git_oid: stats}
Turn the mapping of {git_oid: stats} to {(project_name, project_version): stats} using the git_oid and the datasets in this repo
Turn {(project_name, project_version): stats} into anything, by joining the (project_name, project_version) on another dataset (like yours)
So with this we could parse all .py files, count the number of classes, and plot "classes written over time, segmented by PyPI trove classifier/other pypi metadata/number of downloads/maintainer/whatever".
The problem is that this is all disconnected and a bit shit. I want this to be relatively seamless because I'm sick of doing it manually 😂.
I'm working on a CLI tool to handle step 1, 2 and 3 for users, but step 4 is pretty interesting.
Perhaps we could take the pypi-json-data dataset, enrich it a bit and provide it in some format that can be used as part of this workflow?
That data could also be explorable via py-code.org, I've been thinking of adding some info from pypi-json-data to the site. not sure what format it should be in though.
Hello @orf, I absolutely love https://py-code.org! Thank you for creating this service.
I manually maintain my own dataset about Python packages available on PyPI (but more around dependency metadata and PyPI-specific information like maintainers). Do you have any interest in supporting these use-cases? Would happily stop maintaining my own dataset and point to py-code if this information is made available (your dataset is much more automated and has a nice frontend ✨)
Let me know what you think, and thanks again!
The text was updated successfully, but these errors were encountered: