Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLD: need PyPI package eventually #15

Open
tylerjereddy opened this issue Jan 7, 2022 · 3 comments
Open

BLD: need PyPI package eventually #15

tylerjereddy opened this issue Jan 7, 2022 · 3 comments

Comments

@tylerjereddy
Copy link
Collaborator

We'll eventually need/want a PyPI package for the project so it can leveraged easily in i.e., asv benchmarks. Jakob mentioned that we might want to consider a "lazy" approach where the PyPI package doesn't contain the data files, but rather can be used to download them. Not sure on the complexity/usage tradeoffs there.

@tylerjereddy
Copy link
Collaborator Author

Some options I saw mentioned upstream recently (because SciPy is considering a datasets subpackage--the discussion is quite interesting: scipy/scipy#8707) are:

I wonder how much reinventing the wheel we've done with get_log_path() in the man repo--probably not much, since that is small and has a pytest specific purpose, though the design may be further refined over time by studying the above options.

@nawtrey
Copy link
Contributor

nawtrey commented Jan 26, 2022

Taking a brief look at each, I think I would lean towards pooch. It looks like it has more example use cases and the documentation seems friendly enough. I would have to do some prototyping to see how these things would work for our different use cases (testing, asv, etc.), but it seems like it offers more flexibility for fetching logs.

@tylerjereddy
Copy link
Collaborator Author

scikit-image is using pooch, so mimicking their approach may be viable perhaps. It sounds like SciPy wants to go in that direction as well based on the community call this morning. One point of deviation from us is that they don't see much value in having a minimal vs. full testsuite (i.e., just make the datasets a mandatory component of running the testuite, since test-time dependencies are not much extra burden). That would probably make our lives easier as well from an engineering standpoint, although I think the Argonne folks were in favor of maintaining the minimal suite as well.

On top of us avoiding slow uploads/downloads of a PyPI package, we also should try to help the community reduce bandwidth where reasonable--PyPI annual bandwidth bill is approaching US $25 million: https://dustingram.com/articles/2021/04/14/powering-the-python-package-index-in-2021/ .

We can probably also use CI "caching" features to prevent pulling the logs repo/data in fresh on every single CI flush of the main repo as well, though this currently runs pretty quickly anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants