BLD: need PyPI package eventually #15

tylerjereddy · 2022-01-07T15:02:55Z

We'll eventually need/want a PyPI package for the project so it can leveraged easily in i.e., asv benchmarks. Jakob mentioned that we might want to consider a "lazy" approach where the PyPI package doesn't contain the data files, but rather can be used to download them. Not sure on the complexity/usage tradeoffs there.

The text was updated successfully, but these errors were encountered:

tylerjereddy · 2022-01-15T22:46:03Z

Some options I saw mentioned upstream recently (because SciPy is considering a datasets subpackage--the discussion is quite interesting: scipy/scipy#8707) are:

I wonder how much reinventing the wheel we've done with get_log_path() in the man repo--probably not much, since that is small and has a pytest specific purpose, though the design may be further refined over time by studying the above options.

nawtrey · 2022-01-26T17:36:31Z

Taking a brief look at each, I think I would lean towards pooch. It looks like it has more example use cases and the documentation seems friendly enough. I would have to do some prototyping to see how these things would work for our different use cases (testing, asv, etc.), but it seems like it offers more flexibility for fetching logs.

tylerjereddy · 2022-03-02T16:43:52Z

scikit-image is using pooch, so mimicking their approach may be viable perhaps. It sounds like SciPy wants to go in that direction as well based on the community call this morning. One point of deviation from us is that they don't see much value in having a minimal vs. full testsuite (i.e., just make the datasets a mandatory component of running the testuite, since test-time dependencies are not much extra burden). That would probably make our lives easier as well from an engineering standpoint, although I think the Argonne folks were in favor of maintaining the minimal suite as well.

On top of us avoiding slow uploads/downloads of a PyPI package, we also should try to help the community reduce bandwidth where reasonable--PyPI annual bandwidth bill is approaching US $25 million: https://dustingram.com/articles/2021/04/14/powering-the-python-package-index-in-2021/ .

We can probably also use CI "caching" features to prevent pulling the logs repo/data in fresh on every single CI flush of the main repo as well, though this currently runs pretty quickly anyway.

tylerjereddy mentioned this issue Feb 18, 2022

BENCH: peakmem benchmarks for heatmap darshan-hpc/darshan#655

Merged

tylerjereddy mentioned this issue Mar 4, 2022

BENCH, ENH: runtime heatmap performance on Python side darshan-hpc/darshan#683

Open

tylerjereddy mentioned this issue May 16, 2022

MAINT, CI: cibuildwheel support darshan-hpc/darshan#741

Merged

shanedsnyder mentioned this issue May 17, 2022

MAINT, CI: use darshan-logs repo in cibuildwheel tests darshan-hpc/darshan#744

Closed

tylerjereddy mentioned this issue Jun 15, 2022

TST: test infrastructure for pulling in log test data darshan-hpc/darshan#420

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BLD: need PyPI package eventually #15

BLD: need PyPI package eventually #15

tylerjereddy commented Jan 7, 2022

tylerjereddy commented Jan 15, 2022

nawtrey commented Jan 26, 2022

tylerjereddy commented Mar 2, 2022

BLD: need PyPI package eventually #15

BLD: need PyPI package eventually #15

Comments

tylerjereddy commented Jan 7, 2022

tylerjereddy commented Jan 15, 2022

nawtrey commented Jan 26, 2022

tylerjereddy commented Mar 2, 2022