About the mission and structure of alchemtest #56

xiki-tempula · 2021-08-28T16:39:17Z

From a user perspective, I would think that the mission of alchemtest would be to provide some "real" dataset that the user could use alchemlyb to analyse and see what kind of data they will generate and what kind of result they will get.

From the developer perspective, some datasets are required to test edge cases, such as the dataset with restart #55.

However, these datasets for testing edge cases might not be very useful for normal users and their existence might dissuade the user from the dataset that they might be more interested in.

So I think there could be two ways of doing it.

-We could have two sections in the doc, where the dataset that the user might be interested in are in one section and the dataset that the developer uploads to bump the test should be in another section.

-Or we could move the test dataset that the developer are interested in to alchemlyb?

Opinions are welcomed. (Or should I move it to the alchemlyb discussion?)

orbeckst · 2021-10-06T20:56:40Z

At the moment the docs are API docs. If we had a more narrative part then we could highlight the datasets for users (e.g., for teaching the use of alchemlyb).

Problem with real datasets is that they can be big. In MDAnalysisData we solve the problem by not hosting the data sets and instead delegating to archive-grade repositories. I think that's the right approach for any real data. We could, in principle, use the MDAnalysisData approach here, too.

richardjgowers · 2022-05-16T15:26:31Z

I'd also support a shift to the pattern of MDAnalysisData; where this package is just knowledgeable about where to download data sets from.

orbeckst · 2022-05-16T15:53:42Z

For running the test suite for alchemlyb I'd want to keep the test data bundled to avoid further slow-down by having to download them every time in CI.

I think the question is if alchemtests should also cater to the "teaching alchemlyb" angle. If so, we could add MDAnalysisData-style code (we can actually copy it from MDAnalysisData because it's all BSD-3). Or we create a new package alchemdata (or whatever works) and then have a clean separation between use cases.

orbeckst · 2022-07-28T05:06:55Z

To bump this old issue: I'd be happy to also have alchemtest serve as a teaching tool that gives access to bigger datasets using accessor functions similar to what scikit-learn and MDAnalysisData have. I'd just want to ensure that the data that are needed for running the alchemlyb tests remains part of the repository itself.

If anyone wants to get started adding tooling (and tests) for external datasets then please do and submit a PR!!

orbeckst added the enhancement label Jul 28, 2022

xiki-tempula mentioned this issue Sep 25, 2022

AMBER tests could be clearer #65

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the mission and structure of alchemtest #56

About the mission and structure of alchemtest #56

xiki-tempula commented Aug 28, 2021 •

edited

Loading

orbeckst commented Oct 6, 2021

richardjgowers commented May 16, 2022

orbeckst commented May 16, 2022

orbeckst commented Jul 28, 2022

About the mission and structure of alchemtest #56

About the mission and structure of alchemtest #56

Comments

xiki-tempula commented Aug 28, 2021 • edited Loading

orbeckst commented Oct 6, 2021

richardjgowers commented May 16, 2022

orbeckst commented May 16, 2022

orbeckst commented Jul 28, 2022

xiki-tempula commented Aug 28, 2021 •

edited

Loading