-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About the mission and structure of alchemtest #56
Comments
At the moment the docs are API docs. If we had a more narrative part then we could highlight the datasets for users (e.g., for teaching the use of alchemlyb). Problem with real datasets is that they can be big. In MDAnalysisData we solve the problem by not hosting the data sets and instead delegating to archive-grade repositories. I think that's the right approach for any real data. We could, in principle, use the MDAnalysisData approach here, too. |
I'd also support a shift to the pattern of MDAnalysisData; where this package is just knowledgeable about where to download data sets from. |
For running the test suite for alchemlyb I'd want to keep the test data bundled to avoid further slow-down by having to download them every time in CI. I think the question is if alchemtests should also cater to the "teaching alchemlyb" angle. If so, we could add MDAnalysisData-style code (we can actually copy it from MDAnalysisData because it's all BSD-3). Or we create a new package alchemdata (or whatever works) and then have a clean separation between use cases. |
To bump this old issue: I'd be happy to also have alchemtest serve as a teaching tool that gives access to bigger datasets using accessor functions similar to what scikit-learn and MDAnalysisData have. I'd just want to ensure that the data that are needed for running the alchemlyb tests remains part of the repository itself. If anyone wants to get started adding tooling (and tests) for external datasets then please do and submit a PR!! |
From a user perspective, I would think that the mission of alchemtest would be to provide some "real" dataset that the user could use alchemlyb to analyse and see what kind of data they will generate and what kind of result they will get.
From the developer perspective, some datasets are required to test edge cases, such as the dataset with restart #55.
However, these datasets for testing edge cases might not be very useful for normal users and their existence might dissuade the user from the dataset that they might be more interested in.
So I think there could be two ways of doing it.
-We could have two sections in the doc, where the dataset that the user might be interested in are in one section and the dataset that the developer uploads to bump the test should be in another section.
-Or we could move the test dataset that the developer are interested in to alchemlyb?
Opinions are welcomed. (Or should I move it to the alchemlyb discussion?)
The text was updated successfully, but these errors were encountered: