-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
demes-python and demes-spec both carry example yaml files #203
Comments
Picking up this thread again. @molpopgen and I were just putting together some tests for What do we think about this? |
I think having the examples as data files is a good idea. There's the testing angle, and then |
While I'm thinking about this: is there a "version" field in our spec somewhere? That'd be quite useful to track changes in models over time. |
Having some set of example models available as data files might be useful, yeah. Here are a few additional considerations.
We don't have a version field for the models, no. What would a simulator would do with this field though? If a model has a bug, it should be fixed and a new software release (e.g. stdpopsim release) will include the fix. If a model is otherwise changed, it should be given a different filename (unless maybe it's a toy/testing model in |
Right, these would be different.
Wouldn't the edge cases belong in
Why prefer this package? I think this may be problematic for simulation authors whose work is not supported by stdpopsim.
I guess I was thinking of distinguishing testing models from published models like the Gutenkunst, for example
Ultimately, this would end up in the metadata in a tree sequence file. Changing the file name breaks API for down stream packages. Changing the version number does not, and makes the metadata no longer equal, which may be useful. |
Collecting demographic models is a core goal of the stdpopsim package. It shouldn't matter whether fwdpy11 is a supported backend simulation engine in stdpopsim or not, because the YAML files will be there for anyone to use. Making these available as data files in a standard location seems like a good idea and should be fairly easy. Then they could be accessed with a simple wrapper function like this: import pkgutil # core python lib
import demes
def get_stdpopsim_model(species_id: str, model_id: str) -> demes.Graph:
# pkgutil.get_data() returns the file contents as bytes
yaml_bytes = pkgutil.get_data("stdpopsim", "demographic_models/{species_id}/{model_id}.yaml")
yaml_str = yaml_bytes.decode()
return demes.loads(yaml_str)
graph = get_stdpopsim_model("HomSap", "OutOfAfricaArchaicAdmixture_5R19") We could probably even separate out the data files into a
Hmm... ok. But wouldn't that be just as easily dealt with by including the YAML contents in the tree sequence provenance? Or a checksum? One could do something slightly more sophisticated like this: import hashlib # core python lib
import demes
def graph_checksum(graph: demes.Graph) -> str:
# We'll probably make a gaurantee that the asdict() result will be stable
# across `demes` versions, barring bugs (but not for asdict_simplified()).
data = graph.asdict()
# delete stuff we don't want in the checksum
for key in ("description", "doi"):
data.pop(key, None)
for deme in data["demes"]:
deme.pop("description", None)
if "ancestors" in deme:
# sort ancestors/proportions lists
deme["ancestors"], deme["proportions"] = zip(
*sorted(zip(deme["ancestors"], deme["proportions"]))
)
m = hashlib.sha256()
m.update(repr(data).encode())
return m.hexdigest() |
I think that we'd want to keep the |
This is my concern as well. A |
A related concern is licensing. |
To clarify, I'm not suggesting that demes-python would depend on stdpopsim, nor that demes-python would need to access the stdpopsim models at all. |
I'm with @grahamgower here --- stdpopsim will have a large number of curated demes models which we can easily add in a supported interface for accessing programatically. This is entirely separate from the demes package./ I don't think it's a good idea shipping example models as part of the package. We are taking on a responsibility of ensuring that they are correct, which I don't want to do. (Even if we say they're just examples and not to do it, people will assume they are correct anyway.) It's just maintenance burden for no real benefit, IMO. |
I'd like to avoid having the demographic models shipping with that package. It brings in a bunch of dependencies that users of other simulators just don't need. And it would also force other developers to keep up with their dependencies, such as the minimum python version, which has dropped support for 3.6 ahead of its end of life. That kind of stuff just poses complications for other tools. I propose an entirely separate repository for the curated models. That keeps it as minimal as possible, and it doesn't even need to be Python. It could just be a bunch of directories organized in some logical way, and consumers could deal with it however they see fit. |
The more I think about it, the more it makes sense that the models shouldn't be associated with a Python package at all. If someone re-implements the specification another language, then it's weird for them to do something like include a Python package as a sub module. |
Sure, I agree with your points here @molpopgen, but it's going to be a lot of work to make all this happen. We need a lot of infrastructure to make stdpopsim happen and it's going to need to be refactored quite a lot to keep it updated over time. It's a lot of work, and I'm not convinced there's really that much benefit in breaking out the demes specifications. If we just want a handful of examples for documentation purposes, then I think we already have that. If we want a set of carefully chosen examples that illustrate various aspects of the spec, and that form set of positive and negative test cases for evaluating parsers/implementations, then models from the literature are useless there anyway and we should make up artificial ones. ps. Are there that many deps with stdpopsim? It's just msprime and a few standard Python things at the moment AFAIK, pretty lightweight in the scheme of things. |
This is actually really annoying to discover. I ended up installing
|
Maybe we punt this for now then. Personally, I think there are quite a lot of extra dependencies of I agree with @molpopgen that breaking out the (future) catalog of |
I think there's some crossed wires here: nobody is suggesting that stdpopsim becomes a dependency of demes. Graham and I are just saying that if someone wants a set of curated demes models from the literature, they can get them from stdpopsim. If you have a python package that needs these, then a dependency on stdpopsim seems a reasonable price. I guess we could break out the stdpopsim-catalog as a separate minimal package at some point, but there's a lot of work involved and not that many developers. |
Ok - yeah, I think there were some crossed wires, and I didn't understand the suggestions I think. Sorry about that! That's all quite reasonable, and I agree it's a lot of work and few developers, but if there is good reason to, I think it's still worth considering when we get to that point of converting the stdpopsim models. More simply, what are we doing with the YAML files in the examples dir? Are they used in the docs, and if not then should they be removed so that they don't need to be maintained? Jerome you make the good point that we don't want to guarantee their correctness here, and maintaining them can be a hassle.. |
They've served a use during the initial stages of development, where we wanted to exchange our thoughts about how the yaml format should look, but that's probably coming to an end. We do still want to include some yaml files for the tests, so probably we should just move them somewhere under the |
I think there's a fairly clear case for using a separate license (or dual license) for the yaml files that will be included in stdpopsim. While there is some intellectual property associated with implementing and curating the models, these also derive from details in publications. Certainly these models are "free for all to use" in spirit, and distributing stdpopsim under the GPL because the python code does |
It's useful to have examples in each repository for testing, and probably awkward to force devs to clone both repositories just to run the tests. So if we keep both copies, we should have a way to synchronise them, or at least validate the demes-python repository examples against the spec.
The text was updated successfully, but these errors were encountered: