Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate "extra" dependencies #389

Open
jsstevenson opened this issue Apr 10, 2024 · 5 comments
Open

Eliminate "extra" dependencies #389

jsstevenson opened this issue Apr 10, 2024 · 5 comments
Labels
2.0-alpha Issues related to VRS 2.0-alpha branch technical debt A feature/requirement implemented in a sub-optimal way & must be re-written. Contrast to "cleanup"

Comments

@jsstevenson
Copy link
Contributor

jsstevenson commented Apr 10, 2024

I'm not sure if a firm decision has been made on this but I've heard a few thoughts expressed on the matter:

  • Rename "extras" to something more expressive ("tools" or something?)
  • Merge the extra dependencies into the main dependencies
  • Separate the models/serialization business and the extra tooling into separately installable packages ("core" vs "tools" ?)

edit: had another thought to toss out: move extras to a different module level, a la

src
└── ga4gh
    ├── core
    ├── vrs    
    └── extras. # or "tools" or w/e
        ├── __init__.py
        ├── decorators.py
        ├── object_store.py
        ├── translator.py
        └── vcf_annotation.py
        └── utils
            └── hgvs_tools.py
@jsstevenson jsstevenson added 2.0-alpha Issues related to VRS 2.0-alpha branch technical debt A feature/requirement implemented in a sub-optimal way & must be re-written. Contrast to "cleanup" labels Apr 10, 2024
@korikuzma
Copy link
Contributor

@ga4gh/vrs-python-maintainers @ga4gh/vr-maintainers should provide thoughts

@korikuzma
Copy link
Contributor

  • Rename "extras" to something more expressive ("tools" or something?)

I am pro more meaningful name. Tools is fine to me

  • Merge the extra dependencies into the main dependencies
  • Separate the models/serialization business and the extra tooling into separately installable packages ("core" vs "tools" ?)

I would prefer to keep them separate since in gene-normalizer, we only use the Pydantic models and would like to keep deps lightweight

@jsstevenson
Copy link
Contributor Author

jsstevenson commented Apr 10, 2024

I would prefer to keep them separate since in gene-normalizer, we only use the Pydantic models and would like to keep deps lightweight

Yeah I agree that simply merging the dependencies would cause some unnecessary difficulties for some niche but not infrequent use cases. I guess the question is

  1. if we keep in same installable, is it weird to have the requirements for the most common use cases live under an optional dependency group?

  2. if we separate out into different packages, will it suck to maintain a models/core/serialization package separately from a tools (translator, annotator) package?

final note: I noticed a few days ago that NetworkX is doing something like this -- you can theoretically install it without Numpy, etc but will hit import errors very quickly, so they recommend you just install with the optional [default] dependency group: https://networkx.org/documentation/stable/install.html#install-the-released-version

@theferrit32
Copy link
Contributor

I think it's okay to have a frequently used optional dependencies target. The core library functionality of the models and serialization stuff should be usable without those. Not being able to normalize alleles without it is a bit of a limitation.

If there wasn't such a size difference in the installed size between them I'd say it wouldn't be that big of a deal to merge them. Adding the extras adds a lot (though I would expect most users to add them).

docker run -it --rm python:3.11 bash -c 'du -hs /usr/local/lib/python3.11/site-packages'

13M	/usr/local/lib/python3.11/site-packages


docker run -it --rm python:3.11 bash -c 'pip install "ga4gh.vrs @ git+https://github.com/ga4gh/vrs-python" && du -hs /usr/local/lib/python3.11/site-packages'

34M	/usr/local/lib/python3.11/site-packages


docker run -it --rm python:3.11 bash -c 'pip install "ga4gh.vrs[extras] @ git+https://github.com/ga4gh/vrs-python" && du -hs /usr/local/lib/python3.11/site-packages'

182M	/usr/local/lib/python3.11/site-packages

@jsstevenson
Copy link
Contributor Author

@theferrit32 ~40MB of that is PySam (by way of HGVS -> SeqRepo -> PySam), which maybe explains why the DataProxy stuff is reimplemented here instead of being imported from SeqRepo...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.0-alpha Issues related to VRS 2.0-alpha branch technical debt A feature/requirement implemented in a sub-optimal way & must be re-written. Contrast to "cleanup"
Projects
None yet
Development

No branches or pull requests

3 participants