Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release sdssdb 1.0 #267

Open
12 tasks
albireox opened this issue Sep 3, 2024 · 3 comments
Open
12 tasks

Release sdssdb 1.0 #267

albireox opened this issue Sep 3, 2024 · 3 comments

Comments

@albireox
Copy link
Member

albireox commented Sep 3, 2024

sdssdb is almost 6 years old and it has been "stable" (or at least not breaking) for a long while. I think it's time to give ourselves the satisfaction of tagging a 1.0 release.

I think we could simply tag 1.0 now —well, after merging #266— and that would be ok, but there are a few things I'd like to discuss that we may want to include in 1.0 (or not):

  • Agree on a package management strategy. It may be that we don't need one and just maintaining dependencies the way we have been doing is fine, but we may want to consider settling on something like poetry, uv, or hatchling that allows to specify lock files.
  • We have never enforced a project-wide format. We enforce flake8 rules, which include some formatting requirements like line length, but in general there is nothing about code style (single vs double quotes, empty lines, etc.) Here the question is whether we use (and enforce) black/ruff or not —I don't think there are any other formatters that are really worth considering.
  • Import times are bad. connection.py imports a lot of things from both peewee and sqlalchemy when usually only one of the libraries will be used. We could split connection.py into two different files and avoid importing them in __init__.py.
  • We should probably review the state of the SQLAlchemy models and make sure it's reasonable.
  • The documentation is quite out of date (although essentially still works).
  • We have some testing, but it's old and unclear how comprehensive. We should, at least, try to get some decent coverage of the connection classes and main models.
  • The package is not compatible with SQLAlchemy 2.x (I think). Do we want to fix that?
  • There are a few longstanding bugs and features (e.g. support peewee connection pooling #236) that we may want to fix or close.
  • The package includes a lot of tools (e.g., things in the ingest module) that I'm not sure are relevant anymore. They account for a lot of the [all] dependencies.
  • Clear unnecessary dependencies and update the rest.
  • Maybe drop support for some old Python versions?
  • A significant part of the package is the schema directory. There is an argument to be made for moving it to its own repository, but also arguments against it (for example it may be difficult to maintain the relationship between sdssdb and sdssdb-schema, although right now nothing enforces schema/ to be up to date with the model classes).

There are a few larger features we have discussed, like SQLAlchemy async support. I think we should not try to implement those in 1.0 unless we set a clear timeframe for them.

Anything else?

@havok2063 @joelbrownstein @andycasey @johndonor3 @astronomygupta

@andycasey
Copy link
Contributor

I don't have strong opinions here, but just a few comments:

  1. Great idea.
  2. ruff has black-compatible formatting now. It would be good to add this, and not much downside to enforcing it.
  3. Improving import times would be a very good improvement.
  4. Is SQLAlchemy used with sdssdb? I use the peewee models, but if there's a need to review SQLAlchemy models just to make sure it is reasonable, then maybe nobody is using them?
  5. I think keep the schema here in this repository. I could add the astra schema here.
  6. Drop support for anything no longer user/relevant.

The audience for documentation is basically "SDSS software writers", right? Only a few people have pipelines accounts, and anyone else in SDSS who might want to use operations should already have a Utah account. If that's true, maybe we should just add some documentation pages for the kinds of queries that non-experts might want to make (e.g., "will this star ever be observed"). Here, "non-experts" are people who have Utah accounts but don't necessarily use sdssdb very often.

@havok2063
Copy link
Collaborator

I like the idea of a roadmap to 1.0. Some comments:

  1. Another alternative would be to use pip-compile to generate a resolved requirements file to act as a "lock" file, if we didn't want to switch entirely to a new management tool. If we do move to a lock file, one issue to keep in mind is this may create challenges in dependency resolution for other packages that use sdssdb, especially when mixing packages with and without lock files.
  2. I'm in favor of ruff. I have a small list of grievances with black but I'll cope if we decide to enforce it.
  3. I think 1.0 should support sqlalchemy > 2.0. I think a small number of changes need to be made to enable this. I agree that we don't need to support async for 1.0. It'd be nice but I think requires some more exploratory work. SQLAlchemy is still the de-facto ORM software out there, has the most support, and has many software integrations with other libraries. I don't recommend dropping it. They've improved their syntax and API quite a bit since 1.x.
  4. I think we can safely drop support for Python <= 3.9, and make >=3.10 the minimum version.
  5. I agree we should revisit the test suite. It would be nice to have some tests in place. Another ticket to consider with that is Move tests to GitHub Actions #67.

@albireox
Copy link
Member Author

albireox commented Sep 3, 2024

Thanks @andycasey @havok2063. Those are all good comments.

  • I do like ruff format and have been using it instead of black, but apart from being faster there are not many differences. The main problem with formatters, unless everybody uses them, is that files may be changing under your feet continuously in ways that don't indicate any change in behaviour, just depending who saves them. That muddies the Git commits but otherwise it may be fine.
  • I think we want to keep SQLA, and 2.x has some nice features that are worth exploring, especially async support.
  • There is a discussion to be had about who the sdssdb users are, and yes, that is likely to be known people with access to Utah writing software. Whether that changes something in how we want to continue developing sdssdb, I'm not sure.
  • I like the idea of pip-compile, it may be a bit less of an entry level barrier than other ones. Although in the end one always has to install something, but most of these tools can be easily installed with pipx.
  • I think Move tests to GitHub Actions #67 is now taken care of by Support PEP 621 #266 but we should also have a look at the workflows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants