Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce typehints in controller #881

Merged

Conversation

jonded94
Copy link
Contributor

@jonded94 jonded94 commented Apr 3, 2024

General code quality improvements for the controller part of this repository.

  • Make sure exported symbols are correctly set in __init__.py with __all__.
  • Renamed pytest fixture for creating a k8s namespace from ns to namespace since ns is a bit overloaded, especially in test_controller.py
  • Introduce typehints in controller.py and test_controller.py to massively improve developer experience (previously, one just really had to guess what functions are for and which types are expected)
  • Removed unused function parameters
  • Made function headers (i.e. parameter [type] defintion) compatible with what `kopf.on.[create/change/...]' expects
  • Introduce mypy (and missing stubs of external packages if available)
  • Introduce mypy ignores for (still) untyped parts of the codebase for now

@jonded94 jonded94 requested a review from jacobtomlinson as a code owner April 3, 2024 10:55
@jonded94 jonded94 marked this pull request as draft April 6, 2024 19:58
@jonded94
Copy link
Contributor Author

jonded94 commented Apr 6, 2024

@jacobtomlinson I converged to a somewhat rounded midpoint now; there still are a handful of mypy errors in controller.py (test_controller.py now shows no errors even with mypy --strict), but solving these isn't too easy right now.

Let me know what you think about my PR in general; it is unfortunately quite large, but I did it to improve developer experience (I honestly found contributing to this repository a bit hard since it is almost impossible to know which types are used where). I would prefer not increasing the scope of this PR even further for now but to do only clean up.

Also, I did not include mypy in CI for now since there are still hundreds of errors because of other parts of the codebase of this repo. Let me know what you think about that.

Maybe I can find some time to also add type hints to other parts of this repo in the distant future.

Copy link
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time here.

Generally our policy in Dask projects is to do type hinting on a best efforts basis. It is by no means mandatory and I don't think we ever want to enforce it here as it will likely obstruct novice contributors.

I'm more than happy to accept PRs like this that improve hinting provided the code becomes more readable and continues to function the same. If you want to invest more time on this in the future that would be great too, but accepting this PR isn't conditional on you following it through to mypy being 100% happy.

Unfortunately the CI is failing so I'm guessing these changes have broken something. Would you be able to investigate the errors and fix things up?

@@ -410,18 +472,18 @@ async def retire_workers(
)
# Dask version mismatches between the operator and scheduler may cause this to fail in any number of unexpected ways
with suppress(Exception):
comm_address = await get_scheduler_address(
comm_address = await get_scheduler_address( # type: ignore[no-untyped-call]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of these ignore throughout. Perhaps it makes sense to ignore this globally via the project config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm personally more on the "the more explicit and correct, the better"-side (because ignoring it globally could potentially mask bugs, depending on which error you want to ignore) but I'm happy to use the way that works for everybody. Note that these untyped calls would actually vanish as soon as the entire repo would be properly typed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the thing I would say here is that Dask strives to be a project that is welcoming to novice developers. My concern with type hinting in general is it can be very daunting for junior developers, even if it helps them avoid shooting their foot off.

Removing inline ignores I feel is a good way to remove some visual clutter that may be daunting for some folks. So even though it may mask a bug here and there I would prefer a global ignore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, if you'd prefer a global ignore for no-untyped-call & import-untyped, we can integrate this into the pyproject.toml. I'll do it when I find time for it.


import dask.config
import pytest
import yaml
from dask.distributed import Client
from kr8s.asyncio.objects import Deployment, Pod, Service
from kr8s.asyncio.objects import Deployment, Pod, Service # type: ignore[import-untyped]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with this ignore, should we disable this for the whole project?

@jacobtomlinson jacobtomlinson marked this pull request as ready for review April 11, 2024 13:09
@jonded94
Copy link
Contributor Author

@jacobtomlinson could you help me out with this piece of code here:

In the top-level __init__.py the KubeCluster is defined under __all__, but it is actually not imported "properly" but instead a __getattr__ is defined that is doing a manual importlib.import_module pointing to dask_kubernetes.classic.KubeCluster:
https://github.com/dask/dask-kubernetes/blob/main/dask_kubernetes/__init__.py#L22

This class actually is very much different from the KubeCluster class defined here which I used in all my projects.

Why is there a differentation, why is the default pointing to the "classic" KubeCluster and why is it done in this way? This smells very anti-pattern, but maybe I'm just not seeing something important here.

The CI errors actually stemmed from me trying to importing KubeCluster from the place which I thought to be "right", e.g. the dask_kubernetes.operator.kubecluster one.

@jonded94 jonded94 force-pushed the introduce-typehints-in-controller-tests branch 2 times, most recently from 141c291 to 9855fd4 Compare April 19, 2024 13:48
@jacobtomlinson
Copy link
Member

@jonded94 it was done this way when we deprecated the classic KubeCluster implementation.

In the past users would do from dask_kubernetes import KubeCluster. But when we rewrote things with the operator we pushed things down into submodules so you would import KubeCluster from either dask_kubernetes.classic or dask_kubernetes.operator. The getattr was added so we could tell people to migrate over.

Enough time has passed now that we can do away with all of this. The classic codebase can be totally removed. Then we need to make a decision about should we allow from dask_kubernetes import KubeCluster to import the operator version of the class, or should we enforce everyone to do from dask_kubernetes.operator import KubeCluster. This is mainly to avoid very old codebases breaking.

@jonded94
Copy link
Contributor Author

Okay, so the idea would be to include a DeprecationWarning in the future then as soon somebody is importing that classic thing, got it (because right now it simply imports it with no further warning or output). This is something separate then from this PR :) Thanks for your clarification!

@jacobtomlinson
Copy link
Member

Yeah, it was actually there previously, but we moved it again because there were some usage patterns that meant it didn't show up.

I basically consider the classic code overdue for removal. I just haven't had the time lately to do the work. Hoping to get to it very soon.

@jacobtomlinson
Copy link
Member

I took the plunge and removed the classic codebase in #890. Would you mind rebasing this PR?

…ository.

Make sure exported symbols are correctly set in __init__.py with __all__.
Renamed pytest fixture for creating a k8s namespace from ns to namespace since ns is a bit overloaded, especially in test_controller.py
Introduce typehints in controller.py and test_controller.py to massively improve developer experience (previously, one just really had to guess what functions are for and which types are expected)
Removed unused function parameters
Made function headers (i.e. parameter [type] defintion) compatible with what `kopf.on.[create/change/...]' expects
Introduce mypy (and missing stubs of external packages if available)
Introduce mypy ignores for (still) untyped parts of the codebase for now
@jonded94 jonded94 force-pushed the introduce-typehints-in-controller-tests branch from 9855fd4 to 82c1d93 Compare April 30, 2024 16:37
@jonded94
Copy link
Contributor Author

Very nice to hear! :) Rebased my PR.

Copy link
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've given it another pass and have a few more comments.

@@ -410,18 +472,18 @@ async def retire_workers(
)
# Dask version mismatches between the operator and scheduler may cause this to fail in any number of unexpected ways
with suppress(Exception):
comm_address = await get_scheduler_address(
comm_address = await get_scheduler_address( # type: ignore[no-untyped-call]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the thing I would say here is that Dask strives to be a project that is welcoming to novice developers. My concern with type hinting in general is it can be very daunting for junior developers, even if it helps them avoid shooting their foot off.

Removing inline ignores I feel is a good way to remove some visual clutter that may be daunting for some folks. So even though it may mask a bug here and there I would prefer a global ignore.

dask_kubernetes/operator/controller/controller.py Outdated Show resolved Hide resolved
dask_kubernetes/operator/controller/controller.py Outdated Show resolved Hide resolved
@jonded94
Copy link
Contributor Author

I now fixed the remarks, i.e.:

  • removed #type ignore: [...] comments to exchange them with global ignores in pyproject.toml
  • fixed a wrong type annotation of me regarding idle_since - that thing is a float not a str, sorry
  • removed visually taxing tests for None-ness and manual Exception raises and replaced them with simple assert's
  • added type annotations for functionality that was introduced in the mean time

After merging this MR, the next step surely would be to introduce mypy in the CI, at least for dask_kubernetes.operator.controller? (since that now shows 0 errrors)

Copy link
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me.

Very happy to add mypy to the pre-commit checks just for the controller for now. Let's do that in a follow up PR.

@jacobtomlinson jacobtomlinson merged commit 3ff7989 into dask:main Jun 3, 2024
11 checks passed
@jacobtomlinson jacobtomlinson changed the title Draft: Introduce typehints in controller Introduce typehints in controller Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants