Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YTEP-0038: Type Annotations #17

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

matthewturk
Copy link
Member

This is a draft of a type hinting YTEP. I may have missed some things or not gone into enough detail.

I particularly want to ask @neutrinoceros and @cphyc to take a look, and to feel free to push changes to this PR (and to add their names as authors.)

@chrishavlin
Copy link

This is great! One general question -- at what point should PR's that introduce new methods be type hinted? At some point it would make sense to encourage (and at some later point require?) PR authors to use type hinting so that those methods don't have to be subsequently converted, but I'm not sure what point that is.

@matthewturk
Copy link
Member Author

That's a good question! I think ideally this would not happen for some time, but eventually we would encourage it or even mandate it if it's user-facing.

@neutrinoceros
Copy link
Member

neutrinoceros commented Mar 1, 2021

Disclaimer: Obviously my opinion is biased because I have a couple PRs pending already that have type hints, but I think that we should start encouraging them or at least letting them in ASAP.

The reason for this is that it takes a lot of time to add them as a standalone task, but it's more or less marginal to add types to functions when you're already working on them. In particular, I've found that when I'm working on a fix for a part of the code that I'm not very familiar with, part of my process involves understanding what types fit e.g. a function signature. At this point, I'm spending time thinking typing no matter what, but this time is wasted in the long run if I can't add the hints to the main branch.

@cphyc
Copy link
Member

cphyc commented Mar 1, 2021

Thanks for doing this! I just skimmed through it and here are a handful of questions:

  1. I don't think the higher-level user-facing part requires any Python 3.7 specific code (in particular Postponed Evaluation of Annotations, see https://www.python.org/dev/peps/pep-0563/). I think the first two points are probably doable with our current set supported minimal Python version.
  2. Should we write in the PEP that we can start doing type annotation (except postponed evaluation of annotations) until Python 3.7 or later is supported?
  3. Still regarding postponed evaluation, we should probably get rid of all the from __future__ import annotations that will pop all over the place once our minimal python version is 3.10. Would this YTEP be a good place to keep track of this?
  4. For many of the developers (including past me a couple of months ago), type hints are somewhat wizardry, even though the gist of it is really simple. Would it be worth investing a bit of time to develop what our type hints should look like? I have compiled a list of simple rules below to facilitate that, of course they're open to the discussion!
  5. When picking type names, should they reflect their Python type or their meaning? Let's discuss an example: field names. Should they be typed as
from typing import Tuple
Field = Tuple[str, str]
StrTuple = Tuple[str, str]

def get_data(field: FieldType):   # type conveys what the object represents
    ...
   
def get_data(field: StrTuple):    # type reflects how the object is internally stored
    ...
  1. The amount of things we can type is limitless: where do we draw the boundary? Should type hints be limited to function signatures? Or do we allow/encourage/discourage type hinting within functions?
def doSomethingComplex(ds: Dataset) -> List[str]:
    def _comment_field(field):  # nice and compact, and obvious from reading it
        ...
    # or 
    def _comment_field(field: Tuple[str, str]) -> str:  # explicit but tedious to write
        if field[0] in ds.particle_types:
            return "Great, I love particles"
        else:
            return "Blimey, where are all my particles?"

    comments = []
    # or
    comments: List[str] = []
    for field in ds.derived_field_list:
        comments.append(_comment_field(field))
    
    return comments

Tentative list of rules for type hinting or the Zen of Typing Yt

  • DRY: All internal typing should be imported either from typing or from yt.typing, except classes (that should be imported from their respective files directly) and functions that return or take as inputs objects from foreign libraries (e.g. that export pandas dataframes). We should avoid as much as possible code repetition in function signatures, except where it helps understanding the signature itself (i.e. Tuple[str, str] may be clearer than StrTuple?)
  • type for users then for the machine: Typing should help users and developers first, and be machine readable then. This means that complex types should be defined as their own meaningful type and stored in yt/typing.py (or whatever file we settle on)
  • favour clean signature over history: Inevitably we will hit a function with very complex signatures/variable return types. We should favour refactoring it if that can contribute to simplifying the signature.
# yt/some_objects.py
class SlicePlot:
   ...
class ProjectionPlot:
   ...

# yt/typing.py
from yt.some_objects import SlicePlot, ProjectionPlot
from typing import Union, TypeVar
T = TypeVar("T")
Plot = Union[SlicePlot, ProjectionPlot]
ListOrSingle = Union[T, List[T]]
PlotsOrPlot = ListOrSingle[Plot]

# yt/another_file B.py
from yt.some_objects import SlicePlot, ProjectionPlot
from yt.typing import ListOrSingle, PlotsOrPlot, Plot

# This is fine
def returnSomething() -> SlicePlot:
    ...

# The return type is overly complex. The Union[SlicePlot, ProjectionPlot] should be defined as a meaningful
# type in `yt/units.py`, e.g.  imported from there
def returnComplexThing() -> Optional[Union[SlicePlot, ProjectionPlot, List[SlicePlot] List[ProjectionPlot]]]:
    ...
    
def returnComplexThing() -> PlotsOrPlot:  # ok I guess? Though we have to inspect yt/units.py to figure out what PlotsOrPlot is
    ...
   
def returnComplexThing() -> Union[Plot, List[Plot]]:  # not ok if ListOrSingle is defined
    ...
    
def returnComplexThing() -> ListOrSingle[Plot]:   # best?
    ...

@matthewturk
Copy link
Member Author

The reason for this is that it takes a lot of time to add them as a standalone task, but it's more or less marginal to add types to function when you're already working on them.

This is a great point. How about in an update, I change it to say that we're immediately going to open up to adding type hinting?

Copy link
Member

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for boostraping this. I want to flesh this out a little but I can't do it for the time being, so here are some comments in the mean time. If you guys want to reuse some of the material as part of the YTEP I'm fine with it, otherwise I'll do it myself soon !

Comment on lines +47 to +48
However, there are some features of type-hinting that would be extremely
useful to yt that only started appearing in Python 3.7. Part of this
Copy link
Member

@neutrinoceros neutrinoceros Mar 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Corentin said, it actually started with Python 3.6
the builtin typing and types (as well as collections from Python 3.9 on) modules have been getting more and more support each version, and it's true that dropping Python 3.6 makes the task a little less difficult, but it's not a requirement to get started.

For completeness, the specific improvement in Python 3.7 over 3.6 is that type hints evaluation is postponed, instead of immediate. This makes the following snippet invalid in Python 3.6 but valid in 3.7 and above

def baconify(spam) -> Bacon:  # error, Bacon type isn't defined yet !
    return Bacon(spam)

class Bacon:
    def __init__(self, spam):
        ...

Note that is order to make this Python 3.6 compliant, it is sufficient to quote the problematic annotation as

def baconify(spam) -> "Bacon":

The main downside is that it creates unnecessary churn to write Python 3.6 compliant annotation if we're going to drop it immediately after, which is why those two problems should indeed be discussed together.

Comment on lines +63 to +65
* Identify any type hint `Union` types that need to be made (to account for,
for instance, supplying either a `unyt_quantity` or a tuple of (value,
unit))
Copy link
Member

@neutrinoceros neutrinoceros Mar 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alas, typing.Union isn't the magical beast I once thought it was, and there's much more to typing than defining well thought instances of this. For instance typing.TypeVar is slightly different and close to templating in C++, and one needs to know both and think about which one applies in a given situation.

For instance, let's say we have the following function, whose design exploits duck typing

def sum_strings(strings, joiner):
    return joiner.join(strings) 

(For the sake of the argument, I'm going to willingly ignore the fact that typing.AnyStr exists)

What input types are relevant here ? str and bytes work, so we might simply express this as a typing.Union

from typing import Sequence, Union
MyStrT = Union[str, bytes]
def sum_strings(strings:Sequence[MyStrT], joiner:MyStrT) -> MyStrT:
    ...

The problem is that then it's not clear what the output type will be in case the user code mixes allowed types as

sum_strings(["spam", "bacon"], b"eggs")

but even more importantly... it's not clear from the signature that this call won't even work !
The solution here is to use typing.TypeVar instead:

from typing import Sequence, TypeVar
MyStrT = TypeVar("MyStrT", str, bytes)
# MyStrT can be str or bytes (and nothing else), but two MyStrT objects defined in the same signature can't have mixed types
def sum_strings(strings:Sequence[MyStrT], joiner:MyStrT) -> MyStrT:
    ...

Now a type checker will be able to tell that my user code is violating the signature.

Let's take another example where Union isn't the right tool

def usum(it):
    ret = it[0]
    for elem in it:
        ret += elem
    ret -= it[0]
    return ret

Now if we want to add type hints to this signature, the easy way out is

from typing import Any, Sequence
def usum(it:Sequence[Any]) -> Any:
    ...

But that's not exactly relevant is it ? The return type will be the same that the element type in the input sequence,
and the signature should reflect that. This can't be expressed by typing.Union, but it can be expressed with typing.TypeVar

from typing import Sequence, TypeVar
T = TypeVar("T")  # this can be anything, but two "T"s in a single signature _have_ to be the same type
def usum(it:Sequence[T]) -> T:
    ...

* Type-hint constructors for all top-level objects (`SlicePlot`, `Sphere`, etc)
* Type-hint all visualization function calls (`p.set_width`)
* Type-hint public methods on top-level objects (`.profile`)
* Utilize annotators for internal (underscore) methods.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the feasibility of this last part remains to be demonstrated afaic.

Alternatives
------------

None that I can identify.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one: do nothing and spend our dev time elsewhere 🙈
But as I mentioned in a comment already, doing this while we're working on the code for different reasons adds a marginal cost, at least for people who want to do it.

@matthewturk
Copy link
Member Author

I'll go ahead and update the ytep with all of these comments. Thank you!

@neutrinoceros
Copy link
Member

neutrinoceros commented Mar 1, 2021

I want to answer some of the points in @cphyc 's comment.

  • Still regarding postponed evaluation, we should probably get rid of all the from future import annotations that will pop all over the place once our minimal python version is 3.10. Would this YTEP be a good place to keep track of this?

Good news : I have faith that this will eventually be automated with pyupgrade, which we're already using as part of our pre-commit tool belt. This is a few years down the road but that's what the tool is for, and I trust his developer (who's also the person behind pre-commit and pre-commit.ci) will have our back by then.

For many of the developers (including past me a couple of months ago), type hints are somewhat wizardry, even though the gist of it is really simple.

I'm sorry Corentin, there's no easy way to say this but your example that follows this quote is simply wrong. (wizardry takes time :/). Specifically
Union[str, str] is exactly the same as plain str (in this context), and it doesn't represent a size 2 tuple (or sequence). I believe you meant to write Tuple[str, str].
I'll add that #3064 already contains my own proposal for field-related types and is open for discussion :)

All internal typing should be imported either from typing or from yt.typing

We might need to rely on numpy.typing too, eventually.

About the name of the typing module, I think there are only two serious contenders if we want to fit the ecosystem:

  • yt._typing or the pandas model: we write type hints for our code base only, they shouldn't be used downstream because they are unstable and prone to change
  • yt.typing or rhe numpy model: our custom types would be useful to downstream projects, and we want to expose them as close to the surface as reasonable (not top level).

I think we should start with a _typing module to make it clear that we're allowing ourselves to make mistakes, and users should not depend on it. Once the module is mature enough to be considered stable, it could easily be renamed to drop the underscore.

Otherwise, I love the "Zen" you wrote, and I support each of the rules you're proposing. Thanks a lot !

@neutrinoceros
Copy link
Member

A question that should be adressed in the YTEP is when we should start adding typechecking to our CI. mypy can be used through pre-commit, which means the required additions to our infrastructure is minimal, but I'm not sure about perfs (specifically, I don't know whether the hook runs on every files or just staged ones).
There may be important difficulties as to how this should be done too.

@matthewturk
Copy link
Member Author

matthewturk commented Mar 2, 2021 via email

@neutrinoceros
Copy link
Member

neutrinoceros commented Mar 2, 2021

Dataclasses were introduced in Python 3.7
This is indeed related in that using them would be further motivation to drop Python 3.6 support, it also affects how we define and expose type hints for the affected structures.
Their usage is close to that of collections.namedtuple and [types.simplenamespace] (https://docs.python.org/3/library/types.html?highlight=simplenamespace#types.SimpleNamespace) (most recently changed in Python 3.7) (most recently changed in Python 3.9), so I would advise we carefully evaluate if we need any, and which one is best, because I anticipate having a little of each could make the code base confusing and difficult to maintain. I personally favour the latest iteration (data classes), but I'm stating this in the void, and I don't have a good idea of where and how they would fit in yt.

@cphyc
Copy link
Member

cphyc commented Mar 2, 2021

I'm sorry Corentin, there's no easy way to say this but your example that follows this quote is simply wrong. (wizardry takes time :/). Specifically

Oops, my bad you're right. I intended to write Tuple[str, str] instead of a Union[str, str]! Should have typed that in a proper code editor first ;) Anyway #17 (comment) is probably more interesting than my two cents on that.

I have updated my example.

@matthewturk
Copy link
Member Author

So I've read the ZoT proposed by @cphyc and I love it. I will add it (and @cphyc as an author) to this YTEP.

@matthewturk
Copy link
Member Author

@cphyc One thing you brought up in your comments was the idea of, should we type for clarity, or purpose. I am inclined to think that we might want to think of this in combination with #8 , which lets us add on validation in some circumstances. I think that we might want to use this for very specific end-user facing things, and utilize traitlets for validation. But that's not an incredibly well-thought out response, I confess. I'm also kind of inclined to think of the contents of the typing module as a set of fundamental operators in yt; so I'm a bit torn about whether we want to provide a "field" object, or a str tuple instead.

@neutrinoceros
Copy link
Member

so I'm a bit torn about whether we want to provide a "field" object, or a str tuple instead.

I'm just now realising

@neutrinoceros
Copy link
Member

oops sorry, misclicked

@neutrinoceros neutrinoceros changed the title Initial draft of type hinting YTEP YTEP-0038: type hints Mar 8, 2021
@neutrinoceros neutrinoceros changed the title YTEP-0038: type hints YTEP-0038: Type Annotations Mar 8, 2021
@neutrinoceros
Copy link
Member

I am inclined to think that we might want to think of this in combination with #8 , which lets us add on validation in some circumstances. I think that we might want to use this for very specific end-user facing things, and utilize traitlets for validation

For the record I love this idea, though it seems hard to find good (or any) resource on how to make both work together. Anything helps !

Copy link
Member

@cphyc cphyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I just spotted a few typos on the way ;)
Let's type then!

source/YTEPs/YTEP-0038.rst Outdated Show resolved Hide resolved
source/YTEPs/YTEP-0038.rst Outdated Show resolved Hide resolved
Co-authored-by: Corentin Cadiou <[email protected]>
Co-authored-by: Corentin Cadiou <[email protected]>
@neutrinoceros
Copy link
Member

Quick note since it hasn't been mentioned yet: although typing adds no performance gain yet, it opens the way to use mypyc (or similar tools ?) that allows to compile normally typed-annotated python (no additional micro language or framework).
https://mypyc.readthedocs.io

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants