Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[uss_qualifier] Fix participant_id access #310

Merged

Conversation

BenjaminPelletier
Copy link
Member

This PR should resolve #309 and a few similar locations

@BenjaminPelletier BenjaminPelletier marked this pull request as ready for review November 2, 2023 21:56
@Shastick
Copy link
Contributor

Shastick commented Nov 3, 2023

Thanks for taking care of this. I might have gotten something wrong about ImplicitDict and taken type hints slightly too literally.

To check if I got it right, here's my updated understanding:

ImplicitDict lets us treat a dictionary like an object, and vice-versa, on the condition that fields have been set, including Optional fields if they are None.

More specifically: if a field has not been set, a direct access will result in a KeyError. If the intention is to represent an object with an Optional field that is not set, we should explicitly set that field to None.

More general question: what would a good strategy for avoiding such bugs look like?

  • Should we make sure that any Optional field is initialized to None in all our classes?
  • Or should we defensively check for the existence of any field that might not be set?

@BenjaminPelletier
Copy link
Member Author

Thanks for taking care of this.

No problem -- bugs like this are normal and expected I think.

I might have gotten something wrong about ImplicitDict and taken type hints slightly too literally.

Yeah, fortunately and unfortunately Python does not have strict typing and the type hints + ImplicitDict approximation is imperfect in a number of different ways.

To check if I got it right, here's my updated understanding:

ImplicitDict lets us treat a dictionary like an object, and vice-versa,

Just the vice-versa -- a dict can't be treated as an ImplicitDict (along with its type-hinted attributes); a dict must be parsed into an ImplicitDict (ImplicitDict.parse). An ImplicitDict actually is a dict subclass, so it's a full dict (ImplicitDict can be treated like dicts because they are dicts -- an ImplicitDict subclass is implicitly a dict). The special property of ImplicitDicts is that the keys expected in the dict are defined using type-hinted class fields/attributes, and the values can be accessed via these type-hinted class fields/attributes.

on the condition that fields have been set, including Optional fields if they are None.

The ImplicitDict constructor has logic that throws a ValueError if mandatory keys (expected keys are indicated by defining class fields/attributes) weren't provided. The "Optional" generic type annotation is used by ImplicitDict in two different ways which can certainly be confusing, but I've tried to make it as close to intuition as possible. The first use is that Optional[T] is really just code for Union[T, None] -- that means the field may be populated by either a T instance or None. The second use is that Optional[T] means "this field/key doesn't need to be provided when an instance of the ImplicitDict subclass is created". Keys/fields that aren't provided upon construction (or parsing) simply aren't included in the dict aspect of an ImplicitDict subclass...except if they have a default value. If a field has a default value, it can always be omitted, and if/when it's omitted, the instance receives the default value for that field. ImplicitDict tests establishing expected behavior around Optional edge cases can be found here.

Side note: there is no runtime type checking in the ImplicitDict constructor -- if the value for a field does not match the declared type of that field, no exception will be raised. This is one reason to use ImplicitDict.parse to construct something because parse does perform type checking and casting. But, parse is much heavier-weight than simply constructing an instance. So, if you know you have the right types, better (performance-wise) to just construct an instance.

Additional side note: The default value used for each instance of an ImplicitDict subclass is the one default value specified for the class. Usually this doesn't matter, but it does matter if default values are objects (like an empty list, dict, or set). In that case, the same object will be used as the default value for all instances -- not a new blank list/dict/set for each instance as one might expect. So, it's good practice not to use objects for default values.

More specifically: if a field has not been set, a direct access will result in a KeyError. If the intention is to represent an object with an Optional field that is not set, we should explicitly set that field to None.

The two possible solutions to avoid KeyError are 1)

class Foo(ImplicitDict):
    bar: Optional[str] = None

foo = Foo()
print(foo.bar)

and 2)

class Foo(ImplicitDict):
    bar: Optional[str]

foo = Foo()
if "bar" in foo and foo.bar:
    print(foo.bar)
# -- OR ---
if foo.has_field_with_value("bar"):
    print(foo.bar)

Option 1 is nice because you know the key is always populated so you don't have to do any fiddly checks like Option 2. The down side with Option 1 is that you lose the ability to omit the field. This means when an instance is dumped to JSON/YAML, all of those optional fields will always be set to null explicitly, which can be a lot less pretty than omitting irrelevant fields when only one or a few fields may be relevant or when exactly one field is ever relevant. If you're sure the object will never be dumped to JSON/YAML, then Option 1 is probably a decent option. But, if you're sure the object will never be dumped to JSON/YAML, it may not need to be an ImplicitDict since that's half the purpose of ImplicitDict (the other half being parsing from JSON/YAML) -- instead, @dataclass is a good option when the object will never be serialized or deserialized.

Option 2 is what I use by default as that retains the ability to omit actually-optional fields. Moreover, I tend to favor the first if statement of Option 2 because doing that makes sure you can easily find usages of a particular field using a standard IDE (like IntelliJ). With the second if statement of Option 2, the field name is hidden in a magic string. I don't particularly like either solution in Option 2 for these reasons, but I don't have any better ideas :)

More general question: what would a good strategy for avoiding such bugs look like?

I think just knowing "if I'm accessing an Optional field of an ImplicitDict with no default value, I must check for its existence first" is pretty good. I'm open to better strategies, but that one seems sufficient to me. We can't relieve developers of all their burdens to know how to do things :) Ideally, reviewers could also help catch this issue by being on the lookout for it with newer contributors who aren't aware of all the practices yet, but I don't think we should expect reviewers to always catch things like this either. Ultimately, I don't think we can fully avoid bugs, so our best strategy is to make identification and resolution easy. I think we're doing that relatively well so far, and fixing bugs like this is normal and expected.

Making sure developers know everything they need to know to work on the codebase is always a challenge because there are potentially such a large range of backgrounds for new developers. It seems like it would likely be useful to update this documentation to include an explanation like the one in this comment.

  • Should we make sure that any Optional field is initialized to None in all our classes?
  • Or should we defensively check for the existence of any field that might not be set?

I'd say probably no and no. No to the first because it removes the ability to omit fields as mentioned above. No to the second because checking is awkward even when we need to, and I think we should be able to rely on mandatoriness in order to avoid this awkwardness in many situations. But I'm open to suggestions and discussion :)

@BenjaminPelletier BenjaminPelletier merged commit ad41f13 into interuss:main Nov 3, 2023
8 checks passed
@BenjaminPelletier BenjaminPelletier deleted the fix-participant_id-access branch November 3, 2023 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unattributed queries result in execution error
3 participants