Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds ISL model #251

Merged
merged 1 commit into from
Apr 22, 2023
Merged

Adds ISL model #251

merged 1 commit into from
Apr 22, 2023

Conversation

popematt
Copy link
Contributor

Issue #, if available:

#250

Also, partial work on #109 in that this PR has new range implementations that will eventually replace the existing range implementations.

Description of changes:

Adds a model for Ion Schema. Currently the model is opt-in only—gated by the @ExperimentalIonSchemaModel annotation—and subject to breaking changes.

A few questions/comments for reviewers:

  • I have decided to use the term "Type Argument" instead of "type reference" to refer to the thing that you provide to a constraint (such as element, all_of, etc.). This is a departure from the spec, but I believe it will eliminate the confusion between a "reference" and a so-called "type reference" (that could be a "reference", an inline import, or an inline type). I plan to update the spec to use the new terminology.
  • A large chunk of code from the original Regex constraint impl is moved to a separate regex.kt utility file so that the regex validation code can be shared (for now) between the model and the existing constraint impl.
  • Marking classes as data classes means that we can't hide implementation details as well. data classes always indirectly expose the primary constructor through the auto-generated copy() implementation, and data classes always have destructuring componentN() functions. I would like to eventually make all of the constructors internal and provide a version-safe builder for constructing the ISL model. The benefit of data is that we also get autogenerated equals() and hashCode() functions. Should we not use data classes so as not to leak implementation through the copy, etc. functions?
  • Right now the constraint classes are nested in the Constraint interface. Thoughts? Should they be moved to a ...ionschema.model.constraints packages?
  • I plan to have an internal sealed interface SealedConstraint (once we upgrade the language level). Is it worth making SealedConstraints public? If it is public, it would have to be gated by an opt-in annotation because every time we add a new constraint, it would be a breaking change. (See "TODO" comment in Constraint.kt for more discussion.)

Related PRs in ion-schema, ion-schema-tests, ion-schema-schemas:

N/A

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov
Copy link

codecov bot commented Apr 17, 2023

Codecov Report

Patch coverage: 60.28% and project coverage change: -1.91 ⚠️

Comparison is base (a69f84b) 85.47% compared to head (ce3deb1) 83.56%.

Additional details and impacted files
@@             Coverage Diff              @@
##             master     #251      +/-   ##
============================================
- Coverage     85.47%   83.56%   -1.91%     
- Complexity      686      704      +18     
============================================
  Files            78       93      +15     
  Lines          2340     2531     +191     
  Branches        463      509      +46     
============================================
+ Hits           2000     2115     +115     
- Misses          225      299      +74     
- Partials        115      117       +2     
Impacted Files Coverage Δ
...in/kotlin/com/amazon/ionschema/model/Constraint.kt 0.00% <0.00%> (ø)
.../kotlin/com/amazon/ionschema/model/HeaderImport.kt 0.00% <0.00%> (ø)
.../kotlin/com/amazon/ionschema/model/TypeArgument.kt 0.00% <0.00%> (ø)
...in/kotlin/com/amazon/ionschema/model/ValidValue.kt 0.00% <0.00%> (ø)
...n/ionschema/model/VariablyOccurringTypeArgument.kt 0.00% <0.00%> (ø)
...otlin/com/amazon/ionschema/model/TypeDefinition.kt 33.33% <33.33%> (ø)
...kotlin/com/amazon/ionschema/internal/util/regex.kt 58.24% <58.24%> (ø)
...lin/com/amazon/ionschema/model/DiscreteIntRange.kt 60.00% <60.00%> (ø)
...otlin/com/amazon/ionschema/model/SchemaDocument.kt 89.47% <89.47%> (ø)
.../com/amazon/ionschema/internal/constraint/Regex.kt 96.00% <100.00%> (+28.45%) ⬆️
... and 6 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@desaikd desaikd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Marking classes as data classes means that we can't hide implementation details as well. data classes always indirectly expose the primary constructor through the auto-generated copy() implementation, and data classes always have destructuring componentN() functions. I would like to eventually make all of the constructors internal and provide a version-safe builder for constructing the ISL model. The benefit of data is that we also get autogenerated equals() and hashCode() functions. Should we not use data classes so as not to leak implementation through the copy, etc. functions?

Sorry I am not that much familiar with Kotlin standard way for implementing a model but I think we should not expose any implementation detail. You mentioned making constructors internal for version-safe builders, if that showcases the right implementation detail based on version or if that hides the implementation detail then we should use that.

  • Right now the constraint classes are nested in the Constraint interface. Thoughts? Should they be moved to a ...ionschema.model.constraints packages?

I think it should be okay to have them in the Constraint interface since we are just defining the model here which doesn't do any validation.

  • I plan to have an internal sealed interface SealedConstraint (once we upgrade the language level). Is it worth making SealedConstraints public? If it is public, it would have to be gated by an opt-in annotation because every time we add a new constraint, it would be a breaking change. (See "TODO" comment in Constraint.kt for more discussion.)

Did you mean we can provide an opt-in for whether to consider it as breaking change or not? I think it depends on which ISL version is used by the user to model schema. If they are using an older version they don't need the build failures. So if we are trying to provide something like that with the opt-in then we can make it public.

package com.amazon.ionschema.model

/**
* A TypeArgument represents (defines or references) a Type, allowing the Type to be used as an argument for a constraint.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a grammar explaining what are these TypeArguments or maybe add a link to type reference section of the specification?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will, once the link exists.

* Convenience alias for open content in schema headers, footers, and type definitions.
* Called 'OtherFields' to make it clear that this is for struct fields.
*/
typealias OtherFields = List<Pair<String, IonValue>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Question) Is this referring to open contents as per ISL 1.0? Maybe we should specify what ISL version will use this and maybe an example of what OtherFields would look like for a schema?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is independent of the schema version. It's just a container to hold extra struct fields.

* See relevant section in [ISL 1.0 spec](https://amazon-ion.github.io/ion-schema/docs/isl-1-0/spec#fields) and
* [ISL 2.0 spec](https://amazon-ion.github.io/ion-schema/docs/isl-2-0/spec#fields).
*/
data class Fields internal constructor(val types: Map<String, VariablyOccurringTypeArgument>, val closed: Boolean) : SealedConstraint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Suggestion) Since we already have constraint construction methods such as AnnotationsV* maybe we should provide different methods where fields,element as well? This would be less confusing and would be similar to how we handle other constraints like Annotations. Alternatively, we could also provide something like DistinctFields where it will be used for distinct fields and for non distinct fields Fields can be used.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively we could either model v1 in it's own interface or not at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to have only one model that is used for all versions, and I will create version-specific builders so that they can only be constructed in valid ways.

The reason there are two "annotations" constraints is because the old annotations has some weird behavior (see #150). The differences between the two are so great, and it didn't seem worth it to try to figure out all of the weird edge cases, so I've just tried to avoid doing anything with the old annotations implementation so that I don't break anything.

None of the other ISL 1.0 constraints are as bad as that, so I haven't given up on them in the same way.

If you feel particularly strongly about this, I can try to emulate the ISL 1.0 annotations using the ISL 2.0 annotations constraint in order to unify the model.

Copy link

@rmarrowstone rmarrowstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff! I don't see any major blockers, would like to understand if we can descope modeling the ISL 1 stuff.


/**
* This internal only interface allows us to use an exhaustive `when` within this library.
* It will become `sealed` when we update to kotlin language version = 1.5 or higher.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.5 was released almost two years ago, which is a blink in java time but pretty significant for Kotlin. What is keeping us on 1.4?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just because we haven't determined whether an upgrade would require a new major version, and we haven't had a particularly strong impetus before now.

Comment on lines 16 to 17
* TODO—some library users might prefer for their builds to break when they are not handling a new constraint. Should
* we make this public with the caveat that it requires opt in, and is subject to breaking changes?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In looking at what you have here my sense is that not many if at all.. and that making it sealed would prevent folks from implementing their own... which I think we want to allow. So I would not make it sealed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do not want to allow people to implement their own (for now). There was discussion about this as part of the ISL 2.0 spec work. We declined to introduce custom constraints (for now) because there was no concrete use case with which to justify the cost in complexity.

However, one thing we did decide is that if we do add user-defined constraints, we need to have support for it in the Ion Schema Specification, or else we'll start running into problems with schemas not being portable (i.e. different behavior for the same inputs on differently configured systems). For example, a custom constraint using the "is_prime" field name—if Ion Schema Kotlin on one system has that constraint installed, and on another system, Ion Schema Kotlin does not have that constraint installed, then the same schema and the same data could have different validation results on two different machines.

* See relevant section in [ISL 1.0 spec](https://amazon-ion.github.io/ion-schema/docs/isl-1-0/spec#annotations).
*/
@Deprecated("The annotations constraint in Ion Schema 1.0 has poorly defined behavior.")
data class AnnotationsV1 internal constructor(val annotations: List<Annotation>, val closed: Boolean, val ordered: Boolean) : SealedConstraint {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it valuable to model the deprecated constraint here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All Ion Schema 2.0 implementations are required to be able to import/operate on ISL 1.0. I probably should not actually mark it as @Deprecated.

* See relevant section in [ISL 1.0 spec](https://amazon-ion.github.io/ion-schema/docs/isl-1-0/spec#fields) and
* [ISL 2.0 spec](https://amazon-ion.github.io/ion-schema/docs/isl-2-0/spec#fields).
*/
data class Fields internal constructor(val types: Map<String, VariablyOccurringTypeArgument>, val closed: Boolean) : SealedConstraint

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively we could either model v1 in it's own interface or not at all?

* A `DiscreteRange` is allowed to be a _degenerate interval_ (i.e. `start == endInclusive`), but it may not be an
* empty interval (i.e. `start > endInclusive`).
*/
data class DiscreteRange<T : Comparable<T>>(val start: T?, val endInclusive: T?) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why model this as it's own class? It seems like it would be simpler to just merge it with the above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discrete Ranges don't require us to store whether the boundary value is exclusive because we can "translate" exclusive to +/-1 of the boundary value.

More importantly, however, is the fact that the logic for determining emptiness is different. With a discrete range, it is true that ∀n∈Z, ¬∃x∈Z, n<x & x<n+1, but for a range over continuous values, it would not be empty. If n and x are both in the set of all rational numbers rather than the set of all integers, then we can trivially prove that there does exist at least one x, such as x = n + 0.5.

Finally, my admittedly subjective opinion is that it helps the reader/user understand how the ranges are to be used by keeping them distinct.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But whether a range is continuous or discrete depends on the type of arguments, right? I don't see any constraints that model that... so it looks like I can define a Discrete range of a Continuous (but Comparable) type and vice versa.

Further I don't see any definition in the spec about emptiness or degenerate-ness so I don't see where the requirement for the validation (and the need to split it) even comes from.

What I do see is what appears to be duplicated code. 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test suite requires that ranges (at least for countable things like number of codepoints) are non-empty, and I believe that requirement has been part of the reference implementation since the initial release. https://github.com/popematt/ion-schema-tests/blob/master/ion_schema_2_0/constraints/codepoint_length.isl#L79

At this point, the non-empty range requirement is de facto part of the specification, and the specification document should be updated to reflect that.

Degenerate ranges are fine. I could remove that bit of the sentence if it's too confusing to have it, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have found a clean way to avoid the duplicated code, though, so I will remove the DiscreteRange<T> implementation.

*/
None,
/**
* Ion Schema 1.0 nullability. See https://amazon-ion.github.io/ion-schema/docs/isl-1-0/spec#core-types

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty inclined to make modeling ISL 1 a non-goal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's an option because of the requirement for ISL 2.0 implementations to also support ISL 1.0

Comment on lines +8 to +10
val type: List<String> = emptyList(),
val header: List<String> = emptyList(),
val footer: List<String> = emptyList(),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hrrm. The problem is that Kotlin's List is read-only but not technically immutable. IIRC, like internal this is something that can be circumvented with java code. Not sure how protective we want to get, just making sure you're aware of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am aware. If we're concerned about this, I could bring in the kotlinx.collections.immutable library.

Copy link

@rmarrowstone rmarrowstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only comment on the data class question is: it looks like (with few exceptions) all of the fields in the model are public vals. I don't really see what there is to hide! If we need to add indirection later we can always redefine them as a property.

@popematt
Copy link
Contributor Author

My only comment on the data class question is: it looks like (with few exceptions) all of the fields in the model are public vals. I don't really see what there is to hide! If we need to add indirection later we can always redefine them as a property.

I agree. I don't think there is anything to hide.

* A `DiscreteRange` is allowed to be a _degenerate interval_ (i.e. `start == endInclusive`), but it may not be an
* empty interval (i.e. `start > endInclusive`).
*/
class DiscreteIntRange private constructor(private val delegate: ContinuousRange<Int>) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I was sort of thinking that the only type we really needed this for was Int...

Rename file to match class?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is sort of a shame we need to delegate here... I get why Kotlin disallows extension of data classes. Personally I would consider a Range sealed base class that each specialization extended. 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename file to match class?

Good catch!

interface Bounded<T> { val value: T }
data class Closed<T : Comparable<T>>(override val value: T) : Limit<T>(), Bounded<T>
data class Open<T : Comparable<T>>(override val value: T) : Limit<T>(), Bounded<T>
class Unbounded<T : Comparable<T>> : Limit<T>() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just be object and avoid the overriding of equals and hashcode? or does that not play well with java?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it doesn't play well with generics. object Unbounded: Limit<Nothing>() was giving me trouble getting the variances right, but you are right that objects are not as nice in Java.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dang. Oh well.

@popematt popematt merged commit 519fea2 into amazon-ion:master Apr 22, 2023
@popematt popematt deleted the model-2023 branch May 5, 2023 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants