Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIP-67 - Improve strictEquality feature for better compatibility with existing code bases #97

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

mberndt123
Copy link

Hi there,

I'd like to use the strictEquality feature for the improved type safety it provides, but currently find it too inconvenient to use due to an unfortunate interaction with pattern matching. This SIP is my attempt to fix that.
There have been no comments in the Pre-SIP thread for the past two weeks, and it's a very small (though impactful) change to the language, so I felt it was time to submit it.

Best regards
Matthias

@kyouko-taiga kyouko-taiga changed the title improve strictEquality feature for better compatibility with existing code bases SIP-67 - Improve strictEquality feature for better compatibility with existing code bases Nov 15, 2024
@soronpo
Copy link
Contributor

soronpo commented Nov 25, 2024

Thank you for the proposal. Here is the feedback from the SIP Committee:

  • We would like to see a different proposal that special-cases enumerations in both pattern matching and equality operations so that enumerations are considered to have a CanEqual derivation and the compiler would pick up the default equality if no explicit derives CanEqual is given. This will take care of old libraries that do not have CanEqual derivation.
  • In general, we do not think its good to have a different behavior between == and pattern match equality. Do you have such a use-case?

@mberndt123
Copy link
Author

mberndt123 commented Nov 25, 2024

Hey @soronpo,

I'm sorry, I'm afraid I can't do that.

Please consider the following enum:

enum Foo:
  case Bar
  case Baz(f: Int => Int)

I'm going to contend that it shouldn't ever be allowed to compare two Foo values with == because that would require us to determine whether two functions are the same, which isn't possible in a useful way.
Scala agrees with me on this already, because this type is isomorphic to Option[Int => Int], and that type can't be compared with ==, despite the fact that a CanEqual instance for Option is available by default:

scala> import scala.language.strictEquality

scala> Option.empty[Int => Int] == Option.empty[Int => Int]
-- [E172] Type Error: ----------------------------------------------------------
1 |Option.empty[Int => Int] == Option.empty[Int => Int]
  |^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |Values of types Option[Int => Int] and Option[Int => Int] cannot be compared with == or !=.

This is not a bug, this is a feature.

Now you might say, OK, then maybe we can make the CanEqual instance available only when all the fields in the enum have a CanEqual instance? I. e. make CanEqual[Foo, Foo] available only when CanEqual[Int => Int, Int => Int] is available? We can simulate this:

scala> enum Foo:
     |   case Bar
     |   case Baz(f: Int => Int)
     | object Foo:
     |   given (using CanEqual[Int => Int, Int => Int]): CanEqual[Foo, Foo] =
     |     CanEqual.derived
     | 
// defined class Foo
// defined object Foo

Alas, while this does prevent nonsensical comparisons with ==, it also fails to achieve the whole point of this proposal, which is to allow pattern matching:

scala> def bla(f: Foo) match
     |   case Foo.Bar => 0
     |   case Foo.Baz(f) => f(0)
     | 
-- [E172] Type Error: ----------------------------------------------------------
2 |  case Foo.Bar => 0
  |       ^^^^^^^
  |Values of types Foo and Foo cannot be compared with == or !=.

(insert sad trombone sound here)

Now you might ask: How come it works just right for Option? You cannot compare two Option[Int => Int] objects with ==, but it is possible to perform pattern matching on an Option[Int => Int]:

scala> Option.empty[Int => Int] match
     |   case None => 0
     |   case Some(f) => f(0)
     | 
val res0: Int = 0

That seems like the sweet spot! The reason this works is that None is of type Option[Nothing], hence this pattern match doesn't require a CanEqual[Option[Int => Int], Option[Int => Int]] but a CanEqual[Option[Int => Int], Option[Nothing]], which is available.
However this doesn't carry over to enum Foo: it doesn't have any type parameters, hence it's not possible to differentiate the empty Bar case from the non-empty Baz one on the type level.

The conclusion from all of this is that it's impossible to make this work correctly by providing a magic CanEqual instance. enum Foo is the proof for this: you either provide the instance, which makes pattern matching work but allows for == comparisons that don't make sense, or you don't, in which case you can't do pattern matching. Whatever the solution to this problem is, this is not it.

@soronpo
Copy link
Contributor

soronpo commented Nov 25, 2024

Please consider the following enum:

enum Foo:
  case Bar
  case Baz(f: Int => Int)

I would not consider this to be a valid use of enum. Have you seen this type of code in the wild?

@jducoeur
Copy link

Have you seen this type of code in the wild?

Not sure offhand, but it seems like it would be unsurprising when constructing a DSL interpreter. (Where the enum is a kind of expression, and the leaf is an instantiation of one expression type.)

The interpreter for my own QL language is still on Scala 2, but I could see myself trying to build it along those lines if it was Scala 3, so I don't think that's just a strawman.

@soronpo
Copy link
Contributor

soronpo commented Nov 25, 2024

Not sure offhand, but it seems like it would be unsurprising when constructing a DSL interpreter.

Still I'm not sure enum is the correct way to do it in Scala 3. It's not meant as a replacement for all kinds of case class hierarchies.

@eugengarkusha
Copy link

eugengarkusha commented Nov 25, 2024

I would not consider this to be a valid use of enum. Have you seen this type of code in the wild?

I have seen ADTs with functions inside
(Hope its ok to consider enum as a nicer syntax for ADT)

@KristianAN
Copy link

I would not consider this to be a valid use of enum. Have you seen this type of code in the wild?

I have seen ADTs with functions inside (Hope its ok to consider enum as a nicer syntax for ADT)

From the Scala 3 book.

Algebraic Data Types (ADTs) can be created with the enum construct, so we’ll briefly review enumerations before looking at ADTs.

Personally I have almost exclusively written Scala 3 and would use enums for this kind of ADT (with a function). If that is not the correct way to do it, then the correct way must be elusive.

@jducoeur
Copy link

Yeah, agreed. My understanding of the enum feature was that it was more or less exactly to reify the pattern that the community has settled on for ADTs over the years. (With enumerations per se being sort of the degenerate case, but far from the whole story.)

Saying that only some ADTs count, and other reasonably well-formed ones don't seems kind of un-Scala to me. A key element of Scala is that functions are values; intuitively, I would expect them to work here like any other value type.

@SystemFw
Copy link

SystemFw commented Nov 26, 2024

I have to agree with @jducoeur , what's the actual argument against functions in enum?
enum already goes quite some way into trying to capture some of the more complex ADT patterns that have little to do with enums, for example allowing extends to encode GADTs, having a value in there that happens to be a function seems like a lot more "normal" in comparison.

The GADT mention is not accidental, the main use case for that is encoding commands:

enum Cmd[A] { 
  case Read() extends Cmd[String]
  case Write extends Cmd[Unit]
}

and command-like GADTs will often have functions in them to allow sequencing:

enum Cmd[A] { 
  case Read() extends Cmd[String]
  case Write extends Cmd[Unit]
  case FlatMap[O, A](fa: Cmd[O], f: O => Cmd[A]) extends Cmd[A]
}

I feel like the argument that enum isn't meant for this would be a lot more stronger if extends wasn't allowed at all

@soronpo
Copy link
Contributor

soronpo commented Nov 26, 2024

Let me clarify. If you want function arguments that should not be part of the pattern match, then these should come as a second argument block of the case, IMO.

enum Cmd[A]:
  case Read() extends Cmd[String]
  case Write extends Cmd[Unit]
  case FlatMap[O, A](fa: Cmd[O])(val f: O => Cmd[A]) extends Cmd[A]

If you want them to be part of the pattern match, you put them in the first block where they have same equality treatment like all the other arguments.

@JD557
Copy link

JD557 commented Nov 26, 2024

I admit that I never played with strict equality, so this might not make much sense, but:

Now you might say, OK, then maybe we can make the CanEqual instance available only when all the fields in the enum have a CanEqual instance? I. e. make CanEqual[Foo, Foo] available only when CanEqual[Int => Int, Int => Int] is available? We can simulate this:

Do we need a CanEqual[Foo, Foo] for the pattern match? Wouldn't a CanEqual[Foo, Foo.Bar.type] suffice?

Now, from what I can tell based on some quick tests, the compiler really wants a CanEqual[Foo, Foo] for pattern matching, but maybe this could be changed?

@mberndt123
Copy link
Author

mberndt123 commented Nov 26, 2024

Hi @JD557,

Do we need a CanEqual[Foo, Foo] for the pattern match? Wouldn't a CanEqual[Foo, Foo.Bar.type] suffice?

Now, from what I can tell based on some quick tests, the compiler really wants a CanEqual[Foo, Foo] for pattern matching, but maybe this could be changed?

Yes, I had this in mind and I was going to propose it – you beat me to it. It can't be done with just a magic CanEqual instance, but if we slightly tweak the behaviour pattern matching under strictEquality I think it should work.

I. e.

def bla(foo: Foo) =
  foo match
    case Foo.Bar => 0 // requires CanEqual[Foo.Bar.type, Foo]
    case Foo.Baz(f) => f(0) // uses unapply

Then all we would need is a magic CanEqual instance that spawns the required CanEqual. That seems like a workable approach.

@mberndt123
Copy link
Author

mberndt123 commented Nov 26, 2024

Let me clarify. If you want function arguments that should not be part of the pattern match, then these should come as a second argument block of the case, IMO.

[…]

If you want them to be part of the pattern match, you put them in the first block where they have same equality treatment like all the other arguments.

First of all, I'd like to address the question of "in the wild" examples of functions within ADTs. This is definitely something that people do, e. g.
https://github.com/typelevel/cats/blob/1cc04eca9f2bc934c869a7c5054b15f6702866fb/free/src/main/scala/cats/free/Free.scala#L219
https://github.com/typelevel/cats-effect/blob/eb918fa59f85543278eae3506fda84ccea68ad7c/core/shared/src/main/scala/cats/effect/IO.scala#L2235

I think this is perfectly good code. It should be possible to use enum for these types, and it is today – strictEquality shouldn't break that. I also see no reason to compel people to rewrite this to a style where the function goes in the second parameter list. In fact, I think it is worse that way, because == is supposed to tell you whether two things are the same – but if you just ignore the function, you're simply going to get a broken equals method that says that two things are the same even though they aren't. This is something that we shouldn't encourage. The better option is to allow the subset of equality comparisons that people are likely to need and that we know we can perform correctly – i. e. comparisons to the singleton cases – and prevent all other equality tests at compile time.

@mberndt123
Copy link
Author

I've pushed a new revision that is based on a minor change to enum pattern matching and a magic CanEqual instance. @soronpo please let me know what you think

@soronpo
Copy link
Contributor

soronpo commented Dec 20, 2024

@mberndt123 Hey Mathias, thank you again for the amendments. We discussed the SIP today and we strive for a more general approach: A case class or enum case will be considered to have CanEqual derivation when all arguments (recursively) have CanEqual derivation (singletons therefore have CanEqual). What do you think?

@mberndt123
Copy link
Author

mberndt123 commented Jan 3, 2025

Hi Oron,

I have considered the committee's proposal and would like to present my conclusions.

  1. When applied to the motivating Nat example in the SIP, the committee's proposal would make CanEqual[Nat.Zero.type, Nat.Zero.type] available (it would not make CanEqual[Nat.Succ, Nat.Succ] available since its only field is of type Nat and a CanEqual[Nat, Nat] is not available). However, def + from that example requires a CanEqual[Nat, Nat] instance to compile. Therefore, the committee's proposal does not solve the problem which this SIP is intended to solve. If Nat had been specified as a sealed trait hierarchy, the pattern match in def + would require a CanEqual[Nat.Zero.type, Nat], which is also not available, so it doesn't work for this case either.
  2. The committee's proposal also does not make the == operator work for enum types, because comparing these with == again requires a CanEqual instance for the enum type, not for the individual cases. The only thing it does make work for enums is contrived code like Some[Nat.Zero.type](Nat.Zero) == Some[Nat.Zero.type](Nat.Zero).
  3. A modified version of the committee's proposal that makes CanEqual available for enum types iff all of their cases' fields' types have a CanEqual will also not fix pattern matching as it cannot work with the enum Foo example from above.

What the committee's proposal does do is make == work for case classes. However, I have some observations about this too:

  1. For deeply nested data structures, the recursive search for CanEqual instances would lead to an increase in compile times. This can not happen in my own proposal as it only provides a magic CanEqual instance for case objects and singleton enum cases – the fields never enter the picture. That is not a bug, it is a feature.
  2. It is not immediately obvious if and how it would work for recursive definitions such as case class Nat(n: Option[Nat])
  3. I am also concerned about maintainability. Imagine you use a library that defines some case class. You use == all over the place to compare it, and it works due to the magic CanEqual instance for case classes. One day the library author decides to change the case class. Maybe he changes one of the fields' types to an opaque type for better type safety. Now there is no longer a CanEqual available for that field and your code no longer compiles. And if this happens in some complex, deeply nested data structure, figuring out where that breakage is coming from is going to be a nightmare.
  4. Making the "magic" CanEqual instance available only when CanEqual is available for the fields' types makes this a safer option than simply adding derives CanEqual to the case class, because it won't make CanEqual available for types that shouldn't have it, whereas derives CanEqual will. This encourages developers to never use derives CanEqual, making all of the above problems even worse.

For all of these reasons, I think "fully automatic" typeclass derivation (i. e. typeclass derivation that works without an explicit derives Something clause) is a bad idea in general, and the language/standard library shouldn't set a bad example in that regard.

To sum up: the committee's proposal doesn't solve the problem that the SIP is designed to solve. It doesn't make the == operator work for enum types either. It does make == work for case classes, but with many downsides that should be avoided. I therefore do not consider the committee's proposal viable. I had already pointed out in my first comment that “it's impossible to make this work correctly by providing a magic CanEqual instance”. This continues to be true and is the reason why my revised proposal also includes language regarding the behaviour of pattern matching, which is required to enable a solution based on a CanEqual instance.

Thinking about this issue once more made me realize that my own revised proposal has a safety problem. It mentions a "union or intersection of one or more sealed or enum types". This was meant to allow pattern matching against unions of different sealed or enum types without nesting matches, but it can also be used to perform equality tests that don't make sense and should be prevented. Since the revised proposal does not have any other upsides over the original proposal and also failed at its other goal of being acceptable to the committee, I've decided to stop pretending and reverted the SIP proposal to its original state.

ADT pattern matching and equality tests are separate things, and the fact that Scala has conflated them is a historical mistake. The best way to move forward now is to untangle them as far as possible: simply don't require a CanEqual instance for those cases that correspond to traditional ADT pattern matching, and require it for those cases where it makes sense, like matching a Vector against Nil or an Int against a named constant.

Regarding CanEqual for case classes and enums, I think that improvements are desirable. derives CanEqual can be added to any type, and there are no guardrails to prevent it from compiling when one of the fields doesn't have a CanEqual instance. Safer alternatives should be provided. However they are neither required nor useful to fix pattern matching, which is the goal of this SIP, and hence they should be explored in a separate SIP, which I will be happy to author once this one is accepted.

Best Regards
Matthias Berndt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants