title | document | date | audience | author | toc | toc-depth | ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Pattern Matching |
D1371R4 |
2021-07-19 |
Evolution |
|
true |
4 |
\pagebreak
- R4
- Drafting. expr
match
pattern,let
, etc.
- Drafting. expr
- R3
- Updated [Design Overview] and other sections:
inspect
is always an expression. - Clarified that extractor pattern samples are not proposed for standardisation.
- Forbid
break
insideinspect
expression. - Removed [Binding Pattern] and simplified [Case Pattern].
- Updated [Design Overview] and other sections:
- R2
- Modified [Dereference Pattern] to
(*!) pattern
and(*?) pattern
- Modified [Extractor Pattern] to
(extractor!) pattern
and(extractor?) pattern
. - Added reasons for the choice of [
let
rather thanauto
]. - Allowed using [Statements in
inspect
expression].
- Modified [Dereference Pattern] to
- R1
- Modified [Wildcard Pattern] to use
__
(double underscore). - Added new patterns [Case Pattern] and [Binding Pattern].
- Removed
^
from [Expression Pattern]. - Modified [Dereference Pattern] to
*!
and*?
. - Added [Structured Binding Pattern] usage in variable declaration.
- Modified [Wildcard Pattern] to use
- R0
- Merged [@P1260R0] and [@P1308R0]
This iteration of the paper presents an approach for pattern matching using
match
and let
. It present a match
expression, expr match
pattern
expression which simply answers the question "Does expr match pattern?"
and yields a boolean result.
We extend this syntax for multiple patterns and corresponding statements:
expr match {
pattern1 => statement1
pattern2 => statement2
...
}
let
is used to indicate that an identifier is a new name rather than an
existing name.
expr match {
x => ... // match against existing `x`
let x => ... // introduce new x.
}
This is also in response and collaboration with Herb Sutter and his [@P2392R0].
One of the most contentious and difficult design problem to solve in introducing
pattern matching has been the issue of dealing with identifiers. Specifically,
whether an identifier in a pattern introduces a new name or refers to an
existing name. This paper aims to address this concern by making names refer to
existing names by default, and use let
as the introducer for new names.
Another key goal that Herb Sutter aims for in [@P2392R0] is the ability for
pattern syntax to be used outside of inspect
. More specifically, the goal is
to be able to perform pattern matching easily in boolean-expecting contexts such
as if
and requires
. This paper aims to address this concern with
the expr match
pattern expression.
[@P1371R0] proposed id
to introduce a new name, and for ^id
to refer to
an existing name. [@P1371R1] removed the ^
due to feedback about it being
a Microsoft C++ extension for smart pointers, as well as its effect on code
performing simple enumeration matching.
enum Color { Red, Green, Blue };
inspect (e) {
^Red => // I'm just trying to
^Green => // match simple enums...
^Blue => // What's with these carets?
}
[@P1371R1] and [@P1371R2] kept id
introducing a new name, but replaced ^id
with case id
to refer to an existing name and let id
to explicitly introduce
a new name.
One of the motivations for this was to make simple code such as enumeration matching familiar again:
enum Color { Red, Green, Blue };
inspect (e) {
case Red => // Huh, just looks like `switch` now.
case Green => // ...
case Blue => // ...
}
The case
and let
applied recursively to identifiers to allow patterns
such as case [a, b]
to match against existing a
and b
, and
let [x, y]
to introduce new names x
and y
.
This recursive behavior received positive feedback from EWG in Cologne 2019:
"We support the authors' direction of let
and case
modes applying to
subpatterns.": SF: 7, F: 7, N: 5, A: 4, SA: 0
The problem with this approach appeared to be due to the overriding behavior
of the case
and let
. For example, constructs such as case [a, b, let x]
were allowed to match against existing a
and b
for the first two elements
and bind x
to the third.
In more complex examples however, it becomes more difficult to parse through which names are new and which refer to existing.
For example in let [a, case [let b, c, d], [e]]
,
a
, b
, e
are new names, c
, d
refer to an existing names.
In response, [@P1371R3] chose to keep case
and drop let
. Because the
top-level already had let
behavior in that it introduced new names, we
already effectively had "recursive let
". We really only needed to provide
a way to specify an existing name. This led to case
becoming non-recursive.
At this point, there were several push backs:
-
"We shouldn't to bifurcate expressions like this."
Some expressions didn't need
case
(e.g.0
), but some did (e.g.id
). Ifcase expr
is pattern matching syntax for expressions, that would at least be consistent.This paper aims to address this concern by not requiring
case
for expressions at all. -
"Declarations of new names should have an introducer like most other places."
The comment is in regard to code like this:
inspect (e) { x => ... }
Some people found it surprising that that would introduce a new name, as most identifiers are introduced with an introducer. Notable exceptions being lambda captures and structured bindings. Even then,
[x, y]() {}
doesn't introducex
andy
out of thin-air, it captures existingx
andy
at the same time. Structured bindings at least has theauto
in front, like:auto [x, y]
which doesn't apply to thex
andy
at all, but does indicate that we're in a declaration context.This paper aims to address this concern by introducing a recursive
let
introducer for new names.
::: cmptable
switch (x) {
case 0: std::cout << "got zero"; break;
case 1: std::cout << "got one"; break;
default: std::cout << "don't care";
}
inspect (x) {
0 => { std::cout << "got zero"; }
1 => { std::cout << "got one"; }
__ => { std::cout << "don't care"; }
};
:::
::: cmptable
x match {
0 => { std::cout << "got zero"; }
1 => { std::cout << "got one"; }
_ => { std::cout << "don't care"; }
};
inspect (x) {
is 0 => { std::cout << "got zero"; }
is 1 => { std::cout << "got one"; }
is _ => { std::cout << "don't care"; }
};
:::
::: cmptable
if (s == "foo") {
std::cout << "got foo";
} else if (s == "bar") {
std::cout << "got bar";
} else {
std::cout << "don't care";
}
inspect (s) {
"foo" => { std::cout << "got foo"; }
"bar" => { std::cout << "got bar"; }
__ => { std::cout << "don't care"; }
};
:::
::: cmptable
c match {
"foo" => { std::cout << "got foo"; }
"bar" => { std::cout << "got bar"; }
_ => { std::cout << "don't care"; }
};
inspect (s) {
is "foo" => { std::cout << "got foo"; }
is "bar" => { std::cout << "got bar"; }
is _ => { std::cout << "don't care"; }
};
:::
::: cmptable
auto&& [x, y] = p;
if (x == 0 && y == 0) {
std::cout << "on origin";
} else if (x == 0) {
std::cout << "on y-axis";
} else if (y == 0) {
std::cout << "on x-axis";
} else {
std::cout << x << ',' << y;
}
inspect (p) {
[0, 0] => {
std::cout << "on origin";
}
[0, y] => {
std::cout << "on y-axis at " << y;
}
[x, 0] => {
std::cout << "on x-axis at " << x;
}
[x, y] => {
std::cout << x << ',' << y;
}
};
:::
::: cmptable
p match {
[0, 0] => {
std::cout << "on origin";
}
[0, let y] => {
std::cout << "on y-axis at " << y;
}
[let x, 0] => {
std::cout << "on x-axis at " << x;
}
let [x, y] => {
std::cout << x << ',' << y;
}
};
inspect (p) {
is [0, 0] => {
std::cout << "on origin";
}
[_, y] is [0, _] => {
std::cout << "on y-axis at " << y;
}
[x, _] is [_, 0] => {
std::cout << "on x-axis at " << x;
}
[x, y] is _ => {
std::cout << x << ',' << y;
}
};
:::
\pagebreak
::: cmptable
struct visitor {
void operator()(int i) const {
os << "got int: " << i;
}
void operator()(float f) const {
os << "got float: " << f;
}
std::ostream& os;
};
std::visit(visitor{strm}, v);
inspect (v) {
<int> i => {
strm << "got int: " << i;
}
<float> f => {
strm << "got float: " << f;
}
};
:::
::: cmptable
v match {
int: let i => {
strm << "got int: " << i;
}
float: let f => {
strm << "got float: " << f;
}
};
inspect (v) {
i as int => {
strm << "got int: " << i;
}
f as float => {
strm << "got float: " << f;
}
};
:::
struct Shape { virtual ~Shape() = default; };
struct Circle : Shape { int radius; };
struct Rectangle : Shape { int width, height; };
::: cmptable
virtual int Shape::get_area() const = 0;
int Circle::get_area() const override {
return 3.14 * radius * radius;
}
int Rectangle::get_area() const override {
return width * height;
}
int get_area(const Shape& shape) {
return inspect (shape) {
<Circle> [r] => 3.14 * r * r;
<Rectangle> [w, h] => w * h;
};
}
:::
::: cmptable
int get_area(const Shape& shape) {
return shape match {
Circle: let [r] => 3.14 * r * r;
Rectangle: let [w, h] => w * h;
};
}
int get_area(const Shape& shape) {
return inspect (shape) {
[r] as Circle => 3.14 * r * r;
[w, h] as Rectangle => w * h;
};
}
:::
\pagebreak
struct Expr;
struct Neg {
std::shared_ptr<Expr> expr;
};
struct Add {
std::shared_ptr<Expr> lhs, rhs;
};
struct Mul {
std::shared_ptr<Expr> lhs, rhs;
};
struct Expr : std::variant<int, Neg, Add, Mul> {
using variant::variant;
};
namespace std {
template <>
struct variant_size<Expr> : variant_size<Expr::variant> {};
template <std::size_t I>
struct variant_alternative<I, Expr> : variant_alternative<I, Expr::variant> {};
}
::: cmptable
int eval(const Expr& expr) {
struct visitor {
int operator()(int i) const {
return i;
}
int operator()(const Neg& n) const {
return -eval(*n.expr);
}
int operator()(const Add& a) const {
return eval(*a.lhs) + eval(*a.rhs);
}
int operator()(const Mul& m) const {
// Optimize multiplication by 0.
if (int* i = std::get_if<int>(m.lhs.get()); i && *i == 0) {
return 0;
}
if (int* i = std::get_if<int>(m.rhs.get()); i && *i == 0) {
return 0;
}
return eval(*m.lhs) * eval(*m.rhs);
}
};
return std::visit(visitor{}, expr);
}
:::
::: cmptable
int eval(const Expr& expr) {
return inspect (expr) {
<int> i => i;
<Neg> [(*!) e] => -eval(e);
<Add> [(*!) l, (*!) r] => eval(l) + eval(r);
// Optimize multiplication by 0.
<Mul> [(*!) <int> 0, __] => 0;
<Mul> [__, (*!) <int> 0] => 0;
<Mul> [(*!) l, (*!) r] => eval(l) * eval(r);
};
}
:::
::: cmptable
int eval(const Expr& expr) {
return expr match {
int: let i => i;
Neg: [*: let e] => -eval(e);
Add: let [*: l, *: r] => eval(l) + eval(r);
// Optimize multiplication by 0.
Mul: ([*: int: 0, _] || [_, *: int: 0]) => 0;
Mul: let [*: l, *: r] => eval(l) * eval(r);
_ => 0;
};
}
:::
::: cmptable
int eval(const Expr& expr) {
return inspect (expr) {
i as int => i;
[*e] as Neg => -eval(e);
[*l, *r] as Add => eval(l) + eval(r);
[*l, *r] as Mul {
// Optimize multiplication by 0.
if (l as int == 0 || r as int == 0) => 0;
is _ => eval(l) * eval(r);
}
is _ => 0; // default case is required
};
}
:::
\pagebreak
::: cmptable
auto const& [topLeft, unused] = getBoundaryRectangle();
auto const& [topBoundary, leftBoundary] = topLeft;
auto const& [[topBoundary, leftBoundary], unused] = getBoundaryRectangle();
:::
::: cmptable
if (auto p = f(); p.first == 0) { // C++20
auto&& y = p.second;
// ...
}
if (auto p match [0, let y] = f()) { // R4
// ...
}
if (auto [_, y] is [0, _] = f()) { // P2392
// ...
}
:::
\pagebreak
| expression
match
pattern guardopt
| expression
match
trailing-return-typeopt{
| pattern guardopt=>
statement | pattern guardopt=>
!
opt{
statement-seq}
| ... |}
| guard: |
if
expression
match
is an expression in all contexts. Depending on the enclosed statements it
may either yield a void
result or a value, the type of which will be statically
deduced from the statements themselves or specified by a trailing return type. The
deduction is analogous to that performed when determining the return type of a
lambda expression. A pattern that passes control to a compound statement yields
a void
result. The return types of all patterns must match. If a trailing return
type is provided, all patterns must result in an expression returning a type that
is implicitly convertible to the trailing return type.
If !
prefix is used before compound statement - the statement would not contribute
to return type deduction for match
expression. Such a statement is not expected to
yield a value and should stop the execution either by returning from the enclosing function,
throwing an exception or terminating the program. This allows users to express desired
no-match behaviour or to act upon broken invariant, without affecting return type of the
whole of match
expression. If execution reaches end of the compound statement
std::terminate
is called.
When match
is executed, the expression to the left is evaluated and matched
in order (first match semantics) against each pattern. If a pattern successfully
matches the value of the condition and the boolean expression in the guard
evaluates to true
(or if there is no guard at all), then the value of the
resulting expression is yielded or control is passed to the compound statement,
depending on whether the inspect yields a value. If the guard
expression evaluates to false
, control flows to the subsequent pattern.
If no pattern matches, none of the expressions or compound statements specified
are executed. In that case if the match
expression yields void
, control is
passed to the next statement. If the match
expression does not yield void
,
std::terminate
will be called.
\pagebreak
The wildcard pattern has the form:
|
_
and matches any value v
.
int v = /* ... */;
inspect (v) {
_ => { std::cout << "ignored"; }
// ^ wildcard pattern
};
Refer to the discussion for choice of _
in [Wildcard Pattern].
The expression pattern has the form:
| cast-expression
An expression pattern e
matches value v
- If
e(v)
is valid,e
matchesv
ife(v)
is contextually convertible tobool
and evaluates totrue
. - Otherwise, if
e == v
is valid,e
matchesv
ife == v
is contextually convertible tobool
and evaluates totrue
.
int v = /* ... */;
v match {
0 => { std::cout << "got zero"; }
1 => { std::cout << "got one"; }
// ^ expression pattern
};
enum Color { Red, Green, Blue };
Color color = /* ... */;
color match {
Red => // ...
Green => // ...
Blue => // ...
// ^^^^^ expression pattern
};
The let pattern has the form:
|
let
let-pattern
The let-pattern allows all patterns and identifier, without any expressions.
identifier within let-pattern match any value and are bindings to the corresponding component as defined by its parent pattern. It is in scope from the pattern guard if one exists, otherwise from the start of the statement following the pattern label.
int v = /* ... */;
inspect (v) {
let x: std::cout << x;
// ^^^^^ let pattern
}
static constexpr int zero = 0, one = 1;
std::pair<int, std::pair<int, int>> p = /* ... */
inspect (p) {
[zero, let [x, y]]: std::cout << "got zero" << ' ' << x << ' ' << y;
// ^ ^ identifier pattern
[one, let [x, y]]: std::cout << "got one" << ' ' << x << ' ' << y;
// ^^^^^^^^^^ binding pattern
}
The structured binding pattern has the following two forms:
|
[
_pattern_0,
_pattern_1,
...,
_pattern_N]
|[
_designator_0:
_pattern_0,
_designator_1:
_pattern_1,
...,
_designator_N: _pattern_N]
The first form matches value v
if each patterni matches the i^th^
component of v
. The components of v
are given by the structured binding
declaration: auto&& [__e
0, __e
1,
..., __e
N] = v;
where each
__e
i are unique exposition-only identifiers.
std::pair<int, int> p = /* ... */;
inspect (p) {
[0, 0] => { std::cout << "on origin"; }
[0, let y] => { std::cout << "on y-axis"; }
// ^^^^^ let pattern
[x, 0] => { std::cout << "on x-axis"; }
// ^ expression pattern
let [x, y] => { std::cout << x << ',' << y; }
// ^^^^^^ structured binding pattern within let pattern
};
The second form matches value v
if each patterni matches the direct
non-static data member of v
named identifier from each designatori.
If an identifier from any designatori does not refer to a direct
non-static data member of v
, the program is ill-formed.
struct Player { std::string name; int hitpoints; int coins; };
void get_hint(const Player& p) {
inspect (p) {
[.hitpoints: 1] => { std::cout << "You're almost destroyed. Give up!\n"; }
[.hitpoints: 10, .coins: 10] => { std::cout << "I need the hints from you!\n"; }
[.coins: 10] => { std::cout << "Get more hitpoints!\n"; }
[.hitpoints: 10] => { std::cout << "Get more ammo!\n"; }
[.name: let n] => {
if (n != "The Bruce Dickenson") {
std::cout << "Get more hitpoints and ammo!\n";
} else {
std::cout << "More cowbell!\n";
}
}
};
}
[Unlike designated initializers, the order of the designators need not be the same as the declaration order of the members of the class.]{.note}
// TODO: Think through this further...
The alternative pattern has the following forms:
|
auto :
pattern | concept:
pattern | type:
pattern | cast-expression:
pattern
*: let foo
e match {
std::integral => ...
int => ...
int: let i => ...
std::integral: let i => ...
};
s match {
ctre::match<"[a-z]+([0-9]+)"> => // ...
ctre::match<"[a-z]+([0-9]+)">: let [m, n] => ...
};
|
< auto >
pattern |<
concept>
pattern |<
type>
pattern |<
constant-expression>
pattern
Let v
be the value being matched and V
be std::remove_cvref_t<decltype(v)>
.\newline
Let Alt
be the entity inside the angle brackets.
Case 1: std::variant
-like
If std::variant_size_v<V>
is well-formed and evaluates to an integral,
the alternative pattern matches v
if Alt
is compatible with the current
index of v
and pattern matches the active alternative of v
.
Let I
be the current index of v
given by a member v.index()
or else
a non-member ADL-only index(v)
. The active alternative of v
is given by
std::variant_alternative_t<I, V>&
initialized by a member v.get<I>()
or
else a non-member ADL-only get<I>(v)
.
Alt
is compatible with I
if one of the following four cases is true:
Alt
isauto
Alt
is a concept andstd::variant_alternative_t<I, V>
satisfies the concept.Alt
is a type andstd::is_same_v<Alt, std::variant_alternative_t<I, V>>
istrue
Alt
is a constant-expression that can be used in aswitch
and is the same value asI
.
::: cmptable
std::visit(
[&](auto&& x) {
strm << "got auto: " << x;
},
v);
inspect (v) {
<auto> x => {
strm << "got auto: " << x;
}
};
std::visit([&](auto&& x) {
using X = std::remove_cvref_t<decltype(x)>;
if constexpr (C1<X>()) {
strm << "got C1: " << x;
} else if constexpr (C2<X>()) {
strm << "got C2: " << x;
}
}, v);
inspect (v) {
<C1> c1 => {
strm << "got C1: " << c1;
}
<C2> c2 => {
strm << "got C2: " << c2;
}
};
std::visit([&](auto&& x) {
using X = std::remove_cvref_t<decltype(x)>;
if constexpr (std::is_same_v<int, X>) {
strm << "got int: " << x;
} else if constexpr (
std::is_same_v<float, X>) {
strm << "got float: " << x;
}
}, v);
inspect (v) {
<int> i => {
strm << "got int: " << i;
}
<float> f => {
strm << "got float: " << f;
}
};
std::variant<int, int> v = /* ... */;
std::visit(
[&](int x) {
strm << "got int: " << x;
},
v);
std::variant<int, int> v = /* ... */;
inspect (v) {
<int> x => {
strm << "got int: " << x;
}
};
std::variant<int, int> v = /* ... */;
std::visit([&](auto&& x) {
switch (v.index()) {
case 0: {
strm << "got first: " << x; break;
}
case 1: {
strm << "got second: " << x; break;
}
}
}, v);
std::variant<int, int> v = /* ... */;
inspect (v) {
<0> x => {
strm << "got first: " << x;
}
<1> x => {
strm << "got second: " << x;
}
};
:::
\pagebreak
Case 2: std::any
-like
|
<
type>
pattern
If Alt
is a type and there exists a valid non-member ADL-only
any_cast<Alt>(&v)
, let p
be its result. The alternative pattern
matches if p
contextually converted to bool
evaluates to true
,
and pattern matches *p
.
::: cmptable
std::any a = 42;
if (int* i = any_cast<int>(&a)) {
std::cout << "got int: " << *i;
} else if (float* f = any_cast<float>(&a)) {
std::cout << "got float: " << *f;
}
std::any a = 42;
inspect (a) {
<int> i => {
std::cout << "got int: " << i;
}
<float> f => {
std::cout << "got float: " << f;
}
};
:::
Case 3: Polymorphic Types
|
<
type>
pattern
If Alt
is a type and std::is_polymorphic_v<V>
is true
, let p
be
dynamic_cast<Alt'*>(&v)
where Alt'
has the same cv-qualifications as
decltype(&v)
. The alternative pattern matches if p
contextually converted
to bool
evaluates to true
, and pattern matches *p
.
While the semantics of the pattern is specified in terms of dynamic_cast
,
[@N3449] describes techniques involving vtable pointer caching and hash conflict
minimization that are implemented in the [@Mach7] library, as well as mentions
of further opportunities available for a compiler intrinsic.
Given the following definition of a Shape
class hierarchy:
struct Shape { virtual ~Shape() = default; };
struct Circle : Shape { int radius; };
struct Rectangle : Shape { int width, height; };
::: cmptable
virtual int Shape::get_area() const = 0;
int Circle::get_area() const override {
return 3.14 * radius * radius;
}
int Rectangle::get_area() const override {
return width * height;
}
int get_area(const Shape& shape) {
return inspect (shape) {
<Circle> [r] => 3.14 * r * r;
<Rectangle> [w, h] => w * h;
};
}
:::
::: cmptable
int get_area(const Shape& shape) {
return shape match {
Circle: let [r] => 3.14 * r * r;
Rectangle: let [w, h] => w * h;
};
}
int get_area(const Shape& shape) {
return inspect (shape) {
[r] as Circle => 3.14 * r * r;
[w, h] as Rectangle => w * h;
};
}
:::
\pagebreak
The parenthesized pattern has the form:
|
(
pattern)
and matches value v
if pattern matches v
.
std::variant<Point, /* ... */> v = /* ... */;
inspect (v) {
Point: ([let x, 0] | [0, let x]) => // ...
// ^^^^^^^^^^^^^^^^^^^^^^^^^ parenthesized pattern
};
\pagebreak
The dereference pattern has the form:
|
* :
pattern
and matches value v
if v
is contextually convertible to
bool
, evaluates to true
, and pattern matches *v
.
struct Node {
int value;
std::unique_ptr<Node> lhs, rhs;
};
void print_leftmost(const Node& node) {
inspect (node) {
[.value: v, .lhs: nullptr] => { std::cout << v << '\n'; }
[.lhs: *: l] => { print_leftmost(l); }
// ^^^^ dereference pattern
};
}
[Refer to [Red-black Tree Rebalancing] for a more complex example.]{.note}
\pagebreak
The extractor pattern has the following two forms:
| cast-expression : pattern
Let c
be the cast-expression and e
be the result of c(v)
.
The extractor pattern matches value v
if e
is contextually convertible to
bool
, evaluates to true
and pattern matches *e
.
std::optional<std::array<std::string_view, 2>> email(std::string_view sv);
std::optional<std::array<std::string_view, 3>> phone_number(std::string_view sv);
s match {
email: let [address, domain] => { std::cout << "got an email"; }
phone_number: ["415", _, _] => { std::cout << "got a San Francisco phone number"; }
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ extractor pattern
};
The pattern guard has the form:
|
if
expression
Let e
be the result of expression contextually converted to bool
.
If e
is true
, control is passed to the corresponding statement.
Otherwise, control flows to the subsequent pattern.
The pattern guard allows to perform complex tests that cannot be performed within the pattern. For example, performing tests across multiple bindings:
p match {
let [x, y] if test(x, y) => { std::cout << x << ',' << y << " passed"; }
// ^^^^^^^^^^^^^ pattern guard
};
This also diminishes the desire for fall-through semantics within the
statements, an unpopular feature even in switch
statements.
// TODO: check if this actually does what we want...
Every pattern is able to determine whether it matches value v
as a boolean
expression in isolation. Ignoring any potential optimization opportunities,
we're able to perform the following transformation:
::: cmptable
v match {
pattern1 if (cond1) => { stmt1 }
pattern2 => { stmt2 }
// ...
};
if (v match pattern1 && cond1) stmt1
else if (v match pattern2) stmt2
// ...
:::
match constexpr
is then formulated by applying constexpr
to every if
branch.
::: cmptable
v match constexpr {
pattern1 if (cond1) => { stmt1 }
pattern2 => { stmt2 }
// ...
};
if constexpr (v match pattern1 && cond1) stmt1
else if constexpr (v match pattern2) stmt2
// ...
:::
\pagebreak
inspect
can be declared [[strict]]
for implementation-defined exhaustiveness
and usefulness checking.
Exhaustiveness means that all values of the type of the value being matched
is handled by at least one of the cases. For example, having a __:
case makes
any inspect
statement exhaustive.
Usefulness means that every case handles at least one value of the type of
the value being matched. For example, any case that comes after a __:
case
would be useless.
Warnings for pattern matching [@Warnings] discusses and outlines an algorithm for exhaustiveness and usefulness for OCaml, and is the algorithm used by Rust.
Patterns that cannot fail to match are said to be irrefutable in contrast to refutable patterns which can fail to match. For example, the identifier pattern is irrefutable whereas the expression pattern is refutable.
The distinction is useful in reasoning about which patterns should be allowed
in which contexts. For example, the structured bindings declaration is
conceptually a restricted form of pattern matching. With the introduction of
expression pattern in this paper, some may question whether structured bindings
declaration should be extended for examples such as auto [0, x] = f();
.
This is ultimately a question of whether structured bindings declaration supports refutable patterns or if it is restricted to irrefutable patterns.
\pagebreak
The following is the beginning of an attempt at a syntactic structure.
Add to §8.4 [stmt.select] of ...
\pnum{1}Selection statements choose one of several flows of control.
| selection-statement: |
if constexpr
opt(
init-statementoptcondition)
statement |if constexpr
opt(
init-statementoptcondition)
statementelse
statement |switch (
init-statementoptcondition)
statement | [inspect constexpr
opt(
init-statementoptcondition)
trailing-return-typeopt{
inspect-case-seq}
]{.add}
::: add | inspect-case-seq: | inspect-statement-case-seq | inspect-expression-case-seq
| inspect-statement-case-seq: | inspect-statement-case | inspect-statement-case-seq inspect-statement-case
| inspect-expression-case-seq: | inspect-expression-case | inspect-expression-case-seq
,
inspect-expression-case| inspect-statement-case: | inspect-pattern inspect-guard
opt=>
statement| inspect-expression-case: | inspect-pattern inspect-guard
opt=>
assignment-expression| inspect-pattern: | alternative-pattern | case-pattern | dereference-pattern | expression-pattern | extractor-pattern | identifier-pattern | structured-binding-pattern | wildcard-pattern
| inspect-guard: |
if (
expression)
:::
Change §9.1 [dcl.dcl]
| simple-declaration: | decl-specifier-seq init-declarator-list
opt;
| attribute-specifier-seq decl-specifier-seq init-declarator-list;
| [attribute-specifier-seqoptdecl-specifier-seq ref-qualifieropt[
identifier-list]
initializer;
]{.rm} | [attribute-specifier-seqoptdecl-specifier-seq ref-qualifieroptstructured-binding-pattern initializer;
]{.add}
\pagebreak
This paper proposes that _
be used as the wildcard pattern.
// TODO
The design is intended to be consistent and to naturally extend the notions introduced by structured bindings. That is, The subobjects are referred to rather than being assigned into new variables.
We propose any irrefutable pattern to be allowed in structured binding declaration, as it does not introduce any new behaviour. A separate paper will explore possibility of allowing refutable patterns to be used in declarations.
This proposal introduces a new inspect
statement rather than trying to extend
the switch
statement. [@P0095R0] had proposed extending switch
and received
feedback to "leave switch
alone" in Kona 2015.
The following are some of the reasons considered:
switch
allows thecase
labels to appear anywhere, which hinders the goal of pattern matching in providing structured inspection.- The fall-through semantics of
switch
generally results inbreak
being attached to every case, and is known to be error-prone. switch
is purposely restricted to integrals for guaranteed efficiency. The primary goal of pattern matching in this paper is expressiveness while being at least as efficient as the naively hand-written code.
The proposed matching algorithm has first match semantics. The choice of first match is mainly due to complexity. Our overload resolution rules for function declarations are extremely complex and is often a mystery.
Best match via overload resolution for function declarations are absolutely
necessary due to the non-local and unordered nature of declarations.
That is, function declarations live in different files and get pulled in
via mechanisms such as #include
and using
declarations, and there is no
defined order of declarations like Haskell does, for example. If function
dispatching depended on the order of #include
and/or using
declarations
being pulled in from hundreds of files, it would be a complete disaster.
Pattern matching on the other hand do not have this problem because
the construct is local and ordered in nature. That is, all of the candidate
patterns appear locally within inspect (x) { /* ... */ }
which cannot span
across multiple files, and appear in a specified order. This is consistent with
try
/catch
for the same reasons: locality and order.
Consider also the amount of limitations we face in overload resolution due
to the opacity of user-defined types. T*
is related to unique_ptr<T>
as
it is to vector<T>
as far as the type system is concerned. This limitation
will likely be even bigger in a pattern matching context with the amount of
customization points available for user-defined behavior.
We considered the possibility of restricting side-effects within patterns. Specifically whether modifying the value currently being matched in the middle of evaluation should have defined behavior.
The consideration was due to potential optimization opportunities.
bool f(int &); // defined in a different translation unit.
int x = 1;
inspect (x) {
0 => { std::cout << 0; }
1 if (f(x)) => { std::cout << 1; }
2 => { std::cout << 2; }
};
If modifying the value currently being matched has undefined behavior,
a compiler can assume that f
(defined in a different translation unit)
will not change the value of x
. This means that the compiler can generate
code that uses a jump table to determine which of the patterns match.
If on the other hand f
may change the value of x
, the compiler would be
forced to generated code checks the patterns in sequence, since a subsequent
pattern may match the updated value of x
.
The following are illustrations of the two approaches written in C++:
::: cmptable
bool f(int &);
int x = 1;
switch (x) {
case 0: std::cout << 0; break;
case 1: if (f(x)) { std::cout << 1; } break;
case 2: std::cout << 2; break;
}
bool f(int &);
int x = 1;
if (x == 0) std::cout << 0;
else if (x == 1 && f(x)) std::cout << 1;
else if (x == 2) std::cout << 2;
:::
However, we consider this opportunity too niche. Suppose we have a slightly more
complex case: struct S { int x; };
and bool operator==(const S&, const S&);
.
Even if modifying the value being matched has undefined behavior, if the
operator==
is defined in a different translation unit, a compiler cannot do
much more than generate code that checks the patterns in sequence anyway.
There are three popular pattern matching libraries for C++ today: [@Mach7], [@Patterns], and [@SimpleMatch].
While the libraries have been useful for gaining experience with interfaces and implementation, the issue of introducing identifiers, syntactic overhead of the patterns, and the reduced optimization opportunities justify support as a language feature from a usability standpoint.
Many languages provide a wide array of patterns through various syntactic forms. While this is a potential direction for C++, it would mean that every new type of matching requires new syntax to be added to the language. This would result in a narrow set of types being supported through limited customization points.
Matchers and extractors are supported in order to minimize the number of patterns with special syntax. The following are example matchers and extractors that commonly have special syntax in other languages.
+----------------------------+---------------------+
| Matchers / Extractors | Other Languages |
+============================+=====================+
| any_of{1, 2, 3}
| 1 | 2 | 3
|
+----------------------------+---------------------+
| within{1, 10}
| 1..10
|
+----------------------------+---------------------+
| (both!) [[x, 0], [0, y]]
| [x, 0] & [0, y]
|
+----------------------------+---------------------+
| (at!) [p, [x, y]]
| p @ [x, y]
|
+----------------------------+---------------------+
Each of the matchers and extractors can be found in the [Examples] section. The example extractors and matchers are not proposed for standardisation in this paper, and presented just for demonstration.
[@P1371R0] had proposed a unary ^
as an "expression introducer". The main
motivation was to leave the design space open for patterns that look like
expressions. For example, many languages spell the alternation pattern with |
,
resulting in a pattern such as 1 | 2
which means "match 1
or 2
".
However, to allow such a pattern a disambiguation mechanism would be required
since 1 | 2
is already a valid expression today.
That paper also included what is called a dereference pattern with the syntax of
*
pattern. There was clear guidance from EWG to change the syntax of this
pattern due to confusion with the existing dereference operator.
As such, the design direction proposed in this paper is to allow expressions
in patterns without an introducer, and to require that new patterns be
syntactically unambiguous with an expression in general.
The following is a flow graph of decisions that need to be made:
Since inspect
is always an expression we decided to forbid using break
keyword inside
inspect
expressions.
The problem lies with two possible use cases where inspect
would be used.
for (const auto& el: some_vec) {
// If-else-if chain
if (el.type() == "NotInteresting") {
break;
} else if (el.type() == "SomeOther") {
break;
}
// Switch statement
switch (el.value()) {
case 1: /* ... */
break; // no fallthrough
case 2: /* ... */
break; // no fallthrough
default:
/* ... */
}
}
In the example above, if we're replacing existing switch
statement, break
is used
to terminate current statement sequence and jump to first statement after the switch
.
In particular it is required to prevent fallthrough.
If the code being replaced is a sequence of if-else branches, break
there would indicate iteration stop
for the enclosing loop.
Both use cases are interesting and valid for inspect
, but resulting break
behaviour differs. If we
were to adopt one, the other use case would be prone to error. So for now we decided to forbid
using break
statements inside inspect
expression branches.
Note, it is generally desirable to be able to yield from inspect
expression branch early, but currently
there is no syntax that would allow specifying yield value with break
statement (i.e. break 2;
).
We think this behaviour is valuable, but not crucial for this proposal.
The following are few of the optimizations that are worth noting.
Structured binding patterns can be optimized by performing switch
over
the columns with the duplicates removed, rather than the naive approach of
performing a comparison per element. This removes unnecessary duplicate
comparisons that would be performed otherwise. This would likely require some
wording around "comparison elision" in order to enable such optimizations.
The sequence of alternative patterns can be executed in a switch
.
[@N3449] describes techniques involving vtable pointer caching and hash conflict minimization that are implemented in the [@Mach7] library, but also mentions further opportunities available for a compiler solution.
Short-string optimization using a predicate as a discriminator rather than an explicitly stored value. Adapted from Bjarne Stroustrup's pattern matching presentation at Urbana-Champaign 2014 [@PatMatPres].
struct String {
enum Storage { Local, Remote };
int size;
union {
char local[32];
struct { char *ptr; int unused_allocated_space; } remote;
};
// Predicate-based discriminator derived from `size`.
Storage index() const { return size > sizeof(local) ? Remote : Local; }
// Opt into Variant-Like protocol.
template <Storage S>
auto &&get() {
if constexpr (S == Local) return local;
else if constexpr (S == Remote) return remote;
}
char *data();
};
namespace std {
// Opt into Variant-Like protocol.
template <>
struct variant_size<String> : std::integral_constant<std::size_t, 2> {};
template <>
struct variant_alternative<String::Local, String> {
using type = decltype(String::local);
};
template <>
struct variant_alternative<String::Remote, String> {
using type = decltype(String::remote);
};
}
char* String::data() {
return inspect (*this) {
<Local> l => l;
<Remote> r => r.ptr;
};
// switch (index()) {
// case Local: {
// std::variant_alternative_t<Local, String>& l = get<Local>();
// return l;
// }
// case Remote: {
// std::variant_alternative_t<Remote, String>& r = get<Remote>();
// return r.ptr;
// }
// }
}
A class hierarchy can effectively be closed with an enum
that maintains
the list of its members, and provide efficient dispatching by opting into
the Variant-Like protocol.
A generalized mechanism of pattern is used extensively in LLVM;
llvm/Support/YAMLParser.h
[@YAMLParser] is an example.
struct Shape { enum Kind { Circle, Rectangle } kind; };
struct Circle : Shape {
Circle(int radius) : Shape{Shape::Kind::Circle}, radius(radius) {}
int radius;
};
struct Rectangle : Shape {
Rectangle(int width, int height)
: Shape{Shape::Kind::Rectangle}, width(width), height(height) {}
int width, height;
};
namespace std {
template <>
struct variant_size<Shape> : std::integral_constant<std::size_t, 2> {};
template <>
struct variant_alternative<Shape::Circle, Shape> { using type = Circle; };
template <>
struct variant_alternative<Shape::Rectangle, Shape> { using type = Rectangle; };
}
Shape::Kind index(const Shape& shape) { return shape.kind; }
template <Kind K>
auto&& get(const Shape& shape) {
return static_cast<const std::variant_alternative_t<K, Shape>&>(shape);
}
int get_area(const Shape& shape) {
return inspect (shape) {
<Circle> c => 3.14 * c.radius * c.radius;
<Rectangle> r => r.width * r.height;
};
// switch (index(shape)) {
// case Shape::Circle: {
// const std::variant_alternative_t<Shape::Circle, Shape>& c =
// get<Shape::Circle>(shape);
// return 3.14 * c.radius * c.radius;
// }
// case Shape::Rectangle: {
// const std::variant_alternative_t<Shape::Rectangle, Shape>& r =
// get<Shape::Rectangle>(shape);
// return r.width * r.height;
// }
// }
}
\pagebreak
The logical-or pattern in other languages is typically spelled
_pattern_0 |
_pattern_1 | ... |
_pattern_N, and matches
value v
if any patterni matches v
.
This provides a restricted form (constant-only) of the logical-or pattern.
template <typename... Ts>
struct any_of : std::tuple<Ts...> {
using tuple::tuple;
template <typename U>
bool match(const U& u) const {
return std::apply([&](const auto&... xs) { return (... || xs == u); }, *this);
}
};
int fib(int n) {
return inspect (n) {
x if (x < 0) => 0;
any_of{1, 2} => n; // 1 | 2
x => fib(x - 1) + fib(x - 2);
};
}
The range pattern in other languages is typically spelled first..last
,
and matches v
if v
[first, last]
.
struct within {
int first, last;
bool match(int n) const { return first <= n && n <= last; }
};
inspect (n) {
within{1, 10} => { // 1..10
std::cout << n << " is in [1, 10].";
}
__ => {
std::cout << n << " is not in [1, 10].";
}
};
\pagebreak
The logical-and pattern in other languages is typically spelled
_pattern_0 &
_pattern_1 & ... &
_pattern_N, and matches v
if all of _pattern_i matches v
.
This extractor emulates binary logical-and with a std::pair
where
both elements are references to value v
.
struct Both {
template <typename U>
std::pair<U&&, U&&> extract(U&& u) const {
return {std::forward<U>(u), std::forward<U>(u)};
}
};
inline constexpr Both both;
inspect (v) {
(both!) [[x, 0], [0, y]] => // ...
};
The binding pattern in other languages is typically spelled
identifier @
pattern, binds identifier to v
and matches if pattern
matches v
. This is a special case of the logical-and pattern
(_pattern_0 &
_pattern_1) where _pattern_0 is an identifier.
That is, identifier &
pattern has the same semantics as
identifier @
pattern, which means we get at
for free from both
above.
inline constexpr at = both;
inspect (v) {
<Point> (at!) [p, [x, y]] => // ...
// ...
};
Dereference patterns frequently come into play with complex patterns using recursive variant types. An example of such a problem is the rebalance operation for red-black trees. Using pattern matching this can be expressed succinctly and in a way that is easily verified visually as having the correct algorithm.
Given the following red-black tree definition:
enum Color { Red, Black };
template <typename T>
struct Node {
void balance();
Color color;
std::shared_ptr<Node> lhs;
T value;
std::shared_ptr<Node> rhs;
};
\pagebreak
The following is what we can write with pattern matching:
template <typename T>
void Node<T>::balance() {
*this = inspect (*this) {
// left-left case
//
// (Black) z (Red) y
// / \ / \
// (Red) y d (Black) x (Black) z
// / \ -> / \ / \
// (Red) x c a b c d
// / \
// a b
[case Black, (*?) [case Red, (*?) [case Red, a, x, b], y, c], z, d]
=> Node{Red, std::make_shared<Node>(Black, a, x, b),
y,
std::make_shared<Node>(Black, c, z, d)};
[case Black, (*?) [case Red, a, x, (*?) [case Red, b, y, c]], z, d] // left-right case
=> Node{Red, std::make_shared<Node>(Black, a, x, b),
y,
std::make_shared<Node>(Black, c, z, d)};
[case Black, a, x, (*?) [case Red, (*?) [case Red, b, y, c], z, d]] // right-left case
=> Node{Red, std::make_shared<Node>(Black, a, x, b),
y,
std::make_shared<Node>(Black, c, z, d)};
[case Black, a, x, (*?) [case Red, b, y, (*?) [case Red, c, z, d]]] // right-right case
=> Node{Red, std::make_shared<Node>(Black, a, x, b),
y,
std::make_shared<Node>(Black, c, z, d)};
self => self; // do nothing
};
}
\pagebreak
The following is what we currently need to write:
template <typename T>
void Node<T>::balance() {
if (color != Black) return;
if (lhs && lhs->color == Red) {
if (const auto& lhs_lhs = lhs->lhs; lhs_lhs && lhs_lhs->color == Red) {
// left-left case
//
// (Black) z (Red) y
// / \ / \
// (Red) y d (Black) x (Black) z
// / \ -> / \ / \
// (Red) x c a b c d
// / \
// a b
*this = Node{
Red,
std::make_shared<Node>(Black, lhs_lhs->lhs, lhs_lhs->value, lhs_lhs->rhs),
lhs->value,
std::make_shared<Node>(Black, lhs->rhs, value, rhs)};
return;
}
if (const auto& lhs_rhs = lhs->rhs; lhs_rhs && lhs_rhs->color == Red) {
*this = Node{ // left-right case
Red,
std::make_shared<Node>(Black, lhs->lhs, lhs->value, lhs_rhs->lhs),
lhs_rhs->value,
std::make_shared<Node>(Black, lhs_rhs->rhs, value, rhs)};
return;
}
}
if (rhs && rhs->color == Red) {
if (const auto& rhs_lhs = rhs->lhs; rhs_lhs && rhs_lhs->color == Red) {
*this = Node{ // right-left case
Red,
std::make_shared<Node>(Black, lhs, value, rhs_lhs->lhs),
rhs_lhs->value,
std::make_shared<Node>(Black, rhs_lhs->rhs, rhs->value, rhs->rhs)};
return;
}
if (const auto& rhs_rhs = rhs->rhs; rhs_rhs && rhs_rhs->color == Red) {
*this = Node{ // right-right case
Red,
std::make_shared<Node>(Black, lhs, value, rhs->lhs),
rhs->value,
std::make_shared<Node>(Black, rhs_rhs->lhs, rhs_rhs->value, rhs_rhs->rhs)};
return;
}
}
}
The design of this proposal also accounts for a potential language support
for variant. It achieves this by keeping the alternative pattern flexible
for new extensions via <
new_entity >
pattern.
Consider an extension to union
that allows it to be tagged by an integral,
and has proper lifetime management such that the active alternative need not
be destroyed manually.
// `: type` specifies the type of the underlying tag value.
union U : int { char small[32]; std::vector<char> big; };
We could then allow <
qualified-id >
that refers to a union
alternative to support pattern matching.
U u = /* ... */;
inspect (u) {
<U::small> s => { std::cout << s; }
<U::big> b => { std::cout << b; }
};
The main point is that whatever entity is introduced as the discriminator, the presented form of alternative pattern should be extendable to support it.
The benefit of pattern matching for ranges is unclear. While it's possible to
come up with a ranges pattern, e.g., {x, y, z}
to match against a fixed-size
range, it's not clear whether there is a worthwhile benefit.
The typical pattern found in functional languages of matching a range on head and tail doesn't seem to be all that common or useful in C++ since ranges are generally handled via loops rather than recursion.
Ranges likely will be best served by the range adaptors / algorithms, but further investigation is needed.
Thanks to all of the following:
- Yuriy Solodkyy, Gabriel Dos Reis, Bjarne Stroustrup for their prior work on [@N3449], Open Pattern Matching for C++ [@OpenPM], and the [@Mach7] library.
- Pattern matching presentation by Bjarne Stroustrup at Urbana-Champaign 2014. [@PatMatPres]
- Jeffrey Yasskin/JF Bastien for their work on [@P1110R0].
- (In alphabetical order by last name) Dave Abrahams, John Bandela, Agustín Bergé, Ori Bernstein, Matt Calabrese, Alexander Chow, Louis Dionne, Michał Dominiak, Vicente Botet Escribá, Eric Fiselier, Bengt Gustafsson, Zach Laine, Jason Lucas, Barry Revzin, John Skaller, Bjarne Stroustrup, Tony Van Eerd, and everyone else who contributed to the discussions.
\pagebreak
references:
- id: OpenPM
citation-label: OpenPM
title: "Open Pattern Matching for C++"
author:
- family: Solodkyy given: Yuriy
- family: Reis given: [Gabriel, Dos]
- family: Stroustrup given: Bjarne URL: http://www.stroustrup.com/OpenPatternMatching.pdf
- id: Mach7
citation-label: Mach7
title: "Mach7: Pattern Matching for C++"
author:
- family: Solodkyy given: Yuriy
- family: Reis given: [Gabriel, Dos]
- family: Stroustrup given: Bjarne URL: https://github.com/solodon4/Mach7
- id: PatMatPres
citation-label: PatMatPres
title: ""Pattern Matching for C++" presentation at Urbana-Champaign 2014"
author:
- family: Solodkyy given: Yuriy
- family: Reis given: [Gabriel, Dos]
- family: Stroustrup given: Bjarne
- id: SimpleMatch
citation-label: SimpleMatch
title: "Simple, Extensible C++ Pattern Matching Library"
author:
- family: Bandela given: John URL: https://github.com/jbandela/simple_match
- id: Patterns
citation-label: Patterns
title: "Pattern Matching in C++"
author:
- family: Park given: Michael URL: https://github.com/mpark/patterns
- id: Warnings
citation-label: Warnings
title: "Warnings for pattern matching"
author:
- family: Maranget given: Luc URL: http://moscova.inria.fr/~maranget/papers/warn/index.html
- id : YAMLParser citation-label : YAMLParser URL: http://llvm.org/doxygen/YAMLParser_8h_source.html
- id : SwiftPatterns citation-label : Swift Patterns title : "Swift Reference Manual - Patterns" URL: https://docs.swift.org/swift-book/ReferenceManual/Patterns.html