diff --git a/designs/0003-ragged-containers.md b/designs/0003-ragged-containers.md new file mode 100644 index 0000000..482d123 --- /dev/null +++ b/designs/0003-ragged-containers.md @@ -0,0 +1,575 @@ +- *Feature Name:* **Ragged Containers for Stan Language** +- *Start Date:* 2019-07-28 +- *RFC PR:* +- *Stan Issue:* + +# Summary +[summary]: #summary + +Add sized ragged containers to the Stan language so that we can +support non-rectangular structures of any type with natural indexing. +The term "ragged container" is used because the resulting structures +might be arrays of vectors or matrices. + + +# Motivation +[motivation]: #motivation + +Ragged data structures are ubiquitous. Use cases include any data set +where there are differing numbers of observations. Typical examples +include hierarchical models, which might contain different numbers of +item per group. For example, the number of survey respondents may be +different in different groups, such as men and women, or European +vs. Asian residents. + +The motivation for this proposal is that *ragged containers allow +ragged data to be represented and modeled using the same idioms as +rectangular containers.* In particular, we want to support vectorized +statements without slicing (which is hard to read and hence error +prone). + +As of Stan 2.19, there is no built-in support for ragged containers. +See the [ragged array section of the user's +guide](https://mc-stan.org/docs/2_19/stan-users-guide/ragged-data-structs-section.html) +for an explanation of how ragged structures may be coded now through +slicing and/or long-form index fiddling. + + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +A *ragged container* is nothing more than a container whose elements +may vary in size. + +### Ragged arrays of reals + +The simplest ragged container is a ragged array. A simple non-trivial +example is: + +``` +real[{3, 4}] x = {{a, b, c}, {d, e, f, g}}; +``` + +Here we are assuming that the standard array expression will be +generalized to work for ragged arrays. For example, `{1, 2}` is a +two-dimensional array of integers, and `{{1, 2}, {3, 4}, {5, 6}}` +denotes a three by two array of integers. + +The variable `x` is a ragged two-dimensional array of size two. +Indexing works as usual, with + +``` +x[1] == {a, b, c} +``` + +and + +``` +x[2] == {d, e, f, g} +``` + +Sizing also works as expected, with + +``` +size(x) == 2 +``` + +and + +``` +size(x[1]) == 3 +size(x[2]) == 4 +``` + +This array is ragged because the sizes of `x[1]` and `x[2]` are +different. If `x[i1, ..., iN]` is the same for all valid indexes, +then `x` is said to be *rectangular*. + +We write the unsized type of `x` the same way as for rectangular two-dimensional arrays, + +``` +x : real[ , ] +``` + +Declarations for ragged arrays are made up of the sizes of their +elements. Declaring `x` of type and size `real[{3, 4}]` declares `x` +to be a two-dimensional array with `x[1]` having size three and `x[2]` +size four. + +Providing an out-of-bounds index will raise the same exceptions (and +thus cause the same behavior in interfaces) as other out-of-bounds +errors. + +A ragged three-dimensional array may be defined similarly; each +element in the sizing is the sizing of a ragged two-dimensional array. + +``` +real[{ {2, 3}, { 1, 2, 3 } }] y + = { {{a, b, c}, {d, e, f, g}}, + {{h}, {i, j}, {k, l, m}} }; +``` + +The size of `y` is the number of elements in its top-level +declaration, here `size(y) == 2`. The first element of `y` is a +ragged two-dimensional array, + +``` +y[1] == {{a, b, c}, {d, e, f, g}} +``` + +and the second is + +``` +y[2] == {{h}, {i, j}, {k, l, m}} +``` + +The type of both `y[1]` and `y[2]` is `real[, ]`, the type of a +real-valued two-dimensional array. This is a requirement of ragged +structures. + +This brings up an important principle of ragged structures: each +element in a ragged structure must be the same unsized type. That is, +it's not possible to have a ragged array whose first element is +two dimensional and whose second element is three dimensional. + +### Ragged arrays of vectors + +In addition to arrays, the values of ragged structures may be vectors +or matrices. For example, a ragged array of vectors may be declared +and defined as + +``` +vector[{3, 2}] v = { [a, b c]', [e, f]' }; +``` + +The variable `v` is typed as an array of vectors, `v : vector[]`. The +value of `v[1]` is the size three vector `[a, b, c]'`, whereas the +value of `v[2]` is the size two vector `[e, f]'`. Both `v[1]` and +`v[2]` are of type `vector`. All + +All size functions (`rows()`, `cols()`, etc.) work as expected. + +Row vectors are handled identically, e.g., + +``` +row_vector[{3, 2}] w = { [a, b c], [e, f] }; +``` + +### Ragged arrays of matrices + + +Ragged arrays of matrices are more challenging to declare because +their base elements are two dimensional. For example, + +``` +matrix[2, 3] u = [[a, b, c], [d, e, f]]; +``` + +declares `u` to be a two by three matrix. In order to deal with +ragged matrix declarations in general, the following declaration is +identical to the previous, for a two by three matrix, + + +``` +matrix[{2, 3}] u = [[a, b, c], [d, e, f]]; +``` + +This is chosen to match the dimensions declaration, where the function +`dims` returns a matrix's dimensions, + +``` +dims(u) == {2, 3} +``` + +A ragged array of matrices is declared using a sequence of sizes. For +example, + +``` +matrix[{ {2, 3}, {3, 2}, {1, 2} }] v + = { [[a, b, c], [d, e, f]], + [[g, h], [i, j], [k, l]], + [[m, n]] }; +``` + +declares a one-dimensional array of matrices of size three, so + +``` +size(v) == 3 +``` + +the dimensions of which are + +``` +dims(v[1]) == {2, 3} +dims(v[2]) == {3, 2} +dims(v[3]) == {1, 2} +``` + +Multidimensional arrays of matrices are handled in the same way. + + +### Constrained type declarations + +Because all constrained types resolve at run time to real, integer, +vector, row vector, or matrix types, ragged containers of constrained +types may be declared in exactly the same way. + +For instance, to declare a ragged two-dimensional array of +probabilities, we can use + +``` +real[{3, 4}] x = {{a, b, c}, {d, e, f, g}}; +``` + +We can declare a ragged array of simplexes as + +``` +simplex[{3, 2, 2}] theta + = { [0.2, 0.7, 0.1]', [0.3, 0.7]', [0.0018, 0.9982]' }; +``` + +Indexing and sizing is the same as for ragged arrays of vectors. + +We can declare a two-dimensional matrix `Sigma` of size three, +containing a 3 by 3, 2 by 2, and another 2 by 2 matrix, + +``` +cov_matrix[{ {3, 3}, {2, 2}, {2, 2} }] Sigma + = { [[1, 0, 0], [0, 1, 0], [0, 0, 1]], + [[1, 0.5], [0.5, 1]], + [[2.3, 0.8], [0.8, 3.5]] }; +``` + +Correlation matrices, as well as Cholesky factors of covariance and +correlation matrices may be declared the same way. + +#### Square correlation and covariance matrices + +*Optional*: Because correlation matrices and covariance matrices are + typically square, a 3-dimensional array of 2 by 2 covariance matrices + may be declared as + +``` +cov_matrix[2][3] x + = { [[1, 0], [0, 1]], + [[3.1, -0.2], [-0.2, 3.1]], + [[15.2, 0.1], [0.1, 15.2]] }; +``` + +### Adoption note + +Nothing changes in the language for declaring rectangular containers, +so this should be straightforward to teach to new Stan users. +Rectangular arrays are just a special case of ragged arrays in their +usage. + + +# Developer notes + +This is covered in the reference-level explanation, but for the sake +of completeness, the main issues are + +1. *Error checking*: Ragged structures can use exactly the same error checking for sizing +and assignment as rectangular containers, as these have the same C++ +runtime types. + +2. *I/O*: The base `var_context` type and all implementations will need + to be updated to deal with ragged structures. Otherwise, this should + not be a problem for either the R dump or JSON formats, as neither of + them requires rectangular structures. + +3. *type checking*: enhancing parser and code generator to allow +non-rectangular containers + + + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +## Type and size inference + +The typing can be explained with a sequence of simple sequent rules. +We currently have the array declaration base case + +* `T[i1, ..., iN]`: `N`-dimensional array of `T` + +and propose to add: + +* `matrix[{i, j}]` as a base case with the same meaning as `matrix[i, j]` + +The recursive case for an arbitrary type `T` is then just + +* if `x : T[arr]` then `x[i] : T[arr[i]]` + +#### Examples + +For example, if + +``` +x : real[{3, 2, 4}] +``` + +then + +``` +x[1] : real[3] +x[2] : real[2] +x[3] : real[4] +``` + +That is, `x` is a two-dimensional ragged array containing +one-dimensional arrays of sizes three, two, and four. The variable +`x` itself will be represented as unsized type `real[ , ]`, which we +write as + +``` +x : real[ , ] +``` + +As such nothing distinguishes ragged from fixed types in their runtime types. + +Now suppose + +``` +y : real[{{{3, 2}, {4, 5}}, {{3, 1}}}] +``` + +The unsized type of `y` is + +``` +y : real[ , , ] +``` + +a three-dimensional ragged array. We have + +``` +size(y) = 2 +``` + +and + +``` +y[1] : real[{{3 ,2}, {4, 5}}] +y[2] : real[{{3, 1}}] +``` + + + +``` +y[1, 1] : real[{3, 2}] +y[1, 2] : real[{4, 5}] +y[2, 1] : real[{3, 1}] +``` + +It would be illegal to declare + +``` +real[{{3, 2}, {4, 5}}, {1, 2}}] z; +``` + +because `z[1]` would be of type `real[ , ]`, whereas `z[2]` +would be of type `real[]`. + +Suppose we have a ragged array of matrices of type + +``` +u : matrix[{{2, 3}, {4, 5}}] +``` + +The type of `u` is just an array of matrices, + +``` +u : matrix[] +``` + +The elements are of type + +``` +u[1] : matrix[2, 3] +u[2] : matrix[4, 5] +``` + +## Static vs. dynamic sizing + +The proposal is to wait for unsized local variables for dynamically +sized ragged containers. Until then, ragged structures will behave +like Stan's other type, in that sizes are fixed at the point they are +declared and immutable afterward. This will *not* require additional code as +the assignment function is already defined to do size matching. + +The one place where we have dynamic sizes is in function arguments, +and those will work just like assignment and should not require any +additional code generation or type checking. The function argument +syntax is the one we've been using where a type is a contiguous +sequence of characters. + + +## Runtime types + +The runtime type of a ragged container is the same as that of a +rectangular one. This does not require any changes, as the +`std::vector>` used for two-dimensional arrays of +integers can support ragged arrays; rectangular arrays are just a +special case where each element is the same size (all the way down to +through base types integer, real, vector, row vector, and matrix). + + +## Array expressions + +We will need to generalize array expressions to permit ragged array +expressions, such as `{{1, 2}, {3, 4, 5}}`. The underlying type will not +change (`vector>` in C++). This will be +straightforward and existing assignment statements will be able to +handle all size checking. + +## I/O format + +#### JSON + +The JSON format does not change. Nor do the return types in the +interfaces. The only change is that + +#### RDump + +This is being deprecated in interfaces other than R, which reads it +natively. Optionally, we could define a data format to allow ragged +containers in R. The approach would be to use lists. That is, +if a ragged container `x` was of size `N`, we'd represent it as +`list(x[1]', ..., x[N]')` where `x[n]'` is the encoding of `x[n]`. + +## Cross assignment + +We will be able to support the assignment of a ragged structure to a +rectangular structure and vice-versa. Either way, the sizes will have +to match if the variable being assigned is a sized block variable. + + +# Drawbacks +[drawbacks]: #drawbacks + +It will take developer effort to code, test, document, and maintain. + + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +A [preliminary +design](https://github.com/stan-dev/stan/wiki/Ragged-array-spec) was +refined online and motivated this design. I don't consider it a +serious alternative to this proposal. + +# Prior art +[prior-art]: #prior-art + +For now, what we do is code things in one of two long forms. Suppose +we have a sequence of sequences, each of which we want to model as a +time-series, say + +``` +int K; // num items +int N; // observations +real[N] y; // num observations +int[K] start; // start of each series +int[K] end; // end of each series +... +for (k in 1:K) + y[start[k], end[k]] ~ foo(theta); +``` + +For example, we might have data + +``` +N = 15; +K = 4; +start = {1, 5, 7, 12}; +end = {5, 7, 12, 16}; +y = {1, 2, 3, 0, -1, 0.2, 3, 5, + 7, 9, 11, 15, 13, 11, 9}; +``` + +With the new proposal, this will look like: + +``` +real[{4, 2, 5, 9}] yr + = {{1, 2, 3, 0}, {-1, 0.2}, + {3, 5, 7, 9, 11}, {15, 13, 11, 9 }}; +... +for (k in 1:K) + yr[k] ~ foo(theta); +``` + +The new data structure will also be more efficient as a reference can +be returned from `yr[k]` rather than requiring a copy. + +In some situations, long form can be used if there's no inherent +structure in the subarrays. For example, we might have long form data + +``` +real[N] y + = {1, 2, 3, 0, -1, 0.2, 3, 5, + 7, 9, 11, 15, 13, 11, 9}; +int[N] kk = {1, 1, 1, 1, 2, 2, 3, 3, 3, + 3, 3, 4, 4, 4, 4}; +... +for (n in 1:N) + y[n] ~ foo(theta[kk[n]]); +``` + +whereas the ragged structure would either be + +``` +for (k in 1:K) + y[k] ~ foo(theta[k]); +``` + +or even + +``` +y ~ foo(theta); +``` + +if `foo` is vectorized. + +### Other languages + +In R, they use lists, which allow heterogeneous containers. The +ragged proposal here is more strongly typed. MATLAB does not support +ragged arrays. + +A language like C++ supports ragged arrays in the sense of allowing +containers to hold containers. For example, you could have + +``` +using std::vector; +vector> y; +for (size_t i = 0; i < 100; ++i) + y[i] = vector(0.0, i); // y[i].size() == i +``` + +This C++ idiom will be supported when we have unsized local variables. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +Whether this gets implemented for the deprecated Rdump format of I/O. +(The JSON format should not need to change other than that loops need +to be size-bounded rather than constant-bounded.) + +Ragged containers will play nicely with unsized local variables and +function arguments. + +This proposal as written depends on allowing type declarations like: + +``` +real[...] x; +``` + +rather than + +``` +real x[...]; +``` + +This is *not* critical for this proposal, though it does specify what +the function argument type language will look like (same thing without +sizes). diff --git a/designs/0004-closures-fun-types.md b/designs/0004-closures-fun-types.md new file mode 100644 index 0000000..de54668 --- /dev/null +++ b/designs/0004-closures-fun-types.md @@ -0,0 +1,608 @@ +- *Feature Name:* closures-fun-types +- *Start Date:* 2019-08-31 +- *RFC PR(S):* +- *Stan Issue(s):* + +# Summary +[summary]: #summary + +Add functional types and closures to the Stan language; generalize +scope of function definitions. + +# Motivation +[motivation]: #motivation + +1. Extend object sized and unsized type language to functional types. + +2. Allow general expressions, variables, function arguments of + function types to support functional programming. + +3. Allow functions to be defined in any scope. + +4. Allow reference to variables in static lexical scope to be + captured via closures without passing as arguments. + + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +## Functional Types + +#### First-order functions + +Consider the multiplication function, which among other signatures, +requires a row vector and vector argument and returns their product, a +real value. The function can be coded in Stan (without input +size validation) as + +``` +real mult(row_vector x, vector y) { + real prod = 0; + for (n in 1:cols(x)) + prod += x[n] * y[n]; + return prod; +} +``` + +This defines an identifier `mult` that is of type `real(row_vector, +vector)`. In the type, the arguments are presented in order between +parentheses and the result is on the outside. + +The types `real`, `int`, `vector`, `row_vector`, and `matrix` are all +first-order types. Arrays of first-order types are also first-order +types, so that includes `real[]`, `int[ , ]`, and `row_vector[]`. A +function is *first-order* if its arguments and result are first-order +types. The current version of Stan only allows users to define +first-order functions. + +#### Higher-order functions + +A function that takes function arguments or returns a function is said +to be *higher-order* (we don't need to assign integer orders here; +this definition is just for convenience). + +The arguments to a function might themselves be functions. For +instance, consider a function that composes two other functions, + +``` +real compose_apply(real(real) f, real(real) g, real x) { + return f(g(x)); +} +``` + +The argument `f` is of type `real(real), i.e., a function from reals +to reals. The argument `g` is of the same type. The final argument +`x` is of type real. The function simply applies `g` to `x`, then +applies `f` to the result. The type of `compose_apply` is +`real(real(real), real(real), real)`. + +Higher-order functions can also return functions as results. For +example, consider the following version of multiplication that takes +its arguments one at a time (such functions are said to be *curried* +after Haskell Curry, one of the pioneers of higher-order logic and +computation). + +``` +real(vector) curry_mult(row_vector x) { + return + (vector y) { + return mult(x, y); + } +} +``` + +This definition introduces two new concepts, lambdas for anonymous +functions and closures, which we will unpack in the next two +sections. For now, we will concentrate on implicit function typing. + +The function `curry_mult` is of type `(real(vector))(row_vector)`, +meaning that it takes a single argument of type `row_vector` and +returns a result of type `real(vector)`, i.e., a function from +`vector` to `real`. To avoid a pileup of parentheses on the left of +type expressions, we assume they are left associative, so that, for +example, + +``` +real(vector)(row_vector) == (real(vector))(row_vector) +``` + +Stan's types are what are known as *simple*, meaning they are +recursively composed of simpler types down to first-order types. In +In contrast, programming languages like Lisp involve much more general +recursive typing, with which is possible to define a function that can +be applied to itself. Languages OCaml go even further in allowing +mutually recursive templated type definitions. This proposal sticks +to simple types. + +#### Lambdas + +The term "lambda" is derived from the lambda-calculus, the logic +behind functional programming languages from Lisp to OCaml. +Currently, a Stan function is defined with a name like the definition +of `mult` above. With lambdas, we can define *anonymous functions* +that do not have names. + +For example, consider the expression + +``` +(row_vector x, vector y) { + return x * y; +} +``` + +This is a lambda defining a function that takes two arguments, a row +vector `x` and a vector `y`, and returns their product. The return +type, `real`, is implicitly defined by the type of `x * y`. Thus the +type of the whole expression is `real(row_vector, vector)`. + +Using Stan's type language, we can declare a function variable and assign +a function expression to it. For example + +``` +real(row_vector x, vector y) f; +... +f = (row_vector x, vector y) { return x * y; }; +... +row_vector a = ...; +vector b = ...; +real c = f(a, b); +``` + +After this program executes, `c` will be equal to `a * b`. + +Using the declare-define syntax, this provides an alternative +approach to defining functions, + +``` +real(row_vector, vector) mult + = (row_vector x, vector y) { + return x * y; + }; +``` + +The right-hand side of the expression defining `mult` is a lambda +defining an anonymous function; the left hand side declares the +variable `mult`, which is assigned the anonymous function as its +value. The resulting value of `mult` is identical to the more +standard definition form, + +``` +real mult(row_vector x, vector y) { + return x * y; +} +``` + +This standard form may be thought of as syntactic sugar for the lambda +and assignment. + + +#### Closures + +Recall the definition of the curried form of row-vector/vector multiplication, + +``` +real(vector) curry_mult(row_vector x) { + return (vector y) { return mult(x, y); }; +} +``` + +The value returned is an anonymous function, defined by + +``` +(vector y) { return mult(x, y); } +``` + +More specifically, the value is a *closure* beause the variable `x` +takes its value from the value of `x` in an enclosing scope, here the +function argument `row_vector x`. Stan employs *static lexical +scoping*, meaning that closures defined by lambdas such as the one +above capture references to variables in enclosing scopes, including +earlier in the Stan program. + +For example, we can capture data variables by defining a function in +the transformed data block, + +``` +data { + real x; + real y; +} +transformed data { + real foo(real z) { + return x * y + z; + } +} +``` + +The function `foo` defined in the transformed data block uses +variables `x` and `y`, which act as references to the variables `x` +and `y` defined in the data block. The same approach may be used to +capture parameters by defining a function in the transformed +parameters block. The type of `foo` is simply `real(real)`, as it +takes a real argument and returns a real result; under the hood, the +function stores references to the variables `x` and `y` in the +enclosing scope for use when the function is evaluated. + +Lambdas may also use closures. For example, if the following appeared +in a block allowing statements, + +``` +real x = 12; +real(real) h = (real u) { return u + x; }; +print(h(5)); +``` + +the value 17 will be printed. On the other hand, if we redefine `x` +before calling the function, the current value will be used. That +means that the following program fragment, + +``` +real x = 12; +real(real) h = (real u) { return u + x; }; +x = 1; +print(h(5)); +``` + +will print 6, because the value of `x` used in `h(5)` will be 1. + +This style of scoping for closures captures variables by reference, +resolving which variable's value to use by static lexical scoping. +This latter term just means the variable to be used is known at +compile time and scoping is to the (lexical) environment in which the +lamda is defined. Any variable in scope (that is, available to be used +or printed) may be used in a lambda. + +Now we can put everthing together with an example to see how +composition might work. + +``` +real(real)(real(real))(real(real)) compose + = (real(real) f) { + return (real(real) g) { + return (real x) { + return f(g(x)); + } + } + }; +real(real) sq = (real x) { return x^2; } +real(real) p1 = (real u) { return u + 1; } +real(real) sq_p1 = compose(sq, p1); +real y = sq_p1(5); // y is 26 +``` + +The composition function is written out directly in curried form. +Then the `sq` and `p1` variables are set to functions and passed into +`compose`. We could repeat and evaluate `compose(sq_p1)(sq_p1)(3)`. +We cold also pass the lambdas in directly as argumens without ever +giving them names, + +``` +real(real) sq_p1 + = compose((real x) { return x^2; }, + (real u) { return u + 1; }); +``` + +#### Dangling references + +Closures capturing local variables bring the risk of dangling +references. Consider the following example: + +``` +transformed data { + real(real) f; + { + real y = 3; + f = (real u) { return u + y; }; + real a = f(5); // OK: y = 3, so a = 8 + } + real b = f(7); // ERROR: y out of scope +``` + +Recall that the inner braces define a local scope; as soon as the last +statement executes, local variables go out of scope and have undefined +values. The risk in using closures to capture local variables is that +the closure may survive beyond the variable to which it is referring +if it is captured by assignment or returned from a function. + +In the above example, when `f(5)` is evaluated in the local scope, the +value of `y` is available, with value 3. This allows the inner term +`u + y` to be evaluated in the function, resulting in a value for `a` +of 8. In stark contrast, the evaluation of `f(7)` outside of the +scope for `y` results in undefined behavior---the value of `y` is no +longer available from the inner scope as we have returned from that +scope. The value of `f` persists outside of the local scope, but it +contains a reference to a variable `y` that no longer exists. This +kind of dangling reference needs to be avoided. + + + +Explain the proposal as if it was already included in the project and you were teaching it to another Stan programmer in the manual. That generally means: + +- Introducing new named concepts. +- Explaining the feature largely in terms of examples. +- Explaining how Stan programmers should *think* about the feature, and how it should impact the way they use the relevant package. It should explain the impact as concretely as possible. +- If applicable, provide sample error messages, deprecation warnings, or migration guidance. +- If applicable, describe the differences between teaching this to existing Stan programmers and new Stan programmers. + +For implementation-oriented RFCs (e.g. for compiler internals), this section should focus on how compiler contributors should think about the change, and give examples of its concrete impact. For policy RFCs, this section should provide an example-driven introduction to the policy, and explain its impact in concrete terms. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +## Type system + +The underlying type system for Stan relies on a notion of runtime +typing of all objects. The runtime type does *not* include any +constraints. The constraints are used for bounds checking for read or +constructed types and for transforms for parameters. + +Sized types are used for block declarations and unsized types are used +for function arguments. Local variables are currently sized, but in +the future will be allowed to be unsized. + + +#### Unsized runtime types + +The set of unsized runtime types is the least set of types such that + +* *unsized primitive type*: `int` and `real` are unsized types, +* *unsized vector type*: `vector` and `row_vector` are unsized types, +* *unsized matrix typen*: , `matrix` is an unsized type, +* *unsized array type*: `T[]` is an unsized type if `T` is an unsized type, and +* *unsized function type*: `T0(T1,...,TN)` is an unsized type if `T0`, `T1`, ..., `TN` +are unsized types. + + + +#### Covariance and contravariance + +If we wanted to get fancy, we could use the polarity of type +occurrences in function types to extend assignability. For the base +case, we have `int` as a subtype of `real` in the sense that we can +assign an `int` expression to a `real` variable but not vice-versa. +Covariant typing would extend this as follows. + +* `int` is a subtype `real`, +* `T[]` is a subtype of `U[]` if `T` is a subtype of `U`, and +* `T0(T1, ..., TN)` is a subtype of `U0(U1, ..., UN)` if + `T0` is a subtype of `U0`, `U1` is a subtype of `T1`, ..., `UN` is a + subtype of `TN`. + +The type `T` in the array type `T[]` is covariant in that it preserves +subtyping. The type `T` in `T(U)` is similarly covariant, but `U` is +contravariant, in that it reverses the subtyping relationship. As a +concrete example, consider legal types for `a` (the lvalue) and `b` +(the rvalue) in `a = b`, + +| lvalue type | legal rvalue types | +|:------------:|:--------------------------------------:| +| `int(int)` | `int(int)`, `int(real)` | +| `real(real)` | `real(real)`, `int(real)` | +| `int(real)` | `int(real)` | +| `real(int)` | `real(int)`, `real(real)`, `int(real)` | + + +#### Sized types + +The set of sized runtime types is the least set of types such that + +* *sized primitive type*: `int` and `real` are sized types, +* *sized vector type*: `vector[K]` and `row_vector[K]` are sized +types if `K` is a non-negative integer, +* *sized matrix type*: `matrix[M, N]` is a sized type if `M` and `N` are + non-negative integers, +* *sized array type*: `T[]` is a sized type if `T` is a sized type, and +* *sized function type*: `T0(T1,...,TN)` is a sized type if `T0`, `T1`, ..., `TN` +are sized types. + + +## Lambda Expression Syntax + +Syntactically, a lambda expression is represented as a function argument +list followed by a function body. For example, in `(real u) { return +1 / (1 + exp(-u)); }`, the function argument list is `(real u)` and +the function body is `{ return 1 / (1 + exp(-u)); }`. + +As a shorthand, a function argument list followed by an expression is +taken to be shorthand for the function argument list followed by a +function body returning that expression. For example, the lambda +expression + +``` +(real u) { return 1 / (1 + exp(-u)); } +``` + +may be replaced with the equivalent + +``` +(real u) 1 / 1 + exp(-u) +``` + +Lambda expressions behave like other expressions such as `2 + 2`, only +they denote functions rather than values and have function types. + + +## Changes to Standard Functions + +Allow functions to be defined in any scope. Allow function +definitions and lambda expressions to +capture variables in scope by references. + +Deprecate the functions block; functions declared there can be +declared in the transformed data block. + +Standard function definitions of the form + +``` +T0 f(T1 x1, ..., TN xN) { ... } +``` + +may be replaced with + +``` +T0(T1, ..., TN) f = (T1 x1, ..., TN xN) { ... } +``` + +## Variable declarations + +Variable declarations for functions follows the usual syntax, with a +sized type required for block variables, an unsized type for function +arguments, and either a sized (as of 2.20) or unsized (future) type +for a local variable declaration. + + +## Assignment + +Nothing changes conceptually about assignment. The right-hand side +type must still be assignable to the left-hand side type, which for +functions, means they have the same function type. The only question +is whether to require strict matching of types or support full +covariance and contravariance. + +## Implementation + +Lambdas can be mapped directly to C++ lambdas with default reference +closures. This will work because the scope of the compiled C++ is the +same as that of the Stan program. + + +# Drawbacks +[drawbacks]: #drawbacks + +Like all new features, more coding, testing, doc, and ongoing maintenance. + +Higher-order programming is hard. If we use too much of it, we risk +alienating users who are looking for something simpler. + +There is no I/O for types with functions in them, as there is no way +to save a function. This will complicate various test programs and +I/O programs involving typed variables. + + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +This is really two proposals, one for closures (for capturing +variables in scope) and one for lambdas (for anonymous functions). +Although they naturally go together, it would be possible to adopt +closures without lambdas or vice-versa. + +Disallowing lambdas would complicate standard functional idioms like +maps, which rely on simple inline anonymous lambdas. + +Disallowing closures requires passing all values to higher order +functions as arrays along with the packing and unpacking required. + +The main unresolved design issues are whether to capture everything by +refernece or by value, and whether to support proper covarianct and +contravariant typing for containers and functions. + +# Prior art +[prior-art]: #prior-art + +Most major languages support both lambdas and closures. I'll survey +the ones that are most relevant to our project in that they'll be the +most familiar to our users. + +#### C++ + +The proposal here follows the +[C++11 style of lambdas and closures](https://en.cppreference.com/w/cpp/language/lambda) +both syntactically and semantically. The sublanguage for specifying +unsized types directly mirrors the type syntax of C++11. + +C++ allows an explicit specification of whether variables are captures +by reference or by value; the proposal here is equivalent to having +the captures at the front of the lambda expression be `[&]`, which +specifies that all automatic variables used in the body of the lambda +be captured by reference. Automatic variables are those without +explicit capture declarations; all variables in this proposal for Stan +will behave as automatic variables. + +#### R + +In R, the expression `function(u) { return(1 / (1 + exp(-u))) }` +defines a function that denotes the inverse logit function. It can be +assigned to a variable, e.g., + +``` +inv_logit <- function(u) { return(1 / (1 + exp(-u))); } +theta <- inv_logit(0.2) +``` + +Unlike C++ or Stan, R requires the outer parentheses for returns. R +provides a convenient shortcut that the body of a function returns its +last evaluated expression if there is no return statement, so the +inverse logit function can be rewritten as + +``` +inv_logit <- function(u) { 1 / (1 + exp(-u)) } +``` + +We can go one step further and drop the braces, as they are only there +to allow a sequence of statements to be grouped, + +``` +inv_logit <- function(u) 1 / (1 + exp(-u)) +``` + +This is the abbreviated syntax we recommend here for Stan lambdas, +where the equivalent would be + +``` +real(real) ilogit = (real u) { return 1 / (1 + exp(-u)); }; +``` + +or in abbreviated form, + +``` +real(real) ilogit = (real u) 1 / (1 + exp(-u)); +``` + +R uses the unusual mechanism of dynamic lexical scoping, meaning that +a lambda will capture whichever variable exists in its environment +when executed. This can lead to non-deterministic behavior of +variable scopes at runtime. This proposal for Stan is to use the more +traditional approach of static lexical scoping, which is what is used +in C++. + +#### Python + +Python allows lambdas such as `lambda u : 1 / (1 + exp(-u))`. +Multiple argument functions my be `lambda x, y : (x**2 + y**2)**0.5`. +Python captures variables by reference in lambdas, as the following +example demonstrates. + +``` +>>> x = 3 +>>> f = lambda y : (x**2 + y**2)**0.5 +>>> f(4) +5.0 +>>> x = 10 +>>> f(4) +10.770329614269007 +``` + +This is the same capture-by reference that is used by R and this +proposal for Stan, as well as the behavior for C++ if the captures +specification takes all implicit captures by reference (`[&]`). + + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +1. Will Stan permit potentially risky capture of local variables and + function arguments? If it is allowed (as it presumably will be, + as it's very very limiting not to) is there a way to flag cases + that might be potentially dangerous and provide warnings? + +2. What should the syntax look like? This proposal just follows the + C++ syntax for both types and closures. + +3. Should variables be captured by value instead of reference? That + avoids them changing later and avoids dangling references, but it + also requires copies and is counterintuitive for those used to + R or Python. + +4. Should full covariance and contravariance for function and array + types be required for assignment (including function calls)?