Specific optimizations

Many useful optimizations are "specific" in the sense that they are tied to particular standard operators. The original plan for these has been to inline them aggressively and then cull branches based on type inference etc. Unfortunately experience has shown that the resultant code size increase is nontrivial and degrades both compile and run time performance. Also, some optimizations do not make sense in this framework, e.g. if there are infinitely many possible "branches"; this is the case with constant folding related optimizations for example.

Cleavir should probably have a way of enabling clients to define and use these kinds of optimizations, while of course not requiring anything. I don't know what way that is right now. In SBCL, the general operator is deftransform; this rewrites a function call by returning a lambda expression, which is compiled into a function, and then that function replaces the call's original callee. This would violate HIR's environment-independence and result in AST compilation while HIR has already been produced, so I would like to avoid it if at all possible, but I'm not sure what could replace it in general short of writing code to produce HIR manually, which is unpleasant to do. In some (most?) cases the function is essentially static, though, in which case it could just be defined in the environment (examples below).

Types and Classes

coerce

If given a constant type, which is almost certainly the usual case, can avoid doing any kind of parsing or dispatch of the type at runtime. (This mostly precludes it being sensitive to type redefinitions, but most types used with coerce are standard and immutable anyway.)

For function, float etc., complex etc., and character, the coercion is "static" in the above sense, e.g. you could rewrite (coerce x 'character) as (character x) rather than custom generating code.

This is also true of sequence types, except that "An error of type type-error should be signaled if result-type specifies the number of elements and object is of a different length." which rather complicates things. This could possibly be dealt with by rewriting into a call that takes the type's sequence length as an argument rather than generating custom code.

Like most functions taking a type specifier as an argument, this kind of optimization can mostly be done by a compiler macro, except that that makes it difficult to use constant propagation rather than just looking for literal constants. It might be permissible to rely on only dealing with literals, but it doesn't sit well with me, as macro code can sometimes involve (let ((#:gensym '...)) (coerce ... #:gensym)) looking things, and I don't like the idea of programmers having to keep a quirk of the compiler in mind.

Another general problem with dealing with type specifier arguments it that they're necessarily dependent on the global environment, and we'd like to avoid having to deal with the global environment in HIR. It might be worthwhile to support a model where clients rewrite functions with type specifier arguments to call a parse-type or whatever so that that kind of processing can be localized to one function.

subtypep

It is possible that clients will have an internal representation of types to facilitate analysis of them, so that (subtypep x y) is along the lines of (%subtypep (parse-type x) (parse-type y)). In that case it could be nice to constant-fold the parse-type operation when one of the arguments to subtypep is constant, which I think is reasonably common. There are concerns with type redefinition implicated.

typep

If the type is a constant, which is pretty usual, it could be parsed at compile time, as with subtypep, and the operation to test the type computed at compile time.

typep is obviously closely related to type inference in two senses: First, type inference can be made aware of branches on typep and use that information accordingly; second, inferred type information can simplify type tests. For example, if x is known to be a fixnum, (typep x '(integer M N)) can just be a range test without actually looking at the type. This latter optimization might still be doable with upfront inlining. SBCL seems to delay typep code generation until very late in the compilation process in order to make use of type information, but again, I would like to avoid compiling any more Lisp code after the AST has been produced.

Data and Control Flow

funcall

A client will likely rewrite (funcall x ...) into (cleavir-primop:funcall (resolve-function-designator x) ...) via simple means like compiler macros, so optimizations of funcall kind of come down to optimizations of whatever resolve-function-designator operator a client has. Possibly this will be defined something along the lines of (defun resolve-function-designator (fdesignator) (etypecase fdesignator (function fdesignator) (symbol (fdefinition fdesignator)))). It's possible for type inference or to be relevant, or just constant propagation e.g. for (funcall #'(setf ...) ...) which is quite common. Upfront inlining could work here.

May be possible to use a specialized entry point since the number of arguments is known.

eq

If the two arguments can have their types inferred and are disjoint, the test can be eliminated. This is a case where inlining is probably not the way to go, as even with standard, non-dependent types there is obviously a combinatorial explosion of possibilities.

Cleavir might be able to propagate information about equality constraints in flow analysis, which could also eliminate calls.

A client may wish to issue a style warning for undefined behavior if one or both arguments to a human-written call to eq is a number or character.

eql

Similar to eq, plus:

A call to eql where one or both arguments is known to be of a type for which eq and eql have the same results could be rewritten as a call to eq. The type of "eq-comparable" arguments is at least (not (or character number)), but might include some numbers or characters in different clients. (For example, in Clasp, fixnums, single floats, and characters are also "eq-comparable".)

A call to eql where both characters are characters could be rewritten as char=, and so on and so forth with other types.

Number boxing might also be relevant here, e.g. a call where both arguments are double floats in a client where doubles are boxed could be rewritten to unbox and do a direct comparison, possibly allowing some box operations to be elided (of course using equality comparisons with floats aren't the most hygienic things to begin with).

equal

Similar to eql, plus:

SBCL rewrites (equal x "") as (and (stringp x) (zerop (length x))) (plus consideration for constant propagation and multiple evaluation); this might be useful.

More type-specialized predicates are possible, e.g. bit vector or pathname equality.

equalp

Similar to equal, plus (equalp x #()) can be rewritten, and there are more specialized tests possible (e.g. string-equal, or using = for numbers).

constantly

Can be inlined at source level. Probably doesn't require Cleavir support beyond general optimizations relating to avoiding consing of closures.

every, some, notevery, notany

See sequence functions, below.

and, or

if-if elimination would be good.

typecase, etypecase, ctypecase

Similar to typep. Some processing may be done by the client (e.g. to recognize disjoint types) and it would be good to allow them to use the same processing for typep maybe.

multiple-value-bind

Doesn't need to cons or involve a function call. This can be done currently using cleavir-primop:multiple-value-setq.

multiple-value-bind with one variable can be rewritten as let. This comes up a lot in macroexpansions of setf.

multiple-value-call

Similar to funcall, plus: may be rewritable as a more conventional call if the value counts from the argument forms can be inferred.

multiple-value-prog1

In cases where the result of multiple-value-prog1 only needs a fixed number of values, there is no need to cons a variable amount of space.

values

Can be rewritten in terms of cleavir-primop:values to avoid a function call or consing a &rest list.

values-list

SBCL rewrites (values-list (list ...)) as (values ...). I'm not sure how often that would be useful. Presumably it can arise in computer-written code?

nth-value

When n is constant this can use multiple-value-bind instead of multiple-value-call or the like.

Numbers

Type inference is obviously helpful in arithmetic, as is box/unbox elimination. I'm not going to explain every possible use of those.

Also, speaking for myself, I sometimes get a bit lost thinking of all the possible arithmetic transformations. You can get to "sufficiently smart"/uncomputable territory pretty quickly. So this is a bit short.

Upfront inlining can lead to really poor performance in arithmetic due to the combinatorial explosion effect, plus the simple ubiquity of arithmetic operations.

Contagion, constants, and associativity

If, for example, all types of the arguments of an arithmetic operation can be inferred tightly enough to know that the contagion rules mean the result will be a double float, conversions could be inserted and then unboxed double float arithmetic used. Alternately, operands of similar type could be grouped, e.g. rational computations could be done first, and then the result made into a float and combined for the other operations.

An n-ary operation with multiple constants in it could have the arithmetic on those constants carried out at compile time.

There's a question of whether it would be better to rewrite multiple-argument calls to n-ary functions into associative pairs at source level, or to wait until this kind of analysis can be done. SBCL seems to do the former except that it looks for constants early. The latter might be easier to support but I'm not sure.

The decision of what to do, if anything, should of course be the client's.

Rewriting IR for this might not be too bad.

Modular arithmetic

Complicated integer arithmetic operations may be compile-able more simply if only low bits are needed, as in e.g. (mod ... power-of-two) or (logand ... power-of-two-minus-one). Specifically there may be no need to cons bignums, or to check overflows.

SBCL's manual describes it's optimization here as follows: «If the compiler sees an expression of form (logand exp mask), where exp is a tree of such “good” functions and mask is known to be of type (unsigned-byte w), where w is a “good” width, all intermediate results will be cut to w bits». This doesn't appear to be totally accurate, though: it's more like it looks at the result type of the logand (using its general type inference facilities), and if it's a bounded integer type, rewrites the arguments to logand. It may be interesting to see if Cleavir can implement something like this in a less specific way, for example by figuring out which bits of the result of an arithmetic expression are actually used.

It could be also possible to do some more exotic things like notice and optimize modular exponentiation.

*, /

Multiplication and sometimes division by constants can be profitably rewritten as shifts and adds. This may require introducing new constants in IR, although probably just immediate fixnums.

expt

Beyond the basic type stuff, constants can be used to rewrite calls, e.g. (expt x 3) as (* x x x) or (expt x 4) as (let ((y (* x x))) (* y y)). IR rewriting could get difficult.

Conses

Several of the cons functions have similar keyword arguments to sequence functions so there are similar opportunities there.

Upfront inlining is probably fine for a lot of functions in this chapter, since all that needs to be tested is null versus cons versus other most of the time. But it might sometimes be useful to track list lengths during type inference, and use that information to e.g. decide whether to inline and unroll the loop of an assoc call or something.

Arrays

Upfront inlining runs into problems here because a client may have a dozen or more specialized array types.

make-array

Calls to make-array can be rewritten into more specialized client-specific calls not using keyword arguments. This can probably mostly be done with compiler macros but there are some constant propagation issues, and importantly, type inference for the first argument could be very important (e.g. if it's known to be an integer a call can be rewritten into an internal make-vector or the like). Also, can be good to be able to process the :element-type specifier, if provided, at compile time.

adjust-array

Clients may want to optimize the case where the old and new array dimensions are an integer, and stuff like that. Depending on how arrays are implemented it's possible shrinking an array can be done very quickly by simply reducing the length value stored in the array, and it could be possible, if somewhat exotic, to note this case with interval arithmetic during type inference.

aref

Using type inference gets a bit complicated here, since the inferred type of the array, in concert with the number and nature of the indices, determines what can be done. For example, if the array's dimensions are known at compile time, code to compute the row major index quickly can be generated, like (aref (array * (x y)) a b) = (row-major-aref (array * (x y)) (+ b (* a x))). When you include insertion of bounds checking code this could end up as a pretty complex rewrite, and this is of course on top of the possibility of using an inferred array element type.

Strings

string= etc

They can be rewritten to not use keyword arguments, and in somewhat fancier ways if one argument is a constant (e.g. check length first, possibly using type inference information).

string/=

SBCL rewrites (if (string/= ...) ...) so that the mismatch index doesn't need to be computed. This could be nice in concert with constants, where the check might not be the obvious loop.

Sequences

A lot of sequence functions take stereotypical keyword arguments parsed in nontrivial ways, which could be dealt with at compile time.

make-sequence

Takes a type specifier argument, so there are some optimizations similar to those on coerce. Actually, make-sequence with :initial-contents may be somewhat rewritable as coerce, or vice versa.

map

Ditto on the type specifier argument.

If the sequence lengths can be inferred ahead of time, runtime checks on them could be skipped. And of course their types could be used to determine how to iterate. It would be very interesting if Cleavir could facilitate this kind of transformation in a way that made extensible sequences more efficient.

Hash tables

It might be possible to avoid hash recomputation for serial calls with the same arguments. For example there's the reasonably common idiom (or (gethash object table) (setf (gethash object table) ...)). This might be better done at runtime though, since there are some real tricky possibilities, like the hash depending on the size of the table, which could change in the interim.