interfaces.tex

\documentclass{mtmtcl}

\newcommand{\cs}[1]{\texttt{\PrintChar{92}#1}}
% \newcommand{\tsplode}{\texttt{%
%   \PrintChar{123}\kern-0.1em*\kern-0.1em\PrintChar{125}%
% }}

\theoremstyle{plain}
\newtheorem{policy}{Policy}

\theoremstyle{remark}
\newtheorem*{example}{Example}

% 
% \usepackage{amsmath,amsfonts,amssymb}
% \usepackage{amsthm}
% 
% \usepackage{longtable}
% 
% \theoremstyle{definition}
% \newtheorem*{definition}{Definition}
% \theoremstyle{remark}
% \newtheorem*{remark}{Remark}
% 
% 
% \makeatletter
% \NewDescribeCommand{\defining}{%
%    \XD@grab@oarg\XD@grab@sarg{*}\XD@grab@oarg\XD@grab@marg
% }{4}{%
%    \IndexEntry{%
%       \ifx \NoValue#1%
%          \LevelSame{\ifx\NoValue#3#4\else#3\fi}%
%       \else
%          \LevelSorted{#1}{\ifx\NoValue#3#4\else#3\fi}%
%       \fi
%    }{main}{\thepage}%
%    \textbf{#4}%
%    \@gobble % Eats \ignorespaces
% }
% \makeatother
% 
% \PageIndex
% \CodelineNumbered
% \setcounter{IndexColumns}{2}
% 
% \newenvironment{procmethod}{%
%    \tclsubcommand{method}{submethod}%
% }{\endtclsubcommand}
% 
% 
% % \newcommand{\Z}{\ensuremath{\mathbb{Z}}}
% \newcommand{\Z}{\texttt{Z}}
% 
% \newcommand*{\mw}[1]{\word{$#1$}}
% 
% \newcommand{\mtl}{\texttt{mathematcl}}
% \newcommand{\Tcl}{\Tcllogo}
% \newcommand{\TclObj}{\Tcllogo\_Obj}
% \newcommand{\TclObjs}{\TclObj s}
% 
% \providecommand{\Ldash}{---}
% \providecommand{\Rdash}{---}
% \providecommand{\Dash}{---}

\begin{document}

\title{\mtl~interfaces}
\author{Lars Hellstr\"om}
\date{2006-06-21--}
\maketitle


\begin{abstract}
  A key part in the \mtl\ system is the use of \emph{interfaces} 
  for structures as a way of making them work together. This 
  document is the primary collection of specifications of these 
  interfaces.
\end{abstract}

\tableofcontents


\section{Introduction}

The \mtl\ system for algebraic computations is designed to meet the 
following goals:
\begin{itemize}
  \item
    It should be reasonably easy to program. 
    \iffalse
    and in particular not 
    constantly burden the mathematician's mind with issues of how 
    data is being stored.
    \fi
  \item
    It should be open to new concepts and new implementations\Dash 
    built-in operations should not receive preferential treatment or 
    display behaviour that is otherwise unattainable.
  \item
    It should be able to manage the many levels of abstraction that 
    occur in algebraic constructions.
  \item
    The system should be portable.
%   \item
%     \dots
\end{itemize}
To meet these goals, the following design decisions have been made:

\begin{policy} \label{Pol:Strukturer}
  Focus is on mathematical \emph{structures} and the \emph{interfaces} 
  these support. Generic algorithms should be expressed not in terms 
  of concrete implementations, but in terms of abstract structures 
  provided as parameters. Structures expose their capabilities and 
  useful properties by declaring support for particular interfaces.
\end{policy}

Basic examples of interfaces are `ring', `group', `lattice', and so 
on. Examples of structures are $\mathbb{Z}$~(ring, additive group, 
lattice under $\max$ and $\min$), $\mathbb{C}$~(field, ring, additive 
group), $\mathbb{N}$~(semiring, additive monoid, multiplicative 
monoid, divisor lattice, max/min lattice), $\mathbb{Z}[x]$~(ring etc.), 
$\mathbb{Z}[x]\big/ \langle x^2 - 2\rangle$~(ditto), $\mathbb{Z} + 
i\mathbb{Z}$~(Gaussian integers: Euclidean domain, ring etc.), 
$\mathrm{M}_n(\mathbb{R})$~(real $n \times n$ matrices: ring etc.), 
$\mathrm{GL}_n(\mathbb{R})$~(invertible real $n \times n$ matrices: a 
group, subset of $\mathrm{M}_n(\mathbb{R})$, etc.), and so on. 
What should be apparent from these examples is 
that many structures naturally support more than one interface; it is 
seldom useful to ask for a name describing all that a structure is, 
but generally quite sufficient to pick an interface and ask whether 
the structure supports it. The case of the natural numbers should 
also make it clear that there are sometimes more than one way in which 
a set can often be turned into a structure supporting a particular 
interface; the ways in which one can reinterpret the basic 
mathematical structures are surprisingly numerous.

What this means in practice is that in order to multiply two 
elements $a$ and $b$ of some algebra $\mathcal{C}$ one does not 
write `$a$ times $b$', but rather `$\mathcal{C}$-multiply $a$ and 
$b$'; cf.~the notation \(a \cdot_{\mathcal{C}} b\) that is 
sometimes used when one needs to clarify that this is the 
multiplication operation of~$\mathcal{C}$. Provided that (references 
to) structures such as this `$\mathcal{C}$' can be passed around as 
easily as elements of structures\Ldash which is essentially the 
`provided as parameters' in the policy statment, and explained in 
more detail below\Rdash this is a cheap and effective way of 
explaining to the computer what operation is desired. Equally 
important is the fact that it scales well: that it can handle 
hundreds of structures as easily as one. Both are necessary to meet 
the goal of handling multiple levels of abstraction, since a level 
$n$ structure is usually constructed from one or several level $n-1$ 
structures, which in turn are constructed from level $n-2$ 
structures, and so on; the implementation of a basic operation in 
a high level structure can easily exercise dozens of operations in 
lower level structures.

\begin{example}
  The field of rational numbers $\mathbb{Q}$ is typically implemented 
  as the field of fractions of the ring of integers $\mathbb{Z}$. 
  This means every rational number is a pair of integers. Even for 
  such an ``easy'' operation as addition in $\mathbb{Q}$, it is 
  necessary to compute a common multiple of the denominators of the 
  terms and extend all the fractions accordingly, which requires at 
  least multiplication in $\mathbb{Z}$. One furthermore typically 
  wants to use as small numbers as possible in these calculations, 
  and then it is also necessary to compute quotients and GCDs in 
  $\mathbb{Z}$.
  
  Consider next the ring of $n \times n$ matrices over $\mathbb{Q}$. 
  One addition in this ring is $n^2$ separate additions in $\mathbb{Q}$. 
  Multiplication in the matrix ring requires both the multiplication 
  and addition operations in $\mathbb{Q}$. Going further, one can 
  consider the subgroup $\mathrm{SL}_n(\mathbb{Q})$, and even the 
  corresponding group algebra $\mathbb{Q}\bigl[ 
  \mathrm{SL}_n(\mathbb{Q}) \bigr]$. There are now four different 
  multiplication operations involved in this definition, so the 
  computer would have a lot to choose from if we didn't tell it which 
  one of these we mean in each case.
\end{example}

In ordinary mathematical writing the structure name 
is usually omitted\Ldash e.g.~when each particular argument only 
involves one multiplication operation anyway\Rdash but in higher 
algebra the normal state of affairs is rather than there are 
several multiplication operations that have to be distinguished. 
A system for algebraic computation must deal with e.g.~the fact 
that the multiplication in a semigroup algebra is defined in 
terms of the multiplications of the underlying semigroup and 
coefficient ring.

\begin{policy}
  There are no ``blessed'' structures, only standard implementations 
  provided for convenience, and anyone is free to provide 
  alternatives. Constructions should not rely on details of a 
  structure that are not publicly declared through an interface.
\end{policy}

This is part of the \emph{openness} goal. Policy~\ref{Pol:Strukturer} 
relates more to the goal of managing many levels of abstraction, 
since the very act of relying on declared interfaces implies that 
all lower levels are ignored. Portability and ease of programming is 
the subject of the next policy.


\begin{policy} \label{Policy:Tcl}
  The main computing environment shall be a \Tcl~(Tool Command 
  Language) interpreter.
\end{policy}

Formally, the \emph{main computing environment} is merely that in 
which independent pieces of code is combined. It \emph{may} be 
thought of strictly as a generic dispatch mechanism (that just happens 
to contain a complete programming language too) for interfacing with 
other people's code (possibly written in some quite different language 
than what you're using), but you may also find that it serves your 
actual computing needs quite well.

\Tcl\ is both syntactically and semantically an extremely simple 
language\Ldash yet expressive and highly flexible\Rdash which makes 
it easy to use in this context. Unlike the majority of contemporary 
programming languages, the syntax of \Tcl\ is not descendant from that 
of traditional mathematical formulae (hence it will probably seem 
unfamiliar the first time it is encountered), but when it comes to 
programming this is mostly an improvement; the syntax of mathematical 
formulae is actually very complicated and it is often a matter of 
convention rather than grammar how a nontrivial formula like $\sin 
2x$ or $\sin 2\ln x$ should be understood. In addition to this 
simplicity, the \Tcl\ syntax also provides an elegant solution to 
the problem of distinguishing between several similarly-named 
operations that can be a major headache when describing complex 
algebraic constructions.

Some may find it a concern that \Tcl\ is not a heavily optimised or 
optimising language, in the sense that a procedure call in \Tcl\ is 
not nearly as fast as a corresponding function call in C, but that 
is less important than one might think. The main reason is the above 
observation that one level~$n$ operation usually boils down to 
several level~$n-1$ operations, each of which in turn boil down to 
several level~$n-2$ operations, and so on: the number of level~$n$ 
operation steps are \emph{massively} outnumbered by the level~$0$ 
operation steps. 
This means dispatch (or even execution) efficiency at level $n$ has a 
rather small impact on overall performance; what matters is 
efficiency at level $0$ and $1$. And the nice thing about \Tcl\ here 
is that it can just as well use implementations in some other (more 
optimised) language for those low level operations. The idea is not 
to use the same \Tcl\ source code for implementing the level $0$ and 
$1$ structures while somehow swapping in other implementations of 
primitives, but to replace the entire implementation of selected 
time-critical level $0$ and $1$ structures by an optimised one in 
e.g.~C, and reap the performance benefits of this.


%     
% \begin{itemize}
%     
%   \item
%     
%     
%     
%     
%     ---
%     
%     
%     
%   \item
%     The system should \emph{not} employ a type system to interpret 
%     operations. Instead, it is required that every operation is fully 
%     identified independently of the data it is to operate on.
%     
%     The main use for type systems in computer languages has been to 
%     reserve memory for data storage, but modern high level languages 
%     generally allocate memory dynamically when needed, freeing the 
%     programmer from having to manage allocation explicitly. 
%     A secondary use for a type system that has grown more common in 
%     the last decades is to resolve overloaded (polymorphic) 
%     operations, where the same symbol is used for several operations 
%     and the language environment picks one whose type-labelling 
%     matches that of the operands. It is primarily the latter practice 
%     that this decision rejects.
%     
%     The runtime rationale for this decision is that type-controlled 
%     polymorphism inserts an impractical detour into the very heart of 
%     the computing environment. The programmer implementing an 
%     algorithm always knows \emph{exactly} (up to parameters defining 
%     the context for the computation) which operation is intended at 
%     every point in the program, and this is also what the computer 
%     needs to know to execute the program. 
%     Relying upon type-controlled polymorphism when instructing the 
%     computer will however mean that a lot of this information is 
%     omitted or provided only indirectly, which requires constant 
%     detours into type-matching to resolve the ambiguity created by 
%     the overloading.
%     
%     The coding time rationale is that the system becomes much simpler 
%     if the administrative overhead of assigning formalised types to 
%     everything can be dropped. There is however also a coding time 
%     cost in that polymorphism is often exploited to produce more 
%     compact source code; the explicit identification of operations 
%     must not become tedious.
% \end{itemize}


\section{Remarks on the computational environment}

This section collects some introductory notes on \mtl\ from a 
programming perspective. Readers who (for the moment) are more 
interested in the mathematical perspective may prefer to skip ahead. 
Readers who are interested in the philosophical considerations 
underlying \mtl\ should however pay attention.


\subsection{A \Tcl\ primer}

At its conceptual core, a \Tcl\ interpreter is a dispatch engine: it 
processes a sequence of calls, each of which has the form of a list of 
values, by interpreting the first value of each call as the name of a 
command and then handing the call over to (the function implementing) 
that command, getting back a result that may become one of the values 
in a later call. Commands may be implemented natively in \Tcl\ 
(procedures, ensembles, aliases, etc.), in C (as is the case with many 
core commands, for speed or because they need to interface more 
closely with the OS), or in a variety of other languages such as C++ 
and Fortran (if you can link it with a reasonably modern C, then you can 
use it with Tcl). A classical development paradigm is that one 
tries to implement commands in \Tcl\ first (which typically allows for 
rapid prototyping, particularly of interfaces) and maybe later 
reimplements critical parts in a lower level language if the \Tcl\ 
implementation turned out to not be performant enough. (Another 
time-tested approach, which is appropriate when the wanted operations 
are already available in some code library, is to provide a thin 
wrapper for that library which makes its functionality available as 
\Tcl\ commands.)

The \Tcl\ \emph{language} syntax describes how a \Tcl\ program piece 
(or \emph{script}) is translated to a sequence of calls. Technically, a 
script consists of a sequence of \emph{sentences}, which are 
separated by newlines (or semicolons), and each sentence consists of 
a sequence of \emph{words}, which are separated by ordinary 
whitespace. The basic idea is that each sentence codes one command 
call, and each word of that sentence contributes one element to the 
list of values making up that call, but there is (as explained below) 
some room for bending those rules.

Words in \Tcl\ programs tend to fall in five functional categories:
\begin{description}
  \item[Variable references]
    A word like `|$foo|', where the first character a dollar sign 
    `\texttt{\$}' and the rest is the name of a variable, is a 
    reference to that variable. Its contribution to the command call 
    is the current value of that variable.
    
  \item[Barewords]
    A word simply denotes itself if it does not contain any of the 
    characters |{}[]$\"#;| (left and right brace, left and right 
    bracket, dollar, backslash, quotation mark, hash sign, and 
    semicolon) 
%     `|{|' (left brace), `|}|' (right brace), `|[|' (left 
%     bracket), `|]|' (right bracket), `|$|' (dollar), `|\|' 
%     (backslash), `|"|' (quotation mark), `|#|' (hash sign), 
%     `|;|'~(semicolon), 
    or whitespace; such \emph{barewords} typically serve as 
    identifiers\slash symbols or explicit constants. Note that the 
    characters allowed in barewords include `|+|', `|-|', 
    `|*|', `|/|', `|=|', `|<|', `|>|', `|!|', and many more, in 
    particular all the non-ASCII characters in Unicode! This often 
    makes it possible to use the ``proper'' mathematical symbol when 
    denoting something, whereas in more traditional languages the 
    wanted character would be unsupported or appropriated for 
    some built-in functionality of the language.
    
  \item[Recursive calls]
    A word like `|[bar baz $foo]|'\Ldash where the first character is a 
    left bracket `\texttt{[}', the last character is the matching 
    right bracket `\texttt{]}', and what is between them is a 
    complete sentence of its own\Rdash is a recursive command call. 
    This first causes the call coded by the inner sentence to be 
    performed, and then the value it returns will be used as the 
    contribution from this word to the call of the outer sentence.
    
    Hence, whereas a function call in traditional mathematical 
    formulae looks like `$f(x,y,z)$'\Ldash the main pattern being 
    function--left parenthesis--arguments--right parenthesis\Rdash 
    in \Tcl\ it rather becomes `\texttt{[$f$ $x$ $y$ $z$]}', i.e., 
    left bracket--function--arguments--right bracket.
    
  \item[Strings]
    A word like `|"My hovercraft is full of eels."|'\Ldash where the 
    first and last characters are (straight ASCII) quotation marks\Rdash 
    typically denotes a string; the effect of the initial quote is that 
    the word does not end until another quote is encountered, so 
    e.g.~spaces and semicolons temporarily lose their roles as 
    separators. `|$|', `|[|', and `|\|' are still special however, 
    triggering variable, command, and backslash substitution 
    respectively, in which the substitution character sequence is 
    replaced by some other piece of text in the string being 
    constructed. The first two behave as explained above (variable 
    references and recursive calls) whereas backslash substitution is 
    like character escapes in strings in~C: literal quotes, left 
    brackets, etc.~in the string can be written as `|\"|', `|\[|', 
    `|\$|', and `|\\|'.
    
  \item[Inlined verbatim material]
    A word that begins with a left brace `|{|' and ends with the 
    matching right brace `|}|' can be used to inline material that is 
    passed on verbatim in the call; the value for this word is the 
    exact string of characters between (but not including) the 
    delimiting braces. The target command is then free to apply any 
    interpretation it sees fit to the word in question. This 
    mechanism can be used to embed code following syntactic 
    principles other than those of \Tcl\Ldash such as ordinary infix 
    form mathematical expressions, regular expressions, or SQL 
    statements\Rdash into \Tcl\ scripts. It is part of the \Tcl\ 
    philosophy that one should not force everything into one 
    syntactic mold, but instead let each thing be expressed in the 
    manner which suits it best.
    
    Brace-delimited words are also used to embed \Tcl\ scripts into 
    \Tcl\ scripts; most \Tcl\ commands implementing basic control 
    structures, such as |if|, |for|, |while|, and |proc|, expect one or 
    several arguments to be \Tcl\ scripts that they recursively call 
    the \Tcl\ interpreter to have evaluated when the functionality 
    they implement requires this! That way, the overall language 
    syntax need not have special cases for control structures built 
    in, and it is possible for user-defined commands to implement 
    custom control structures. Visually, the effect can often be that 
    braces in \Tcl\ appear to be block statement constructors, 
    similarly to braces in C and \textbf{begin}--\textbf{end} in 
    Pascal. In
    \begin{quote}
      |set sum 0.0|\\
      |foreach row $matrix {|\\
      |   foreach cell $row {|\\
      |      set sum [expr {$sum + $cell}]|\\
      |   }|\\
      |}|
    \end{quote}
    the two outer levels of braces delimit inner scripts (bodies for 
    |foreach| loops), whereas the innermost brace level delimits an 
    artihmetic expression in infix form, this latter interpretation 
    being imposed by the |expr| command that receives the expression 
    as argument.
\end{description}
(Technically, the parsing of words is a bit more unified than the 
above may have suggested, but for the exact story we refer to the 
Tcl.n manpage.)

The slight bending of the rule that one sentence corresponds to one 
call is that brackets may embed recursive calls, as explained above. 
The slight bending of the rule that one word contributes one element 
to the call is due to a mechanism known as \emph{expansion}. If a 
word begins with the three-character sequence `|{*}|' (brace, 
asterisk, brace), then the rest of that word first gives rise to a 
value as explained above, but then that value is interpreted as a 
list and each element of that list is contributed as a separate 
element to the call being constructed. A word such as `|{*}$list|' or 
`|{*}[foo $bar]|' thus contributes zero or more values to the 
surrounding sentence. This has several applications in common \mtl\ 
idioms.

In order to parse a \Tcl\ program, one must in addition to the 
overall language syntax also be aware of the syntaxes of the 
particular commands used in the program, which by convention are 
documented in separate manpages (section \texttt{n} in the Unix man 
system). Some important commands, and some commands whose existence 
may be hard to guess, are:
\begin{longtable}{l p{0.7\linewidth}}
  \texttt{set}& Assign new value to a variable.\\
  \texttt{proc}& Define a procedure\Dash a command that executes a 
    subroutine whose body is a \Tcl\ script.\\
  \texttt{if}& Conditional controlled by boolean expression.\\
  \texttt{switch}& Multiway-choice conditional.\\
  \texttt{foreach}& Loop over the elements of a list, 
    or synchronously over several lists.\\
  \texttt{expr}& Expressions in C-like infix notation.\\
  \texttt{incr}& Increment (or decrement) integer variable.\\
  \texttt{info}& Introspection into the \Tcl\ interpreter.
\end{longtable}
Some notable functional areas and commands related to these are:
\begin{longtable}{l p{0.7\linewidth}}
  Flow control&
    \texttt{break}, \texttt{catch}, \texttt{continue}, 
    \texttt{error}, \texttt{eval}, \texttt{for}, \texttt{foreach}, 
    \texttt{if}, \texttt{return}, \texttt{switch}, \texttt{tailcall}, 
    \texttt{throw}, \texttt{try}, \texttt{uplevel}, \texttt{while}
    \\
  Variable scoping&
    \texttt{global}, \texttt{upvar}, \texttt{variable}\\
  Files and I/O&
    \texttt{chan}, \texttt{close}, \texttt{eof}, \texttt{exec}, 
    \texttt{fblocked}, \texttt{fconfigure}, \texttt{fcopy}, 
    \texttt{file}, \texttt{fileevent}, \texttt{flush}, \texttt{gets}, 
    \texttt{glob}, \texttt{open}, \texttt{pid}, \texttt{puts}, 
    \texttt{read}, \texttt{seek}, \texttt{socket}, \texttt{tell}
    \\
  Lists&
    \texttt{concat}, \texttt{join}, \texttt{lappend}, 
    \texttt{lassign}, \texttt{lindex}, \texttt{list}, 
    \texttt{llength}, \texttt{lmap}, \texttt{lrange}, 
    \texttt{lrepeat}, \texttt{lreplace}, \texttt{lreverse}, 
    \texttt{lsearch}, \texttt{lset}, \texttt{lsort}, \texttt{split}
    \\
  Dictionaries&
    \texttt{dict}, \texttt{array}
    \\
  Strings&
    \texttt{append}, \texttt{format}, \texttt{join}, \texttt{regexp}, 
    \texttt{regsub}, \texttt{scan}, \texttt{split}, \texttt{string}, 
    \texttt{subst}
    \\
  Binary data&
    \texttt{binary}, \texttt{encoding}, \texttt{string}, 
    \texttt{zlib}
\end{longtable}

Finally, it should be observed that \Tcl\ has a concept of 
\emph{namespaces} which affect resolution of command (and nonlocal 
variable) names. The namespaces form a hierarchic structure similar 
to a file system, but with the two-character sequence `|::|' as 
separator rather than forward or backward slash. The position 
of a command or variable in the namespace hierarchy generally 
reflects to what package (or subpackage) it belongs. The global (i.e., 
root) namespace is |::|. Command or variable names that begin with 
|::| are interpreted relative to the global namespace, names that do 
not are interpreted relative to the current namespace. All the core 
commands listed above reside in the global namespace, but by 
tradition their names are usually written without the leading |::| in 
scripts since \Tcl\ will try to resolve a name relative to the global 
namespace if no definition of it could be found relative to the 
current namespace. Control of and introspection into namespaces is 
provided by the |namespace| command.

% The namespaces form a tree-like hierarchy, where 
% The  is call

% Informally, a \Tcl\ program (or \emph{script}) is a sequence of 
% \emph{command sentences}\Ldash typically one per line, although there 
% is variance in both directions\Dash and a sentence consists of 
% \emph{words} which are separated by whitespace. The style of these 
% sentences can range from the cryptic:
% \begin{quote}
%   |while {$b} {set b [expr {$a % [set a $b]}]}|
% \end{quote}
% via shell-like:
% \begin{quote}
%   |fconfigure $socket -encoding utf-8 -eofchar "\x19"|
% \end{quote}
% to almost pseudocode:
% \begin{quote}
%   |pick z maximizing realpart from points|
% \end{quote}
% but as far as the language is concerned, all of these are merely 
% lists of words, the first of which is the name of a command. Beyond 
% that, it is up to that command to interpret and use the remaining 
% words as it sees fit. This allows e.g.~`|*|' to be both a binary 
% infix operation (times) in numeric expressions and a unary postfix 
% operation (zero or more) in regular expressions; neither is part of 
% the overall language syntax, so it is entirely up to the command 
% whether to make such an interpretation of the `|*|' character. 
% \Tcl\ was designed to \emph{not get in the way} of programmers 
% needing to express some domain-specific concept, and will happily 
% allow several ``little languages'' (of which numerical expressions 
% and regexps are two built-in examples) to coexist within a single 
% interpreter.
% 
% What the language syntax \emph{does} control is how a script is split 
% up into sentences, how these are split up into words, and what the 
% contents of these words will be. Within each sentence, there are 
% three interacting processes: grouping, substitution, and expansion.
% 
% \emph{Grouping} collects text into a word, even if it contains whitespace 
% characters which would otherwise act as word or sentence separators. 
% The characters which can trigger grouping are `|"|', `|{|', and 
% `|}|', and they appear as delimiters at the beginning and end of the 
% word, with the contents of the word consisting of all characters in 
% between. A word begun by a quote ends with the next quote (that has 
% not been escaped), whereas a word begun by a left brace ends at the 
% matching right brace; the contents of a brace-delimited word have to 
% be balanced with respect to (unescaped) left and right braces. In 
% practice, quotes are often used to delimit ``ordinary'' strings, 
% whereas braces are used to delimit blocks of code or data, but once a 
% word has been parsed, it will have no memory of whether it was 
% quote-, brace-, or just plain whitespace-delimited.
% 
% The difference between the two lies instead in their relation to 
% \emph{substitution}, which is inhibited in brace-delimited words but 
% carried out otherwise. The three types of substitution are:
% \begin{description}
%   \item[Variable substitution,]
%     which is triggered by the dollar sign `|$|'.
%     , which should be 
%     followed by the name of a variable. 
% \end{description}
% 
% ---
% 
% syntactically a sequence of 
% \emph{command sentences} and comments, separated by newlines. Each 
% sentence is a sequence of \emph{words}, separated by whitespace 
% (typically one or several spaces, but tabs are fine too). Words can 
% be arbitrary strings. Semantically, the first word of each sentence 
% is interpreted as the name of the command to execute, whereas the 
% other words are passed to that command as arguments to interpret 
% and process in whatever way it sees fit. Each command produces 
% a result, and the result of the last command it the result of the 
% script as a whole.
% 
% What makes this a Turing-complete programming language is that there 
% are commands for all the usual elementary operations\Dash in 
% particular there are commands for the basic control structures. 
% Unbounded repetition can for example be achieved through the |while| 
% command, which has the syntax
% \begin{displaysyntax}
%   while \word{expression} \word{script}
% \end{displaysyntax}
% When this command is executed\Ldash or \emph{evaluated}, as is the 
% official term in \Tcl\Rdash the \word{expression} and \word{script} 
% are alternatingly evaluated, with the command completing as soon as 
% the \word{expression} value is not boolean true. Other control 
% structures available as core commands include |if|~(if--then--else 
% choice), |for|~(loop with control variable), |foreach|~(loop over 
% list elements), |switch|~(multiway choice), |break|~(loop abortion), 
% and |proc|~(subroutine creation).
% For such \word{script} arguments to be interesting, it must however 
% be possible to embed newlines and spaces in it (since otherwise the 
% \word{script} above would only be a single sentence of one word). 
% To that end, there are three mechanisms which change the words of a 
% command sentence en route from script-string to sequence of words: 
% quoting, substitution, and expansion.
% 
% Quoting collects a piece of text into one word, overriding characters 
% which would otherwise force a word or sentence boundary. The most 
% important form of quoting is brace-quoting, which happens for words 
% where the first character is `|{|' (left brace), the last character 
% is `|}|', and the material between them is balanced with respect to 
% braces;\footnote{
%   Braces preceeded by a backslash don't count when balancing however, 
%   so you can get an unmatched brace if you need to. Even though the 
%   technical details are quite different, the net effect is 
%   very similar to that in \TeX\ where \cs{\{} and \cs{\}} don't 
%   count as braces that have to be balanced.
% } in this case the command argument becomes exactly the string 
% of characters between the outermost braces. This is the main 
% mechanism for nesting ``blocks'' of \Tcl\ code, since it inhibits 
% substitution in ``inner'' commands.
% 
% Substitution replaces a piece of text\Ldash often, but not always, an 
% entire word\Rdash by something else. There are three types of 
% substitution, which differ in where they get the replacement text 
% from and which character triggers them.
% \begin{description}
%   \item[Command substitution]
%     uses the return value of a script. It is triggered by a left 
%     bracket |[| and the script to evaluate is terminated by the 
%     matching right bracket |]|. This mechanism corresponds to 
%     function calls in most other languages, although the syntax is 
%     not the conventional
%     \begin{equation}
%       f(x_1,x_2,x_3,\dotsc,x_n)
%     \end{equation}
%     but rather
%     \begin{displaysyntax}
%       [f $x_1$ $x_2$ $x_3$ $\dots$ $x_n$]
%     \end{displaysyntax}
%     (Putting the ``function name'' inside the bracket with the 
%     arguments may seem odd, but we shall see that it is rather 
%     useful.) An extreme application of command substitution is the 
%     one-liner
%     \begin{quote}
%       |puts [join [lsort [split [read stdin] \n]] \n]|
%     \end{quote}
%     which works similarly to the Unix standard utility |sort|: |read|s 
%     standard in, |split|s it into a list of lines, sorts this list 
%     (|lsort|), |join|s the list back into a string, and out|puts| the 
%     result to standard out.
%     
%   \item[Backslash substitution]
%     is triggered by a backslash `|\|' and functions very much as in 
%     strings in the C~language: as a mechanism for escaping special 
%     interpretations of characters, and as a mechanism for expressing 
%     arbitrary characters using only those found in ``visible ASCII''. 
%     The |\n| above will thus be seen as a linefeed character by 
%     |join| and |split|, whereas any character with special syntactic 
%     meaning in \Tcl\ (backslash `|\|', dollar `|$|', braces `|{|' and 
%     `|}|', brackets `|[|' and `|]|', number sign `|#|', semicolon 
%     `|;|', and the various forms of whitespace\Dash \emph{that's the 
%     entire list!}) can be escaped into an ordinary character by 
%     prepending a backslash. The combination backslash--newline is a 
%     special case in that it counts as an unescaped space rather than 
%     as a newline character, but this makes it convenient to express 
%     sentences with too many words to fit on one line: 
%     
%     ---
%   
%   \item[Variable substitution]
%     uses the value of a variable. It is triggered by the |$| 
%     character, which is followed by the name of the variable to 
%     substitute.
%     
% \end{description}
% It should be observed that replacement text ``inserted'' by a 
% substitution will be passed on verbatim to the command; it will in 
% particular not be subjected to another round of substitution, even if 
% it contains backslashes, dollars, or brackets. Nor does whitespace in 
% substituted material count as word or sentence separators; if 
% something looks as one word before substitution, then it will 
% count as one word in the command sentence no matter what is 
% substituted.
% 
% 
% 
% 
% 
% ---

\subsection{The issue of a type system}

One thing that readers with computer science training tend to be 
curious about is what type system \mtl\ employs. Mathematicians, on 
the other hand, are not so inclined to worry on this and should 
therefore feel free to skip this subsection on a first reading. 
The simple truth is that \mtl\ doesn't have a type system. It doesn't 
need one anyhow, and there is evidence to suggest that adding a type 
system would mostly be harmful! Since these are claims that tend to be 
shocking for the average computer scientist, they need some 
elaboration.

To begin with, it should be observed that the standard foundation of 
mathematics\Ldash Zermelo--Fraenkel set theory and its variants\Rdash 
is untyped. The closest to a ``type'' in the programming language sense 
that you get in it is that one can recognise certain sets as being 
the domains of various functions (meaning it is okay to apply said 
function to an element of that set), but there is none of that business of 
values being associated to one specific type that is frequently presumed 
in computer science. Hence mathematics as such is untyped, or if you 
will one-typed: everything is a set, period. For a system for mathematics 
to be otherwise would therefore be a deviation from the established 
standard.

The historically versed reader may here object that it has not always been 
the case that the standard foundation was untyped. The 
set theory in the Russell--Whitehead \emph{Principia Mathematica} was 
certainly typed (it is even known as ``the type theory of sets''). 
In Greek antiquity (or at least that period during it which became 
authorative for subsequent ages), a quantity had to be specified as 
either length, area, volume, or whatever; there was no canonical way in 
which e.g.~an area and a length could have comparable ``numerical 
values'', so they were of different types.
% is indeed true. 
Type distinctions of this kind have however a strong tendency to dissolve 
over time, as the cost of maintaining them (which can be found in the 
form of frequent extra manoeuvres in arguments to work around the 
type boundaries) is rarely accompanied by any advantages whatsoever. 
In particular the type theory of sets has, to the 21th century observer, 
the distinct smell of a failed high-profile standard that collapsed under 
its own weight: too impractical and complicated to see much in the area 
of actual implementations, but nonetheless taken seriously and being 
pursued because of its association with big names and high-profile 
projects. There is no need for us to repeat that mistake today.

% Historically, that is 
% however an oversimplification: there was actually a short period when 
% the standard foundational set theory was one with types (more precisely 
% the type theory of sets in the Russell--Whitehead \emph{Principia 
% Mathematica}), but this never really caught on. For the 21th century 
% observer, this set theory with types has the distinct smell of a failed 
% high-profile standard that collapses under its own weight: too impractical 
% and complicated to see much in the area of actual implementations, but 
% nonetheless taken seriously and being pursued because of its association 
% with big names and high-profile projects. Mandatory typing was 
% abandoned when more lightweight alternatives emerged as equally 
% consistent.

% One important problem with doing serious 
% mathematics on top of the Russellian set theory is that one 
% frequently encounters situations where everything works out except 
% that the types of two things don't match, and considerable effort has 
% to be spent on adjusting the types, whereas in an untyped setting one 
% would be done already.

Another problem with types is that many programming languages use 
them as a work-around for having limited vocabularies: rather than 
having separate names for what is technically distinct (even if 
analogous) operations, multiple operations are bundled together under 
the same name in the source, and there is a presumption that the 
programmer's exact intentions can be recovered later by analysing the 
operand types. There is no doubt that this to a great extent is 
inspired by generally accepted practices for writing mathematical 
formulae\Ldash in which addition or substraction always denotes the 
operation suitable for the given operands (be they numbers, vectors, 
matrices, functions, or whatever) and multiplication can (depending 
on context) denote almost every operation imaginable\Rdash but that 
fails to take into account the formality gap between the two 
situations. A computer program is always fully formal (even if some 
languages may allow a relaxed-looking syntax) because they are meant 
to be parsed by a dumb mechanism. Mathematical formulae, on the other 
hand, are primarily meant to be read by humans and may therefore skip 
plenty of fine details as long as the intelligent reader will be able 
to fill them in. It is true that there is a substantial body of 
mathematical notation that pretty much all agree upon (and 
which thus could be formalised to everyone's content), but this unity 
quickly evaporates once one starts to venture into notational regions 
that are traversed chiefly by specific subdomains of mathematics; as 
a particular example, one may note that computer science and 
differential geometry have very different ideas about such basic 
concepts as function and argument! The matter is further complicated 
by the fact that many mathematical formulae admit multiple 
fundamentally distinct interpretations, often coming from quite 
different theories each offering its own understanding of the topic, 
and the equal validity of different interpretations can be deep 
theorems (nonstandard analysis, anyone?). In formal mathematical 
logic, where mathematical formulae \emph{are} required to be fully 
formal, symbols are as a rule specified using exact and complete names 
throughout.

This does not mean types are useless as theoretical devices. Much of 
type theory as a field of computer science is really about proving 
simple statements that can be phrased on the form `$x$ is-a $T$', 
where the ``type'' $T$ is some collection of claims that may or 
may not hold true for the ``value'' $x$. The interface specifications 
introduced in the next subsection can be read as providing a great deal 
of information that can be phrased in this form. That \mtl\ still 
does not have a type system in this epistemic sense is primarily 
because all ``type'' information contained in the interface 
specifications is \emph{informal}; there is no formal language using 
which one has to encode all typing relations. Maybe at some point in 
the future will there emerge some language using which one can 
conveniently encode all the type information a computing system 
could reasonably want to know, and which a compiler would then use to 
produce heavily optimised code or a static analyser could use to 
verify program correctness, but that remains to be seen; in 
particular correctness will, in the problem domain for which \mtl\ is 
designed, frequently depend on quite nontrivial mathematical theorems 
and thus for practical purposes be beyond what can be automated anyway. 
For the moment, it would be extremely premature to try to design a 
formal specification language for \mtl, as the main result of 
deploying any given language would probably only be that users' 
imaginations are constrained by having to fit their designs to 
what the specification language supports.

One final aspect of type is the \emph{encoding} aspect, in which a 
type is taken as a specification of how some set of values are 
encoded in terms of some lower-level data model (such as the bits of 
computer memory, characters in a text file, or bytes in a binary file). 
This is an aspect which \mtl\ cannot avoid, but decisions in this area 
tend to be quite local; each package decides for itself how it prefers 
to encode the values it operates on, and other packages generally treat 
these values as opaque units. Since many encoding problems are very 
far from being original however, it is useful to collect a body of 
example solutions to the most common problems (see 
Subsection~\ref{Ssec:TclValues}), and one could argue that this 
consistutes a sort of type system, even if it is of course 
not mandatory. In addition, a package may choose to publicly specify 
the encoding of some data, and it is for that as well useful to have a 
common body of data encoding principles and techniques for package 
authors to employ. Some suggestions along these lines for improving 
interoperability between packages can be found in 
Subsection~\ref{Ssec:StandardDataEncodings}.


% Another, more conceptual application of types in many modern 
% programming languages is for selecting definition of a polymorphic 
% operation; practically the same token is used as name of several 
% different operations, and the types (which in some cases are 
% compile-time properties of variables and in other runtime properties 
% of values) of the operation arguments are used to select an exact 
% definition. This idea was probably taken from ordinary mathematical 
% notation, where it is common that the intended interpretation of an 
% operation symbol (or lack thereof, as in $xy$) depends on what the 
% operands happen to denote; $G-u$ means one thing if $G$ and $u$ are 
% numbers, another thing if they are vectors, and something very 
% different if $G$ is a graph and $u$ is a vertex in $G$. While this 
% is absolutely standard, one should nonetheless observe that it really 
% is a shorthand notation; the standard formalisation of mathematics in 
% mathematical logic requires that operation with different definitions 
% are given distinct names. In particular the distinction between 
% `element' and `set' used to justify shorthands such as $f(S)$ for 
% $\left\{ f(x) \bigm\vert x \in S\right\}$ when $S$ is a set break 
% down when it is taken into account that mathematics is constructed on 
% top of set theory, since \emph{everything} is a set\footnote{
%   Or in some axiomatisations: a class; the same argument applies 
%   for all kinds of collections.
% } in standard set theory; if one were to actually take that shorthand 
% seriously as a rule of definition, one would find that every function 
% was defined for every mathematical object, and usually not in 
% any way that makes practical sense!


% ---
% 
% 
% For applications where types are convenient, it is perfectly 
% straightforward to emulate typing within a subsystem of mathematics 
% by simply implementing the type system of one's choice, but it is 
% frequently less straightforward to rid oneself of an awkward type 
% system that has been allowed to permeate the foundations. 
% 
% 
% ---


\subsection{The \Tcl\ value model}
\label{Ssec:TclValues}

Several references have been made above to `values', without actually 
defining what these can be or how they behave. Sadly, it is quite 
common for language manuals to dodge this question, since the true 
story tends to be both complicated and not all that flattering for 
the language designers; the model is frequently ``whatever happened 
to get implemented'' rather than something thought out in advance. 
\emph{\Tcl\ is different,} in that it has a clearly defined model 
whose simplicity, flexibility, and expressive power are befitting of 
mathematics. It should however be observed that this model has two 
rather different faces: on one hand, there is the formal model 
against which program correctness should be judged, and on the other 
hand there is the underlying technology, which determines the 
computational complexity of algorithm implementations. (There have 
unfortunately been some widely disseminated, although by now rather 
old, analyses of \Tcl\ whose outdated conclusions about complexity 
are based on the formal model alone.)

The formal model is simply this: \emph{everything} (i.e., every 
value) \emph{is a string!} Every value has a unique decomposition 
into a finite sequence (of length zero or more) of characters, and 
two values are equal if and only if they decompose to the same 
sequence of characters. Commands are allowed to treat unequal 
strings as denoting equivalent values\Ldash for example 
`\texttt{0xFF}' and `\texttt{255}' are equivalent to commands 
expecting an integer\Rdash but they may not distinguish between equal 
values constructed in different ways. This is useful for testing and 
debugging, since it has the corollary that ``what you see is what it 
got'' with respect to calling of commands; an input value constructed 
by some complicated subroutine is always the same as some literal 
string. It also means that the value model maps 
straightforwardly into several fundamental models of computations, 
particularly those of more mathematical flavour such as 
recursive functions (on natural numbers) and Turing machines.

The underlying technology instead supports two separate views of 
values: the \emph{string representation} of the formal model, and an 
\emph{internal representation} adapted to the hardware at hand. The 
canonical string representation of a number is for example in 
decimal, but the internal representation (which the arithmetic 
operations make use of) is in binary. Conversions between the two are 
lazy, meaning a particular representation is only generated if 
somethings asks for it. Hence most numbers in a well-written \Tcl\ 
program are born, live, and eventually die possessing only their 
binary internal representations; it is sufficient for `everything is 
a string' (EIAS) that the string representations could have been 
generated.

The set of internal representations is furthermore not closed, but 
something that can be extended: many dynamically loadable \Tcl\ 
extensions define their own internal representation for some kind of 
data they provide operations on. The only requirement is really that 
they set up two C-level functions: one for deallocating the internal 
representation for a value (called when the last reference to that 
value goes away), and one function for generating a string 
representation for a value given its internal representation, to meet 
the requirement that every value potentially can be viewed as a string. 
The string representation can be something as raw as a hexdump of the 
internal representation, but it can equally well be something more 
enduring and readable, such as an OpenMath-XML encoding of the value 
in question (in case the value is a mathematical object). 

That the internal representation still has to live up to the promises 
set by the formal model also has the implication that all values are 
immutable: there's no way to change a value. For composite kinds of 
values (lists, dictionaries, \dots), there are often operations for 
replacing or modifying a part of the value (e.g.~replace the value in 
an element of a list by another value), but what these operations 
formally do is always that they create a new value which is the same 
as the old one, except in the part that was to be modified. The great 
practical benefit of this is that if the same composite value (e.g.~a 
matrix) is used in several parts of your program, then there is no 
way that operations applied to it in one place (e.g.~row operations 
to perform Gaussian elimination) can affect the value of the matrix 
as seen in other parts of the program. This is often \emph{not} the 
case in programming languages which employ pointers or other references 
to mutable storage as the primary means of constructing composite 
values (most of the mainstream is that way, really); there it is 
instead the responsibility of the programmer (indeed, the responsibility 
of \emph{every} programmer involved \emph{anywhere} in the codebase) to 
restrain themselves to only using operations for immutable values, if 
the system as a whole is to provide proper value semantics. That is 
actually a rather tall order, and thus a rich source of potential 
bugs! But in \Tcl\ you get the correctness of immutable values by 
default.

The elementary way to internally live up to this formal requirement 
is to make a copy every time some value should be modified, but \Tcl\ 
typically does better than that. If a value is \emph{shared} 
(referenced also from somewhere else) then a copy is made before the 
internal representation is modified, but the internal representation 
of an unshared value can be modified directly, since the 
make-modified-value operation would formally also consume the last 
instance of the old value. For this reason, many make-modified-value 
commands take the name of the variable in which the value to modified 
is stored as argument; had they only taken the value, then the 
internal representation would always be shared (one reference is 
passed to the command, and one reference continues to be held by the 
variable). Thus to set element $2$ of the list stored in variable |L| 
to |x| one would say
\begin{quote}
  |lset L 2 x|
\end{quote}
with the variable name |L| as a bareword, whereas to retrieve element 
$2$ of the list stored in variable |L| one would say
\begin{quote}
  |lindex $L 2|
\end{quote}
where the |$L| retrieves the value of stored in |L|.


---


\section{Interpreting interface specifications}

This section explains how to interpret the various specifications of 
interfaces that make up a significant part of the \mtl\ 
infrastructure.

A structure is supposed to be more or less the kind of thing that 
universal algebra is about, although the need to make it work in 
practice rather than just in theory means some classical tricks to 
simplify the theoretical framework cannot be used. As a particular 
example, modules are in universal algebra often regarded as an 
algebra with one unary operation for every element of the thing 
acting on the module, which leads to huge (often infinite) axiom 
systems but makes it possible to keep the theory single-sorted. A 
more pragmatic approach will have to face up with the fact that many 
algebraic structures have several sorts of elements. To that end, the 
following definition is useful.

\begin{definition}
  A structure is said to be \defining{single-sorted} if it has one sort 
  of elements (all elements are syntactically equivalent) and 
  \defining{multiple-sorted} otherwise. A structure is 
  \defining{principal} if it is single-sorted or multiple-sorted but 
  comes with a distinguished sort of element (the \defining{principal} 
  sort). The \defining{principal set of elements} of a structure is 
  the set of elements of the principal sort.
\end{definition}

Fairly often, it is possible to ``hide'' the nonprincipal sorts of 
elements by making them the principal sets of elements of various 
``helper'' structures, but it is not always useful to do so. 
Therefore the concept is needed.

Another example of how theory might not match practice is that in 
theoretical constructions it is often sufficient to know equality (or 
congruence, depending on how one looks at it) as an \emph{implicit} 
relation (give me two elements and I can tell you whether they are 
equal), but in practice it may be much more useful to provide a 
function that computes a canonical form for every element is given. 
This matter is addressed more in Subsection~\ref{Ssec:Equality}.


Here is an example of an interface specification (recall that a 
\emph{magma} is an algebraic structure with a binary multiplication 
operation, which is not required to be associative or anything like 
that):
\begin{quote}
  \small\leavevmode
  \begin{APIspec}{division}{1.0}
    The |division| interface specifies the existence of a binary 
    division operation |/| that is an inverse of |*|. The quotient 
    need not be defined for all pairs of elements. It probably makes 
    more sense if |*| is at least associative, but is perfectly 
    possible to define even in other situations.
    \begin{APIdescription}{magma}
      \begin{APImethod}{=}
        \word{element} \word{element}
      \end{APImethod}
        The |=| method must satisfy \APIref+{equality}{1.0}.
      \begin{APImethod}{*}
        \word{element} \word{element}
      \end{APImethod}
        The |*| method returns an element.
      \begin{APImethod}{/}
        \word{element} \word{element}
      \end{APImethod}
        This method may throw an error, but if
        \begin{displaysyntax}
          [$M$ / $a$ $b$]
        \end{displaysyntax}
        for elements $a$ and $b$ of a \meta{magma} $M$ returns $x$ 
        then that $x$ must be an element such that the two expressions
        \begin{displaysyntax}
          [$M$ * $b$ $x$]\par
          $a$
        \end{displaysyntax}
        are |=|-equal.
        
        \begin{remark}
          In terms of a multiplicative inverse, this defines $a/b$ 
          to be $b^{-1}a$, i.e., it is a \emph{left} division. This 
          may seem unintuitive, but it makes it possible to have 
          \(a/(bc) = (a/b)/c\). Right division would have \(a/(bc) = 
          (a/c)/b\).
        \end{remark}
        
        If this method throws an error, then it should set the 
        |-errorcode| to one of the following lists:
        \begin{displaysyntax}
          API division nosolution\par
          API division unimplemented
        \end{displaysyntax}
        Use of the |nosolution| form signals that there really isn't 
        any solution $x$ to \(bx=a\). Use of the |unimplemented| form 
        merely means that division has not been implemented for these 
        arguments (e.g.~a ring of polynomials might have division 
        implemented only for monomial denominators), in which case 
        the error does not imply nonexistence of a solution.
    \end{APIdescription}
    
    Note that |/| is not required to be congruent. This is 
    because the \texttt{division} interface does not require the 
    equation \(bx=a\) to have a unique solution, and 
    ensuring that the choice of solution is independent of 
    argument representation could well be unreasonably expensive. 
  \end{APIspec}
\end{quote}
The name and version number in the left margin marks where the formal 
specification begins. The triangle marks where it ends (compare 
\qedsymbol\ in proofs).


\subsection{Pseudocode formulae}
\label{Ssec:Pseudokodsformler}

For many mathematical structures, it is appropriate for 
specifications to spell out the axioms that the primitive concepts of 
such structures must obey. However, since the main computing environment 
by Policy~\ref{Policy:Tcl} is a \Tcl\ interpreter, it follows that the 
only syntax for these concepts that is guaranteed to be formally defined 
is that which is exposed in this environment; more traditional 
mathematical syntaxes usually exist, but it is hard for a 
specification to rely on them. 
Therefore the axioms will generally have to be expressed using 
\emph{pseudocode formulae}, which mix traditional mathematical 
notation with some basics of \Tcl\ syntax.

The pseudocode formula syntax is essentially given by the following 
rules:
\begin{displaysyntax}
  \(\meta{formula} \longrightarrow \meta{mathematical formula} \mid 
  {}\)[\meta{sentence}]\par
  \(\meta{sentence} \longrightarrow \meta{word} \mid
  \meta{sentence}\meta{whitespace}\meta{word}\)\par
  \(\meta{word} \longrightarrow \meta{literal} \mid \meta{formula}\)
\end{displaysyntax}
Here, a \meta{sentence} is the syntactical unit corresponding to a 
\Tcl\ command call, and the brackets signal that we want to make use 
of the value returned by that call. A typical example of how such a 
\meta{formula} might look is
\begin{displaysyntax}
  [$G$ * [$G$ * $a$ $b$] $c$]
\end{displaysyntax}
where `$G$', `$a$', `$b$', and `$c$' are \meta{mathematical formula}s 
while `\texttt{*}' is a \meta{literal}; literals will be written in 
\texttt{typewriter} font.

A `\texttt{[}\meta{sentence}\texttt{]}' formula is evaluated as 
follows. First the individual \meta{word}s are evaluated, and a list 
of their values is constructed. Second, the value of the first word 
is interpreted as a \emph{command name}, and the corresponding 
command implementation is looked up. Third, the list of values is 
passed on to that command, to be interpreted in whatever way it sees 
fit, and the return value of that call is taken as the value of the 
formula as a whole. In other words, sentence formulae will formally 
be in \emph{prefix notation}; the value of a formula `\texttt{[f $x$ 
[g $y$] $z$]}' is $f(x,g(y),z)$ if $f$ and $g$ are the implementations 
of the commands named `\texttt{f}' and `\texttt{g}' respectively. 
The very simple syntax does however leave plenty of room in practice 
for deviations from this prefix notation, where the programmer deems 
that they are appropriate.

The trick is that every command is free to interpret its arguments in 
whatever way it wants; in particular it may expect one or several 
arguments to be \meta{literal}s that control what will be done to 
remaining arguments. The most important use of this 
possibility is the \emph{object command pattern},\footnote{
  So called because it is in \Tcl\ the traditional form of method 
  calls in object-oriented (OO) programming systems: each object is 
  a separate command that has the methods as subcommands. Different 
  methods of a single object may have wildly different syntaxes, but 
  methods of the same name for different objects of the same class 
  will normally have exactly the same syntax.
} where the actual 
command name is not explicit in the code (it is rather kept in a 
variable or function parameter) but the \emph{second} word is a 
literal name of a \emph{subcommand}; it is known that the primary 
command will perform a new round of dispatch based on its subcommand 
argument. Thus, in the example above, `$G$' is to be thought of as 
such an ``object command'' for a mathematical structure, while 
`\texttt{*}' is a subcommand name. Indeed, if $G$ is a group and 
\texttt{*} the group operation then the example above is the left 
hand side of the associativity axiom \((ab)c = a(bc)\), or \((a 
*_G\nobreak b) *_G c = a *_G (b *_G\nobreak c)\) if the operation 
symbol and dependence on the structure cannot be elided. In 
comparison to the latter, the pseudocode formula counterpart
\begin{displaysyntax}
  [$G$ * [$G$ * $a$ $b$] $c$] ${}={}$ [$G$ * $a$ [$G$ * $b$ $c$]]
\end{displaysyntax}
isn't too onerous---especially for formulae involving things that are 
not merely the associate binary operations for which traditional 
mathematical notation has been optimised.

One reservation one can have regarding the formula above is however 
that the equality relation `$=$' itself is one of the things that 
depend on the structure\Ldash indeed, a perfectly sensible 
implementation of many quotient structures is to have everything the 
same as in the numerator structure, except that the equality relation 
considers more things to be equal\Rdash and therefore the claim 
should rather be that
\begin{displaysyntax}
  [$G$ = [$G$ * [$G$ * $a$ $b$] $c$] [$G$ * $a$ [$G$ * $b$ $c$]]]
\end{displaysyntax}
has the value true for all $G$-elements $a$, $b$, and $c$. This is 
also what needs to be done when specifying more unusual structures, 
but for ordinary structures with equality the following intermediate 
formulation (of \(aa^{-1} = a^{-1}a = 1\)) is convenient:
\begin{quote}
  For all \(a \in G\), the three expressions
  \begin{displaysyntax}
    [$G$ * $a$ [$G$ inverse $a$]]\par
    [$G$ * [$G$ inverse $a$] $a$]\par
    [$G$ 1]
  \end{displaysyntax}
  are |=|-equal.
\end{quote}
The formal interpretation of this would be: \texttt{[$G$ = $x$ $y$]} 
must return true when given as $x$ and $y$ any two of those three 
expressions.


\subsection{Command syntax specifications}

Besides the axioms characterising a mathematical structure, it is 
also necessary for an interface specification to detail the syntax 
of each subcommand. In principle this can of course be done using BNF 
rules for \meta{syntax elements} as above, but two extensions of this 
formalism turn out to be very useful, to the point that almost all 
syntaxes one needs to specify can be handled without any 
$\longrightarrow$ rules at all.

The first extension stems from the fact that all a \Tcl\ command gets 
to see is a list of words (or rather the values these words evaluate 
to)\Dash hence it is preferable to point out when an element makes up 
exactly one word. This is done by changing the angle brackets around 
an \meta{element} to braces, like so: \word{element}. This makes it 
totally clear that the |foo| command with syntax
\begin{displaysyntax}
  foo \word{bar} \word{baz}
\end{displaysyntax}
has exactly two arguments.
Because of this, the basic \meta{foo} kind of syntax element is most 
commonly used when the element is a list of items that are going to 
be inserted into the sentence as one word per item. It is however 
also used for used for syntax elements that may only constitute part 
of a word, such as the \meta{unsigned integer}s of 
Subsection~\ref{Ssec:InterfaceVersion} below.

The second extension is that made in EBNF~\cite[Sec.~6]{XML-spec}, 
namely that one may use the repetition and grouping constructions of 
regular expressions within a syntax description. Hence
\begin{longtable}{l@{ means }p{0.6\linewidth}}
  \word{term}\regstar& zero or more \word{term}s,\\
  \word{term}\regplus& one or more \word{term}s,\\
  \word{term}\regopt& zero or one \word{term},\\
  \word{foo} \begin{regblock} \word{bar}\regalt \word{baz} 
    \end{regblock}& 
    a \word{foo} followed by a \word{bar} or a \word{baz},\\
  \begin{regblock}[\regstar] \word{key} \word{value} \end{regblock}&
    an even number (which may be zero) of elements, which 
    alternatingly are \word{key}s and \word{value}s, beginning with a 
    \word{key},
\end{longtable}\noindent
and so on. The \regstar, \regplus, and \regopt\ in particular often 
come in handy to clarify whether an operation is binary, unary, or 
variadic.


\subsection{Lists in pseudocode formulae}

When writing pseudocode formulae, it is sometimes necessary to 
deal with sequences whose lengths are not fixed, precisely as in 
regular mathematical formulae. Several styles of such expressions 
exist, and the choice between them is mainly a stylistic one.

The first style is the traditional mathematical ellipsis (\dots) 
notation. One might for example say that
\begin{quote}
  For any \(a_1,\dotsc,a_m,b_1,\dotsc,b_n \in G\), the two expressions
  \begin{displaysyntax}
    [$G$ * $a_1$ $\dots$ $a_m$ $b_1$ $\dots$ $b_n$]\par
    [$G$ * [$G$ * $a_1$ $\dots$ $a_m$] [$G$ * $b_1$ $\dots$ $b_n$]]
  \end{displaysyntax}
  are |=|-equal.
\end{quote}
Here it is understood that each `$a_1$ $\dots$ $a_m$' represents a 
stretch of zero or more words, the exact number being denoted by $m$. 
If one wishes to express a stretch of one or more words, then one can 
do so as `$a_1$ $a_2$ $\dots$ $a_m$'.

The second style derives from syntax specifications as described 
above, and consists essentially in the observation that a 
\meta{quantity} within angle brackets may represent any amount of 
material, thus also a sequence of zero or more words. Expressing the 
above in this style would instead look like
\begin{quote}
  For any sequences \meta{A} and \meta{B} of elements of $G$, the 
  two expressions
  \begin{displaysyntax}
    [$G$ * \meta{A} \meta{B}]\par
    [$G$ * [$G$ * \meta{A}] [$G$ * \meta{B}]]
  \end{displaysyntax}
  are |=|-equal.
\end{quote}
This style my be more appropriate in cases where the \meta{quantity} is 
effectively opaque (i.e., one is not in the current discussion 
supposed to take it apart), such as the \meta{prefix} that implements 
a particular structure.

The third style derives from the syntax of \Tcl\ itself, and could 
arguably be considered an extension of the pseudocode formula syntax 
given in Subsection~\ref{Ssec:Pseudokodsformler}. It applies when 
some quantity is \emph{the list of} those values that one would like 
to give as separate words, and takes the form of the expand-prefix 
`\splode' followed by (the psuedocode formula for) the list in 
question. In this notation, the previous example looks like
\begin{quote}
  For any lists $A$ and $B$ of elements of $G$, the two expressions
  \begin{displaysyntax}
    [$G$ * \splode$A$ \splode$B$]\par
    [$G$ * [$G$ * \splode$A$] [$G$ * \splode$B$]]
  \end{displaysyntax}
  are |=|-equal.
\end{quote}
This is most useful when the list of these values is an 
abstraction which occurs naturally in the argument, rather than 
as something which is introduced ad hoc to build a collection.

There can conversely also be a need to have a notation for forming a 
list from separate words, if some operation requires such a list as an 
argument. One basic way of doing this is to use \Tcl's list 
construction command |list|, since
\begin{displaysyntax}
  [list $a_1$ \dots\ $a_n$]
\end{displaysyntax}
evaluates to the length $n$ list whose elements are $a_1$, \dots, 
$a_n$. There is however also a tradition of simply enumerating the 
list elements between braces, like so:\footnote{
  This example is straight from the \texttt{switch.n} manpage.
}
\begin{displaysyntax}
  switch \meta{options}\regopt\ \word{string}
  |{| \begin{regblock}[\regplus] \word{pattern} \word{body} 
  \end{regblock} |}|
\end{displaysyntax}


---

% The 


% $^{\splode} L + {}^{\splode}\!L + {\splode} + \hbox{|{*}|} L$
% 
% $^{\circledast} L + {}^{\circledast}\!L + 
% {}^{\circledast}\!\!L + {\circledast} L$

% \begin{displaysyntax}
%   [$M$ + [$M$ + \splode$L_1$] [$M$ + \splode$L_2$]]\par
%   [$M$ + \splode$L_1$ \splode$L_2$]
% \end{displaysyntax}
% \begin{displaysyntax}
%   [$M$ + [$M$ + \tsplode$L_1$] [$M$ + \tsplode$L_2$]]\par
%   [$M$ + \tsplode$L_1$ \tsplode$L_2$]
% \end{displaysyntax}

---


\subsection{Interface names and versions}
\label{Ssec:InterfaceVersion}

An important aspect of the \mtl\ system is that the information of 
which interfaces are supported by a structure should be available at 
runtime, for other structures to query and act upon. This is done 
using the \emph{API convention}, according to which every structure 
should have a method |API| with the following three forms:
\begin{displaysyntax}
  \meta{structure} API \word{interface} \word{version}\par
  \meta{structure} API \word{interface}\par
  \meta{structure} API
\end{displaysyntax}
In these calls, the \word{interface} is the name of the interface and 
the \word{version} is a version number such as \texttt{1.0} or 
\texttt{2.3.1}. Formally, the \word{interface} can be an arbitrary 
string and the \word{version} is defined by the rules
\begin{displaysyntax}
  \meta{version} $\longrightarrow$ \meta{unsigned integer}\relax
    \begin{regblock}[\regstar] .\meta{unsigned integer} \end{regblock}
    \par
  \meta{unsigned integer} $\longrightarrow$ 0 $\bigm\vert$ 
    \meta{digit 1--9}\meta{digit}\regstar\par
  \meta{digit} $\longrightarrow$ 0 $\vert$ 1 $\vert$ 2 $\vert$ 3 
    $\vert$ 4 $\vert$ 5 $\vert$ 6 $\vert$ 7 $\vert$ 8 $\vert$ 9\par
  \meta{digit 1--9} $\longrightarrow$ 1 $\vert$ 2 $\vert$ 3 
    $\vert$ 4 $\vert$ 5 $\vert$ 6 $\vert$ 7 $\vert$ 8 $\vert$ 9
\end{displaysyntax}
The most important form of the |API| method is the first, since it 
constitutes the query ``does \meta{structure} support version 
\word{version} of \word{interface}''; the return value is a boolean. 
When the result is true, the user knows that this structure lives up 
to what is specified in that version of the interface.

The idea of the version numbers is that interface specifications, like 
standards in general, may evolve over time and that later versions 
are better (for example from a usability perspective) than earlier 
versions; hence it is desirable to support as high a version number 
as possible. On the other hand, it is usually easier to implement 
lower versions of an interface since these make fewer claims that the 
implementation needs to satisfy. That \mtl\ interfaces are more or less 
\emph{theories} in the sense of mathematical logic implies that they 
evolve in a slightly different manner than for example software 
packages do.

The first \meta{unsigned integer} of a \word{version} is called the 
\emph{major version number}, whereas the remaining are collectively 
known as \emph{minor version numbers}. Within a series of versions 
with the same major version number, later versions are 
specialisations of earlier versions, meaning any axiom claimed by the 
earlier version must hold also in the later version. Versions with 
different major numbers are however formally as independent as 
interfaces with different names; a jump to a new major number signals 
a fresh start, sometimes because of taking a completely different 
view of the subject (e.g.~starting from a different set of primitive 
concepts), but other times because the previous specification turned 
out to have botched some technical detail and thereby became less 
useful than what was intended.

Taking the above specification of \texttt{division} as an example, 
one could (especially in view of the \emph{remark}) imagine that there 
had been some version~0.0 of this interface according to which |/| 
would instead do right division, but that this got deprecated because 
in a \texttt{group} $G$ it would unintuitively imply that 
\texttt{[$G$ / $a$ [$G$ * $b$ $c$]]} is equal to 
\texttt{[$G$ / [$G$ / $a$ $c$] $b$]}. (There wasn't such a version, 
but there could have been.) Bumping the major version number up to~1 
would then be the way to start anew. Structures where |*| is 
commutative could actually claim to support both versions, but 
noncommutative ones would have to choose one or the other.

For minor version numbers, one could imagine a \texttt{division}~1.1 
where the syntax of |/| was generalised to
\begin{displaysyntax}
  \meta{magma} / \word{numerator} \word{denominator}\regplus
\end{displaysyntax}
This should then be combined with an axiom that serves as a 
definition of this extended division operation. One possibility is to 
say that
\begin{displaysyntax}
  [$M$ / $a$ $b$ \splode$C$]\par
  [$M$ / [$M$ / $a$ $b$] \splode$C$]
\end{displaysyntax}
are |=|-equal for all \(a,b \in M\) and nonempty lists $C$ of 
elements of $M$, but this has the disadvantage to enforcing an 
algorithm for computing the multiple-argument form of |/|. A 
different approach is to say that if \(x \in M\) is the value of
\begin{displaysyntax}
  [$M$ / $a$ $b_1$ \dots $b_n$]
\end{displaysyntax}
then
\begin{displaysyntax}
  [$M$ * $b_1$ [$\dots$ [$M$ * $b_n$ $x$]$\dots$]]
\end{displaysyntax}
is |=|-equal to $a$.


\section{Basic notions}

This section covers some basics of the \mtl\ programming environment.


\subsection{Implementations and constructors}

There are typically (at least) two \Tcl\ commands associated with each 
structure. One provides the structure \emph{implementation}, in that 
it is the underlying command of the command prefix representing the 
structure. The other command constructs such command prefixes, and is 
therefore called a \emph{constructor}. It is sometimes possible for 
several constructors to make use of the same implementation, but 
perhaps more common that one constructor chooses one of several 
implementations\Dash usually the most specialised that will be 
applicable, since that would probably provide the most features and 
take advantage of the maximal number of optimisations.
Implementation commands need not be part of the documented interface 
of a package, since it is sufficient that some constructor ``knows'' 
its name, but it may at times be appropriate to support users calling 
the implementation command directly.

\begin{example}[Integers]
  The |mtmtcl::rings::integers::all| command is an implementation of 
  the ring of integers $\mathbb{Z}$, and it is documented that users 
  may call this command directly. A constructor companion 
  |mtmtcl::rings::integers::make| is provided, but the main 
  reason for using it would be that it returns the fully qualified 
  name of the implementation; there is no shame in inlining what is 
  returned instead of calling this.
\end{example}

\begin{example}[Integers modulo $n$]
  The |mtmtcl::rings::integers::modulo| command is an implementation 
  of the ring of integers-modulo-$n$ $\mathbb{Z}_n$. While it is 
  documented that this command has the call syntax
  \begin{displaysyntax}
    mtmtcl::rings::integers::modulo \word{n} \word{API} 
    \word{subcommand} \word{argument}\regstar
  \end{displaysyntax}
  (where the first three words make up the \meta{structure}), the 
  preferred usage is rather via the constructor command
  \begin{displaysyntax}
    |mtmtcl::rings::integers::make_modulo| \word{n}
  \end{displaysyntax}
  since computing the \word{API} parameter is rather technical. The 
  reason for still documenting the implementation command is that it 
  can be useful for other constructors. In particular, a constructor 
  for finite fields can use |modulo| directly for fields of prime 
  order, but must resort to something more complicated for other 
  finite fields.
\end{example}


---

It seems useful to make a distinction between \emph{adapters} (for 
run-time adaptation, sitting between the caller and the actual 
implementation) and \emph{adaptors} (for define-time adaptation, 
defining a new command which does not itself call the adaptor).


\subsection{Combining interfaces}

It often feels natural to say things like ``if this structure 
satisfies interface $Y$ in addition to interface $X$ then it must 
also be the case that \dots'' The problem with this is that it would 
then not be safe to declare that a structure satisfies interface $Y$ 
(for example as a consequence of an adaptation) unless one can vouch 
also for all possible combination implications of that interface 
with other interfaces on that structure, the set of which can grow 
after the adaptor gets implemented. The implication would thus be 
that an adaptor has to throw away all interfaces it doesn't know 
about, which would be somewhat counterproductive.

Therefore the rule is that properties of an interface may not be 
conditionalised on other interfaces; each interface makes its claims 
on its own. Interfaces may extend other interfaces, but then there is 
a direct implication. If the natural combination of two interfaces is 
slightly stronger than the logical conjunction of the two (e.g.~field 
and totally ordered set), then that should be handled by making the 
combination a third interface (e.g.~ordered field) which implies the 
first two.


---


\subsection{Concrete and abstract data}
\label{Ssec:StandardDataEncodings}

The interfaces mostly deal with \emph{abstract} data, i.e., it does 
not specify how the data is to be encoded. This is generally because 
the interface may productively be applied to mathematical objects 
well beyond the interface author's imagination; when even the 
basic information that needs to be encoded is unknown, there 
obviously wouldn't be any point in trying to prescribe an encoding 
for it. There are however also data involved in the interfaces that 
are quite concretely known, and for that it seems appropriate to lay 
down standard encodings.

\paragraph{Booleans}
A boolean is true or false. In \Tcl\ the canonical representations 
for these are |1| and |0| respectively, but any nonzero integer is 
accepted as true, and several strings (|on|, |off|, |yes|, |no|, 
|true|, and |false|) are accepted as boolean values. In \mtl, 
anything \Cfunctionidentifier{Tcl\_GetBooleanFromObj} accepts as boolean 
is a valid boolean.

\paragraph{Integers}
Any string matching the regular expression \verb"0|-?[1-9][0-9]*" is 
a valid integer, and two such strings are equal as integers if and 
only if they are equal as strings. (Hence these are \emph{proper} 
integers, not machine-integers.) These are the canonical 
representations of integers. There are also some valid noncanonical 
representations, e.g.~hexadecimal notation as strings matching 
|-?0x[0-9A-Fa-f]+|, and octal notation.

\paragraph{Permutations}
A permutation $\sigma$ is typically regarded as a bijection from 
$\{0,1,\dotsc, n-\nobreak1\}$ to itself. It is encoded as a list of 
$n$ elements, where the element at index $i$ is the value of 
$\sigma(i)$.

\paragraph{Data-tree}
A \emph{data-tree} is effectively the type of data structure that 
XML encodes: a rooted tree where the nodes carry a type-tag and 
optionally attributes, the children of a node are ordered, and (as a 
special case) strings may appear as children of a node. The encoding 
of these follow the \textsf{tdom} ``list'' format for XML nodes, 
i.e., a data-tree is either (i)~a three element list
\begin{quote}
  \word{tag} \word{attributes} \word{children}
\end{quote}
where \word{tag} is a string (which must be a valid XML name), 
\word{attributes} is a dictionary (mapping attribute name to value), 
and \word{children} is a list of data-trees, or a data-tree is (ii)~a 
two element list
\begin{quote}
  |#text| \word{string}
\end{quote}
encoding the explicit string \word{string}.

Note that all \word{string}s and attribute values are \Tcl\ strings, 
not XML encodings of such strings. This means the amperand character 
is really `|&|', not `|&amp;|'. Also note that |#text| is not a valid 
\word{tag}, since `|#|' is not allowed in XML names. This implies 
that \textbf{data-trees can be processed directly using the 
data-is-code technique}.


\DocInclude{support}
\DocInclude{groups}
\DocInclude{rings}
% \DocInclude%{export}


\begin{thebibliography}{99}

\bibitem{XML-spec}
  \textit{Extensible Markup Language (XML) 1.0} (Fifth Edition).
  C. M. Sperberg-McQueen, F. Yergeau, E. Maler, T. Bray, J. Paoli, 
  Editors.
  W3C Recommendation, 26 November 2008, 
  http://www.w3.org/TR/2008/REC-xml-20081126/ 
  % Latest version available at http://www.w3.org/TR/xml .

\end{thebibliography}

\PrintIndex

\end{document}


\endinput

---


As a particular example, one may consider polynomials. The standard 
library naturally contains a construction of polynomials over 
(i.e., with coefficients from) an arbitrary ring $\mathcal{R}$; the 
result of that construction being the polynomial ring $\mathcal{R}[X]$. 
However, a semiring enthu


This makes 
\mtl\ different from many computer algebra systems, where the things 
built in tend to be special (e.g.~$2+2$ is always $4$, never $1$ as in 
$\mathbb{Z}_3$). Integers are certainly fun, but there is nothing 
about 


---


A construction of polynomials over something at least requires the 
underlying structure of scalars to be a ring,\footnote{Semiring 
enthusiasts may at this point object that a semiring, such as for 
example the natural numbers $\mathbb{N}$, is sufficient, and this is 
true. However it is not practically possible to always offer the most 
general construction of everything, so in reality there will likely be 
a standard ``polynomial over'' construction that starts with a ring 
$\mathcal{R}$ and produces some ring $\mathcal{R}[X]$. There is no 
restriction or disadvantage in this, as long as anyone can also 
implement and use an alternative semiring-aware construction of 
polynomials.}  and 

The only 
thing common to all structure objects is the way in which they can be 
queried which interfaces they support.


---