-
Notifications
You must be signed in to change notification settings - Fork 0
/
interfaces.tex
1711 lines (1536 loc) · 79.3 KB
/
interfaces.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass{mtmtcl}
\newcommand{\cs}[1]{\texttt{\PrintChar{92}#1}}
% \newcommand{\tsplode}{\texttt{%
% \PrintChar{123}\kern-0.1em*\kern-0.1em\PrintChar{125}%
% }}
\theoremstyle{plain}
\newtheorem{policy}{Policy}
\theoremstyle{remark}
\newtheorem*{example}{Example}
%
% \usepackage{amsmath,amsfonts,amssymb}
% \usepackage{amsthm}
%
% \usepackage{longtable}
%
% \theoremstyle{definition}
% \newtheorem*{definition}{Definition}
% \theoremstyle{remark}
% \newtheorem*{remark}{Remark}
%
%
% \makeatletter
% \NewDescribeCommand{\defining}{%
% \XD@grab@oarg\XD@grab@sarg{*}\XD@grab@oarg\XD@grab@marg
% }{4}{%
% \IndexEntry{%
% \ifx \NoValue#1%
% \LevelSame{\ifx\NoValue#3#4\else#3\fi}%
% \else
% \LevelSorted{#1}{\ifx\NoValue#3#4\else#3\fi}%
% \fi
% }{main}{\thepage}%
% \textbf{#4}%
% \@gobble % Eats \ignorespaces
% }
% \makeatother
%
% \PageIndex
% \CodelineNumbered
% \setcounter{IndexColumns}{2}
%
% \newenvironment{procmethod}{%
% \tclsubcommand{method}{submethod}%
% }{\endtclsubcommand}
%
%
% % \newcommand{\Z}{\ensuremath{\mathbb{Z}}}
% \newcommand{\Z}{\texttt{Z}}
%
% \newcommand*{\mw}[1]{\word{$#1$}}
%
% \newcommand{\mtl}{\texttt{mathematcl}}
% \newcommand{\Tcl}{\Tcllogo}
% \newcommand{\TclObj}{\Tcllogo\_Obj}
% \newcommand{\TclObjs}{\TclObj s}
%
% \providecommand{\Ldash}{---}
% \providecommand{\Rdash}{---}
% \providecommand{\Dash}{---}
\begin{document}
\title{\mtl~interfaces}
\author{Lars Hellstr\"om}
\date{2006-06-21--}
\maketitle
\begin{abstract}
A key part in the \mtl\ system is the use of \emph{interfaces}
for structures as a way of making them work together. This
document is the primary collection of specifications of these
interfaces.
\end{abstract}
\tableofcontents
\section{Introduction}
The \mtl\ system for algebraic computations is designed to meet the
following goals:
\begin{itemize}
\item
It should be reasonably easy to program.
\iffalse
and in particular not
constantly burden the mathematician's mind with issues of how
data is being stored.
\fi
\item
It should be open to new concepts and new implementations\Dash
built-in operations should not receive preferential treatment or
display behaviour that is otherwise unattainable.
\item
It should be able to manage the many levels of abstraction that
occur in algebraic constructions.
\item
The system should be portable.
% \item
% \dots
\end{itemize}
To meet these goals, the following design decisions have been made:
\begin{policy} \label{Pol:Strukturer}
Focus is on mathematical \emph{structures} and the \emph{interfaces}
these support. Generic algorithms should be expressed not in terms
of concrete implementations, but in terms of abstract structures
provided as parameters. Structures expose their capabilities and
useful properties by declaring support for particular interfaces.
\end{policy}
Basic examples of interfaces are `ring', `group', `lattice', and so
on. Examples of structures are $\mathbb{Z}$~(ring, additive group,
lattice under $\max$ and $\min$), $\mathbb{C}$~(field, ring, additive
group), $\mathbb{N}$~(semiring, additive monoid, multiplicative
monoid, divisor lattice, max/min lattice), $\mathbb{Z}[x]$~(ring etc.),
$\mathbb{Z}[x]\big/ \langle x^2 - 2\rangle$~(ditto), $\mathbb{Z} +
i\mathbb{Z}$~(Gaussian integers: Euclidean domain, ring etc.),
$\mathrm{M}_n(\mathbb{R})$~(real $n \times n$ matrices: ring etc.),
$\mathrm{GL}_n(\mathbb{R})$~(invertible real $n \times n$ matrices: a
group, subset of $\mathrm{M}_n(\mathbb{R})$, etc.), and so on.
What should be apparent from these examples is
that many structures naturally support more than one interface; it is
seldom useful to ask for a name describing all that a structure is,
but generally quite sufficient to pick an interface and ask whether
the structure supports it. The case of the natural numbers should
also make it clear that there are sometimes more than one way in which
a set can often be turned into a structure supporting a particular
interface; the ways in which one can reinterpret the basic
mathematical structures are surprisingly numerous.
What this means in practice is that in order to multiply two
elements $a$ and $b$ of some algebra $\mathcal{C}$ one does not
write `$a$ times $b$', but rather `$\mathcal{C}$-multiply $a$ and
$b$'; cf.~the notation \(a \cdot_{\mathcal{C}} b\) that is
sometimes used when one needs to clarify that this is the
multiplication operation of~$\mathcal{C}$. Provided that (references
to) structures such as this `$\mathcal{C}$' can be passed around as
easily as elements of structures\Ldash which is essentially the
`provided as parameters' in the policy statment, and explained in
more detail below\Rdash this is a cheap and effective way of
explaining to the computer what operation is desired. Equally
important is the fact that it scales well: that it can handle
hundreds of structures as easily as one. Both are necessary to meet
the goal of handling multiple levels of abstraction, since a level
$n$ structure is usually constructed from one or several level $n-1$
structures, which in turn are constructed from level $n-2$
structures, and so on; the implementation of a basic operation in
a high level structure can easily exercise dozens of operations in
lower level structures.
\begin{example}
The field of rational numbers $\mathbb{Q}$ is typically implemented
as the field of fractions of the ring of integers $\mathbb{Z}$.
This means every rational number is a pair of integers. Even for
such an ``easy'' operation as addition in $\mathbb{Q}$, it is
necessary to compute a common multiple of the denominators of the
terms and extend all the fractions accordingly, which requires at
least multiplication in $\mathbb{Z}$. One furthermore typically
wants to use as small numbers as possible in these calculations,
and then it is also necessary to compute quotients and GCDs in
$\mathbb{Z}$.
Consider next the ring of $n \times n$ matrices over $\mathbb{Q}$.
One addition in this ring is $n^2$ separate additions in $\mathbb{Q}$.
Multiplication in the matrix ring requires both the multiplication
and addition operations in $\mathbb{Q}$. Going further, one can
consider the subgroup $\mathrm{SL}_n(\mathbb{Q})$, and even the
corresponding group algebra $\mathbb{Q}\bigl[
\mathrm{SL}_n(\mathbb{Q}) \bigr]$. There are now four different
multiplication operations involved in this definition, so the
computer would have a lot to choose from if we didn't tell it which
one of these we mean in each case.
\end{example}
In ordinary mathematical writing the structure name
is usually omitted\Ldash e.g.~when each particular argument only
involves one multiplication operation anyway\Rdash but in higher
algebra the normal state of affairs is rather than there are
several multiplication operations that have to be distinguished.
A system for algebraic computation must deal with e.g.~the fact
that the multiplication in a semigroup algebra is defined in
terms of the multiplications of the underlying semigroup and
coefficient ring.
\begin{policy}
There are no ``blessed'' structures, only standard implementations
provided for convenience, and anyone is free to provide
alternatives. Constructions should not rely on details of a
structure that are not publicly declared through an interface.
\end{policy}
This is part of the \emph{openness} goal. Policy~\ref{Pol:Strukturer}
relates more to the goal of managing many levels of abstraction,
since the very act of relying on declared interfaces implies that
all lower levels are ignored. Portability and ease of programming is
the subject of the next policy.
\begin{policy} \label{Policy:Tcl}
The main computing environment shall be a \Tcl~(Tool Command
Language) interpreter.
\end{policy}
Formally, the \emph{main computing environment} is merely that in
which independent pieces of code is combined. It \emph{may} be
thought of strictly as a generic dispatch mechanism (that just happens
to contain a complete programming language too) for interfacing with
other people's code (possibly written in some quite different language
than what you're using), but you may also find that it serves your
actual computing needs quite well.
\Tcl\ is both syntactically and semantically an extremely simple
language\Ldash yet expressive and highly flexible\Rdash which makes
it easy to use in this context. Unlike the majority of contemporary
programming languages, the syntax of \Tcl\ is not descendant from that
of traditional mathematical formulae (hence it will probably seem
unfamiliar the first time it is encountered), but when it comes to
programming this is mostly an improvement; the syntax of mathematical
formulae is actually very complicated and it is often a matter of
convention rather than grammar how a nontrivial formula like $\sin
2x$ or $\sin 2\ln x$ should be understood. In addition to this
simplicity, the \Tcl\ syntax also provides an elegant solution to
the problem of distinguishing between several similarly-named
operations that can be a major headache when describing complex
algebraic constructions.
Some may find it a concern that \Tcl\ is not a heavily optimised or
optimising language, in the sense that a procedure call in \Tcl\ is
not nearly as fast as a corresponding function call in C, but that
is less important than one might think. The main reason is the above
observation that one level~$n$ operation usually boils down to
several level~$n-1$ operations, each of which in turn boil down to
several level~$n-2$ operations, and so on: the number of level~$n$
operation steps are \emph{massively} outnumbered by the level~$0$
operation steps.
This means dispatch (or even execution) efficiency at level $n$ has a
rather small impact on overall performance; what matters is
efficiency at level $0$ and $1$. And the nice thing about \Tcl\ here
is that it can just as well use implementations in some other (more
optimised) language for those low level operations. The idea is not
to use the same \Tcl\ source code for implementing the level $0$ and
$1$ structures while somehow swapping in other implementations of
primitives, but to replace the entire implementation of selected
time-critical level $0$ and $1$ structures by an optimised one in
e.g.~C, and reap the performance benefits of this.
%
% \begin{itemize}
%
% \item
%
%
%
%
% ---
%
%
%
% \item
% The system should \emph{not} employ a type system to interpret
% operations. Instead, it is required that every operation is fully
% identified independently of the data it is to operate on.
%
% The main use for type systems in computer languages has been to
% reserve memory for data storage, but modern high level languages
% generally allocate memory dynamically when needed, freeing the
% programmer from having to manage allocation explicitly.
% A secondary use for a type system that has grown more common in
% the last decades is to resolve overloaded (polymorphic)
% operations, where the same symbol is used for several operations
% and the language environment picks one whose type-labelling
% matches that of the operands. It is primarily the latter practice
% that this decision rejects.
%
% The runtime rationale for this decision is that type-controlled
% polymorphism inserts an impractical detour into the very heart of
% the computing environment. The programmer implementing an
% algorithm always knows \emph{exactly} (up to parameters defining
% the context for the computation) which operation is intended at
% every point in the program, and this is also what the computer
% needs to know to execute the program.
% Relying upon type-controlled polymorphism when instructing the
% computer will however mean that a lot of this information is
% omitted or provided only indirectly, which requires constant
% detours into type-matching to resolve the ambiguity created by
% the overloading.
%
% The coding time rationale is that the system becomes much simpler
% if the administrative overhead of assigning formalised types to
% everything can be dropped. There is however also a coding time
% cost in that polymorphism is often exploited to produce more
% compact source code; the explicit identification of operations
% must not become tedious.
% \end{itemize}
\section{Remarks on the computational environment}
This section collects some introductory notes on \mtl\ from a
programming perspective. Readers who (for the moment) are more
interested in the mathematical perspective may prefer to skip ahead.
Readers who are interested in the philosophical considerations
underlying \mtl\ should however pay attention.
\subsection{A \Tcl\ primer}
At its conceptual core, a \Tcl\ interpreter is a dispatch engine: it
processes a sequence of calls, each of which has the form of a list of
values, by interpreting the first value of each call as the name of a
command and then handing the call over to (the function implementing)
that command, getting back a result that may become one of the values
in a later call. Commands may be implemented natively in \Tcl\
(procedures, ensembles, aliases, etc.), in C (as is the case with many
core commands, for speed or because they need to interface more
closely with the OS), or in a variety of other languages such as C++
and Fortran (if you can link it with a reasonably modern C, then you can
use it with Tcl). A classical development paradigm is that one
tries to implement commands in \Tcl\ first (which typically allows for
rapid prototyping, particularly of interfaces) and maybe later
reimplements critical parts in a lower level language if the \Tcl\
implementation turned out to not be performant enough. (Another
time-tested approach, which is appropriate when the wanted operations
are already available in some code library, is to provide a thin
wrapper for that library which makes its functionality available as
\Tcl\ commands.)
The \Tcl\ \emph{language} syntax describes how a \Tcl\ program piece
(or \emph{script}) is translated to a sequence of calls. Technically, a
script consists of a sequence of \emph{sentences}, which are
separated by newlines (or semicolons), and each sentence consists of
a sequence of \emph{words}, which are separated by ordinary
whitespace. The basic idea is that each sentence codes one command
call, and each word of that sentence contributes one element to the
list of values making up that call, but there is (as explained below)
some room for bending those rules.
Words in \Tcl\ programs tend to fall in five functional categories:
\begin{description}
\item[Variable references]
A word like `|$foo|', where the first character a dollar sign
`\texttt{\$}' and the rest is the name of a variable, is a
reference to that variable. Its contribution to the command call
is the current value of that variable.
\item[Barewords]
A word simply denotes itself if it does not contain any of the
characters |{}[]$\"#;| (left and right brace, left and right
bracket, dollar, backslash, quotation mark, hash sign, and
semicolon)
% `|{|' (left brace), `|}|' (right brace), `|[|' (left
% bracket), `|]|' (right bracket), `|$|' (dollar), `|\|'
% (backslash), `|"|' (quotation mark), `|#|' (hash sign),
% `|;|'~(semicolon),
or whitespace; such \emph{barewords} typically serve as
identifiers\slash symbols or explicit constants. Note that the
characters allowed in barewords include `|+|', `|-|',
`|*|', `|/|', `|=|', `|<|', `|>|', `|!|', and many more, in
particular all the non-ASCII characters in Unicode! This often
makes it possible to use the ``proper'' mathematical symbol when
denoting something, whereas in more traditional languages the
wanted character would be unsupported or appropriated for
some built-in functionality of the language.
\item[Recursive calls]
A word like `|[bar baz $foo]|'\Ldash where the first character is a
left bracket `\texttt{[}', the last character is the matching
right bracket `\texttt{]}', and what is between them is a
complete sentence of its own\Rdash is a recursive command call.
This first causes the call coded by the inner sentence to be
performed, and then the value it returns will be used as the
contribution from this word to the call of the outer sentence.
Hence, whereas a function call in traditional mathematical
formulae looks like `$f(x,y,z)$'\Ldash the main pattern being
function--left parenthesis--arguments--right parenthesis\Rdash
in \Tcl\ it rather becomes `\texttt{[$f$ $x$ $y$ $z$]}', i.e.,
left bracket--function--arguments--right bracket.
\item[Strings]
A word like `|"My hovercraft is full of eels."|'\Ldash where the
first and last characters are (straight ASCII) quotation marks\Rdash
typically denotes a string; the effect of the initial quote is that
the word does not end until another quote is encountered, so
e.g.~spaces and semicolons temporarily lose their roles as
separators. `|$|', `|[|', and `|\|' are still special however,
triggering variable, command, and backslash substitution
respectively, in which the substitution character sequence is
replaced by some other piece of text in the string being
constructed. The first two behave as explained above (variable
references and recursive calls) whereas backslash substitution is
like character escapes in strings in~C: literal quotes, left
brackets, etc.~in the string can be written as `|\"|', `|\[|',
`|\$|', and `|\\|'.
\item[Inlined verbatim material]
A word that begins with a left brace `|{|' and ends with the
matching right brace `|}|' can be used to inline material that is
passed on verbatim in the call; the value for this word is the
exact string of characters between (but not including) the
delimiting braces. The target command is then free to apply any
interpretation it sees fit to the word in question. This
mechanism can be used to embed code following syntactic
principles other than those of \Tcl\Ldash such as ordinary infix
form mathematical expressions, regular expressions, or SQL
statements\Rdash into \Tcl\ scripts. It is part of the \Tcl\
philosophy that one should not force everything into one
syntactic mold, but instead let each thing be expressed in the
manner which suits it best.
Brace-delimited words are also used to embed \Tcl\ scripts into
\Tcl\ scripts; most \Tcl\ commands implementing basic control
structures, such as |if|, |for|, |while|, and |proc|, expect one or
several arguments to be \Tcl\ scripts that they recursively call
the \Tcl\ interpreter to have evaluated when the functionality
they implement requires this! That way, the overall language
syntax need not have special cases for control structures built
in, and it is possible for user-defined commands to implement
custom control structures. Visually, the effect can often be that
braces in \Tcl\ appear to be block statement constructors,
similarly to braces in C and \textbf{begin}--\textbf{end} in
Pascal. In
\begin{quote}
|set sum 0.0|\\
|foreach row $matrix {|\\
| foreach cell $row {|\\
| set sum [expr {$sum + $cell}]|\\
| }|\\
|}|
\end{quote}
the two outer levels of braces delimit inner scripts (bodies for
|foreach| loops), whereas the innermost brace level delimits an
artihmetic expression in infix form, this latter interpretation
being imposed by the |expr| command that receives the expression
as argument.
\end{description}
(Technically, the parsing of words is a bit more unified than the
above may have suggested, but for the exact story we refer to the
Tcl.n manpage.)
The slight bending of the rule that one sentence corresponds to one
call is that brackets may embed recursive calls, as explained above.
The slight bending of the rule that one word contributes one element
to the call is due to a mechanism known as \emph{expansion}. If a
word begins with the three-character sequence `|{*}|' (brace,
asterisk, brace), then the rest of that word first gives rise to a
value as explained above, but then that value is interpreted as a
list and each element of that list is contributed as a separate
element to the call being constructed. A word such as `|{*}$list|' or
`|{*}[foo $bar]|' thus contributes zero or more values to the
surrounding sentence. This has several applications in common \mtl\
idioms.
In order to parse a \Tcl\ program, one must in addition to the
overall language syntax also be aware of the syntaxes of the
particular commands used in the program, which by convention are
documented in separate manpages (section \texttt{n} in the Unix man
system). Some important commands, and some commands whose existence
may be hard to guess, are:
\begin{longtable}{l p{0.7\linewidth}}
\texttt{set}& Assign new value to a variable.\\
\texttt{proc}& Define a procedure\Dash a command that executes a
subroutine whose body is a \Tcl\ script.\\
\texttt{if}& Conditional controlled by boolean expression.\\
\texttt{switch}& Multiway-choice conditional.\\
\texttt{foreach}& Loop over the elements of a list,
or synchronously over several lists.\\
\texttt{expr}& Expressions in C-like infix notation.\\
\texttt{incr}& Increment (or decrement) integer variable.\\
\texttt{info}& Introspection into the \Tcl\ interpreter.
\end{longtable}
Some notable functional areas and commands related to these are:
\begin{longtable}{l p{0.7\linewidth}}
Flow control&
\texttt{break}, \texttt{catch}, \texttt{continue},
\texttt{error}, \texttt{eval}, \texttt{for}, \texttt{foreach},
\texttt{if}, \texttt{return}, \texttt{switch}, \texttt{tailcall},
\texttt{throw}, \texttt{try}, \texttt{uplevel}, \texttt{while}
\\
Variable scoping&
\texttt{global}, \texttt{upvar}, \texttt{variable}\\
Files and I/O&
\texttt{chan}, \texttt{close}, \texttt{eof}, \texttt{exec},
\texttt{fblocked}, \texttt{fconfigure}, \texttt{fcopy},
\texttt{file}, \texttt{fileevent}, \texttt{flush}, \texttt{gets},
\texttt{glob}, \texttt{open}, \texttt{pid}, \texttt{puts},
\texttt{read}, \texttt{seek}, \texttt{socket}, \texttt{tell}
\\
Lists&
\texttt{concat}, \texttt{join}, \texttt{lappend},
\texttt{lassign}, \texttt{lindex}, \texttt{list},
\texttt{llength}, \texttt{lmap}, \texttt{lrange},
\texttt{lrepeat}, \texttt{lreplace}, \texttt{lreverse},
\texttt{lsearch}, \texttt{lset}, \texttt{lsort}, \texttt{split}
\\
Dictionaries&
\texttt{dict}, \texttt{array}
\\
Strings&
\texttt{append}, \texttt{format}, \texttt{join}, \texttt{regexp},
\texttt{regsub}, \texttt{scan}, \texttt{split}, \texttt{string},
\texttt{subst}
\\
Binary data&
\texttt{binary}, \texttt{encoding}, \texttt{string},
\texttt{zlib}
\end{longtable}
Finally, it should be observed that \Tcl\ has a concept of
\emph{namespaces} which affect resolution of command (and nonlocal
variable) names. The namespaces form a hierarchic structure similar
to a file system, but with the two-character sequence `|::|' as
separator rather than forward or backward slash. The position
of a command or variable in the namespace hierarchy generally
reflects to what package (or subpackage) it belongs. The global (i.e.,
root) namespace is |::|. Command or variable names that begin with
|::| are interpreted relative to the global namespace, names that do
not are interpreted relative to the current namespace. All the core
commands listed above reside in the global namespace, but by
tradition their names are usually written without the leading |::| in
scripts since \Tcl\ will try to resolve a name relative to the global
namespace if no definition of it could be found relative to the
current namespace. Control of and introspection into namespaces is
provided by the |namespace| command.
% The namespaces form a tree-like hierarchy, where
% The is call
% Informally, a \Tcl\ program (or \emph{script}) is a sequence of
% \emph{command sentences}\Ldash typically one per line, although there
% is variance in both directions\Dash and a sentence consists of
% \emph{words} which are separated by whitespace. The style of these
% sentences can range from the cryptic:
% \begin{quote}
% |while {$b} {set b [expr {$a % [set a $b]}]}|
% \end{quote}
% via shell-like:
% \begin{quote}
% |fconfigure $socket -encoding utf-8 -eofchar "\x19"|
% \end{quote}
% to almost pseudocode:
% \begin{quote}
% |pick z maximizing realpart from points|
% \end{quote}
% but as far as the language is concerned, all of these are merely
% lists of words, the first of which is the name of a command. Beyond
% that, it is up to that command to interpret and use the remaining
% words as it sees fit. This allows e.g.~`|*|' to be both a binary
% infix operation (times) in numeric expressions and a unary postfix
% operation (zero or more) in regular expressions; neither is part of
% the overall language syntax, so it is entirely up to the command
% whether to make such an interpretation of the `|*|' character.
% \Tcl\ was designed to \emph{not get in the way} of programmers
% needing to express some domain-specific concept, and will happily
% allow several ``little languages'' (of which numerical expressions
% and regexps are two built-in examples) to coexist within a single
% interpreter.
%
% What the language syntax \emph{does} control is how a script is split
% up into sentences, how these are split up into words, and what the
% contents of these words will be. Within each sentence, there are
% three interacting processes: grouping, substitution, and expansion.
%
% \emph{Grouping} collects text into a word, even if it contains whitespace
% characters which would otherwise act as word or sentence separators.
% The characters which can trigger grouping are `|"|', `|{|', and
% `|}|', and they appear as delimiters at the beginning and end of the
% word, with the contents of the word consisting of all characters in
% between. A word begun by a quote ends with the next quote (that has
% not been escaped), whereas a word begun by a left brace ends at the
% matching right brace; the contents of a brace-delimited word have to
% be balanced with respect to (unescaped) left and right braces. In
% practice, quotes are often used to delimit ``ordinary'' strings,
% whereas braces are used to delimit blocks of code or data, but once a
% word has been parsed, it will have no memory of whether it was
% quote-, brace-, or just plain whitespace-delimited.
%
% The difference between the two lies instead in their relation to
% \emph{substitution}, which is inhibited in brace-delimited words but
% carried out otherwise. The three types of substitution are:
% \begin{description}
% \item[Variable substitution,]
% which is triggered by the dollar sign `|$|'.
% , which should be
% followed by the name of a variable.
% \end{description}
%
% ---
%
% syntactically a sequence of
% \emph{command sentences} and comments, separated by newlines. Each
% sentence is a sequence of \emph{words}, separated by whitespace
% (typically one or several spaces, but tabs are fine too). Words can
% be arbitrary strings. Semantically, the first word of each sentence
% is interpreted as the name of the command to execute, whereas the
% other words are passed to that command as arguments to interpret
% and process in whatever way it sees fit. Each command produces
% a result, and the result of the last command it the result of the
% script as a whole.
%
% What makes this a Turing-complete programming language is that there
% are commands for all the usual elementary operations\Dash in
% particular there are commands for the basic control structures.
% Unbounded repetition can for example be achieved through the |while|
% command, which has the syntax
% \begin{displaysyntax}
% while \word{expression} \word{script}
% \end{displaysyntax}
% When this command is executed\Ldash or \emph{evaluated}, as is the
% official term in \Tcl\Rdash the \word{expression} and \word{script}
% are alternatingly evaluated, with the command completing as soon as
% the \word{expression} value is not boolean true. Other control
% structures available as core commands include |if|~(if--then--else
% choice), |for|~(loop with control variable), |foreach|~(loop over
% list elements), |switch|~(multiway choice), |break|~(loop abortion),
% and |proc|~(subroutine creation).
% For such \word{script} arguments to be interesting, it must however
% be possible to embed newlines and spaces in it (since otherwise the
% \word{script} above would only be a single sentence of one word).
% To that end, there are three mechanisms which change the words of a
% command sentence en route from script-string to sequence of words:
% quoting, substitution, and expansion.
%
% Quoting collects a piece of text into one word, overriding characters
% which would otherwise force a word or sentence boundary. The most
% important form of quoting is brace-quoting, which happens for words
% where the first character is `|{|' (left brace), the last character
% is `|}|', and the material between them is balanced with respect to
% braces;\footnote{
% Braces preceeded by a backslash don't count when balancing however,
% so you can get an unmatched brace if you need to. Even though the
% technical details are quite different, the net effect is
% very similar to that in \TeX\ where \cs{\{} and \cs{\}} don't
% count as braces that have to be balanced.
% } in this case the command argument becomes exactly the string
% of characters between the outermost braces. This is the main
% mechanism for nesting ``blocks'' of \Tcl\ code, since it inhibits
% substitution in ``inner'' commands.
%
% Substitution replaces a piece of text\Ldash often, but not always, an
% entire word\Rdash by something else. There are three types of
% substitution, which differ in where they get the replacement text
% from and which character triggers them.
% \begin{description}
% \item[Command substitution]
% uses the return value of a script. It is triggered by a left
% bracket |[| and the script to evaluate is terminated by the
% matching right bracket |]|. This mechanism corresponds to
% function calls in most other languages, although the syntax is
% not the conventional
% \begin{equation}
% f(x_1,x_2,x_3,\dotsc,x_n)
% \end{equation}
% but rather
% \begin{displaysyntax}
% [f $x_1$ $x_2$ $x_3$ $\dots$ $x_n$]
% \end{displaysyntax}
% (Putting the ``function name'' inside the bracket with the
% arguments may seem odd, but we shall see that it is rather
% useful.) An extreme application of command substitution is the
% one-liner
% \begin{quote}
% |puts [join [lsort [split [read stdin] \n]] \n]|
% \end{quote}
% which works similarly to the Unix standard utility |sort|: |read|s
% standard in, |split|s it into a list of lines, sorts this list
% (|lsort|), |join|s the list back into a string, and out|puts| the
% result to standard out.
%
% \item[Backslash substitution]
% is triggered by a backslash `|\|' and functions very much as in
% strings in the C~language: as a mechanism for escaping special
% interpretations of characters, and as a mechanism for expressing
% arbitrary characters using only those found in ``visible ASCII''.
% The |\n| above will thus be seen as a linefeed character by
% |join| and |split|, whereas any character with special syntactic
% meaning in \Tcl\ (backslash `|\|', dollar `|$|', braces `|{|' and
% `|}|', brackets `|[|' and `|]|', number sign `|#|', semicolon
% `|;|', and the various forms of whitespace\Dash \emph{that's the
% entire list!}) can be escaped into an ordinary character by
% prepending a backslash. The combination backslash--newline is a
% special case in that it counts as an unescaped space rather than
% as a newline character, but this makes it convenient to express
% sentences with too many words to fit on one line:
%
% ---
%
% \item[Variable substitution]
% uses the value of a variable. It is triggered by the |$|
% character, which is followed by the name of the variable to
% substitute.
%
% \end{description}
% It should be observed that replacement text ``inserted'' by a
% substitution will be passed on verbatim to the command; it will in
% particular not be subjected to another round of substitution, even if
% it contains backslashes, dollars, or brackets. Nor does whitespace in
% substituted material count as word or sentence separators; if
% something looks as one word before substitution, then it will
% count as one word in the command sentence no matter what is
% substituted.
%
%
%
%
%
% ---
\subsection{The issue of a type system}
One thing that readers with computer science training tend to be
curious about is what type system \mtl\ employs. Mathematicians, on
the other hand, are not so inclined to worry on this and should
therefore feel free to skip this subsection on a first reading.
The simple truth is that \mtl\ doesn't have a type system. It doesn't
need one anyhow, and there is evidence to suggest that adding a type
system would mostly be harmful! Since these are claims that tend to be
shocking for the average computer scientist, they need some
elaboration.
To begin with, it should be observed that the standard foundation of
mathematics\Ldash Zermelo--Fraenkel set theory and its variants\Rdash
is untyped. The closest to a ``type'' in the programming language sense
that you get in it is that one can recognise certain sets as being
the domains of various functions (meaning it is okay to apply said
function to an element of that set), but there is none of that business of
values being associated to one specific type that is frequently presumed
in computer science. Hence mathematics as such is untyped, or if you
will one-typed: everything is a set, period. For a system for mathematics
to be otherwise would therefore be a deviation from the established
standard.
The historically versed reader may here object that it has not always been
the case that the standard foundation was untyped. The
set theory in the Russell--Whitehead \emph{Principia Mathematica} was
certainly typed (it is even known as ``the type theory of sets'').
In Greek antiquity (or at least that period during it which became
authorative for subsequent ages), a quantity had to be specified as
either length, area, volume, or whatever; there was no canonical way in
which e.g.~an area and a length could have comparable ``numerical
values'', so they were of different types.
% is indeed true.
Type distinctions of this kind have however a strong tendency to dissolve
over time, as the cost of maintaining them (which can be found in the
form of frequent extra manoeuvres in arguments to work around the
type boundaries) is rarely accompanied by any advantages whatsoever.
In particular the type theory of sets has, to the 21th century observer,
the distinct smell of a failed high-profile standard that collapsed under
its own weight: too impractical and complicated to see much in the area
of actual implementations, but nonetheless taken seriously and being
pursued because of its association with big names and high-profile
projects. There is no need for us to repeat that mistake today.
% Historically, that is
% however an oversimplification: there was actually a short period when
% the standard foundational set theory was one with types (more precisely
% the type theory of sets in the Russell--Whitehead \emph{Principia
% Mathematica}), but this never really caught on. For the 21th century
% observer, this set theory with types has the distinct smell of a failed
% high-profile standard that collapses under its own weight: too impractical
% and complicated to see much in the area of actual implementations, but
% nonetheless taken seriously and being pursued because of its association
% with big names and high-profile projects. Mandatory typing was
% abandoned when more lightweight alternatives emerged as equally
% consistent.
% One important problem with doing serious
% mathematics on top of the Russellian set theory is that one
% frequently encounters situations where everything works out except
% that the types of two things don't match, and considerable effort has
% to be spent on adjusting the types, whereas in an untyped setting one
% would be done already.
Another problem with types is that many programming languages use
them as a work-around for having limited vocabularies: rather than
having separate names for what is technically distinct (even if
analogous) operations, multiple operations are bundled together under
the same name in the source, and there is a presumption that the
programmer's exact intentions can be recovered later by analysing the
operand types. There is no doubt that this to a great extent is
inspired by generally accepted practices for writing mathematical
formulae\Ldash in which addition or substraction always denotes the
operation suitable for the given operands (be they numbers, vectors,
matrices, functions, or whatever) and multiplication can (depending
on context) denote almost every operation imaginable\Rdash but that
fails to take into account the formality gap between the two
situations. A computer program is always fully formal (even if some
languages may allow a relaxed-looking syntax) because they are meant
to be parsed by a dumb mechanism. Mathematical formulae, on the other
hand, are primarily meant to be read by humans and may therefore skip
plenty of fine details as long as the intelligent reader will be able
to fill them in. It is true that there is a substantial body of
mathematical notation that pretty much all agree upon (and
which thus could be formalised to everyone's content), but this unity
quickly evaporates once one starts to venture into notational regions
that are traversed chiefly by specific subdomains of mathematics; as
a particular example, one may note that computer science and
differential geometry have very different ideas about such basic
concepts as function and argument! The matter is further complicated
by the fact that many mathematical formulae admit multiple
fundamentally distinct interpretations, often coming from quite
different theories each offering its own understanding of the topic,
and the equal validity of different interpretations can be deep
theorems (nonstandard analysis, anyone?). In formal mathematical
logic, where mathematical formulae \emph{are} required to be fully
formal, symbols are as a rule specified using exact and complete names
throughout.
This does not mean types are useless as theoretical devices. Much of
type theory as a field of computer science is really about proving
simple statements that can be phrased on the form `$x$ is-a $T$',
where the ``type'' $T$ is some collection of claims that may or
may not hold true for the ``value'' $x$. The interface specifications
introduced in the next subsection can be read as providing a great deal
of information that can be phrased in this form. That \mtl\ still
does not have a type system in this epistemic sense is primarily
because all ``type'' information contained in the interface
specifications is \emph{informal}; there is no formal language using
which one has to encode all typing relations. Maybe at some point in
the future will there emerge some language using which one can
conveniently encode all the type information a computing system
could reasonably want to know, and which a compiler would then use to
produce heavily optimised code or a static analyser could use to
verify program correctness, but that remains to be seen; in
particular correctness will, in the problem domain for which \mtl\ is
designed, frequently depend on quite nontrivial mathematical theorems
and thus for practical purposes be beyond what can be automated anyway.
For the moment, it would be extremely premature to try to design a
formal specification language for \mtl, as the main result of
deploying any given language would probably only be that users'
imaginations are constrained by having to fit their designs to
what the specification language supports.
One final aspect of type is the \emph{encoding} aspect, in which a
type is taken as a specification of how some set of values are
encoded in terms of some lower-level data model (such as the bits of
computer memory, characters in a text file, or bytes in a binary file).
This is an aspect which \mtl\ cannot avoid, but decisions in this area
tend to be quite local; each package decides for itself how it prefers
to encode the values it operates on, and other packages generally treat
these values as opaque units. Since many encoding problems are very
far from being original however, it is useful to collect a body of
example solutions to the most common problems (see
Subsection~\ref{Ssec:TclValues}), and one could argue that this
consistutes a sort of type system, even if it is of course
not mandatory. In addition, a package may choose to publicly specify
the encoding of some data, and it is for that as well useful to have a
common body of data encoding principles and techniques for package
authors to employ. Some suggestions along these lines for improving
interoperability between packages can be found in
Subsection~\ref{Ssec:StandardDataEncodings}.
% Another, more conceptual application of types in many modern
% programming languages is for selecting definition of a polymorphic
% operation; practically the same token is used as name of several
% different operations, and the types (which in some cases are
% compile-time properties of variables and in other runtime properties
% of values) of the operation arguments are used to select an exact
% definition. This idea was probably taken from ordinary mathematical
% notation, where it is common that the intended interpretation of an
% operation symbol (or lack thereof, as in $xy$) depends on what the
% operands happen to denote; $G-u$ means one thing if $G$ and $u$ are
% numbers, another thing if they are vectors, and something very
% different if $G$ is a graph and $u$ is a vertex in $G$. While this
% is absolutely standard, one should nonetheless observe that it really
% is a shorthand notation; the standard formalisation of mathematics in
% mathematical logic requires that operation with different definitions
% are given distinct names. In particular the distinction between
% `element' and `set' used to justify shorthands such as $f(S)$ for
% $\left\{ f(x) \bigm\vert x \in S\right\}$ when $S$ is a set break
% down when it is taken into account that mathematics is constructed on
% top of set theory, since \emph{everything} is a set\footnote{
% Or in some axiomatisations: a class; the same argument applies
% for all kinds of collections.
% } in standard set theory; if one were to actually take that shorthand
% seriously as a rule of definition, one would find that every function
% was defined for every mathematical object, and usually not in
% any way that makes practical sense!
% ---
%
%
% For applications where types are convenient, it is perfectly
% straightforward to emulate typing within a subsystem of mathematics
% by simply implementing the type system of one's choice, but it is
% frequently less straightforward to rid oneself of an awkward type
% system that has been allowed to permeate the foundations.
%
%
% ---
\subsection{The \Tcl\ value model}
\label{Ssec:TclValues}
Several references have been made above to `values', without actually
defining what these can be or how they behave. Sadly, it is quite
common for language manuals to dodge this question, since the true
story tends to be both complicated and not all that flattering for
the language designers; the model is frequently ``whatever happened
to get implemented'' rather than something thought out in advance.
\emph{\Tcl\ is different,} in that it has a clearly defined model
whose simplicity, flexibility, and expressive power are befitting of
mathematics. It should however be observed that this model has two
rather different faces: on one hand, there is the formal model
against which program correctness should be judged, and on the other
hand there is the underlying technology, which determines the
computational complexity of algorithm implementations. (There have
unfortunately been some widely disseminated, although by now rather
old, analyses of \Tcl\ whose outdated conclusions about complexity
are based on the formal model alone.)
The formal model is simply this: \emph{everything} (i.e., every
value) \emph{is a string!} Every value has a unique decomposition
into a finite sequence (of length zero or more) of characters, and
two values are equal if and only if they decompose to the same
sequence of characters. Commands are allowed to treat unequal
strings as denoting equivalent values\Ldash for example
`\texttt{0xFF}' and `\texttt{255}' are equivalent to commands
expecting an integer\Rdash but they may not distinguish between equal
values constructed in different ways. This is useful for testing and
debugging, since it has the corollary that ``what you see is what it
got'' with respect to calling of commands; an input value constructed
by some complicated subroutine is always the same as some literal
string. It also means that the value model maps
straightforwardly into several fundamental models of computations,
particularly those of more mathematical flavour such as
recursive functions (on natural numbers) and Turing machines.
The underlying technology instead supports two separate views of
values: the \emph{string representation} of the formal model, and an
\emph{internal representation} adapted to the hardware at hand. The
canonical string representation of a number is for example in
decimal, but the internal representation (which the arithmetic
operations make use of) is in binary. Conversions between the two are
lazy, meaning a particular representation is only generated if
somethings asks for it. Hence most numbers in a well-written \Tcl\
program are born, live, and eventually die possessing only their
binary internal representations; it is sufficient for `everything is
a string' (EIAS) that the string representations could have been
generated.
The set of internal representations is furthermore not closed, but
something that can be extended: many dynamically loadable \Tcl\
extensions define their own internal representation for some kind of
data they provide operations on. The only requirement is really that
they set up two C-level functions: one for deallocating the internal
representation for a value (called when the last reference to that
value goes away), and one function for generating a string
representation for a value given its internal representation, to meet
the requirement that every value potentially can be viewed as a string.
The string representation can be something as raw as a hexdump of the
internal representation, but it can equally well be something more
enduring and readable, such as an OpenMath-XML encoding of the value
in question (in case the value is a mathematical object).
That the internal representation still has to live up to the promises
set by the formal model also has the implication that all values are
immutable: there's no way to change a value. For composite kinds of
values (lists, dictionaries, \dots), there are often operations for
replacing or modifying a part of the value (e.g.~replace the value in
an element of a list by another value), but what these operations
formally do is always that they create a new value which is the same
as the old one, except in the part that was to be modified. The great
practical benefit of this is that if the same composite value (e.g.~a
matrix) is used in several parts of your program, then there is no
way that operations applied to it in one place (e.g.~row operations
to perform Gaussian elimination) can affect the value of the matrix
as seen in other parts of the program. This is often \emph{not} the
case in programming languages which employ pointers or other references
to mutable storage as the primary means of constructing composite
values (most of the mainstream is that way, really); there it is
instead the responsibility of the programmer (indeed, the responsibility
of \emph{every} programmer involved \emph{anywhere} in the codebase) to
restrain themselves to only using operations for immutable values, if
the system as a whole is to provide proper value semantics. That is
actually a rather tall order, and thus a rich source of potential
bugs! But in \Tcl\ you get the correctness of immutable values by
default.
The elementary way to internally live up to this formal requirement
is to make a copy every time some value should be modified, but \Tcl\
typically does better than that. If a value is \emph{shared}
(referenced also from somewhere else) then a copy is made before the
internal representation is modified, but the internal representation
of an unshared value can be modified directly, since the