-
Notifications
You must be signed in to change notification settings - Fork 25
/
chap6.tex
206 lines (170 loc) · 9.2 KB
/
chap6.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
\chapter{Conclusion}
%% A generation of dynamic languages have been designed by trying variants of
%% the class-based object oriented paradigm. This process has been aided by
%% the development of standard techniques (e.g. bytecode VMs) and reusable
%% infrastructure such as code generators, garbage collectors, and whole VMs
%% like the JVM and CLR.
%% It is possible to envision a future generation of languages that generalize
%% this design to set-theoretic subtyping instead of just classes. This next
%% generation will require its own new tools, such as partial evaluators (already
%% under development in PyPy and Truffle). One can also imagine these future
%% language designers wanting reusable program analyses, and tools for
%% developing lattices and their operators.
It is probably not possible to define technical computing precisely.
That is just as well, since being ``domain specific'' is not our goal.
Programming languages want to be general purpose, and evidence suggests
people want them to be as well.
Recent analysis of programming language adoption~\cite{Meyerovich:2013:EAP:2509136.2509515}
found that when a language is popular only within a certain niche, it is not
usually the most popular language in that niche --- broadly popular,
general languages are still preferred.
We speculate that wherever this property fails to hold, there is an opportunity
to improve our general purpose languages.
As noted in the introduction, at least some forms of technical computing,
however imprecisely defined, have surely been just such a case.
This dissertation takes a small step towards understanding this niche in a
more general framework.
%we see it as a motivation, not a target
%want to be more general, not domain specific
%``I've yet to hear anyone explain how you decide what are the boundaries of a `domain-specific' language. Isn't the `domain' mathematics and science itself?''~\cite{bobharperdsl}
%tc is not a domain, and doesnt need to be:
% \cite{meyerovich2012socio} about appealing to some domain first
% technical computing was ripe for such a treatment
We began by observing that technical computing users have priorities
that have rarely been addressed directly by programming languages.
Arguably, even many of the current technical computing languages only
address these priorities for a subset of cases.
% in particular, they want flexible notation, and code specialized
% for information regardless of when that information is known.
In some sense all that was missing was an appropriate notion of
partial information.
Partial information has many uses.
It describes sets of values, and therefore code applicability.
It describes what sorts of information a compiler can determine,
contributing to a performance model.
It leads to staged programming: generating code requires
knowing something about what needs to be computed, but not everything, so that
generated code can be reused enough to be worthwhile.
Of course, this sounds a lot like a type system.
However programmers in our area of interest, and working in dynamic languages
generally, have resisted this.
In our view this can change as more uses of types are discovered and
advocated.
Surely there are many more ways to obtain useful run time behavior based on
partial information.
The rest of this concluding chapter will reflect on what more could be done.
% what's wrong with dynamic languages is not just a lack of static
% guarantees, but that they don't seem to be pareto optimal.
% actually not enough is done with dynamic typing: you give up type
% checking, and all you get is untyped dictionaries in return?
% we know these programs need more organization and type info.
% this is the first truly viable approach
% - bad that they need to think about type stability, but
% also good that is relatively easy to grasp, and it gets
% people thinking about it who wouldn't normally
% - the types give people the tools to deal with it
% - can't pretend that people wont have to worry about this stuff;
% it's fundamental
%exploiting partial information is key.
%dispatch was an especially natural way to do this,
%are there others?
%What else can we do with partial information as part of a programming
%model?
% somehow being dual to ML-family languages; what's hard for them is
% easy for us and vice versa.
\section{Performance}
If we are interested in abstraction, then type specialization is the
first priority for getting performance.
However in a broader view there is much more to performance.
For example ``code selection'' appears in a different guise in the
idea of \emph{algorithmic choice}~\cite{Ansel:2009:PLC:1542476.1542481,Ansel:2014:OEF:2628071.2628092}.
At any point in a program, we might have a choice of several algorithms and
want to automatically pick the fastest one.
Julia's emphasis on writing functions in small pieces and describing
applicability seems naturally suited to this functionality.
A possible approach could be to dispatch on components of an execution plan
object.
High-performance code generators such as Spiral~\cite{Puschel:2004:SGP:1080647.1080687}
often specialize on data size.
Julia makes it easy to represent array sizes in the type system, and the
usual pattern of dynamic dispatch to specialized code could be applicable
in these cases as well.
% immutable arrays, specializing on size
%There are several key aspects of performance programming that our design
%does not directly address, e.g. storage and in-place optimizations.
While Julia includes a distributed computing library, we have not
sufficiently explored shared memory parallelism.
The Cilk model~\cite{Joerg96,Randall98} would be a good match for the
level of performance and abstraction we aim for.
\section{Other future work}
As with any practical system, we must begin the process of trying to get back
what our design initially trades away.
For example, the specialization-based polymorphism we use does not support
separate compilation.
Fortunately that's never stopped anyone before: C++ templates have the same
problem, but separately compiled modules can still be generated.
We plan to do the same, with the same cost of re-analyzing all relevant code
for each module.
Another approach is to use layering instead of separate modules.
Starting with the core language, one next identifies and compiles a useful base
library.
That layer is then considered ``sealed'' and can no longer be extended ---
it effectively becomes part of the language.
Applications using this library can be compiled more efficiently.
More layers can be added (slowest-changing first) to support more specific
use cases.
This is essentially the ``telescoping languages'' approach~\cite{telescoping}.
An ideal approach to higher-order programming in Julia remains somewhat
elusive.
For example the \texttt{map} function in section~\ref{sec:implementingmap}
always returns an array of element type \texttt{Bottom} when the input
array is empty, which is fully defensible but still confusing and surprising
to many programmers.
It is possible the situation could be improved by some syntactic sugar for
nominal arrow types.
Approaches to static type checking are also worth thinking about.
Most importantly, we feel we have the priorities right: first introduce
at least some kind of type information, then gradually add checks and
tooling to improve safety.
For example, a useful alternative form of the language could require
types to match when control flow edges meet.
Tools like \texttt{TypeCheck.jl}~\cite{typecheckjl} can be used to take
better advantage of whatever type information we are able to infer.
%There are many opportunities to incorporate more detailed static analysis.
% higher resolution type info degrades SNR
%% \begin{itemize}
%% \item formalize more details of dispatch, meet, join, widen
%% \item product domains, how to incorporate finer types more smoothly
%% *\item how to incorporate static checks? \cite{typecheckjl}
%% *\item separate compilation
%% *\item function types
%% \item shared memory, cilk
%% \item can we implement BLAS?
%% *\item incorporate autotuning?
%% \end{itemize}
% oscar:
% - what we have now is the ``minimal julia''
% - ``easy to write code that's easy to analyze statically''
% - got non-programmer people to write easy to analyze code
% mention lines of code
%% How much of the past 30 years of handwritten Matlab internals can
%% be autogenerated with a compiler? (A lot)
%% 70kLOC julia, 34kLOC c/c++, 6kLOC scheme
% but this includes a REPL, package manager
% easy polymorphism.
% - starts with a very simple HLL
% - performs well because of underlying mechanisms,
% which then unfold gradually when you want more control over performance
% making everything a method, and everything glued together by types
% gives certain reflective properties.
\section{Project status}
Not only is Julia free software, but we want to make
the most of it by facilitating reading and modifying its code.
Most of Julia is defined in libraries, lowering the barrier to contributing
to more aspects of the system.
There are currently 590 packages listed in our package management system,
and the majority of them are written entirely in Julia.
%over 350 contributors
% something about how ``information hiding'' is anathema to science,
% and coincidentally is not the focus of t.c. languages or rigorous
% functional languages.