-
Notifications
You must be signed in to change notification settings - Fork 61
Manual (to be processed)
Note for @radumereuta (by Grigore): Radu, please revise the sort/operation inference and sort annotations sections. The principle underlying sort/operation inference is very simple:
- Variable sorts and operation overloading are inferred in a way that maximizes the sort of any involved term.
Of course, in the process, all user-provided sort constraints must be respected. Warning/Error messages should be reported if there is no way to achieve the above.
With the above in mind, I think this text in the Sort annotations section is obsolete (and confusing):
The dual is `<:Sort` and forces the inner production to have **exactly**
the type specified (no subsorts). This can be useful sometimes when ambiguous
productions have related sorts. For example, a common ambiguity in K is related
to overloaded lists where we have `Exps ::= List{Exp,","}` and
`Ids ::= List{Id,","}`. Also we have the subsort `Ids < Exps`. For the
following input: `(A, B)<:Exps`, both productions will be considered, but we
can annotate the expression with the desired sort to choose the desired production.
The operation <:Sort
should only require its argument to be parse-able to sort Sort
, which includes to any subsorts of Sort
. If one just writes (A, B)<:Exps
then one gets the expected parsing, because A
and B
and (A,B)
will be inferred their maximal sorts to obey the given constraints, which means Exps
for all; if you want A
and B
to have sort Exp
then you have to say it so explicitly, (A:Exp, B:Exp)<:Exps
, or have other constraints from the outside. Either way, there is no ambiguity here. If one means (A, B)
to parse to Ids Ids
, then one just writes (A, B)<:Ids
; this will also enforce that A
and B
are inferred the sort Ids
.**
The principle underlying sort annotations is the also simple, and I believe the above conventions correctly implement it:
- Sort annotations should allow users to syntactically coerce the parser to choose any operations they want in any context, without a need to use abstract syntax (i.e., the labelled notation).
Radu, as you revise the sort/operation inference and sort annotations sections, please find a nice way to also include the two underlying principles above. I actually believe that the best way to do this is to merge the two sections into only one.
The text in this file will eventually be incorporated within the K reference manual.
The text in this file will eventually be incorporated within the K reference manual.
To test the definition against all the programs in this folder, go to the parent folder where the simple-untyped.k definition is and execute
ktest simple-untyped.k --programs programs_and_results --extension simple --exclude dekker
The --exclude
option tells which programs to exclude from krunning. You may
want to exclude programs which take a lot of time, or which don't terminate.
If you want to skip the PDF poster generation, which takes some time, execute
ktest simple-untyped.k --programs programs_and_results --extension simple --exclude dekker --skip pdf
If you already kompiled your definition and want to only krun the programs:
ktest simple-untyped.k --programs programs_and_results --extension simple --exclude dekker --skip "kompile pdf"
To test your solution, go to the folder where it is located (i.e.,
where the file simple-untyped.k
that you wrote is) and then call the
ktest tool on the config.xml
file (the one in this folder). For
example, if your solution is in the parent folder, to test it go there
and type
ktest tests/config.xml
As another example, if the k distribution is installed at
/home/cs422-Fall-2013/k
and if you solve this exercise
in /home/cs422-Fall-2013/joe/hw3/ex1
, then test it by executing
the following command from the folder where your solution is:
ktest /home/cs422-Fall-2013/k/dist/tutorial/2_languages/1_simple/1_untyped/exercises/break-continue/config.xml
If you want to skip the kompilation and the pdf generation, then use the following command instead (and similarly for the second example above):
ktest tests/config.xml --skip "kompile pdf"
If you want to skip only one of kompile or pdf (typically the pdf), then use
only that one in the --skip
option (e.g., --skip pdf
).
Both the original SIMPLE untyped programs and the programs in this folder will be tested. If your solution is correct, then all tests should pass, except, potentially, the new break-from-function.simple (see the comments there for the reason).
If your solution is somewhere else, then use the -d
option of ktest to give
the directory to where your K definition is located. For example, provided
that your definition is in the parent folder, then to test it you can type
the following command in this folder (i.e., programs_and_results):
ktest config.xml -d ..
To test this definition, you can call ktest on the config.xml file in this
folder. Recall that if you call ktest from any other folder than where
your definition is, then you need the option -d path/to/your/def/directory
.
For example, if you want to call it from this folder, then type
ktest config.xml -d ..
If you want to call it from the parent folder then just type:
ktest results/config.xml
Also recall that if your definition is already kompiled, then you can skip
kompilation to save time using the --skip
kompile option; if you want to skip
both kompilation and pdf generation then use the option --skip "kompile pdf"
.
The general philosophy of ktest is that the config file contains all the static info (where the programs and their results are), but it does not contains the dynamic info (where the actual definition to be tested is). The same philosophy holds when we use include. So all the non-definition attributes need to be statically solved at the place where the include attribute appears. The use of an include attribute is nothing but syntactic sugar, so the user does not have to repeat all that information which has already been given in the included configuration. The include attributes can be desugared completely statically, and thus independently from where the config file containing the include attribute is called.
In short, the inclusion of a config2.xml file into a config1.xml file is equivalent to having a flat config1'.xml file instead of config1.xml, which would behave exactly the same way as config1.xml, wherever ktest may call it from. So imagine that you have that config1'.xml built that way, completely statically (actually, this is probably the way you implement it anyway). Now, the same way I can call ktest on config1'.xml from wherever I want, the same way I want to be able to completely equivalently call ktest on config1.xml from the same place.
For example, suppose that you are somewhere called "the working directory" and that the definition you want to test is at path/lang.k from the working directory. Suppose that for some reason you really need to put your config1'.xml/config1.xml file at path1 from the working directory. Then your config1'.xml file will look something like this:
<test> definition="path/lang.k" programs="P1" ... </test>
where P1 is the relative path from the directory path1 to where the programs are, and you would call it with the command
ktest path1/config1'.xml
Now suppose that a config2.xml file for testing lank.k against the same programs has already been created in the past and that you would like to replace the verbose and redundant config1'.xml with a config1.xml which simply includes config2.xml. For concreteness, assume that config2.xml is at relative path path2 from where config1'.xml/config1.xml is, and that its contents is
<test> definition="lang.k" programs="P2" ... </test>
Then it must be the case that P1=path1/P2. Knowing this, we can call ktest on config2.xml as follows:
ktest path1/path2/config2.xml --directory path
This suggests that we include config2.xml into config1.xml as follows:
<include file="path2/config2.xml" directory="path">
That is, the include attribute in config.xml contains exactly and only the information that I would have to manually pass to ktest to call config2.xml instead of config1'.xml or config1.xml.
However, since configuration files have fixed programs folders, we have to agree on what are the programs corresponding to the include attribute in config1.xml. There is only one reasonable way that can be, namely the same programs folder as in config1'.xml, that is, P1. So in a command like
ktest path1/config1.xml --directory path --programs P
the root directory P is applied to P1 and NOT to P2 (again, think config1'.xml instead of config1.xml). So the above command would be equivalent to
ktest path1/path2/config2.xml --directory path --programs P/path1/P2
In a batch mode, ktest requires a configuration file.
A configuration file can have multiple <test>
elements. A <test>
element
corresponds to a single job mode where each attribute corresponds to the
command line option of the same name.
A <test>
element requires a K definition to be tested. For example, the
following <test>
element just kompiles "lang.k".
<test
definition="lang.k"
/>
Given the above, ktest does nothing but kompilation, but might be useful when you want to only check if your K definition is valid to kompile.
A <test>
element can also have "programs" attributes that specify root
directories of programs to be tested. Note that all nested sub-directories
are considered when searching the programs. Several root directories can be
given, separated by white spaces. A "programs" attribute requires an
"extension" attribute that indicates extensions of programs to be executed.
For example, the following <test>
element kompiles "lang.k" and runs all
programs under the directories "P1" and "P2", whose extension is either "E1"
or "E2".
<test
definition="lang.k"
programs="P1 P2"
extension="E1 E2"
/>
In other words, given the above <test>
element, ktest has the same behavior
with the following bash shell script:
kompile lang.k
for i in `find P1 -name '*.E1' -o -name '*.E2'`
`find P2 -name '*.E1' -o -name '*.E2'`
do
krun $i
done
Note that if there exist multiple files with the same name under the "P1" and "P2" directories, the one in the "P2" is only executed.
An "exclude" attribute specifies which programs not to be executed. A program whose file name contains a pattern given by a "exclude" attribute is excluded for the execution.
A "results" attribute specifies root directories of reference outputs. For each program "pgm.ext", a reference output file "pgm.out" is searched under the directories given by a "results" attribute. As the "programs" attribute, a "results" attribute can have several root directories separated by white spaces, and also, the one from the last root directory is used when multiple reference outputs with the same name exist. The directories given by a "results" attribute are also used when searching an input file of each program.
A <test>
element can have several <kompile-option>
elements, which is used at
the kompilation of the given K definition. The default options of krun can be
specified by a <all-programs>
element that contains multiple <krun-option>
elements. Note that the <all-programs>
element can be given once at most.
Program specific options can be given by <program>
elements. A <program>
element has a "name" attribute and multiple <krun-option>
elements. A program
whose name contains a pattern given by a "name" attribute is executed with the
options specified by the <program>
element.
A <test>
element can have an optional "directory" attribute, which specifies
where the kompiled directory is created. If this attribute is not given, the
kompiled directory is created, by default, at the directory in which the K
definition resides. The "directory" attribute is necessary when you provide
multiple <test>
elements that have the same K definition but have different
<kompile-option>
elements.
To sum, for example, given the following <test>
element:
<test
definition="lang.k"
directory="D"
programs="P1 P2"
extension="E1 E2"
exclude="X1 X2"
results="R1 R2"
/>
<kompile-option name="--O1" value="V1" />
<kompile-option name="--O2" value="V2" />
<all-programs>
<krun-option name="--O3" value="V3" />
<krun-option name="--O4" value="V4" />
</all-programs>
<program name="N1">
<krun-option name="--O5" value="V5" />
<krun-option name="--O6" value="V6" />
</program>
<program name="N2">
<krun-option name="--O7" value="V7" />
<krun-option name="--O8" value="V8" />
</program>
</test>
ktest works as follows:
At first, "lang.k" is kompiled with the options "--O1=V1 --O2=V2", whose kompiled directory "lang-kompiled" is saved at "D". And then, all programs under the "P1" and "P2" directories, whose extension is either "E1" or "E2", are executed with the options "--O3=V3 --O4=V4". However, a program whose name contains "N1" (or "N2, resp.) is executed with another series of options "--O5=V5 --O6=V6" (or "--O7=V7 --O8=V8", resp.). Furthermore, a program whose name contains "X1" or "X2" is excluded in the execution. When a program, say "pgm.ext", is executed, an input file "pgm.in" under the "R1" or "R2", if any, is given as a stdin; and an output file "pgm.out" is used for comparison with the actual stdout. Note that if there exist multiple files with the same name under the "P1" and "P2" (or "R1" and "R2", resp.) directories, the only one in the "P2" (or "R2", resp.) is used.
A configuration file can have multiple <include>
elements as well as <test>
elements. By an <include>
element, you can include another configuration file.
For example, you can include "another_config.xml" file as follows:
<include
file="another_config.xml"
/>
Basically, an <include>
element inherits all attributes and elements from the
included configuration file, which can be, of course, overwritten by
explicitly specifying them, except "definition", "directory", "programs", and
"results" attributes. For example, let a configuration file be:
<include
file="another_config.xml"
exclude="new_exclude"
/>
and the included configuration file "another_config.xml" be:
<test
definition="D"
programs="P"
exclude="X"
/>
then the included <test>
element is virtually same with the following:
<test
definition="D"
programs="P"
exclude="new_exclude"
/>
Although the "programs" and "results" attributes cannot be overwritten, they can be incremented by using "more-programs" and "more-results" attributes. For example, let a configuration file be:
<include
file="config.xml"
more-programs="P"
more-results="R"
/>
and the included configuration file "config.xml" be:
<test
definition="lang.k"
programs="P1 P2"
results="R1 R2"
/>
then the included <test>
element is virtually same with the following:
<test
definition="lang.k"
programs="P1 P2 P"
results="R1 R2 R"
/>
Note that "definition" attribute cannot be overwritten at all.
Also note that an <include>
element can also have "directory", "programs", or
"results" attributes, but they have another meaning: path resolution.
Every path of a configuration file is relative. Three different kinds of paths exist: definition, programs, and results. A root for the relative directories can be specified using command line options. For example, let command line arguments of ktest be:
$ ktest C/config.xml --directory D --programs P --results R
and 'C/config.xml' be:
<test
definition="D1/lang.k"
programs="P1 P2"
results="R1 R2"
/>
then the paths are resolved as follows:
<test
definition="D/D1/lang.k"
programs="P/P1 P/P2"
results="R/R1 R/R2"
/>
Note that the option --directory
is optional, and its default value is the
current directory. Also, the options --programs
and --results
are optional,
and their default values are the directory where the given configuration file
resides. For example, the following command
ktest C/config.xml
is virtually same with the following:
ktest C/config.xml --directory . --programs C --results C
The root can be also specified using <include>
's attributes. The same rule is
applied here. For example, let a configuration file 'C/config.xml' have the
following <include>
element:
C/config.xml:
<include
file=C1/config1.xml
directory=D
programs=P
results=R
/>
and the included configuration file 'C/C1/config1.xml' be:
C/C1/config1.xml:
<test
definition="D1/lang.k"
programs="P1 P2"
results="R1 R2"
/>
then the <include>
element is resolved as follows:
C/config.xml:
<test
definition="D/D1/lang.k"
programs="P/P1 P/P2"
results="R/R1 R/R2"
/>
Note that the attribute 'definition' is optional, and its default value is the
current directory. Also the attributes 'programs' and 'results' are optional,
and their default values are the directory where the included configuration
file resides. For example, the following <include>
element
<include
file=C1/config1.xml
/>
is vitally same with the following:
<include
file=C1/config1.xml
directory=.
programs=C1
results=C1
/>
To sum, the roots given by command line options and <include>
's attributes are
accumulated to resolve the relative paths. For example, let command line
argument of ktest be:
$ ktest C/config.xml --directory D --programs P --results R
and configuration file 'C/config.xml' be:
C/config.xml:
<include
file=C1/config1.xml
directory=D0
programs=P0
results=R0
/>
and the included configuration file 'C/C1/config1.xml' be:
C/C1/config1.xml:
<test
definition="D1/lang.k"
programs="P1 P2"
results="R1 R2"
/>
then the paths are eventually resolved as follows:
<test
definition="D/D0/D1/lang.k"
programs="P/P0/P1 P/P0/P2"
results="R/R0/R1 R/R0/R2"
/>
Note that when they are not specified, the default roots are:
D = .
P = C
R = C
D0 = .
P0 = C1
R0 = C1
ktest has a feature of automatic generation of output (*.out) files. This
feature consists of two options: —-update-out
and --generate-out
.
ktest --update-out
When this option is enabled, ktest updates every output files by newly generated output of krun. This option creates a new set of reference output files, discarding the previous one. This option is useful when you want to change the format of output files due to the version update of krun.
Use this option at your own risk. With this option, ktest does not check the correctness of krun's output.
ktest —-generate-out
When this option is enabled, ktest generates a new output file at a new place whenever krun's output is different from the existing output file. The new output file is created in the last directory of the given 'results' attributes.
This option is useful when you create a new K definition and want to construct a new set of test programs and outputs for the K definition.
NOTE:
- A new output file is also created when no previous output file exists.
- A previous output file could be updated if the last directory of the 'results' attributes is the same with the directory that the previous output file resides.
NOTE:
These two options make ktest run in sequential mode, ignoring the --threads
option of ktest. Since multiple tests can share a single output file,
generating/updating output files should be done sequentially. Otherwise,
parallel updates may cause a race condition for the shared output files.
Each test element contains three steps: kompile, pdf, and krun. Each steps can be skipped by using either a command line option or an attribute of config.xml file.
In order to skip a step for the entire test suite, you can use a command line
option --skip <steps>
. For example, the following command runs ktest without
generating PDFs for all test elements:
$ ktest --skip pdf config.xml
You can also skip multiple steps by enumerating them. For example, the following command runs only the krun steps for all tests:
$ ktest --skip "kompile pdf" config.xml
In order to skip a step for each test element, you can use an attribute of config.xml file 'skip'. For example, the following test element skips the PDF generation.
<test
...
skip="pdf"
...
/>
You can also skip multiple steps by enumerating them as follows:
<test
...
skip="kompile pdf"
...
/>
All of kompile/krun options such as <kompile-option>
, <all-programs>
, <program>
elements in <include>
overwrite the included counterparts. That is, this behavior is not a bug but a design.
The rational of this design is that it will be too difficult for users to keep track of all those options when we allow them to be extended. It would be much tricker when some of extended options are conflicted with the previous ones. In that case, we will need a way to describe how to resolve those conflict, which will make ktest much more complex.
Note that such an extension is only allowed in more-programs
and more-results
which extend the previous programs
and results
attributes. They cannot have a conflict, because it only add the more-programs/results
to the previous programs/results
.
Of course, I admit that this design is not consistent in some sense, although it is our best at that moment. It will be great if you suggest a more general and simpler design.
ktest is much more complex than your quick description. The program options are necessary to everybody who writes non-trivial definitions, not only to the adventurous. And so is the capability to exclude things. And the programs and results can be multiple folders, not only one. The directory
and the programs
/results
have different meaning and semantics, they should never be put into the same category. K definitions change, evolve, while the programs and their outputs stay mostly the same; sometimes they change, too, which is why you want to have the capability to override them, too. But most of the time they don't. For example, you have lots of programs and tests for n language features. Now you add feature number n+1. You want to reuse all the previous programs and their outputs, but with a different language definition.
The semantics of a ktest config file config.xml is the following pair:
[[config.xml]] = ( { (p1,i1,o1), (p2,i2,o2), ..., (pn,in,on) }, lang.k )
The first element is a set of triples (program,input,output), namely the actual tests to be performed against a K definition. The second argument is the actual definition. Based on this domain, we can give semantics to all ktest's constructs. For example, [[include config.xml]]
adds to the semantics of the current configuration file all the triples (program,input,output) from the semantics of config.xml, but ignores the definition of config.xml, because we want to test the current definition not the old one. Other constructs, like more-programs
, exclude
, programs
, etc., allow you to modify the set of triples (program,input,output) in a compact way. This was derived by handling tens of definitions and exercises.
We may want an additional include tag, maybe import
, which would include also the definition, not only the programs/inputs/outputs. A couple of people I talked to thought that was the meaning of include
. Having two different ways to import tests will at least make them reflect on the difference between the two and hopefully choose the right one for their needs.
exclude = ""
actually means excluding EVERYTHING.
exclude
is simply a pattern that is searched in FULL FILE PATHS.
The exclude
string it's not a glob or regex pattern, it's just a character string.
Since there was also someone else confused by this, I think it's a good idea to either change this behavior or improve help message.
If you want to write something to a file at once, you can go with
#open
and #write
.
For example,
rule <k> run => writeFile("file", "content") ... </k>
syntax K ::= writeFile(String,String)
rule writeFile(File:String, Content:String)
=> writeFD(#open(File), Content)
syntax K ::= writeFD(Int,String)
rule writeFD(FD:Int, Content:String)
=> #write(FD, Content) ~> #close(FD)
TODO: how to read from a file
Suppose that you want to parse configuration variables (say CFG1, CFG2, ...) with different parsers ('parser1', 'parser2', ...). Here is how you can set a parser for each configuration variable:
-$ krun -config-var-parser="parser1" -cCFG1="some term" -cCFG3="another term"
-config-var-parser="parser2" -cCFG2="other term"
Note that once you have setup parser1 then it is active until -config-var-parser option is used again. That is, in our example, the values of CFG1 and CFG3 are parsed using parser1 and the value of CFG2 is parser using parser2.
This feature allows one to customize strictness for all positions or for individual positions.
Example 1:
syntax Exp ::= Exp "+" Exp [left, strict<k>(all(context('rvalue)))]
syntax K ::= rvalue(K) [strict,context(result(RVal))]
syntax RVal ::= Int
The first line declares "+" strict in the <k>
cell, on all arguments, wrapping all of them for evaluation with the 'rvalue context wrapper.
The second line declares rvalue which is itself strict, and which when used as a context cools for results of type RVal (defined on the third line).
Example 2:
syntax Exp ::= Exp "=" Exp [right, strict(1, 2(context(rvalue))), assign]
syntax Exp ::= "rvalue" "(" Exp ")" [context(result(RVal)), strict, klabel(rvalue)]
The first line declares "=" strict (in the <k>
cell), evaluating the
first argument to the default computation result type (KResult) and the
second argument to the default wrapped with the rvalue context wrapper (note
the klabel annotation on line 2).
General syntax:
StrictAttribute ::= StrictHead | StrictHead "(" StrictPositions ")"
StrictHead ::= StrictType | StrictType "<" CellNames ">"
StrictType ::= "strict" | "seqstrict"
CellNames ::= List{CellName,","}
StrictPositions ::= List{StrictPosition, ","}
StrictPosition ::= Position | Posistion "(" StrictPositionAttributes ")"
Position ::= Int | "all" | "other"
StrictPositionAttributes ::= List{StrictPositionAttribute, ","}
StrictPositionAttribute ::= "context" "(" KLabel ")"
| ContextPositionAttribute
ContextPositionAttribute ::= "result" "(" KSort ")"
| "cell" "(" CellNames ")"
ContextContextAttribute ::= "context" "(" ContextPositionAttributes ")"
ContextPositionAttributes ::= List{ContextPositionAttribute, ","}
The semantics when transforming these into context is based on the following principles
- the "<" CellNames ">" arguments of StrictHead are transformed into "cell" "(" CellNames ")" attributes for all positions.
- StrictPosition declarations are processed in order. If duplicates are found, the latter update the attributes values they redeclare.
- "all" refers to all positions, updating the existing ones and adding those missing
- "other" refers to all positions which have not been declared up to encountering "other"
- If "context" attribute is found, the RHS of the context rule is wrapped with the corresponding label and the subattributes of the "context" attribute are used to update the current attributes.
- The attributes passed to the context rule for a given position are obtained by updating the production attributes with the StrictPositionAttributes (perhaps updated using the context attributes as specified at (5)) corresponding to that position.
Note: Currently, generalized strictness does not apply to strict labels
What is the difference between 'macro', 'anywhere', and 'function'? The generated maude code seems to be same except metadata. Who makes, if any, a difference?
They all generate the same code now. However, they are semantically different:
macro should only be used for desugaring, and should be applied to programs before any evaluation is started.
function should only be used for annotating function declarations
anywhere should be used for rules which we want to apply anywhere, disabling configuration concretization for them.
Function rules are anywhere rules, but anywhere rules are not constrained to functions.
The [function] attribute can be attached to syntax declarations. All the rules corresponding to that function are annotated automatically as [anywhere] which is only a rule attribute. However, if we want to say that a rule applies anywhere without being related to a function declaration, we simply annotate it with [anywhere] and the contextual transformation will not affect that rule. Therefore the functionality of [function] is implemented using [anywhere], and [anywhere] is an attribute which does not fill the context for that rule.
Does desugaring also apply to rules?
It surely applies to programs. Concerning rules, that has been debated in the past, but I'm not sure we reached a definite conclusion.
The best way to understand these is to forget about the Maude backend. They translate the same way in Maude because Maude does nor provide us with the subtle differences that we need here.
macro: yes, they are supposed to apply statically, before any rewriting starts, both on the program and on the semantic rules. So in terms of performance, they are supposed to not count (i.e., their amortized cost is zero).
function: is a label has this attribute, then it is as if it is a builtin function, the only difference being that it is "implemented" using K rules. We even considered changing the language in which functions are implemented, and following a functional style instead. So they are meant to be very fast, almost instantaneous, when compared to the other rules. Different backends are free to implement the functions different ways. The symbolic execution engine also considers than as part of the axiomatization of the underlying domain in which semantic rules are applied. So mathematically speaking, functions do not count as semantic computational steps.
anywhere: these are just rules which can apply anywhere they match, so they are expected to be comparatively very slow. Their application counts as computational steps, so they are interleaved etc. in formal analyses. It is hard to reason with them in reachability logic. One should avoid such rules by any means if possible, and use strictness and normal rules instead.
Currently the compilation process assumes that all KLabels having 'is' as a prefix are of the form 'isSort' where Sort is the name of a syntactic category. Therefore users defining their own labels should be aware of this and understand that defining and using an isXXX label which is not related to a sort XXX will result in undefined behavior.
Is done using the require
keyword at the beginning of a file, followed by a string with the path of the file, absolute, or relative to the current file. If the required file is not found in the relative path, then it is looked for in /k/include
directory, where the framework stores the predefined builtin files. The names are solved by the JVM so relative paths should be portable on Windows, Linux, and OSX.
During kompilation, some predefined information is needed in order to parse a definition. All this information is stored in files in the /include
directory from the distribution. The autoincluded.k
is the root file and it is always required, unless the --no-prelude
option is provided. From that file, the AUTO-INCLUDED-MODULE
is automatically added to all the user defined modules. The single exception is the separate syntax module (usually main module suffixed by -SYNTAX
, or specified in the command line). In this case only the AUTO-INCLUDED-MODULE-SYNTAX
module is added, which contains only a minimal amount of syntax usually required to define a programming language. The user can always extend this behavior by manually adding the required modules, but the program parser may be corrupted with unwanted syntactic productions.
Rewrite rules, are by far the hardest part of a definition to parse. It combines at runtime the grammar of the defined language, and the grammar of K. This means the resulting grammar will most likely be ambiguous. The parser has some builtin disambiguation filters which try to find a parse which validate the following conditions:
-
=>
(rewrite) is greedy (has the lowest priority) -
~>
has the lowest priority after rewrite - casting has the highest priority
- named variables must have the same sort on every location.
- for overloaded operators we consider only the most general production that fits in the current context. Two productions are considered to be overloaded iff all terminals match, and iff all non-terminal sorts are subsorted, or equal.
The variable type inferencer tries to maximize the sorts for each variable in order to make the rule match as many cases as possible. Example:
rule A + B => A +Int B
Commonly found in definitions, is the semantics of addition. In the left hand side, A and B can have sort Exp
, but on the right hand side can only be Int
. The final sorts inferred by default will be Int
. The default inferred sorts for variables, can be seen by running kompile
with the -w all
option.
To make defining languages easier and small for simple definitions, the terminals in productions are automatically rejected from being any lexical production. This is a simple way to ensure that keywords don't produce ambiguities. Example:
Stmt ::= "return" ";"
| Exp ";"
Exp ::= Id ...
Id ::= Token{[a-zA-Z][a-zA-Z0-9]*}
Because "return" can match the regular expression of Id, it may produce ambiguities. The solution is to say that "return" is not allowed to parse as an Id. There might be cases when the default behavior is not wanted. It can be simply be overridden by declaring another time the desired production:
Id ::= "return"
An additional way to reject unwanted parses, is to write:
Float ::= Token{[0-9]* "." [0-9]*}
Float ::= "." [reject]
This will reject "." only from the Float category. It will not affect other sorts. The default behavior can however be overridden by adding the attribute 'noAutoReject' to a token declaration:
Bubble ::= Token{~[\n\r\t]+} [noAutoReject]
This will no longer reject the terminals in the definition, and the user will have to reject the desired keywords, manually.
Suppose that you want to define your own floating point numbers.
For example, suppose that a.k
is given as follows:
$ cat a.k
module A
syntax MyFloat ::= Token{ "Infinity" } [notInRules, notInGround]
rule S:String => #parse(S,"MyFloat")
endmodule
And 1.a
:
$ cat 1.a
"Infinity"
Then, krun
with the java backend results in #noparse
:
$ ~/dev/k/bin/kompile --backend java a.k
$ ~/dev/k/bin/krun 1.a
[1, 20.47 ms]
<k>
#noparse
</k>
This is because the built-in float.k
contains the following keyword declaration,
syntax #Float ::= "NaN" | "Infinity"
which triggers the automatic keyword rejection, rejecting "Infinity"
to be parsed as MyFloat
.
In order to fix this problem, you can use noAutoReject
attribute for MyFloat
:
syntax MyFloat ::= Token{ "Infinity" } [notInRules, notInGround, noAutoReject]
TODO(Cosmin): implement it properly using strategies
When I kompile these three sets of rules, the java backend will result in non-termination:
non-termination due to sorts:
module C-LTLMC
syntax C ::= TypeSpecifier
syntax TypeSpecifier ::= "Void"
syntax Expression ::= "-" K [prec(22)]
| "+" K [prec(22)]
| "!" K [prec(22)]
| "~" K [prec(22)]
| "*" K [prec(22)]
| "&" K [strict, prec(22)]
syntax CId ::= "#NoName" // new
syntax C ::= CId
syntax Expression ::= CId
syntax Bits ::= Int
syntax SymLoc ::= "NullPointer"
syntax SymBase ::= string(String)
syntax CSize ::= Int
syntax CValue ::= CSize
syntax Statement ::= "loopMarked"
syntax Type ::= type(K) [function]
syntax FieldInfo ::= fieldInfo(List, Map, Map)
// Ltl atomic propositions.
syntax ValueProp ::= Int
| #ptr(SymLoc, Type)
| #struct(List, FieldInfo)
| #union(List, FieldInfo)
syntax CProp ::= ValueProp | CId
| "-" CProp
| "+" CProp
| "*" CProp
> CProp "[" CProp "]"
> CProp "+" CProp [left]
| CProp "-" CProp [left]
syntax Prop ::= CProp
syntax ValueProp ::= ltlRVal(Bag, Prop) [function]
rule ltlRVal(_, V:ValueProp) => V
rule ltlRVal(B:Bag, - P:CProp)
=> ltlRVal(B, 0 - ltlRVal(B, P))
rule ltlRVal(B:Bag, + P:CProp)
=> ltlRVal(B, P)
rule ltlRVal(B:Bag, L:CProp[R:CProp])
=> ltlRVal(B, * (L + R))
rule ltlRVal(B:Bag, X:CId) => ltlRVal(B, * ltlLVal(B, X))
// Not so interested in l-values.
syntax ValueProp ::= ltlLVal(Bag, CId) [function]
syntax ValueProp ::= "ltlLVal'" "(" Bag "," String ")" [function]
endmodule
module C-LTLMC
syntax C ::= AttributeWrapper(K, K) [klabel('AttributeWrapper)]
syntax Expression ::= "-" K [prec(22)]
| "+" K [prec(22)]
| "!" K [prec(22)]
| "~" K [prec(22)]
| "*" K [prec(22)]
| "&" K [strict, prec(22)]
syntax CId ::= "#NoName" // new
syntax C ::= CId
syntax Expression ::= CId
syntax Bits ::= Int
syntax SymLoc ::= "NullPointer"
syntax SymBase ::= string(String)
syntax CSize ::= Int
syntax CValue ::= CSize
syntax Statement ::= "loopMarked"
syntax Type ::= type(K) [function]
syntax FieldInfo ::= fieldInfo(List, Map, Map)
// Ltl atomic propositions.
syntax ValueProp ::= Int
| #ptr(SymLoc, Type)
| #struct(List, FieldInfo)
| #union(List, FieldInfo)
syntax CProp ::= ValueProp | CId
| "-" CProp
| "+" CProp
| "*" CProp
> CProp "[" CProp "]"
> CProp "+" CProp [left]
| CProp "-" CProp [left]
syntax Prop ::= CProp
syntax ValueProp ::= ltlRVal(Bag, Prop) [function]
rule ltlRVal(_, V:ValueProp) => V
rule ltlRVal(B:Bag, - P:CProp)
=> ltlRVal(B, 0 - ltlRVal(B, P))
rule ltlRVal(B:Bag, + P:CProp)
=> ltlRVal(B, P)
rule ltlRVal(B:Bag, L:CProp[R:CProp])
=> ltlRVal(B, * (L + R))
rule ltlRVal(B:Bag, X:CId) => ltlRVal(B, * ltlLVal(B, X))
// Not so interested in l-values.
syntax ValueProp ::= ltlLVal(Bag, CId) [function]
syntax ValueProp ::= "ltlLVal'" "(" Bag "," String ")" [function]
endmodule
module C-LTLMC
syntax C ::= TypeSpecifier
syntax TypeSpecifier ::= TAtomic(K, K)
[klabel('TAtomic)]
syntax Expression ::= "-" K [prec(22)]
| "+" K [prec(22)]
| "!" K [prec(22)]
| "~" K [prec(22)]
| "*" K [prec(22)]
| "&" K [strict, prec(22)]
syntax CId ::= "#NoName" // new
syntax C ::= CId
syntax Expression ::= CId
syntax Bits ::= Int
syntax SymLoc ::= "NullPointer"
syntax SymBase ::= string(String)
syntax CSize ::= Int
syntax CValue ::= CSize
syntax Statement ::= "loopMarked"
syntax Type ::= type(K) [function]
syntax FieldInfo ::= fieldInfo(List, Map, Map)
// Ltl atomic propositions.
syntax ValueProp ::= Int
| #ptr(SymLoc, Type)
| #struct(List, FieldInfo)
| #union(List, FieldInfo)
syntax CProp ::= ValueProp | CId
| "-" CProp
| "+" CProp
| "*" CProp
> CProp "[" CProp "]"
> CProp "+" CProp [left]
| CProp "-" CProp [left]
syntax Prop ::= CProp
syntax ValueProp ::= ltlRVal(Bag, Prop) [function]
rule ltlRVal(_, V:ValueProp) => V
rule ltlRVal(B:Bag, - P:CProp)
=> ltlRVal(B, 0 - ltlRVal(B, P))
rule ltlRVal(B:Bag, + P:CProp)
=> ltlRVal(B, P)
rule ltlRVal(B:Bag, L:CProp[R:CProp])
=> ltlRVal(B, * (L + R))
rule ltlRVal(B:Bag, X:CId) => ltlRVal(B, * ltlLVal(B, X))
// Not so interested in l-values.
syntax ValueProp ::= ltlLVal(Bag, CId) [function]
syntax ValueProp ::= "ltlLVal'" "(" Bag "," String ")" [function]
endmodule
non-termination due to macros:
module C-SYNTAX
syntax C ::= CId
syntax CId ::= "#NoName" // new
syntax Expression ::= "*" K [prec(22)]
> K "+" K [prec(33), left]
syntax Expression ::= CId
syntax Expression ::= K "[" K "]"
syntax Expression ::= K "." CId
syntax Expression ::= K "->" CId
rule K:K -> F:CId => (* K) . F [macro, structural]
rule E1:K[E2:K] => *(E1 + E2)
[macro, structural]
endmodule
module C-LTLMC
imports C-SYNTAX
syntax ValueProp ::= Int
syntax CProp ::= ValueProp | CId
| "*" CProp
> CProp "[" CProp "]"
| CProp "." CId
| CProp "->" CId
> CProp "+" CProp [left]
syntax Prop ::= CProp
syntax ValueProp ::= ltlRVal(Bag, Prop) [function]
rule ltlRVal(_, V:ValueProp) => V
rule ltlRVal(B:Bag, L:CProp[R:CProp])
=> ltlRVal(B, * (L + R))
endmodule
The key feature to cause this non-termination is as follow: First, I have a sort named "CProp" and have subsort "ValueProp". There are two functions under the subsort "ValueProp", which are very similar.
Second, The Identifier sort "CId" is a subsort of a lot of sorts.
Third, I have two groups of expressions which belong to two different sorts, and they have common operators.
Fourth, I have a sort "C", such that "CId" is a subsort of it and there is another Sort "TypeSpecifier" which is a subsort of "C" and it has some operators which will not be used in anywhere in the context.
When I run c semantics with command:
krun -cOPTIONS="(SetItem(NOLINK) .Set)" -cOBJ1=".Bag" -cOBJ2=".Bag" -d c11-translation-kompiled ctype.c11-translation
It works fine in some small programs, however if I run command:
krun -cOPTIONS="(SetItem(NOLINK) .Set)" -cOBJ1=".Bag" -cOBJ2=".Bag" --parser "kast --parser ground" --parser="cat" -d c11-translation-kompiled ctype.c11-translation
The java backend krun will complain. Something must be wrong here. The output is like:
Exception in thread "main" java.lang.ClassCastException: org.kframework.kil.BackendTerm cannot be cast to org.kframework.backend.java.kil.Term
at org.kframework.backend.java.symbolic.KILtoBackendJavaKILTransformer.visit(KILtoBackendJavaKILTransformer.java:240)
at org.kframework.backend.java.symbolic.KILtoBackendJavaKILTransformer.visit(KILtoBackendJavaKILTransformer.java:78)
at org.kframework.kil.KSequence.accept(KSequence.java:46)
at org.kframework.kil.AbstractVisitor.visitNode(AbstractVisitor.java:92)
at org.kframework.backend.java.symbolic.KILtoBackendJavaKILTransformer.visit(KILtoBackendJavaKILTransformer.java:279)
at org.kframework.backend.java.symbolic.KILtoBackendJavaKILTransformer.visit(KILtoBackendJavaKILTransformer.java:78)
at org.kframework.kil.Cell.accept(Cell.java:287)
at org.kframework.kil.AbstractVisitor.visitNode(AbstractVisitor.java:92)
at org.kframework.backend.java.symbolic.KILtoBackendJavaKILTransformer.visit(KILtoBackendJavaKILTransformer.java:326)
at org.kframework.backend.java.symbolic.KILtoBackendJavaKILTransformer.visit(KILtoBackendJavaKILTransformer.java:78)
at org.kframework.kil.Bag.accept(Bag.java:69)
at org.kframework.kil.AbstractVisitor.visitNode(AbstractVisitor.java:92)
at org.kframework.backend.java.symbolic.KILtoBackendJavaKILTransformer.visit(KILtoBackendJavaKILTransformer.java:269)
at org.kframework.backend.java.symbolic.KILtoBackendJavaKILTransformer.visit(KILtoBackendJavaKILTransformer.java:78)
at org.kframework.kil.Cell.accept(Cell.java:287)
at org.kframework.kil.AbstractVisitor.visitNode(AbstractVisitor.java:92)
at org.kframework.backend.java.symbolic.KILtoBackendJavaKILTransformer.transformTerm(KILtoBackendJavaKILTransformer.java:134)
at org.kframework.backend.java.kil.Term.of(Term.java:50)
at org.kframework.backend.java.symbolic.JavaSymbolicKRun.javaKILRun(JavaSymbolicKRun.java:96)
at org.kframework.backend.java.symbolic.JavaSymbolicKRun.internalRun(JavaSymbolicKRun.java:81)
at org.kframework.backend.java.symbolic.JavaSymbolicKRun.run(JavaSymbolicKRun.java:76)
at org.kframework.krun.Main.normalExecution(Main.java:372)
at org.kframework.krun.Main.execute_Krun(Main.java:1454)
at org.kframework.main.Main.main(Main.java:64)
You can use #tokenToString and #parseToken, built-in functions for conversion between tokens and strings. For example,
syntax Name ::= Token{[\_\$A-Za-z][\_\$A-Za-z0-9]*} [notInRules]
syntax String ::= StringOfName(Name) [function, notInPrograms]
rule StringOfName(N:Name) => #tokenToString(N)
syntax Name ::= NameOfString(String) [function, notInPrograms]
rule NameOfString(S:String) => #parseToken("Name", S)
Once you get the string representation of the token, you can use the built-in string library functions to manipulate the string, such as substrString, findString, replaceAll, lengthString, Char2String, and String2Char.
Here are the declarations that use the enriched syntax for attributes:
syntax Exp ::= "*" Exp [strict(all(context(rvalue), result(Ref))), indirect]
> Exp "+" Exp [strict(all(context(rvalue), result(RVal))), left, plus]
> Exp "=" Exp [strict(1, 2(context(rvalue), result(RVal))), right, assign]
syntax KItem ::= "rvalue" "(" K ")" [context(result(Val)), strict, klabel(rvalue)]
Without the support for the above declarations, a user must explicitly write the heating&cooling rules:
syntax Exp ::= "*" Exp [ indirect]
> Exp "+" Exp [left, plus]
> Exp "=" Exp [right, assign]
syntax KItem ::= "rvalue" "(" K ")" [strict, klabel(rvalue)]
rule rvalue(K:K) => K ~> rvalue(HOLE)
requires (isVal(K) =/=K true)
[heat]
rule K:KItem ~> rvalue(HOLE) => rvalue(K)
requires isVal(K)
[cool]
rule * K:KItem => rvalue(K) ~> * HOLE
requires isRef(K) =/=K true
[heat]
rule rvalue(K) ~> * HOLE => * K
requires isRef(K)
[cool]
rule KL:K + KR:K => rvalue(KL) ~> HOLE + KR
requires isRVal(KL) =/=K true
[heat]
rule rvalue(KL:K) ~> HOLE + KR:K => KL + KR
requires isRVal(KL)
[cool]
rule KL:K + KR:K => rvalue(KR) ~> KL + HOLE
requires isRVal(KR) =/=K true
[heat]
rule rvalue(KR:K) ~> KL:K + HOLE => KL + KR
requires isRVal(KR)
[cool]
context HOLE = _
rule KL:K = KR:K => rvalue(KR) ~> KL = HOLE
requires isRVal(KR) =/=K true
[heat]
rule rvalue(KR:K) ~> KL:K = HOLE => KL = KR
requires isRVal(KR)
[cool]
We see that for each binary operator we have to write four rules. For 10 such operators, there are 40 rules. In C and C++ there more than 10 operators.
The meaning of the enriched syntax can be briefly explained as follows: The following declarations describe the situation with a binary operator that requires special strict attributes for each argument:
syntax Exp ::= Exp "op" Exp [strict(1(context(ctx1),result(Res1)),
2(context(ctx2),result(Res2))),
left, oplabel]
syntax KItem ::= ctx1(K) [context(result(ResCtx1)), strict, klabel(ctx1)]
syntax KItem ::= ctx2(K) [context(result(ResCtx2)), strict, klabel(ctx2)]
This is equivalent to the following declarations and rules (they are self-explained):
syntax Exp ::= Exp "op" Exp [left, oplabel]
syntax KItem ::= ctx1(K) [klabel(ctx1)]
syntax KItem ::= ctx2(K) [klabel(ctx2)]
//heating&cooling rules for the operator
rule KL op KR => ctx1(KL) ~> HOLE op KR requires isRes1(KL) =/=K true [heat]
rule ctx1(KL) ~> HOLE op KR => ctx1(KL) op KR requires isRes1(KL) [cool]
rule KL op KR => ctx2(KR) ~> KL op HOLE requires isRes2(KR) =/=K true [heat]
rule ctx2(KR) ~> KL op HOLE => KL op KR requires isRes2(KR) [cool]
//heating&cooling rules for the contexts
rule ctx1(K) => K ~> ctx1(HOLE) requires isResCtx1(KL) =/=K true
rule K ~> ctx1(HOLE) => ctx1(K) requires isResCtx1(KL)
rule ctx2(K) => K ~> ctx2(HOLE) requires isResCtx2(KL) =/=K true
rule K ~> ctx2(HOLE) => ctx2(K) requires isResCtx2(KL)
There is the constraint as Res1 < ResCtx1
and Res1 < ResCtx1
.
Of course, we may have ctx1 equal to ctx2 if ResCtx1 = ResCtx2.
I think that mechanism is simple and it can be implemented in K, not necessarily in this form.
To run K with nailgun, set the K_NAILGUN environment variable, and then spawn a server process to handle requests using the kserver
command. It is recommended that the user start the server with a high timeout value to avoid disconnects which occur due to server latency. For example, to listen on port 2113 (the default) with a timeout of 100 seconds, run kserver 2113 100000
.
Following the start of the server, users can run commands against the server exactly identical to if they were running commands locally, so long as the K_NAILGUN variable is set. If it is not set, it defaults to running the command in a new process.
It is encouraged that users run large test suites using nailgun, as the increase in speed associated with decreased startup costs per process is quite dramatic and can reach as high as 7-8x faster krun on small programs.
The !X:Sort
notation generates fresh Sort
constants based on an integer counter. Internally, the backend manages the counter and passes it to a function that returns the fresh constant. By default, K provides such functions only for Int
and Id
. One can define such functions for other sorts as follows:
syntax Sort ::= freshSort(Int) [freshGenerator, function]
rule freshSort(I:Int) => ...
For instance, the function for Id
is:
rule freshId(I:Int) => #parseToken("Id", "#" +String Int2String(I))
The convention is that fresh constants start with # and fresh variables with _, to keep them distinct.
If you declare a production with no nonterminals of sort KLabel, this is a KLabel constant declaration. A KLabel constant declaration is treated the same as a declaration of a KItem, except that it does not have information pertaining to the arity and sort of its children. If sort information in required, they are assumed to be of sort K, and if the arity is required, an error is thrown unless the production is annotated with an "arity" attribute.
For example,
syntax KLabel ::= "foo" [arity(2)]
Is treated the same as the production
syntax KItem ::= foo(K, K) [klabel(foo)]
Annotating a term with :Sort
translates to a semantic check, so the rule won't apply if the subterm does not have the expected sort or subsort.
There are other sort annotations which are purely syntactic and only affect how the parsed terms are being statically type checked. To change the sort of a production, the suffix :>Sort
can be used. The inner term can have any sort, but the production can be placed anywhere Sort
is expected: A:>Exp + B
will allow A to have any sort, but B has to be a subsort of Exp.
The dual is <:Sort
and forces the inner production to have exactly the type specified (no subsorts). This can be useful sometimes when ambiguous productions have related sorts. For example, a common ambiguity in K is related to overloaded lists where we have Exps ::= List{Exp,","}
and Ids ::= List{Id,","}
. Also we have the subsort Ids < Exps
. For the following input: (A, B)<:Exps
, both productions will be considered, but we can annotate the expression with the desired sort to choose the desired production.
Combining the two behaviors can be done with the production ::Sort
or <:>Sort
. In practice these annotations are productions of the form:
Sort ::= Sort "::Sort"
Sort ::= Sort "<:>Sort"
K ::= Sort "<:Sort"
Sort ::= K ":>Sort"
and have the highest priority of all productions (A + B:Sort
will apply only to B
).
To generate a pdf of a compiled K definition, run the command kdoc
from within the directory in which the -kompiled
directory is located. For more information on the options that can be specified to kdoc, run kdoc --help
.
To execute a program symbolically using unification instead of pattern-matching, simply add the --symbolic-execution
flag to krun.
To determine why a particular rule (for example, one at line 50 of file.k), did not apply at a particular numbered rewrite step, (for example, step 100), you can use the following command:
krun <program> --audit-file file.k --audit-line 50 --audit-step 100
When we add the appropriate check, it will be an error for macros to have side conditions.
Here is a high-level design of the IO capabilities of K:
-
Cells can be linked to streams, which can come from stdin, stdout, sockets, files, etc., and in general from any URIs. This link-age of a cell to a stream is the only thing that the user has to do, and that is done when the configuration is defined.
-
Rules can match (or read) or write items in those cells, just like in any other cells. When a rule matches (or reads), a term of a certain sort in a streamed cell, for example, the parser will be invoked to read an appropriate sequence of characters from the streamed cell. When a rule writes a term of a certain sort to a streamed cell, then the pretty-printer will be invoked to stream than term as a sequence of characters.
No, this is not doomed to be slow!
It will be slow only if implemented naively.
There is no reason to not "parse" primitive values in an input stream using the fastest POSIX or whatever specialized library functions are available on the current platform, and similarly for writing things to cells.
If one wants to read an expression of sort Exp
or a program of sort Pgm
, well, that's a different story, and things can be slower in that case (it depends of the complexity of the grammar; see, for example, our paper on monitoring CFG specifications: http://fsl.cs.illinois.edu/index.php/Efficient_Monitoring_of_Parametric_Context-Free_Patterns).