- 1. Overview
- 2. Context Tables
- 3. Rules
- 4. Symbols
- 5. Writing Speech Lines
- 6. Formatting Speech Lines
- 7. Speech Database
- 8. Functions
Table of contents automatically generated by this website
Contextual Dialogue is a framework for building a database for contextual dialogue barks. It facilitates randomly selecting and generating speech lines from the database. This is designed as a valid interpreter for DrakonScript, which can be used to more easily write speechbanks for this framework.
This framework provides the following features:
- A context table implementation that can be used to store flattened facts about the world.
- A hierarchical database of speechbanks that stores rules and their criteria in a highly sorted fashion.
- A highly efficient method of selecting a matching rule, then generating speech lines from that rule.
- Support for placeholders such as symbols, functions, and context expressions within a speech line and filling them in when the speech line is generated.
- Options for extending the functionality to fine-tune it to your project, such as adding your own speech query parameters via subclassing and adding your own functions.
This framework also makes several design choices that may not be ideal for all scenarios:
- When a list expression is printed, a random item is selected from the list. This uses random selection without replacement, which adds natural variety when the same list is used more than once.
- When a non-negative integer expression is printed, the number is converted to a word representation (e.g.
3
is printed asthree
). Float (decimal) expressions are converted to an integer and then converted in the same way. - When selecting a matching speech line, only the matching rules with the highest priority are considered. Multiple candidate rules with the same priority are tiebroken with random selection. Only the speech lines within the matching rule attempt to generate.
- An alternative implementation could be that all potential speech lines across all matching rules are added into a weighted pool, allowing for more natural variety without the use of the fail criterion. However, this approach was chosen for its efficiency in not having to check every rule in the database if a high-priority rule already matches.
- When searching for context without a name, the speech query searches through a list of default tables in a predefined order.
- An alternative implementation could be to search through all of the currently available context tables; however, for efficiency and consistency a set of default tables were used instead.
- In many parts of the implementation, the program prioritizes faster runtime performance over memory usage.
While this framework is not meant to be run by itself since it is more of a code library, two sample main()
functions are provided:
Tester.java
: Runs some various tests on the code, used during development.Main.java
: Used by the DrakonScript command line tool, which takes in several arguments to read the speech database and generates speech lines from a single group/category.
These classes can be used as examples on how to use this code to load a speechbank database and generate lines with it.
The following data types may be used in functions and symbol expressions. When the data is part of the speech line's content as a root-level component, meaning that it is actually printed as part of the result, it resolves to a text version of its actual value (such as the integer 1
becoming one
). This is different for each data type, and some data types may refuse to be printed at all and instead make the speech line fail to generate.
Note that most of these expressions do not work if written directly in the speech line; for example, writing I own 3 chicken
does not convert 3
to three
. However, if a symbol is set to 3 such as @num_chicken = 3
then I own @num_chicken chicken
resolves to I own three chicken
.
Text enclosed in double quotes, such as "apple"
. This is printed as the text without double quotes, e.g. apple
. In some cases such as when writing strings inline (within a speech line's text instead of as a symbol), the quotes should be omitted.
A whole number, either positive or negative but without decimal points, such as 1
, -3
, or 0
. If the number is non-negative (0 or greater), it is printed as a word representation of that number, e.g. 23
-> twenty-three
. If the number is negative, the speech line fails to generate instead.
Any number, including both whole numbers and numbers with decimal points, such as -1
, 4.3
, and -31.56
. When printed, the number is converted to an integer by truncating the decimal portion, then printing according to Integer rules.
A sequence of items of any data type, enclosed in square brackets and separated by commas, e.g. ["a", "b", "c"]
. Items do not have to be the same data type, and can also include placeholder types as well as nested lists.
When printed, a randomly selected item from the list is chosen. If the same list is printed multiple times in the same speech line, it samples without replacement. This also records a list choice, which can be accessed by the @prev()
and @prev_match()
.
Lists can only be printed if they contain only Integers or Strings, as that is what is supported by context and useful for most speech lines. However, it is possible to make lists containing the other data types for other purposes.
Either true
or false
. The speech line fails to generate when attempting to print these; however, functions such as @if_else()
can be used, e.g. @if_else(true, "yes", "no")
-> yes
.
The following data types are placeholder types that must resolve to one of the normal data types when evaluated. Therefore, these can never be printed directly.
References a piece of context directly using #table.name
syntax, or #name
if the table is omitted. Context can be any of the normal data types, as they are all supported by context tables.
It is important to note that the value and type of a context expression is ambiguous at compile-time, since the values in the context table are unknown until the speech query is actually performed (in addition, the context could also simply not exist). Therefore, context cannot be type-checked and is prone to causing speech lines generation to fail if the context does not exist or is not an appropriate type. This relies on proper design of context tables and speechbanks to avoid frequent failed speech lines.
References a previously defined symbol. Resolves to the symbol's expression, which can be any of the normal data types.
Calls a function. The value and type of the function call is based on the function's output (otherwise known as what it returns). For example, @add(1, 3)
returns 4
as an Integer.
Context tables consist of mutable key-value pairs representing facts about the world or other entities. Context tables are flattened, meaning they cannot be nested within each other. This is intended to make it as easy as possible for writers to understand what information is available via context, without having to deal with nested data or functions. A context's key is always a string. Its value can be one of multiple types:
- A number (integer or float)
- A string
- A boolean
- A list of strings or integers
To speed up comparison efficiency, all string and boolean values are converted into numbers. Booleans are converted into 0.0f
and 1.0f
for false
and true
, respectively. Strings use a String Cache, which is a two-way lookup table to assign each string a unique integer. Thus, a word like town
is converted into the integer 1000
, which both context tables and criteria use internally (for example, the criterion structure.type = "town"
is really performed as structure.type = 1000
). This helps static comparisons (such as "equals" or "less than") be more efficient by avoiding branching penalty, as the program does not have to figure out how to handle the comparison's type first.
All lists are actually integer sets, as strings can be converted into integers. The nature of a set means that order is irrelevant, and duplicates are not stored. These make the two main list operations, includes
and empty
, more efficient since the order of the list does not matter.
The speech database consists of rules, which are criteria-response tuples (pairs). The criteria of a rule describes the conditions that must all be true for the rule to match. The responses are various outputs that are available if the rule is selected by a speech query. There are multiple kinds of responses:
- Speech lines: A list of speech lines, one of which is randomly chosen whenever this rule is selected.
- Actions: Operations that can modify the current context, such as adding new context, deleting existing context, or modifying existing context. These action statements are executed sequentially.
Rules have a priority generally equal to the number of criteria the rule has (including criteria inherited from preset rules). This can also be manipulated by the Dummy criterion, which can add or remove priority manually.
When loaded, a rule's criteria are automatically sorted to optimize the matching process. This computes the criteria that are most likely to help decide whether the rule matches first, so that the rule is eliminated as soon as possible (if it would fail). It then tiebreaks by how expensive the rule is compute, so that simple criteria that has a chance to make the rule fail are compared first (such as Fail criteria, which use a single random number), followed by expensive criteria (such as comparisons), followed by criteria that do not contribute anything to the matching process (such as Dummy, which always succeeds and is only used to set the priority of the rule).
Rules may also specify a list of named rules to use as presets. Assuming that the named rule was declared in a parent speechbank or in a previous rule in the same category, this rule inherits all the criteria from that rule. Duplicate criteria are not combined, and will be checked twice (which is inefficient). The rule's priority also takes into account the number of criteria from preset rules that it references.
Symbol declaration is described by DrakonScript. As described by DrakonScript's specification, symbols can be declared at the group level (known as group-level symbols) or at the rule level (known as rule-specific symbols). Group-level symbols can be used anywhere in the speechbank, as well as the speechbank's children. Rule-specific symbols can only be used inside the speechbank and do not exist outside of that rule.
Symbols are defined sequentially, so rules are able to reference other rules in the same block if they were defined in an earlier line. For example, the following would be valid in DrakonScript:
@a = 5
@b = @a + 2
But not this:
@b = @a + 2
@a = 5
Symbols are immutable, so trying to declare the same symbol twice results in an error. This means that if a symbol is already defined at the group level, it cannot be defined again at the rule level.
If an expression is used more than once, it is generally more memory-efficient to represent it with a symbol. See Symbol Replacement for technical details on this.
This framework supports speech lines with placeholders, where the line in the speechbank undergoes some processing before the final line is generated (or printed). For example, "Hello @name!"
can replace the symbol @name
with Joe
and result in the generated line "Hello Joe!"
.
The generated speech line is always a single string. If the speech line goes over multiple lines, then forward slashes /
are used within the string to indicate there should be a line break.
While other kinds of placeholders besides symbols may be used, defining an expression as a symbol is generally easier to write (and can be more memory-efficient) and is therefore recommended. Other options are available primarily for shorthand which should be used only for special cases.
Symbols can be written exactly as they are declared, by using @
before the symbol's name. An example of this can be seen above.
Note that symbols can only consist of letters, numbers, and underscores (and must also start with a letter), so punctuation characters can be used such as !
, .
, ?
, and '
directly adjacent to the symbol. In other words, Hello @name!
knows to look for @name
, not @name!
since !
is not a valid symbol character.
Context can be directly referenced using #table.name
syntax, or #name
if the table is omitted. For example, "Hello #speaker.name!
attempts to replace #speaker.name
with the name of the speaker in context. This follows similar rules to writing symbols which handles punctuation characters. As mentioned in the Context data type, the speech line fails to generate if the context does not exist or cannot be printed.
Functions can be called directly using @function_name(arg1, arg2)
. As mentioned in the Function data type, the speech line fails to generate if the function's return type cannot be printed.
Number, string, context, and symbol arguments can be written inline. Booleans and lists cannot be written inline; you should use symbols to represent these instead. Strings do not have double quotes when written inline.
When writing inline function arguments or the normal speech line text, backslashes (\
) can be used to escape the next character. This prevents that character from being recognized as a special character, allowing them to be written as normal text.
For example, @upper(hi, how are you)
would interpret the comma as a break between arguments and convert this into @upper("hi", "how are you")
(which would cause an error since @upper
only takes one argument). To include the comma, you should instead write @upper(hi\, how are you)
which would correctly output HI, HOW ARE YOU
.
A newer feature added to this system is speech formatting, which allows arbitrary formatting to be applied to speech lines. This inserts invisible tokens (written as text enclosed in {}
) that do not appear when the speech line is displayed, and can be used by your speech line delivery mechanism to decide exactly how the line should appear.
While the exact uses of these are extremely flexible, some shorthand for common formatting have been added. Their equivalents in the "general" speech format syntax are shown; however, these should only be written using the shorthand and not with general syntax, as this allows the tokenizer to better optimize these tokens.
Enclosing text in a single asterisk such as *text1*
italicizes it, making it display as text1. Under the surface, the line is converted to {italics=true}text1{italics=false}
.
Similarly, enclosing text in a double asterisk such as **text2**
bolds it, making it display as text2. This line in general syntax would be written as {bold=true}text2{bold=false}
.
Writing a triple asterisk (***
) will always consider the first two asterisks to be bold, and the third to be italics (and would be converted into {bold=true}{italics=true}
). Tokenization fails if these asterisks are not balanced, such as if *text3
is written without ending the asterisk. To write the same text without throwing an error (and having a literal asterisk in the speech line), use a backslash such as \*text3
. Tokenization also fails if there are redundant bold/italics pairs, such as ****abc****
which could just be written as **abc**
.
Pauses in the speech line can be written using underscores (_
). For example, One...___two...___three
can put a lengthy pause between each word. The length of a single pause depends on how speech line delivery is implemented, this length is multiplied by however many consecutive _
characters there are.
The example line in general syntax would be written as One...{pause=3}two...{pause=3}three
. Speech lines cannot end on a pause, so three__
would cause a tokenization error.
A forward slash /
indicates a line break. Lines should not be empty, so multiple forward slashes in a row (which would normally create blank lines) causes a tokenization error. For example, line1/line2
in general syntax is written as line1{linebreak}line2
. Note this does not use the standard \n
for line breaks in favor of a more explicit token. Speech lines cannot end on a line break, nor have consecutive line breaks in a row.
For any other formatting supported by your speech delivery, general formatting syntax must be used. This follows the form {k1=v1, k2=v2, k3=v3}
where the k
values are your attribute and v1
values are the value to assign to the attribute. Values can be strings (which are not enclosed by quotes), booleans (true
/false
), and numbers. In addition, an "empty" attribute can be specified without an assigned value such as {k1, k2, k3}
which is useful for certain formatting options such as line breaks.
Multiple attributes can be specified in a single {}
block; however, under the surface these are expanded sequentially. For example, the first form would be converted to {k1=v1}{k2=v2}{k3=v3}
with each attribute being set one at a time.
This is a flexible syntax meant to support a wide variety of use cases. Ideas for other attributes include changing to a certain image/emote, playing a sound, changing letter-by-letter text scrolling speed, and setting text color. Of course, if you are only concerned with plain text, then none of these need to be used or implemented.
The speech database contains all the speechbanks. It is divided as a hierarchial database first by groups, then by category. Each mini-database contains the list of rules pertaining to that group and category. Since group and category are always given to the speech query, this makes lookup time very efficient since only that group and category's rules need to be considered.
Within each of these mini-databases, rules are sorted by their priority, from highest to lowest, in order to skip unnecessary rule comparisons. Therefore, when reading through the database in order, the first matching rule has the highest priority, so therefore no rules with a lower priority need to be compared. If the following rule after the matching rule has the same priority as the matching rule, it also needs to be compared. If there are multiple matching rules, a random one is selected.
Speech queries represent a request for a speech line. They are passed through the speech database and may return either a speech line, or null
if a line failed to generate. Speech queries consist of the following components:
- The group that the speech line should come from.
- The category that the speech line should come from.
- A collection of context tables, which represent the context available to the query.
The context tables can come from any source, and are labelled with keys such as speaker
, listener
, world
, etc. These labels are assigned when constructing the query, and are not a part of the context table itself. This allows the same context table to have different labels in different scenarios; for example, a character can be the speaker
in one query, but the listener
in another.
When a speech line is requested, the database looks only at the rules in the mini-database corresponding to the given group and category. Rules are sorted by priority within each mini-database, so it begins with the rule that has the highest priority and works its way downwards. The highest priority matching rule is selected using the following logic:
- If the rule matches, it is stored as a candidate rule. This continues until the priority of the next rule is less than the priority of a candidate rule, if one exists. Therefore, rules with the same priority may all become candidate rules if they all match. However, if the next rule has a lower priority and a candidate rule already exists, then the remaining rules are ignored since they can never have a higher priority than the already-matching candidate rule and thus will never be selected.
- Then, a rule is selected from the candidate rules at random. The selected rule selects one of its speech lines at random and attempts to generate it. It repeats this random selection up to 3 times until a speech line is generated successfully.
If this rule selection and speech line generation process fails at any point, such as if there is no mini-database for this group-category pair or if there are no matching rules, then a speech line fails to generate.
Parent-child relationships can be declared between two speechbanks. A speechbank can have more than one child, but can only have one parent. This parent speechbank provides the following functionality:
- All group-level symbols are passed down from the parent to the child and can be used by the child's speech lines and other expressions. Since symbols are immutable, trying to declare a symbol with the same name in the child speechbank causes an error.
- All named rules are passed down from the parent to the child.
- When generating a speech line, if the child speechbank fails to generate a speech line entirely, then the system attempts to generate a speech line using the parent.
This does not combine the two speechbanks into one, and there is no inheritance of rules in the same category. Rules in the parent speechbank will never be selected matched unless the child speechbank fails entirely.
When loading the database from a directory of JSON files, the parent speechbank is always loaded before its children. If the specified parent does not exist or causes a circular reference (speechbanks are each other's parent), the program throws an error. The preset speechbank is always loaded first.
A special preset speechbank stored in a preset.json
in the root speechbank directory has a special purpose. If a speechbank does not have a parent explicitly assigned to it, its parent is the preset speechbank by default. Therefore, the preset speechbank is "global" or an ancestor that every other speechbank can access, using the parent speechbank rules. This is useful for the following applications:
- Any group-level symbols declared here are global and can be accessed by any speechbank. This is useful for symbols that are commonly used by all speechbanks, such as the speaker's name.
- Any named rules declared here are global and can be accessed by any speechbank. These are useful to define sets of criteria that are used often, and generally should have an empty response.
It is recommended to make the categories in the preset speechbank begin with preset_
so they are not confused with actual categories and the system never tries to generate lines from these empty rules.
Functions are expressions of the form @name(arg1, arg2, ...)
perform an operation on the input arguments (within the parentheses) and returns a specific output. While the syntax of functions is similar to symbols, they are differentiated by the use of parentheses ()
after the name.
The list of functions supported by this framework can be found here.
The parser attempts to validate the arguments of a function at compile-time in a process known as type checking. The goal of this process is to check that all arguments are the correct type so they can be used by the function.
This allows it to reject obvious type errors, such as if you try to subtract two Strings using @sub()
. In addition, it is also able to partially infer the types of Symbol or Function data types by analyzing the symbol's expression or the function's return type, respectively.
However, Context data types are ambiguous since its value is based on the current context, which cannot be checked at compile-time. Therefore, it will not cause any errors at compile-time since the type is ambiguous. However, this may lead to errors at runtime if the actual value of the context is not compatible with the function it is used with. If this occurs, the speech line fails to generate.
The parser also uses implicit type coercion to make certain function calls easier to write, where it automatically converts a value of one type into another (matching the type of the function's argument) if they are compatible.
The simplest example of this is converting an Integer to a Number for use in a function that only works with Numbers, which can be performed since all integers are numbers. However, not all numbers are integers, so numbers cannot be coerced into integers implicitly. You can use the function @to_int()
to accomplish this manully.
Another important coercion concerns Strings. All normal data types (with the exception of Booleans, since they cannot be printed) can be converted into Strings by using their printed representation. For example, consider the integer 1
(which is printed as one
) and the function @upper()
, which takes in a single string as an argument. The call @upper(1)
evaluates to ONE
because the argument is coerced into a String before the function is evaluated.