Utilities for validation and modification of Text
values in the Dhall configuration language. Not recommended for use in production.
The Text
type in Dhall is opaque by design. It is impossible to inspect the value of a Text
expression during evaluation. (One may, however, test Text
values for equality during the type-checking phase, e.g. assert : "abc" === "ab" ++ "c"
.) The Text
-related language bulit-ins are also scarce: Text/show
, Text/replace
and the concatenation operator ++
.
This library contains a number of functions, implemented using the above toolset and a dose of creativity, that allow one to manipulate Text
values based on their content:
let L = ./src/Logic/package.dhall
let P = ./src/Predicates/package.dhall
let Prelude = ./src/Prelude.dhall
let animals = ["aardvark", "hippo", "dog", "chimpanzee", "cat", "chickadee", "flapper", "manatee","aardwolf"]
let f : Text -> Text =
\(t : Text)
-> "${Text/show t} "
++
L.ifThenElse
(L.or [P.hasPrefix "aa" t, P.hasSuffix "ee" t, P.contains "pp" t])
("starts with \"aa\", or ends with \"ee\", or contains \"pp\"!")
("is not very interesting")
in Prelude.Text.concatSep "\n"
(Prelude.List.map Text Text f animals)
validate Text
values at the type-checking phase:
let L = ./src/Logic/package.dhall
let P = ./src/Predicates/package.dhall
let badWords = ["adults", "only", "really", "bad", "swear", "words"]
let rhyme =
''
As I went up the apple tree
All the apples fell on me.
Apple pudding, apple pie
Did you ever tell a lie?
''
let test = assert : L.isFalse (P.containsOneOf badWords rhyme)
in rhyme
or use dependent types to control allowed function input:
let L = ./src/Logic/package.dhall
let P = ./src/Predicates/package.dhall
let T = ./src/Transformations/package.dhall
let hexDigits = ["0","1","2","3","4","5","6","7","8","9","A","B","C","D","E","F"]
let hex2binary: ∀(hex : Text) -> (L.isTrue (P.consistsOf hexDigits hex)) -> Text =
\(hex : Text) -> \(_ : L.isTrue (P.consistsOf hexDigits hex)) ->
T.applyAll
[Text/replace "F" "1111"
,Text/replace "E" "1110"
,Text/replace "D" "1101"
,Text/replace "C" "1100"
,Text/replace "B" "1011"
,Text/replace "A" "1010"
,Text/replace "9" "1001"
,Text/replace "8" "1000"
,Text/replace "7" "0111"
,Text/replace "6" "0110"
,Text/replace "5" "0101"
,Text/replace "4" "0100"
,Text/replace "3" "0011"
,Text/replace "2" "0010"
,Text/replace "1" "0001"
,Text/replace "0" "0000"
]
hex
in hex2binary "BADF00D" L.QED
And some parser-like capabilities are in the making!
More examples can be found in the examples/
directory.
The library was created out of curiosity. I wanted to see how far I can get with validation and manipulation of Text
values using the limited Text
toolbox of Dhall. It was great fun!
The following discussions gave me some inspiration:
- dhall-lang/dhall-lang#1035
- dhall-lang/dhall-lang#669 (comment)
- https://discourse.dhall-lang.org/t/text-manipulation-functions/176/55
Note that some of the possibilities that the library gives are strongly against the philosophy of Dhall (and good programming practices), e.g. it allows to string-encode some of the business logic instead of keeping it at the type level.
While writing the library I mostly thought about extending it with new capabilities, keeping it correct, and making its use simple. Performance was never a goal.
For this reason I do not advise to use dhall-text-utils
in production. The performance is probably very poor (it was never
benchmarked). While some improvements are definitely possible, the whole idea behind the library seems doomed to
have bad performance. The library was also not tested thoroughly.
However, the library demonstrates that Dhall already has some text validation capabilities. This may be an argument to have them included, in some form, in a future version of the language standard as built-in features.
The library makes use of several properties of the Text
related operations. Some of them are inherent to these operations
(e.g. the empty string being the neutral element of concatenation), some are more incidental (e.g. the escaping rules used
by Text/show
).
While the Text
type is opaque in Dhall, i.e. we can not inspect Text
values and get, say, a Bool
or a Natural
result
depending on their content. However, Text
values may be modified. And the result of the Text/replace : Text -> Text -> Text -> Text
function
depends on the content of its arguments in an interesting way. In particular Text/replace t s t
equals s
for any non-empty string t
, but it
equals ""
if t === ""
. Using this fact we can, given a Text
value t
, obtain some fixed string, let it be "x"
, if t
is
non-empty, and obtain ""
otherwise:
-- returns "" if t is empty and "x" if t is not empty
let nonempty2x = \(t : Text) -> Text/replace t "x" t
It also happens that ""
is the neutral element of the Text
concatenation monoid (i.e. appending or prepending ""
does not change
a string), which allows us tu "detect" the empty string in a similar fashion:
-- returns "" if t is not empty and "x" if t is empty
let empty2x = \(t : Text) -> Text/replace "xx" "" ("x" ++ Text/replace t "x" t)
Now it is easy to write a function that accepts three arguments and returns the second argument if the first argument is the empty string and returns the third argument otherwise:
let ifEmptyThenElse =
\(t : Text) -> \(whenEmpty : Text) -> \(whenNotEmpty : Text) ->
(Text/replace "x" whenEmpty (empty2x t))
++
(Text/replace "x" whenNotEmpty (nonempty2x t))
If we are able to implement a function (let's call it a "predicate") hasPropertyP : Text -> Text
that converts a Text
value t
either to "x"
or to ""
, depending on t
having or not having some property P
, then we may use ifEmptyThenElse
to construct a function ifHasPropertyPThenElse : Text -> Text -> Text
which returns either the second argument if the first argument has property P
, and the third argument otherwise. Moreover, we can use hasPropertyP
to validate function arguments at the type-checking level:
let f : forall ( t : Text ) -> ("x" === hasPropertyP t) -> SomeType =
\(t : Text) -> \(assert : ("x" === hasPropertyP t)) -> [...]
or to write assert
-based tests:
let test = assert : "x" === hasPropertyP someVariable
It turns out such "predicates" exist for many properties. For example, to check whether t
is equal to some given string, we can use:
let equals = \(s : Text) -> \(t : Text) ->
empty2x (Text/replace s "" t ++ Text/replace t "" s)
See ./src/Predicates
for the "predicates" that the library provides.
Now, if we assign names to "x"
and ""
:
let true : Text = "x"
let false : Text = ""
we can implement Boolean operators:
let not : Text -> Text = empty2x
let or : Text -> Text -> Text = \(t1 : Text) -> \(t2 : Text) -> nonempty2x (t1 ++ t2)
let and : Text -> Text -> Text = \(t1 : Text) -> \(t2 : Text) -> not (or (not t1) (not t2))
that work as expected when their input is either ""
or "x"
.
The dhall-text-utils
library wraps true
and false
in a record type (named TextBool
, see ./src/Logic
) and adds some sugar here and there, but this is the basic idea.
One other notable trick that is used in the stripPrefix
and stripSuffix
functions (and in the histogram example) is based on the fact that the output of Text/show
will never contain certain substrings
(e.g. three consecutive quotation marks) or characters (e.g. a tab or a newline, which are converted to \t
and \n
). The details are left to the interested
reader.
The library consists of several interrelated sub-packages:
Logic
Predicates
CharacterClasses
Transformations
Here we describe the contents of these packages. Studying the examples provided in this README and in the ./examples
directory should
be enough to start working with the library.
Contains the definition of the Logic.TextBool
type, which is used to represent "true" and "false" encoded as Text
wrapped in a record. Values of the type Logic.TextBool
should never be constructed by hand. The details of the implementation of this type can change without notice and this is not
considered a breaking change. The main source of Logic.TextBool
values are the functions living in the Predicates
package. However, the functions Logic.true
and Logic.false
can be used to create Logic.TextBool
"literals".
The basic Boolean operations are provided by Logic.not
, Logic.or
, Logic.and
and Logic.xor
. An if
-like conditional is given in Logic.ifThenElse
.
The package also contains some sugar used for assertions and dependent types: Logic.isTrue
, Logic.isFalse
can be used instead of === Logic.true
or === Logic.false
. For example, instead of writing Predicates.isEmpty x === Logic.true
we can write Logic.isTrue (Predicates.isEmpty x)
. This
is mostly a matter of style. There exists another helper Logic.QED
, that provides a "proof" for dependently typed functions:
let f : forall (t : Text) -> Logic.isFalse (Predicates.isEmpty t) -> Text =
\(t : Text) -> \(_ : Logic.isFalse (Predicates.isEmpty t)) -> [...]
in f "abc" Logic.QED
The functions Logic.any
and Logic.all
can be used to map predicates over a list and aggregate the result using either alternative or conjunction.
The package also contains a number of less notable functions.
We defina a character class to be a list of Text
values, each consisting of a single codepoint. For example ["a","b","3"]
and [] : List Text
are character classes, but ["a","bc"]
is not.
The package contains definitions of several common character classes (e.g. the list of all ASCII characters, the list of ASCII digits, the list of Unicode control characters).
Note that if a function in the library accepts a List Text
value that is expected to be a charcater class, then it should be a character class. This is not
controlled at the type level and may lead to errors! One should pay special attention to Unicode characters consisting of several codepoints.
By a predicate, in the context of this library, we understand a TextBool
-valued function. The package contains a number of predicates related to various properties of Text
values, e.g. Predicates.isEmpty : Text -> TextBool
, Predicates.hasPrefix : Text -> Text -> TextBool
, Predicates.containsOneOf : List Text -> Text -> Text
.
Some of these have versions with arguments flipped, e.g. Predicates.contains
and Predicates.isContainedIn
, or Predicates.hasSuffix
and Predicates.isSuffixOf
. Some may be easily constructed using the others and the functions from the Logic
package (e.g. Predicates.containsOneOf
is implemented using Logic.any
and Predicates.isContainedIn
). I do not strive for minimality. There is at present no rule which of these derived predicates are included in the library, and which are not. One exception is that I do not want to include negations of existsing predicates. For example:
let isNotEmpty : Text -> TextBool = \(t : Text) -> Logic.not (Predicates.isEmpty t)
is not going to be included in the Predicates
package.
Some of the predicates (e.g. Predicates.consistsOf
or Predicates.hasLengthUsing
) accept an argument that is required to be a character class (see the section above). This is not checked at the type level!
There are a few dependently typed predicates: Predicates.hasLengthAtLeastUsing
, Predicates.hasLengthAtMostUsing
, Predicates.hasLengthUsing
. These functions use the fact that a given string consists entirely of codepoints belonging to a given character class. To use these predicates you have to guarantee, at the type-checking level, that this property holds. So, for example:
Predicates.hasLengthUsing 5 CharacterClasses.ASCII "abcde" Logic.QED === Logic.true
but
Predicates.hasLengthUsing 5 CharacterClasses.ASCII "ąbćdę" Logic.QED
will fail.
If you need to use a non-dependently typed version of these functions, then you may either try to use Predicates.hasSubstringOfLengthAtLeastConsistingOf
. Or, when you are sure that your Text
values consist only of character belonging to a certain class, create your own versions of these predicates, by copying the source and getting rid of the last argument.
The package contains a miscellany of functions that may be useful when manipulating Text
values, like Transformations.stripPrefix
, Transformations.stripSuffix
or Transformations.applyAll
, which consecutively applies a list of Text -> Text
functions to a given Text
value.