Tool to sample a random uniform Wheeler DFA (WDFA) with the given number of nodes, number of edges, and alphabet size.
- A modern C++17 compiler such as
g++
version 8.4 or higher. - The cmake build system, version \geq 3.10.
- A Linux 64-bit operating system.
To clone the repository, run the following command:
git clone https://github.com/regindex/Wheeler-DFA-generation
You can compile with make:
cd Wheeler-DFA-generation
make
You need to give in input the number of states (
- unordered list
$\sigma < n$ $\sigma \leq m \leq n \cdot \sigma$ $m \geq n-1$
usage: gen-WDFA.py [-h] [-n N] [-m M] [-a A] [--interm] [--binary] [-o O]
[-s S] [--verbose]
Tool to generate random uniform Wheeler DFAs.
optional arguments:
-h, --help show this help message and exit
-n N, --N N number of states
-m M, --M M number of edges
-a A, --A A alphabet size (def. 128)
--interm output the resulting WDFA in intermediate format (def. False)
--binary output the resulting WDFA in binary format (def. False)
-o O, --O O output file basename (def. n_m_a.wdfa)
-s S, --S S random number generation seed (def. rand)
--verbose verbose mode on (def. False)
After compiling, run:
python3 gen-WDFA.py -n 5 -m 4 -a 4 -o test.wdfa
This command will generate a WDFA with
strict digraph {
0 -> 1 [ label = 0 ];
2 -> 1 [ label = 0 ];
3 -> 2 [ label = 1 ];
4 -> 2 [ label = 1 ];
0 -> 3 [ label = 2 ];
1 -> 3 [ label = 2 ];
2 -> 3 [ label = 2 ];
3 -> 4 [ label = 3 ];
}
Enabling the '--interm' flag the tool can also output the resulting WDFA in an intermediate format (see below for an example),
5 8 4 0
0
0 1
2 1
1
3 2
4 2
2
0 3
1 3
2 3
3
3 4
The first line contains the number of states, number of edges, alphabet size, and source node separated by newline characters. Each other line contains either a label or an edge (pair of states: origin-destination). All edges between two consecutive labels, c and c', are labeled with c. For instance, the two edges 0 1 and 2 1 are labeled with 0.
Using the '--binary' flag, it is possible to stream the resulting WDFA in binary format (the fastest method); in this case, each line after the first contains three integers representing an edge (origin label destination). The integers in the first line are represented using 64 bits, while the others can be represented using either 32 bits (if
This project has received funding from the European Research Council (ERC) under the European Union’s Horizon Europe research and innovation programme, project REGINDEX, grant agreement No 101039208