Training

Record the training signal to a tape:

In order to do so vhs-teletext and raspi-teletext need to be installed. vhs-teletext feeds generative data to raspi teletext. This was tested on Raspbian 10 (buster), Kernel 5.10.63-v7+ (32bit)

teletext training generate | <folder of raspi-teletext>/teletext -

# Example how it could look
teletext training generate | /home/marten/github/raspi-teletext/teletext -

Record it back to training.vbi.
Run the following script to process the training file into patterns:

#!/bin/sh
SPLITS=$(mktemp -d -p .)
# for uncompressed vbi input use
teletext training split $SPLITS training.vbi
# for flac compressed input use
flac -c -d --force-raw-format --sign=unsigned --endian=little training.flac | teletext training split $SPLITS

teletext training squash $SPLITS training.dat
teletext training build -m full -b 4 20 training.dat full.dat
teletext training build -m parity -b 6 20 training.dat parity.dat
teletext training build -m hamming -b 1 20 training.dat hamming.dat
cp full.dat parity.dat hamming.dat ~/Source/vhs-teletext/teletext/vbi/data/
echo $SPLITS

Theory

The idea behind training is to record a known teletext signal onto tape and then play it back into the computer in the same way as you would when recovering a tape. Then the original and observed signal can be compared to build a database of patterns.

To make sure we can identify the degraded training packets, each one has an ID and checksum. These are encoded so that each bit of data is three bits wide in the output. This makes recovery of the original trivial. There are also fixed bytes which can be used to help with alignment.

We want to fit the most possible patterns into the least possible tape. A De Bruijn sequence is used to do this. This is defined as the shortest possible sequence which contains every sequence of input characters (0 and 1 in this case) up to length N.

We use the De Bruijn sequence [2, 24]. This means it contains every possible sequence of 1 or 0 of length 24 bits, which is about 16 million patterns. The ID field stores an offset into this sequence.

When recording it is possible for a run of whole frames to be lost, so we do not simply display the whole De Bruijn sequence from start to finish. Instead, for each packet, we add a prime number to the offset and modulo the sequence length. This way every part of the sequence is shown multiple times, and even a long run of frame drops is unlikely to cause total loss of any part of the pattern.

After recording the signal back into the computer it is sliced into patterns representing 24 bits of data. For a specific 24-bit pattern there will be multiple slices in the signal. An average is taken of every occurrence and saved along with the original data it represents. This is the intermediate training data.

Finally, the pattern files are built. A pattern is described like this:

Number of bits to match before.
Set of possible bytes to match.
Number of bits to match after.

So for example, the parity data file is like this:

build_pattern(args.parity, 'parity.dat', 4, 18, parity_set)

Means:

Match 4 bits before.
Match any byte with odd parity. (128 possibilities/7 bits)
Match 3 bits after.

giving 14 bits total, or 16384 patterns.

To build the pattern data the intermediate data is processed and any pattern which matches the criteria is added to a list. Then the average for each list is taken. That is the final pattern we will match against.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training

Training

Theory

Installation & Usage

Method

Capture

Indexing

Troubleshooting

Results

Community

Clone this wiki locally