-
Notifications
You must be signed in to change notification settings - Fork 2
/
cbw.doc
1047 lines (794 loc) · 50.8 KB
/
cbw.doc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
CRYPT BREAKERS WORKBENCH
USERS MANUAL
Robert W. Baldwin
mit-eddie!baldwin
October, 1986
Overview
The Crypt Breakers' Workbench (CBW) is an interactive multi-window system
for mounting a cipher-text only attack on a file encrypted by the Unix crypt
command. CBW is a workbench in the sense that it provides the user with an
integrated set of tools that simplify the initial, middle and final portions of
the decryption process. A user interacts with the workbench by choosing tools
and setting parameters. CBW carries out the work and displays the results. A
moderately experienced user of CBW can easily decrypt both long and short
messages when bigram statistics are known for the message space. The basic
cryptanalytic techniques used by CBW are described in a paper by Reeds and
Weinberger that appeared in the October 1984 issue of the ATT Bell Laboratories
Technical Journal. This manual explains the capabilities and operating
procedures of CBW.
1. Introduction to the Workbench
A workbench is a place where a craftsman can quickly and comfortably carry
out a wide range of tasks. The ease of performing tasks comes from the range
of tools present on the workbench and the experience of the craftsman. Each
tool has its strengths and limitations, but the craftsman can select the best
tool for the job at hand. The workbench is not a static thing. Often the
craftsman builds a jig that is tailored to a particular problem. Some jigs are
later discarded while others become permanent fixtures of the workbench. It is
in this sense of the term workbench that CBW is a workbench for breaking the
Unix crypt cipher.
CBW has evolved over the past two years, and it will continue to evolve as
the process of decrypting Unix files becomes better understood. Already I can
think of commands that I want to add to speed up the end-game of the decryption
process. It is being released in its current form to allow others to play with
it and add their own ideas.
One goal of releasing CBW is to discourage people from using the Unix
crypt command for sensitive information. Crypt was never intended to be a
highly secure cipher. It's primary advantage has been its speed, which made it
suitable for integration into a wide range of application programs. The CBW
program simply points out how easy the cipher is to break using the ideas
described by Reeds and Weinberger in the October 1984 issue of the ATT Bell
Labs Technical Journal. As those authors suggested, if you must use crypt, you
should run compress on the file before encrypting it.
2. Key Space Attack
The basic metaphor of CBW is that the computer knows several message-space
searching strategies, and the user decides which strategy to try next. The
program takes on the burden of carrying out the strategy, but it is up to the
user to choose the strategy and to set the necessary parameters (e.g., how
close a scoring statistic has to be to its expected value before the program
will accept a guess). When a strategy completes, its results are displayed.
An important feature of the program is that the results are displayed in a
manner that makes it easy for the user to decide what to do next. Further, it
easy to carry out decisions such as accepting the outcome as is, accepting it
with minor changes, or starting over with a different strategy or a different
parameter setting.
To use CBW it is necessary to know a few facts about the underlying cipher
system, in this case the Unix crypt program. The primary fact to know is that
the job of deciphering a file can be broken down into several smaller jobs.
The ciphertext being decoded can be divided into blocks of 256 characters, and
each block can be decoded separately.
Each block is encrypted using a relatively simple cipher system, but the
key to this simple system changes at the end of each block. Effectively, each
block is encrypted by a single rotor enigma system. The rotor has 256
positions to handle full eight-bit bytes, and this leads to the number of
characters in each block. CBW represents the wiring for each block's rotor by
a permutation matrix.
Once several blocks are decoded (or mostly decoded), it becomes possible
to determine how the key changes after each block. The key change is expressed
as a permutation matrix called Zee. Typically one needs to solve four or five
blocks (about 1000 characters), before it is possible to automatically compute
Zee and solve the rest of the file. For short messages there is no need to
compute Zee.
After the inter-block relationship has been determined, the program,
zeecode, can be used to decipher the whole file. The program zeecode looks at
the save-file produced by CBW to get the information it needs to decode an
entire file.
3. Files, Terminals and Shell Variables
This documentation assumes that the code, source, and datafiles are kept
in a directory called /projects/Ecrypt, but these files could be placed
anywhere.
The most important files are the program itself, and the files that
contain the English language statistics. The program is called 'cbw'. The
statistics files are 'mss.stats', 'mss-bigram.stats', and 'trigram.stats'. For
the probable word guessing command there are files like 'common.words'.
To make it easy for users to provide their own statistics files, cbw uses
the shell variables LETTERSTATS, BIGRAMSTATS, and TRIGRAMSTATS to access these
files. The variable DICTIONARY identifies a file of words for the
lookup-pattern command (one word per line). These shell variables must be set
to the full pathnames for the desired files. The file stats.slice defines all
of these variables. Your .login file should contain a line like:
source /projects/Ecrypt/stats.slice
The main input to cbw is a ciphertext file, and the main output is a save
file that specifies how the ciphertext can be decoded. The program 'zeecode'
uses CBW's save file to decode the ciphertext file. The ciphertext file for
cbw is assumed to have a name like 'fileroot.cipher'. The output of cbw is
stored in the file named 'fileroot.perm'. The '.perm' stands for permutation.
3.1. Terminal compatibility
CBW is designed to run on any display terminal. To run on a particular
terminal CBW must be able to generate graphic characters on that terminal and
be able to recognize the escape sequences sent by that terminal's function keys
(e.g., the arrow keys). CBW adapts to different terminals by examining shell
variables (TERM, GRAPHICSMAP, and KEYMAP), the termcap entry, and compiled in
defaults. It builds a keystroke map and a graphics symbol map which translate
between arbitrarily long sequences of ascii characters and the internal
representation for keystroke commands and graphic symbols. The rest of this
document will describe the action of keys, but the reader should remember that
the binding between commands and keys can be changed.
You should be able to use CBW without understanding the details of
terminal handling, but for completeness, those details are presented in the
appendix. You will need to be sure that the shell variable TERM identifies the
kind of terminal you are using. You should also be able to find a shell script
that will set up the graphics map for your terminal. List the files matching
/projects/Ecrypt/*.slice, if there is one for your terminal (e.g.,
vt100.slice), execute it to initialize the necessary shell variables (e.g.,
source /projects/Ecrypt/vt100.slice).
In general an unrecognized keystroke generates a self-insert command.
However, control characters other than newline and tab are reserved for default
keystroke commands. To insert a control character, precede it by a C-Q. The
C-Q command will create a self-insert command for the next ascii character
received from the terminal.
The notation C-N will be used to represent pressing the N or n key while
the control key is held down.
4. Getting Started and Getting Out
After setting up the shell variables described in the previous section,
you can invoke the Crypt Breakers' Workbench from any directory by typing:
/projects/Ecrypt/cbw fileroot
Where 'fileroot.cipher' is the name of the ciphertext file in the current
directory. You do not need to specify the full pathname of the cbw program if
/project/Ecrypt is in your shell's search path.
To exit the program move the cursor to the bottom window (using the arrow
keys or C-N or C-X), then type 'q', a space, and a return. The 'q' signifies
the quit command, the space causes command completion to be invoked (i.e.,
'qui ' would work just as well), and the return key causes the command to be
executed. If you have changed the decryption window since you last saved your
work, the program will ask if you really want to quit. Type 'y' if you want to
quit, or any other key to abort the quit command.
5. Commands
There are two types of commands: keystroke and user commands. Keystroke
commands are interpreted by the window that currently contains the cursor. For
example, the up-arrow key generally moves the cursor up by one line, but in the
decryption-block window, it moves the cursor up two lines. An attempt has been
made to keep the keystroke commands uniform and emacs-like. The keystroke
commands will be explained for each window.
The user commands can take arguments typed by the user. They are entered
into the command line via a simple editor and executed when the user types a
return. The cursor must be in the user i/o window to type or execute user
commands. The details of the editor are given in the section on the user i/o
window. Each user command is explained in section 7.
-------------------------------------------------------------------------------
Crypt Breakers' Workbench
Block - 0 Know 14 of 128 wires |Word Match
|.n.y
@Device[dover]......e[[email protected]. |Andy
|envy
......................... ...................................... |indy
|only
....... .................................................xt.r.a. |
|
..o.......s.........d.s.............e.........t....o.mu......on |
|
|
|
Probable word search -- Done |
@Device[dover]..t...e[Font.timesroman..].....e.A..i.l....@Sectio |
n..n..................... .r...n......n.ly.....on.....h......... |
....... .........;................ons.......e .c.... ....xt.r.a. |
..o.......s.........d.s.............e.........t.. .o.mu..c...on |
|
`----------
Help : F3 enters guess, ^G undoes it.
Status : Command completed.
Command: pwords-from file: mss.words Max dev: 1.2 (try 1.0)
-------------------------------------------------------------------------------
Figure 5-1: Sample Screen
6. Windows
The screen is partitioned into seven windows as shown in figure 5-1.
Associated with each window is a keystroke command interpreter and private
state information. By convention keys like up-arrow, C-D, and delete do
similar things in all the windows.
There are some keystrokes that have the same effects in all windows, they
are:
C-L Redraw the whole screen.
C-X, F4 Jump to command input line.
C-Q Quote. Insert the following character.
C-Z Abort the current command, restore the terminal modes, suspend
the program and return to the shell. If the program is
restarted, the terminal will be initialized for CBW and the
screen redrawn. The command that was active (if any) when the
C-Z was received is aborted. Actually this command is invoked
when CBW receives a SIGTSTP signal.
C-C Restore the terminal modes and kill the CBW process. Actually
this command is invoked when CBW receives a SIGINT signal.
The following subsections describe the features of the windows starting at
the top of the screen and working downwards.
6.1. Banner window
This window contains the program title. Default behavior for arrow keys.
6.2. Decryption Block Label window
This window contains a single line that describes the block displayed in
the decryption block storage window. The block number is shown at the left end
of the line. The right hand side of this line displays how much is known about
the decryption permutation for the current block. This value is expressed in
terms of the number of wires (2-cycles in the permutation) that are known. A
decryption permutation that maps character A to B also maps B to A, so there
are only 128 wires even though there are 256 possible characters.
6.3. Decryption Block Storage window
Displays what is known about the plaintext corresponding to a single block
of 256 ciphertext characters. A dot character, which is different from a
period character, is shown when the plaintext character is unknown. Other
graphic characters represent control characters like newline, tab, and
form-feed. If a character decrypts into a byte that has the high bit set
(i.e., not an ascii character), it shows as a solid diamond.
Most keystrokes are self-insert commands. When a character is inserted,
it replaces the character at the current position, and any deduced characters
are displayed and highlighted (by underlining). The other keystroke command
characters are:
C-D, DEL Delete character forward, delete backward. This forces a
permutation wiring to be removed. It can do unexpected things.
Consider a permutation with A wired to B and P wired to
Q. Adding a wire from B to P, then removing it with the DEL
command does not restore the original AB and PQ wirings.
C-G Undo. Backup the permutation to the state it had before the
last contiguous sequence of keystrokes. Arrow commands
terminate a contiguous sequence of keystrokes. In most cases,
undo should be used instead of DEL or C-D.
C-T Try-all. Display a list of the ascii characters that could be
put in the current cursor position without conflicting with the
existing permutation wirings. The list is sorted with the most
likely character first (leftmost).
C-L Redraws the whole screen. As a side effect this erases the
list of characters produced by C-T.
F1, C-R Move to the previous block of 256 characters.
F2, C-S Advance to next block of 256 characters.
6.4. Guess Block Label window
Title for the Guess Block Storage window. Default arrow behavior.
Parameters for guessing commands are displayed in this window.
6.5. Guess Block Storage window
This window is multiplexed between different strategies for guessing that
a particular plaintext string is at some position in the block. The general
meaning of the keystroke commands are:
F2, C-S Goto next guess.
F3, C-A Accept guess, merge it into the current decryption block. This
does not automatically advance to the next guess.
C-G Undo the merge of a guess. Actually this is identical to the
undo in the decryption block storage window, so it will only
undo the guess if you have not typed any commands to the
decryption window.
6.6. Word Lookup
This window displays the list of words in a dictionary that match a
particular pattern. The pattern is specified by the lookup-pattern command
(see section 7.4). Only the first 20 or so matches are displayed. The
dictionary file defaults to /usr/dict/words, but the pathname can be set using
the environment variable DICTIONARY. The dictionary file contains a sorted
list of words separated by newline characters. Case is ignored.
6.7. User I/O window
This window consists of three lines: help, status, and command. The help
line changes as the cursor moves between windows. It reminds the user of the
commands available in the current window. A more elaborate help facility has
not been implemented. The status line is used to show error and completion
messages. The cursor cannot be positioned over either the help or status
lines.
The command line behaves like a line editor. The editing characters are:
C-U Erase the whole line.
C-F, C-B Move forward and backward one character.
C-D, DEL Delete one character forward or backward.
F2, C-S Next-Argument. It searches for the first "%" character in the
command line, deletes it and leaves the cursor there. This is
used with command templates.
Space Within the first word on the command line, a space character
invokes command completion. A completed command string is a
template command with argument slots (marked by %) and
suggested argument values.
Return Execute command. The cursor can be anywhere within the command
line.
7. User Commands
7.1. quit
Exit the program. If any of the permutations have changed since they were
last saved, the user is asked for confirmation.
7.2. save-permutation
Associated with each block of the ciphertext is a permutation that tells
how to decode the known characters. This command saves those permutations.
There is a special permutation, called Zee, that relates the permutations of
successive blocks. The save command also saves Zee.
7.3. load-permutations
Inverse of save. The ciphertext file is loaded automatically when CBW
starts, but the permutation file must be loaded explicitly. This command is
used to load previously saved work. Warning: the load command will replace the
current permutations without asking whether you want to save them first.
7.4. lookup-pattern
Searches a dictionary for words matching a pattern. The dictionary is
specified by the shell variable DICTIONARY. The pattern is specified as an
argument to the command. Currently, the pattern consists of characters and
periods (.). The period characters match any one character in a word from the
dictionary. This means that you have to know the length of the word you are
looking for. The current dictionary does not contain suffixes, so you will
often need to try a few patterns to find the desired word. There are many ways
to improve this command.
7.5. bigram guessing
This command uses a statistics based on letter pairs (bigrams) to
carefully guess at the plaintext. This is the most useful command in CBW and
it represents a significant improvement on the techniques discussed in the
Reeds and Weinberger paper. This tool works by finding the best position for
guessing, and then trying all 128 ascii characters in that position. For each
of the 128 trials the deduced characters are scored and the trial with the best
score is accepted if it meets the criteria set by the user. The process
repeats for the next best position until no positions satisfy the user's cutoff
criteria.
The best position for guessing is the one that produces the most evidence.
It is the position that generates the largest number of deduced characters.
Preference is given to positions that are deduced next to an already accepted
character because bigram statistics can be used in these places. An
approximation to the best position can be identified by examining the
ciphertext. For the crypt cipher it is possible to identify all the characters
in the ciphertext that were the output of a particular terminal of the block
rotor. Each of the 128 trial plaintext characters for that position will
generate a wiring from that terminal to some other terminal. If the other
terminal is already used, the guess is rejected immediately. However, if the
rotor terminal is free, additional character positions may be deduced because
of the symmetric nature of the rotor (recall that if the rotor maps X to Y then
it also maps Y to X). Thus by examining the ciphertext CBW can set a lower
limit on the number of characters that will be deduced by guessing in a
particular position.
The bigram command takes two arguments, a cut-off level and a minimum
probability. The purpose of the cutoff level is to insure that a guess is only
accepted if the probability of the guess being right is at least some factor
greater than the probability that the guess is wrong. A cutoff level of 2.0
tells CBW to only accept guesses when the probability of the guess being right
is at least twice as great as the probability that the guess is wrong. Since
all 128 possible guesses are tried, the probability of a guess being wrong is
just the sum of the probabilities that any other guess is correct.
The probability that a guess is correct is based on a statistic that is
the sum of the logarithm of the expected letter pair frequencies of deduced
characters appearing next to the deduced or accepted characters. The mean and
variance of this statistic is a simple function of the number of characters
deduced, so it is possible to approximate the probability density distribution
of the statistic with a gaussian distribution. The probability (actually
probability density) that a guess is correct can be computed from the gaussian
distribution based on the number of standard deviations that the observed value
of the statistics differs from its expected value.
The minimum probability tells CBW to only accept the best guess if its
probability is higher than some value. The minimum probability comes into play
when very few letter pairs are deduced. In those cases it is better to skip
guessing until additional characters have been deduced.
This command can be used to automatically deduce about 100 of the 256
possible positions within a block. It is by far the most useful command in the
workbench. My usual strategy is to run the command three times on each block,
first with a cut-off of 1.5 and a minimum probability of 0.6, then with a
cut-off of 1.2 and a probability of 0.4, and finally with a cut-off of 1.0 and
a probability of 0.15.
An essential aspect of this command is that it carefully selects the
positions where it does guessing. After each guess it selects the position
that will deduce the maximum number of letters and letter pairs. Without the
careful searching, it is only slightly more effective than the equivalence
class command (which is now obsolete).
The method for computing the bigram probability is somewhat interesting.
Rather than having a 128 x 128 entry table, the program uses a 36 x 36 entry
table with a character mapping table. For example, the character map treats
upper and lower case letters the same, and it treats all left brackets and
parenthesis the same. To compensate for the information lost by treating upper
and lower case letters as the same, the program bases its guess probabilities
on the product (actually products are taken by summing logs) of the monogram
and bigram probabilities for the deduced characters. Compensating for the lost
information lead to a 30% increase in the number of correctly guessed
characters.
7.6. pwords
Given a list of words that are likely to be found in the text, this
command tries each word in every possible starting position, and accepts a
guess if the score for the deduced characters is within a given number of
standard deviations of its expected value. The list of words must be stored in
a file, one word per line, using standard C backslashing conventions for
control characters (e.g., "\n" for newline).
CBW used to have a command that accepted words from the keyboard, but I
found that users are more thorough in selecting words if they have to
separately edit a file. The result is a collection of reusable files that
becomes a permanent part of your workbench. For example, I have files for mail
headers, C programs, computer science articles, common short words, and various
text formatters. Of course in the spirit of a workbench, if you do not like
this limitation it is easy to add the desired command.
7.7. auto-trigram guessing
This command is obsolete. Use pword instead.
7.8. knit-blocks
This command is used to deduce the relationship between successive blocks
of the cipher. That relationship is stored in a permutation called Zee. Given
four or more partially filled in blocks, the knit command can be used to deduce
information about Zee. You can then use the propagate command to transfer what
you have solved in one block to any other block (see section 7.10). For the
theory of knitting see the Reeds and Weinberger paper.
The knitting processes is not deterministic unless you have completely
solved four blocks or partially solved more than five blocks. When knitting
blocks k through n, the knit command displays block n+1 in the decryption
window and in the guess window it displays the characters that would be deduced
by each knitting guess. You can use the F3 (or C-A) key to accept a guess
and/or the F2 to advance to the next guess.
The knit command is intolerant of incorrect decodings of a block. If you
are not sure of the plaintext value of a character (e.g., whether some position
contains a space or a tab character), it is best to leave character undecoded.
If any character is incorrectly decoded, then the knit command will run for
hours. If the command runs for more than five minutes, abort it with C-Z. A
clever algorithm might avoid this mis-feature.
The knit command displays how much of Zee is currently known.
7.9. Clear-zee
Sets the Zee permutation to a state where no information is known. This
is the only way to recover from a bad knitting.
7.10. propagate using Zee
If the Zee permutation is partially known, the propagate command can be
used to deduce unknown characters in one block from known characters in another
block. Its arguments are any two block numbers. The blocks do not have to be
adjacent. The destination block will be displayed in the decryption window
with the deduced characters highlighted.
8. Known Bugs
1. Due to a simple memory allocation strategy, only the first fifteen
or so blocks of a file can be examined.
2. CBW can only process full 256 character blocks, which means that it
cannot be used to decode very short files.
3. Trigram stats are no longer used, so they should not be loaded.
4. Replacing the knit window without accepting the current guess (by
say invoking the propagate command), causes the current knitting
guess to be accepted. This bug comes from an oversight in my window
package and is a bit of a pain to fix.
5. See the file TODO.txt for a list of suggested enhancements.
Acknowledgments
The author would like to thank Simson L. Garfinkel, Samuel M. Levitin,
Clifford Neuman, and Tim Shepard for the ideas and programs they contributed to
this project. Paul Paloski provided extensive comments on the alpha test
version of CBW, and John Gilmore enhanced it for portability.
I. Graphics Map
CBW uses non-ascii symbols to display non-printing characters such as
newline and tab. Each kind of terminal supports a different set of graphics
characters (if any), and even if two terminals support the same graphic (e.g.,
middle horizontal bar) they usually represent it by different sequences of
ascii characters. CBW handles these differences by using an explicit graphics
symbol map. The map converts an internal symbol identifier into a string of
ascii characters which cause the terminal to display the desired symbol.
The graphics map can be set by the user via a shell environment variable.
The map is build in three steps. CBW first looks in the termcap entry for this
terminal for the definitions that describes how to enter and exit the
terminal's standout mode (inverse video) and graphics mode. There must be a
definition for standout mode, but the graphics mode fields can be missing.
Next, a default graphics map is read from a compiled in string (see
terminal.h). The default map uses standout mode to represent symbols. For
example, the newline and tab characters are drawn by displaying an 'n' and a
't' in standout mode. A complete list of the defaults is given in table I-1.
-------------------------------------------------------------------------------
GRAPHIC USE
\St Tab characters.
\SX Non-Ascii character.
\Sn Newline character.
\Sr Return character.
\Sf Form feed character.
\SC Other control characters.
\S Plaintext character unknown (standout space).
\N^ Pseudo underline to highlight char above this one.
\N| Vertical line segment.
\N- Horizontal line segment.
\N` Lower-left corner of box.
Table I-1: Default graphic symbols
-------------------------------------------------------------------------------
The last step of building the graphics map is to read the shell variable
GRAPHICSMAP (if it is defined). This variable is formatted like a /etc/termcap
entry. It is a sequence of definitions separated by whitespace (spaces, tabs,
or newlines) or colons. Each definition has a label and a value. The label is
a string terminated by an '=' character. The label associated with each symbol
is given in table I-2. The value is a sequence of characters terminated by a
colon (or the end of the environment variable string). The first two
characters of the value string are special. The first character must be a
backslash ('\'). The second character specifies the mode that the terminal
should be in when the remainder of the value string is displayed. That
character must be one of 'N', 'S', or 'G' for normal, standout, or graphic
mode. The combined mode of standout plus graphics cannot be represented (this
could be fixed in terminal.c). The rest of the value characters can use the
standard backslash conventions (\n, \r, \t, \f, \\, and \177) with the addition
that \E will be translated into an escape character (\033).
-------------------------------------------------------------------------------
LABEL SYMBOL
tb Tab characters.
na Non-Ascii character.
lf Newline character.
cr Return character.
ff Form feed character.
cc Other control characters.
uk Plaintext character unknown.
ul Pseudo underline to highlight char above this one.
vb Vertical line segment (bar).
hb Horizontal line segment (bar).
ll Lower-left corner of box.
Table I-2: Symbol labels for graphic map
-------------------------------------------------------------------------------
The default map string and the h19 graphics map string are listed below to
illustrate the format.
Default Graphics Map String:
#define DGRAPHICS "tb=\\St:na=\\SX: lf=\\Sn:cr=\\Sr:\
ff=\\Sf:cc=\\SC:uk=\\S :::ul=\\N^:
hb=\\N-:vb=\\N|:ll=\\N`:"
H19 Graphics Map String:
setenv GRAPHICSMAP 'tb=\Gh:lf=\Gk:cr=\Gg:na=\Gi:\
ff=\G~:cc=\Gw:uk=\G^:ul=\Gz:\
hb=\G'\`':vb=\Ga:ll=\Ge'
To make up a new graphics map, you should first find out the range of
symbols that your terminal can display. Just put your terminal in graphics
mode (see the 'as' and 'ae' definitions in the termcap for your terminal) and
send it all the ascii character The file /projects/Ecrypt/graphics does this
for terminal that use \EF and \EG to enter and exit graphics mode (vt100 and
h19).
II. Key Map
CBW uses a keymap to convert arbitrary sequences of ascii characters into
keystroke commands. The information in the map comes from the shell
environment variable KEYMAP, the termcap entry for the user's terminal, and
from a compiled in string that specifies the default map (see terminal.h). The
shell variable overrides the information from the other sources.
The termcap entry for the terminal should provide all the information
needed by CBW. It should not be necessary to use the KEYMAP variable. The
termcap entry can come from the environment variable TERMCAP, or from the file
/etc/termcap based on the value of the variable TERM.
The format of the key map environment variable string is the same as the
format of the graphics map variable except that the first two character of the
value field are not treated specially. Each definition in the string binds a
keystroke command to some sequence of ascii characters. Note that the same
command can be bound to two or more sequences. The label portion of the
definition identifies the command and the value portion gives the character
sequence. Table II-1 lists the label to use for each keystroke command. Many
of the default bindings come the termcap entry. In those cases, the label for
the definition in the termcap entry is given. The command labels used in
KEYMAP are different from the terminal capability labels used in TERMCAP.
-------------------------------------------------------------------------------
LABEL COMMAND
up Move cursor up.
do Move cursor down.
le Move cursor left.
ri Move cursor right.
re Redraw screen.
un Undo last guess accepted.
cl Clear command line.
ws Word search (not fully implemented).
df Delete current character and move forward.
db Move left and delete that character.
pr Previous block or guess.
ne Next block or guess.
ac Accept guess.
ex Execute command line or insert newline.
ta Try all characters in this position.
jc Jump to command line.
Table II-1: Command labels for key map
-------------------------------------------------------------------------------
Several termcap definitions are required by CBW and several others are
used if they are present. CBW will print a message and exit if one of the
required definitions is missing. Table II-2 list all the definitions used by
CBW.
-------------------------------------------------------------------------------
LABEL USE
is Terminal initialization string.
ce Erase to end of line.
cd Erase to end of screen.
cl Erase whole screen.
cm Cursor motion.
as Start graphics mode.
ae End graphics mode.
so Start standout mode.
se End standout mode.
ks Start send keypad escapes.
ke End send keypad escapes.
k1 The f1 key.
k2 The f2 key.
k3 The f3 key.
k4 The f4 key.
ku Up arrow.
kd Down arrow.
kl Left arrow.
kr Right arrow.
Table II-2: TERMCAP definitions used by CBW
-------------------------------------------------------------------------------
III. Command Summary
KEYSTROKE COMMAND
C-L Redraw the whole screen. As a side effect this erases the list
of characters produced by C-T.
C-X, F4 Jump to command input line.
C-Q Quote. Insert the following character.
C-Z Abort the current command, restore the terminal modes, suspend
the program and return to the shell. If the program is
restarted, the terminal will be initialized for CBW and the
screen redrawn. The command that was active (if any) when the
C-Z was received is aborted. Actually this command is invoked
when CBW receives a SIGTSTP signal.
C-C Restore the terminal modes and kill the CBW process. Actually
this command is invoked when CBW receives a SIGINT signal.
C-D, DEL Delete character forward, delete backward. This forces a
permutation wiring to be removed. It can do unexpected things.
Consider a permutation with A wired to B and P wired to
Q. Adding a wire from B to P, then removing it with the DEL
command does not restore the original AB and PQ wirings.
C-G Undo. Backup the permutation to the state it had before the
last contiguous sequence of keystrokes. Arrow commands
terminate a sequence. In most cases, undo should be used
instead of DEL or C-D. The cursor must be in either the guess
block or the decryption block window.
C-T Try-all. Display a list of the ascii characters that could be
put in the current cursor position without conflicting with the
existing permutation wirings. The list is sorted with the most
likely character first (leftmost). The cursor must be in the
decryption block window.
F1, C-R Move to the previous block of 256 characters. Decryption block
window only.
F2, C-S Advance to next block of 256 characters. Decryption block
window only.
F2, C-S Goto next guess. Guess block window only.
F3, C-A Accept guess, merge it into the current decryption block. This
does not automatically advance to the next guess. Guess block
window only.
C-F, C-B Move forward and backward one character.
C-D, DEL Delete one character forward or backward.
F2, C-S Next-Argument. It searches for the first "%" character in the
command line, deletes it and leaves the cursor there. This is
used with command templates.
Space Within the first word on the command line, a space character
invokes command completion. A completed command string is a
template command with argument slots (marked by %) and
suggested argument values.
Return Execute command. The cursor can be anywhere within the command
line.
-------------------------------------------------------------------------------
USER COMMANDS
quit-program permanently
auto-trigram max_dev: % min_total_chars: % min_wire_chars: %
knitting using blocks from: % to: % Min show count: %
lookup-pattern: % in dictionary
equivalence-class guess, use accept level: % (try 2.0)
pwords-from file: % Max dev: % (try 1.0)
load-permutations
save-permutations
clear-zee permutation
propagate-info from: % to: % using Zee
bigram-guess level: % (2.0), min_prob: % (0.15)
IV. Tutorial
The following is a step by step sequence of commands to compile and
exercise the Crypt Breakers Workbench. It demonstrates how easily crypt files
can be broken.
1. Edit stats.slice to set the name of the directory that contains the
statistics files. Statistics for scribe documents are included with
the source files.
The stats.slice file also defines the location of the dictionary
used by the lookup-pattern command. The default dictionary is
/usr/dict/words. The dictionary is just a list of words, one per
line. Case does not matter.
2. Execute 'source stats.slice' to initialize the necessary shell
variables.
3. If there is a .slice file for your terminal type (e.g., vt100.slice,
or h19.slice), execute source on that file. This initializes the
graphics map and key map.
4. Print out cbw.doc, so you can read it after you have decided that
you can't figure out how the program works.
5. Copy test3.perm and .cipher to foo.perm and foo.cipher. The .txt
files contain the original plaintext.
6. Execute 'cbw foo'.
7. The cursor will on the command line. Use the arrow keys (or C-P,
C-N, C-F, C-B) to move the cursor to the upper lefthand position of
the decryption window. Try typing '@Device[Dover]'. Notice that
most of the characters you type deduced other characters in the
block.
8. The 'D' in 'Dover' is wrong. Using the arrow keys, position the
cursor over the 'D' and type 'd'.
9. Advance to the position after the ']' and type C-T. A list of
possible characters for this position will be displayed. The list
is sorted with the most likely character on the left. Notice that
many characters are not possible because they would deduce non-ascii
characters elsewhere in the block, or they would conflict with
previously accepted guesses.
Try guessing tab, comma and linefeed for the character after the
']'. Use C-G to undo each guess. Delete and C-D do not restore the
old state, they just erase the wiring that was deduced by a
particular character.
10. Move the cursor down to the command line. You can use emacs cursor
characters (C-N, C-P, C-F, C-B) or the arrow keys. Unfortunately,
C-U does not work as in emacs. The C-X key or F4 will jump directly
to the command line.
11. Type 'pw '. The space will cause command completion for the
probable-word guessing command. Type F2 (or C-S) to advance to the
first argument, and enter the file name 'mss.words'. That file
contains a list of keywords used by the Scribe (Finalword) text
formatter. Press F2 to advance to the second argument, which
specifies a cut-off level for automatically accepting guesses. The
level is the maximum number of standard deviations that the scoring
function can be away from its expected value. Enter 1.2, and press
return to invoke the command.
12. A partially filled in block will appear in the guessing window. To
accept the result of this command, press F3 (or C-A).
13. Try the pword guess command again with a level of 3. To do this,
just move to the command line, change the 1.2 to a 3, and press
return. Again F3 accepts the guess. If some guesses look wrong
(such as the 'F' on the second line under the '[Article]'), you can
correct them using the editor in the decryption block window.
14. Advance to block 1 of the file by moving the cursor to the
decryption window and pressing F2 (or C-S). F1 (or C-R) moves back
one block, F2 moves ahead one block.
15. The second block is likely to be plain english with few scribe
commands. Move to the command window, type C-U to erase the line
and type 'bi ' to setup the bigram guessing command. Try an
acceptance level of 1.0 and a minimum probability of 0.6. Type
return to invoke the command.
16. After a short wait (~15 seconds), a partial block will appear.
Accept the guess with the F3 key in the guessing window.
17. Try looking up a pattern in the dictionary. In the command window
type 'look ', use F2 to advance to the pattern, and type in the
pattern '....llit.', and press return. This will match against the
word 'satellite' if it is in you site's dictionary.
18. One could go on like this, but let's skip ahead by loading in a
previously saved state. Invoke the load command (it loads the file
foo.perm, just as save dumps to foo.perm (making this command take a
filename is a good implementation exercise)). Type C-U, 'load ',
and return. Notice that all the work so far is replaced by the
information in the .perm file. This can be considered a feature.
19. Use the F1 and F2 keys in the decryption window to view the blocks
that have been solved. Notice that a fully solved block does not
specify all the wirings of the block key (permutation). Typically
only 105 of the 128 wires are used.
20. Lets try deducing the inter-block relationship (zee). Execute the
clear-zee command.
21. Execute knit blocks 0 through 3 with a min show count of 20. This
means that the program should only show you guesses that deduce 20
or more wirings of the zee permutation. Press return to invoke this
guessing strategy. The cursor will move to the guessing window.
Press F2 to start the first round of guessing.
The running time of the knit command is exponential in the number of
blocks knitted, and it will run for a very long time if any of the
blocks are decrypted incorrectly. This means that it is better to
leave a position blank than to guess at it.
22. The program moves to block 4 and shows you the characters in block 4
that would be deduced from block 3 given the deduced wirings of zee.
If these look reasonable, press F3 to accept the guess, and press F2
to try the next guess. To reject a guess, press F2 without pressing
F3.
23. Now that part of zee is known, try propagating the settings of block
1 into block 0 using zee. The propagate command will do this. Also
try propagating blocks 2, 3, and 4 to zero.
Notice that the number of known wires can increase without deducing
additional characters in the block.
24. There should be a command that propagates information between groups
of blocks, but for now you must do this one block at a time. After
propagating block 1 - 4 to block zero, and block zero to blocks 1 -
4, et cetera, try the knit command again.
25. Propagate block 4 to 5 to provide a framework to evaluate new
guesses.
26. Knit block 0 through 4 with a minimum show level of 2. You may want
to skip accepting guesses that do not deduce any characters. Repeat
this process with a show level of 1.
27. The program should now know all 256 wirings of zee. Repeat the
process of propagating information between blocks until all 128
wires of the block zero are known.
28. Save your work with the save-permutations command (on the command
line type 'sa ' and press return). This writes the file foo.perm.
29. Exit the program with the quit command.
30. Copy foo.perm to zeecode.perm.
31. Execute 'zeecode < foo.cipher | more '. This program reads CBW's
save file to find the permutation for block zero and the Zee
permutation that describes how the remaining block keys are