An example of frequency analysis. No need to write this, there's The Gold Bug by E.A.Poe. It's even in the public domain. Yay!
[[[name:goldbug section:20
Here Legrand, having re-heated the parchment, submitted It my inspection. The following characters were rudely traced, in a red tint, between the death's-head and the goat:
53++!305))6*;4826)4+.)4+);806*;48!8`60))85;]8*:+*8!83(88)5*!;
46(;88*96*?;8)*+(;485);5*!2:*+(;4956*2(5*-4)8`8*; 4069285);)6
!8)4++;1(+9;48081;8:8+1;48!85;4)485!528806*81(+9;48;(88;4(+?3
4;48)4+;161;:188;+?;
"But," said I, returning him the slip, "I am as much in the dark as ever. Were all the jewels of Golconda awaiting me on my solution of this enigma, I am quite sure that I should be unable to earn them."
"And yet," said Legrand, "the solution is by no means so difficult as you might be led to imagine from the first hasty inspection of the characters. These characters, as any one might readily guess, form a cipher --that is to say, they convey a meaning; but then, from what is known of Kidd, I could not suppose him capable of constructing any of the more abstruse cryptographs. I made up my mind, at once, that this was of a simple species --such, however, as would appear, to the crude intellect of the sailor, absolutely insoluble without the key."
"And you really solved it?"
"Readily; I have solved others of an abstruseness ten thousand times greater. Circumstances, and a certain bias of mind, have led me to take interest in such riddles, and it may well be doubted whether human ingenuity can construct an enigma of the kind which human ingenuity may not, by proper application, resolve. In fact, having once established connected and legible characters, I scarcely gave a thought to the mere difficulty of developing their import.
"In the present case --indeed in all cases of secret writing --the first question regards the language of the cipher; for the principles of solution, so far, especially, as the more simple ciphers are concerned, depend on, and are varied by, the genius of the particular idiom. In general, there is no alternative but experiment (directed by probabilities) of every tongue known to him who attempts the solution, until the true one be attained. But, with the cipher now before us, all difficulty is removed by the signature. The pun on the word 'Kidd' is appreciable in no other language than the English. But for this consideration I should have begun my attempts with the Spanish and French, as the tongues in which a secret of this kind would most naturally have been written by a pirate of the Spanish main. As it was, I assumed the cryptograph to be English.
"You observe there are no divisions between the words. Had there been divisions, the task would have been comparatively easy. In such case I should have commenced with a collation and analysis of the shorter words, and, had a word of a single letter occurred, as is most likely, (a or I, for example,) I should have considered the solution as assured. But, there being no division, my first step was to ascertain the predominant letters, as well as the least frequent. Counting all, I constructed a table, thus:
Of the character 8 there are 33.
" ; " 26.
" 4 " 19.
" + " 16.
" ) " 16.
" * " 13.
" 5 " 12.
" 6 " 11.
" ! " 8.
" 1 " 8.
" 0 " 6.
" 9 " 5.
" 2 " 5.
" : " 4.
" 3 " 4.
" ? " 3.
" ` " 2.
" - " 1.
" . " 1.
"Now, in English, the letter which most frequently occurs is e. Afterwards, the succession runs thus: a o i d h n r s t u y c f g l m w b k p q x z. E however predominates so remarkably that an individual sentence of any length is rarely seen, in which it is not the prevailing character.
"Here, then, we have, in the very beginning, the groundwork for something more than a mere guess. The general use which may be made of the table is obvious --but, in this particular cipher, we shall only very partially require its aid. As our predominant character is 8, we will commence by assuming it as the e of the natural alphabet. To verify the supposition, let us observe if the 8 be seen often in couples --for e is doubled with great frequency in English --in such words, for example, as 'meet,' 'fleet,' 'speed, 'seen,' 'been,' 'agree,' &c. In the present instance we see it doubled less than five times, although the cryptograph is brief.
"Let us assume 8, then, as e. Now, of all words in the language, 'the' is the most usual; let us see, therefore, whether they are not repetitions of any three characters in the same order of collocation, the last of them being 8. If we discover repetitions of such letters, so arranged, they will most probably represent the word 'the.' On inspection, we find no less than seven such arrangements, the characters being ;48. We may, therefore, assume that the semicolon represents t, that 4 represents h, and that 8 represents e --the last being now well confirmed. Thus a great step has been taken.
"But, having established a single word, we are enabled to establish a vastly important point; that is to say, several commencements and terminations of other words. Let us refer, for example, to the last instance but one, in which the combination ;48 occurs --not far from the end of the cipher. We know that the semicolon immediately ensuing is the commencement of a word, and, of the six characters succeeding this 'the,' we are cognizant of no less than five. Let us set these characters down, thus, by the letters we know them to represent, leaving a space for the unknown--
t eeth.
"Here we are enabled, at once, to discard the 'th,' as forming no portion of the word commencing with the first t; since, by experiment of the entire alphabet for a letter adapted to the vacancy we perceive that no word can be formed of which this th can be a part. We are thus narrowed into
t ee,
and, going through the alphabet, if necessary, as before, we arrive at the word 'tree,' as the sole possible reading. We thus gain another letter, r, represented by (, with the words 'the tree' in juxtaposition.
"Looking beyond these words, for a short distance, we again see the combination ;48, and employ it by way of termination to what immediately precedes. We have thus this arrangement:
the tree ;4(+?34 the,
or substituting the natural letters, where known, it reads thus:
the tree thr+?3h the.
"Now, if, in place of the unknown characters, we leave blank spaces, or substitute dots, we read thus:
the tree thr...h the,
when the word 'through' makes itself evident at once. But this discovery gives us three new letters, o, u and g, represented by + ? and 3.
"Looking now, narrowly, through the cipher for combinations of known characters, we find, not very far from the beginning, this arrangement, 83(88, or egree, which, plainly, is the conclusion of the word 'degree,' and gives us another letter, d, represented by !.
"Four letters beyond the word 'degree,' we perceive the combination
46(;88*.
"Translating the known characters, and representing the unknown by dots, as before, we read thus:
th.rtee.
an arrangement immediately suggestive of the word 'thirteen,' and again furnishing us with two new characters, i and n, represented by 6 and *.
"Referring, now, to the beginning of the cryptograph, we find the combination,
53++!.
"Translating, as before, we obtain
.good,
which assures us that the first letter is A, and that the first two words are 'A good.'
"To avoid confusion, it is now time that we arrange our key, as far as discovered, in a tabular form. It will stand thus:
5 represents a
! " d
8 " e
3 " g
4 " h
6 " i
* " n
+ " o
( " r
; " t
"We have, therefore, no less than ten of the most important letters represented, and it will be unnecessary to proceed with the details of the solution. I have said enough to convince you that ciphers of this nature are readily soluble, and to give you some insight into the rationale of their development. But be assured that the specimen before us appertains to the very simplest species of cryptograph. It now only remains to give you the full translation of the characters upon the parchment, as unriddled. Here it is:
'A good glass in the bishop's hostel in the devil's seat twenty-one degrees and thirteen minutes northeast and by north main branch seventh limb east side shoot from the left eye of the death's-head a bee line from the tree through the shot fifty feet out.'"
"But," said I, "the enigma seems still in as bad a condition as ever. How is it possible to extort a meaning from all this jargon about 'devil's seats,' 'death's-heads,' and 'bishop's hostel'?"
"I confess," replied Legrand, "that the matter still wears a serious aspect, when regarded with a casual glance. My first endeavor was to divide the sentence into the natural division intended by the cryptographist."
"You mean, to punctuate it?"
"Something of that kind."
"But how was it possible to effect this?"
"I reflected that it had been a point with the writer to run his words together without division, so as to increase the difficulty of solution. Now, a not overacute man, in pursuing such an object, would be nearly certain to overdo the matter. When, in the course of his composition, he arrived at a break in his subject which would naturally require a pause, or a point, he would be exceedingly apt to run his characters, at this place, more than usually close together. If you will observe the MS., in the present instance, you will easily detect five such cases of unusual crowding. Acting on this hint, I made the division thus:
'A good glass in the bishop's hostel in the devil's --twenty-one degrees and thirteen minutes --northeast and by north --main branch seventh limb east side --shoot from the left eye of the death's-head --a bee-line from the tree through the shot fifty feet out.'"
"Even this division," said I, "leaves me still in the dark."
]]]
Frequncy of letters in English:
ETAOINSHRDLCUMWFGYPBVKJXQZ
The most common digrams in English:
TH HE IN ER AN RE ED ON ES ST
The most common trigrams in English:
THE ING AND HER ERE ENT THA NTH WAS ETH
It's worth noting that Morse code was designed to be as compact as possible. Terefore, the most common letters have shortest Morse representations:
E .
T -
A .-
O ---
I ..
N -.
S ...
H ....
R .-.
D -..
L .-..
C -.-.
U ..-
M --
W .--
F ..-.
G --.
Y -.--
P .--.
B -...
V ...-
K -.-
J .---
X -..-
Q --.-
Z --..