Therefore the cipher is substitution; and if substitution, then simple substitution, since double substitution (for reasons that will be given later) would very likely have more than twenty letters represented, and would show no such violent variations in frequency as the drop from 13 P’s to a single M, H, and V.
Having cleared the track by identifying the cipher according to type, the cryptographer now turns to his table of letter frequencies. (Table I) Here he finds that E is the most frequent letter in English with T next. It seems likely, therefore that P is E and S is T. In a message as short as this the order of the first two may be reversed; but it will be noted that the P’s are scattered fairly evenly through the message while the S’s tend to bunch, which would indicate the correctness of the P = E solution. Consonants group themselves; vowels invariably scatter. The message is accordingly rewritten with the provisional values P = E and S = T in the proper places below:
This seems very reasonable; it contains no impossible linguistic combinations and the spacing of the T’s and E’s appears what it should be in normal text. Following T and E the next letters in the alphabet in order of frequency are A, O, N, R, I and S. In the message under consideration this corresponds very well with the high frequencies of the letters K, Q, R, Z, C, and E; but both in message and in frequency table these six letters are so closely grouped that it would be very difficult to tell which was which without extensive experiment along trial and error lines.
The cryptographer therefore takes a short cut by consulting the table of trigrams, or three-letter groups. (Table XII) These show that the is overwhelmingly the most frequent three-letter combination in the language; and further, that it is very rare to find any letter but H standing between T and E. In the present message the combination T-blank-E occurs twice, in groups 1 and 5. In both cases the blank is represented by the same letter of the cipher (Z), and on frequency Z can well represent H.
But if Z = H, then Q is probably R or S; for with the insertion of the H group 1 reads THE-blank-E, which is a strong possibility for THERE or THESE. Q, which fills the blank could be, on frequency, either R or S. However in group 10 the combination ZPQ occurs and the ZP has been solved as HE; and the same combination is repeated in groups 19–20. Reference to the trigram table shows that HER is one of the most common in the language, while HES is relatively rare. The balance of the probabilities thus favors the hypothesis that:
Q = R
and it is accordingly filled in that way.
In groups 17–18 occurs the combination SESTSE. This has been partially solved to read T-blank-T-blank-T-blank, which, with the repeated E’s, constitutes a pattern word. The cryptographer therefore looks at his table of pattern-words (Table XI) and discovers that this pattern usually means TITUTI or TETATE. Since P = E in this cipher, the pattern must represent the first of these two combinations, which yields the values:
E = I; T = U
both of which check very well by the frequency table for the message. These values are now filled in, and it becomes evident that the cipher is near complete solution.
Of the little group of letters that showed high frequencies in the message there now remain unsolved R, C, K and J; of high-frequency letters for which no values in the message have been found there remain A, O, N and S. Two of these drop into place with the acceptance of the TITUTI combination, which can hardly end in anything but ON, yielding the equations:
K = O; J = N
If this be correct, groups 1–2 now read THERE I-NOR, or, dividing it into words along the obvious lines, THERE I- NO R, which makes it apparent that:
R = S
This leaves only one letter in the high-frequency group (A) to be accounted for and only one letter of high frequency in the message (C); and unless there be some strong reason to the contrary the cryptographer can assume that:
C = A
Once more filling in, with the obvious word divisions indicated, the following result is obtained:
It is apparent how nearly this finishes the task. Obviously nothing will do at the end of group 11 but the letters L and D, to complete the word should, which gives the equations:
F = L; M = D
Similarly replacing the G of the message in group 6 with M yields a satisfactory result, and the U’s in groups 4 and 14 work out nicely as V’s. LON-blank in groups 13–14 now becomes clear as LONG, and H = B is required in group 17. The remainder can now be filled in:
V = W; X = Y; L = P; I = C.
The message is solved and the cryptographer now draws up his table of equivalents:
The key-word was evidently “chimpanzee” with the final E dropped off (repetitions are not permissible) and it is now possible to fill in the whole table and wait for the appearance of the next message written with the same key.
V
This is the basic method in all decipherments of substitution ciphers. Admittedly the example shows the process at its shortest and simplest. In normal English text the alphabetic frequency tables are unreliable until two hundred or more letters have been reached in one message or two or three written with the same key. Still with the backing of the bigram and trigram tables any cryptographer can dismember any simple substitution cipher in a few minutes and with a minimum of trials. The fact was evidently widely known when Sicco Simonetta wrote that first book on ciphers, for he included alphabetic frequency tables in it.
At the same time the earliest codes were also being found wanting. In the sense that they are mentioned earlier, they antedate ciphers; and, like ciphers, appear to have grown out of the then new custom of keeping at foreign courts resident ambassadors who found it necessary to send reports home and ask for instructions. Venice and the Papal Curia were the first powers to use resident ambassadors; and, though the latter may well have used means of secret communication before the republic of the lagoons, the oldest reference to any code system is in an instruction to the Venetian ambassador at the Court of Austria. In his dispatches it is ordered to refer to the Doge as “V,” the King of Hungary as “P” and the Pope as “Q.”
The context alone would apparently betray the secrets of such messages if they were intercepted, an observation with which the Venetians evidently agreed; for only twelve years after this first crude code, another Venetian instruction orders the city’s diplomats to refer to important personages by periphrasis—that is, to speak of Austria as the “Sun,” since the sun rises in the east; the east = Ost = Österreich; and to replace all the verbs in their dispatches with meaningless words according to a regular system outlined in a little code dictionary they were given when setting out on a mission.
Then comes another of those gaps in cipher history, followed by the appearance of the Trattati in Cifra of G. di Lavinde in 1480. It is quite a remarkable book, showing cryptography already in a state of considerable development, for he recommends a method of decipherment by attacking the vowels, which is still classic for the Latin languages, and a method of defeating this decipherment, by first