We can now encipher our specimen text as follows:
Variants of this cipher were used by the British army as a field cipher in World War I, and by the Americans and Germans in World War II. It's a substantial improvement on Vigenère as the statistics that an analyst can collect are of digraphs (letter pairs) rather than single letters, so the distribution is much flatter and more ciphertext is needed for an attack.
Again, it's not enough for the output of a block cipher to just look intuitively “random”. Playfair ciphertexts look random; but they have the property that if you change a single letter of a plaintext pair, then often only a single letter of the ciphertext will change. Thus using the key in Figure 5.7, rd
enciphers to TB
while rf
enciphers to OB
and rg
enciphers to NB
. One consequence is that given enough ciphertext, or a few probable words, the table (or an equivalent one) can be reconstructed [740]. In fact, the quote at the head of this chapter is a Playfair-encrypted message sent by the future President Jack Kennedy when he was a young lieutenant holed up on a small island with ten other survivors after his motor torpedo boat had been sunk in a collision with a Japanese destroyer. Had the Japanese intercepted it, they might possibly have decrypted it, and history could be different. For a stronger cipher, we will want the effects of small changes in the cipher's input to diffuse completely through its output. Changing one input bit should, on average, cause half of the output bits to change. We'll tighten these ideas up in the next section.
The security of a block cipher can also be greatly improved by choosing a longer block length than two characters. For example, the Data Encryption Standard (DES), which is widely used in payment systems, has a block length of 64 bits and the Advanced Encryption Standard (AES), which has replaced it in most other applications, has a block length of twice this. I discuss the internal details of DES and AES below; for the time being, I'll just remark that we need more than just an adequate block size.
For example, if a bank account number always appears at the same place in a transaction, then it's likely to produce the same ciphertext every time a transaction involving it is encrypted with the same key. This might allow an opponent to cut and paste parts of two different ciphertexts in order to produce a valid but unauthorised transaction. Suppose a crook worked for a bank's phone company, and monitored an enciphered transaction that he knew said “Pay IBM $10,000,000”. He might wire $1,000 to his brother causing the bank computer to insert another transaction saying “Pay John Smith $1,000”, intercept this instruction, and make up a false instruction from the two ciphertexts that decrypted as “Pay John Smith $10,000,000”. So unless the cipher block is as large as the message, the ciphertext will contain more than one block and we'll need some way of binding the blocks together.
5.2.4 Hash functions
The third classical type of cipher is the hash function. This evolved to protect the integrity and authenticity of messages, where we don't want someone to be able to manipulate the ciphertext in such a way as to cause a predictable change in the plaintext.
After the invention of the telegraph in the mid-19th century, banks rapidly became its main users and developed systems for transferring money electronically. What's ‘wired’ is a payment instruction, such as:
‘To Lombard Bank, London. Please pay from our account with you no. 1234567890 the sum of £1000 to John Smith of 456 Chesterton Road, who has an account with HSBC Bank Cambridge no. 301234 4567890123, and notify him that this was for “wedding present from Doreen Smith”. From First Cowboy Bank of Santa Barbara, CA, USA. Charges to be paid by us.’
Since telegraph messages were relayed from one office to another by human operators, it was possible for an operator to manipulate a payment message.
In the nineteenth century, banks, telegraph companies and shipping companies developed code books that could not only protect transactions but also shorten them – which was important given the costs of international telegrams at the time. A code book was essentially a block cipher that mapped words or phrases to fixed-length groups of letters or numbers. So “Please pay from our account with you no.” might become ‘AFVCT’. Sometimes the codes were also enciphered.
The banks realised that neither stream ciphers nor code books protect message authenticity. If, for example, the codeword for ‘1000’ is ‘mauve’ and for ‘1,000,000’ is ‘magenta’, then the crooked telegraph clerk who can compare the coded traffic with known transactions should be able to figure this out and substitute one for the other.
The critical innovation, for the banks’ purposes, was to use a code book but to make the coding one-way by adding the code groups together into a number called a test key. (Modern cryptographers would describe it as a hash value or message authentication code, terms I'll define more carefully later.)
Here is a simple example. Suppose the bank has a code book with a table of numbers corresponding to payment amounts as in Figure 5.8.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
x 1000 | 14 | 22 | 40 | 87 | 69 | 93 | 71 | 35 | 06 | 58 |
x 10,000 | 73 | 38 | 15 | 46 | 91 | 82 | 00 | 29 | 64 | 57 |
x 100,000 | 95 | 70 | 09 | 54 | 82 | 63 | 21 | 47 | 36 | 18 |
x 1,000,000 | 53 | 77 | 66 | 29 | 40 | 12 | 31 | 05 | 87 | 94 |
Figure 5.8: A simple test key system
Now in order to authenticate a transaction for £376,514 we might add together 53 (no millions), 54 (300,000), 29 (70,000) and 71 (6,000) ignoring the less significant digits. This gives us a test key of 207.
Most real systems were more complex than this; they usually had tables for currency codes, dates and even recipient account numbers. In the better systems, the code groups were four digits long rather than two, and in order to make it harder