IM-QPP: REF IRON MOUTH ARTICLE

VIGENERE CIPHERS:

WHAT THEY ARE AND HOW TO SOLVE THEM

Iron Mouth

The Vigenere Cipher

The Vigenere is a well-known, rather old method of encipherment (some say dating from the 16th century). The Vigenere is a polyalphabetic cipher, meaning that it uses more one cipher alphabet, rotated in regular patterns according to a keyword. Recall that, in the simple substitution cipher, one uses a single alphabet for the entire encipherment process, substituting one and only cipher letter each time a letter in plaintext appears. Obviously, because it is a feature of many languages that letter combinations repeat themselves with greater or lesser degrees of frequency, this makes the simple substitution cipher rather easy to break, except in the case of very short messages and/or those in which heroic paroxysms of text manipulation are performed either before or after the text is enciphered (Q makes numerous tantalizing references to these latter operations throughout the Quiller corpus, calling them "transfers," "dupes," "nulls," and so on).

The theory behind the polyalphabetic cipher is that the encipherer gains a twofold advantage, in terms of security, over the simple substitution cipher. First, by changing substitution alphabets regularly, one reduces the likelihood that a repeated cipher sequence can be identified solely by casual observation of the ciphered text. In other words, the common English language diagram "ER," which might be enciphered each and every time in a simple substitution cipher as "AR," in a polyalphabetic (such as the Vigenere), might appear once as "FS," at another point as "LY," at another as "SF," and so on, depending upon which of the simple substitution alphabets is used to encipher each occurrence of "ER." To clarify these points, let us turn to an example of encipherment using the Vigenere.

Enciphering Using the Vigenere Table

The chief disadvantage of the Vigenère as a field cipher is immediately seen in the Vigenere Table (Figure 1) which is generally employed both for encipherment and decipherment. (Note: for your convenience, I've provided a more abbreviated version of the Table which is more suitable for printing and use in decrypting the Vigenere cryptogram for this month.) Although it is not essential to use the Table to encipher (as will be evident, for any given cipher, seldom is more than a small portion of the whole Table used), other methods of encipherment (and accompanying decipherment) are perhaps even more awkward, necessitating extensive writing and the accompanying risk of an opposition party discovering the method likely to have been used by intelligence agents to encode messages. Quiller's frequently expressed contempt for agents who court danger by carrying the key to a cipher on them through a mission is not at all misplaced.

It is apparent that the Vigenere Table is in fact comprised of twenty-six alphabets which can be read either vertically or horizontally, each of which begins with a different letter of the alphabet and runs through the letter "Z," beginning again with the letter "A" and running until there are twenty-six letters in the alphabet. For each set of vertical or horizontal lines, there will be an "A" alphabet, a "B" alphabet, a "C" alphabet, and so on through the "Z" alphabet.

To encipher a plaintext message using the Vigenere Table, one begins by choosing a keyword long enough to ensure a suitable number of substitution alphabets to render the cipher more difficult to break (usually a keyword of from four to ten letters is used). Unlike the keyword for the Playfair, it is not necessary for the Vigenere keyword to be devoid of repeated letters, although of course the more alphabets one uses, the greater the security, and repeated letters mean that one uses repeated alphabets, thereby enhancing the probability of detection of repeated sequences. For purposes of illustration let us choose the keyword, "SHADOW."

In the traditional method of encipherment, the first step involves writing the keyword over and over again above the plaintext. Let us say that our plaintext message is Loman's pedantic assessment of what will happen to the U.K. should the Tango Victor wreck be discovered before it can be destroyed: "The image of the U.K. will receive a certain degree of damage. Regrettably, a criminal element has been manufacturing and selling a deadly chemical warfare material. Nothing more. We shall hope to avoid the disastrous outcome of much more serious revelations." [TTB, 18:179] The first portion of the plaintext would be prepared as in Figure 2.

To encipher the first letter of the plaintext, and each seventh letter thereafter, go to the Vigenere Table (Figure 1) and locate the horizontal line beginning with the letter "S" (called the "key alphabet S"), and follow it over until you find the appropriate letter of the plaintext (at the top of the Table). The letter you find at the intersection of the key row and the plaintext row is the letter to be placed in the ciphered message. Enciphering with the "S" key-alphabet, we obtain the results in Figure 3.

Next, one enciphers the second letter, and each seventh letter thereafter, according to the "H" alphabet (Figure 4), and so on, until all six key alphabets are used to encipher all of the plaintext (Figure 5).

It will be evident that the advantage of increased security over simple substitutions, said to characterize the periodic cipher, certainly holds true here. In the message above, notice that the plaintext letter "E" is variously enciphered as "E" (in the "A" alphabet), "L" (in the "H" alphabet), "W" (in the "S" alphabet), "A" (in the "W" alphabet), and so on. Thus, although the cipher letter for "E" in the simple substitution cipher might be identified simply on the basis of its being the most frequent letter in the coded text ("E" being the most frequent letter in English), that is certainly not the case when the plaintext "E" can conceivably be represented by any one of six cipher letters.

How to "Crack" a Vigenere Cipher: The Kasiski Method

For nearly three centuries, the Vigenere (and similar ciphers) were regarded as fairly safe. In 1863, all that changed, with the publication of findings by the German cryptanalyst, Major F. W. Kasiski.

From a mathematical point of view, the Kasiski method is elegant and beautiful. It is based on the seemingly self-evident fact that, given a message of sufficient length, encoded by a periodic cipher, repeated sequences in the plaintext will match up with repeated sequences in the keyword text. In other words, sequences of enciphered text will repeat themselves, just as do letter sequences in the simple substitution cipher. To return to the fully encoded passage we referenced earlier (Figure 6), we can see this principle in operation. Notice the addition of five-letter numerical markers to the text; this becomes vitally important in future operations.

Notice that the ciphered text "HCB" (underlined) appears at position 40, and is repeated at position 184. Notice also that the plaintext being enciphered at each position (also underlined) is the same: "EOF." This is a characteristic to be found in all periodic ciphers of sufficient length to afford the opportunity for matched letter combinations to repeat themselves.

Kasiski was the first publicly to make the observation that any distance between repeated sequences could by factored into its principal components, and further, that the most frequently occurring factor, over all factored distances, would reveal the length of the keyword (referred to as its "period"). For example, to determine the distance between the trigrams HCB in the example just cited, we merely subtract 40 from 184, obtaining the number 144. That particular number happens to have a relatively large number of factors: 2, 3, 4, 6, 8, 9, 12, 16, and 18. Because we are dealing with an example whose keyword is already known, we know that one of these numbers, 6, is in fact the correct period for the cipher, which makes use of six simple substitution cipher schemes keyed by the six-letter word "SHADOW."

One might ask: of what use is it to know the period of the cipher? Remember that we are dealing with what are, in essence, six simple substitution ciphers. If we know the correct number of simple substitution cipher alphabets used, then we can break the ciphered text down into six coded segments (the first segment comprised of the first letter and each seventh subsequent letter thereafter, the second segment comprised of the second letter and each seventh subsequent letter thereafter, and so on), rather than one complete message, and conduct a frequency count for each of the six segments. Under these circumstances, the "cracking" of each of the six alphabets becomes much easier, not to mention the fact that correctly "cracking" each simple substitution cipher gives you a letter of the keyword. Many times, with two or three letters in hand, the keyword can be accurately guessed, thereby rendering the ciphered message as effectively broken.

The most difficult part of this process is that which confronts the cryptanalyst who first views a ciphered Vigenere message in all its opaque and frustrating glory: how does one determine the cipher's period?

Determining the Cipher's Period

After numbering the ciphered message by five-number units, as shown in Figure 6, the first step is to identify repeated letter sequences and the distance between them. Here's an important tip: when preparing for cryptanalysis of the Vigenere, leave a lot of space in between the lines, because you're going to need it. In fact, I find it useful to read or type the ciphered message into a word processing program. Word processing programs are incredibly useful in this work: you can duplicate messages for work copies, use bold, underlined, italic, and colored type to highlight key sequences, use search mechanisms to look for repeated letter sequences, and many other tasks which would have required pencil, paper, and some considerable eyestrain in times past.)

In Figure 7 are listed the repeated letter combinations in the ciphered message, together with the distances they are from each other. The next step is to take the distances and break them down into their factors. Doing so, we obtain the values in Figure 8.

Immediately, we see dramatic confirmation of the Kasiski method of discovering the keyword length. Except for 2 and 3 (which seldom are true periods, since many of their appearances are due simply to chance, a feature that can even occur with factors as large as 4 or 5), the factor 6 far predominates, occurring 22 times. Of course, in this particular ciphered text, we already know that six is in fact the true period of the keyword, "SHADOW." Suffice it to say that things aren't always quite so clear-cut. More often, one encounters two or more factors equally represented in the coded text (say, 15 instances of the factor "4" and 15 of the factor "6"). In such cases, the following rules may help you in your efforts to choose the factor which represents the "true" keyword period.

First, if one factor is a multiple of the other, choose the larger factor. Second, in cases where one factor is not a multiple of the other, test a factor which is a multiple of both factors. Third, along with the correct factor, you will usually find a significant number of multiples of the factor, which grow fewer as their size increases. Fourth, look at sequences that are longer than two letters: three-letter combinations (called "trigrams") are less likely to occur than two-letter combinations ("digrams"), and combinations of longer than three letters are even rarer. Therefore, factors which occur in longer repeated sequences should be given more weight. Fifth, in the absence of any of the first three conditions, one should usually choose the larger over the smaller factor. Sixth, if all else fails, simply pick one factor and test it out.

To test the period you've chosen, you need to segment the cipher into six portions. For the test period six, this has been done in Figure 9. The next step is to do a frequency count for each of the six message segments, which we've done in Figure 10.

Determining the Key-Alphabets

From this point, you proceed much the same as you'd do with a simple substitution cipher. For example, the first step is often to consider the most frequent ciphered letter in each segment as possibly standing for the plaintext letter "E," the most common letter in English. If you check for yourself, using the Vigenere Table (Figure 1), you will see that, in the first five segments, the most frequent letter indeed represents the plaintext letter "E" (in the first segment, the cipher letter "W" appears the most frequently [five times] and in the key alphabet "S" is indeed the letter which stands for "E"; in the second segment, "L" [six times] stands for "E" in the key alphabet "H"; and so on). In fact, in the example, only the sixth segment does not have cipher letter for the plaintext "E" as the most frequently occurring letter in the segment.

If you are lucky enough to be presented with a very high frequency in your tabulation (such as the ciphered letter "S" in segment 5), that is usually a good place to begin. Let us say you theorize that "S" is "E" in alphabet 5. Go to the Vigenère Table (Figure 1) and find where the column for the plaintext "E" and follow it down until you come to "S." Note that the horizontal line on which "S" appears is that of the key alphabet "O" (indicated on the far left column).

At this point, it's best to try to find some verification of your theory. Go to the second most frequent ciphered letter in segment five, "C" (appearing four times). If your theory is correct, then the letter which is ciphered as "C" in key alphabet "O" should also be fairly frequent in English. We see that indeed it is, representing the plaintext "N," which in English is the fifth (or sixth, depending on the source you consult) most frequent letter.

Try another one (well, actually, three other ones), since the next highest frequency in segment five is three, claimed by the three different ciphered letters "A" ("M" in plaintext if your theory about key alphabet "O" is accurate), "O" ("A" in key alphabet "O"), and "W" ("I" in key alphabet "O"). Two of those fit ("A" is the third most frequent letter in English, and "I" is the fifth [or sixth] most frequent). It looks as if your theory is correct, and you can proceed to place your deciphered letters above the coded text.

From here on out, it's a matter of trial and error, gaining clues from three sources: "correctly" distributed frequencies for segments; deciphered plaintext (which can give you clues that enable you to guess alphabets adjacent to letters you've deciphered); and probable candidates for the keyword (dealing with Quiller-related material, one would not have far to go to guess the keyword, if one had already extracted the keyword letters "S _ _ D _ W"!).

And always, always remember the first commandment of the cryptanalyst of periodic ciphers: you only have to identify one letter correctly to "crack" the entire alphabet.

In passing, the third segment (represented by the key alphabet "A") presents some especially noteworthy features. Notice that the key and plaintext letters are always exactly the same! Thus, the letter "E" in the cipher alphabet is also the "E" in the plaintext. If you are therefore fortunate enough to be presented with a test segment which shows a count of letters in roughly the same distribution pattern as would occur in plaintext English, be aware that you might possibly have a case of using key alphabet "A." However, you also know that, for precisely this reason, encrypters often avoid using keywords which contain the letter "A."

Some Important Caveats

Now let me interject a note of caution. The message that we've used here for an example is of a length that would not be normally encountered, particularly in an actual espionage situation. Moreover, the "message" includes a number of words that habitually are omitted in preparing plaintext for encipherment (words such as "a," "the," "an," "of," "and," and so on). Such words are not usually needed to convey the gist of a plaintext message, and merely serve to contribute to a frequency distribution pattern which approaches normality (exactly as they have done here).

At the same time, even if you are presented with a shorter enciphered message (these are coming next month, trust me!), the frequency count for segments should still look something like a frequency distribution that would be encountered in normal English. If you find one segment (or especially, more than one segment that does not exhibit such a pattern, you have probably selected the wrong period for your keyword.

I have tested the Kasiski method out on this month's Vigenere cipher and it should, relatively easily, enable you to guess the correct period for the cipher. From there, you're on your own, although as I have said, the most difficult part of cryptanalysis of periodic ciphers is the determination of the keyword's period.

Let me know if you have any questions. I'm sure I haven't covered everything in this essay that I wanted to. I can be reached at ironmouth@sprintmail.com.

Good luck!