Author: Jean Michel Polak
This tool illustrates how to break a Vigenère encryption. As a first step, please enter a text encoded with Vigenère. Then you can select the length of the N-grams to perform the analysis. This analysis is based on the English alphabet, so non-alphabetic characters like '?', commas and spaces are filtered out.
The possible key lengths are then displayed in the diagram, and the divisors of the autocorrelation results are also calculated and included in the diagram.
The second view available is a table, in which all N-grams for which the distance between them, has at least one common divisor. These dividers are displayed in the table shown in colour and with a cross. The most likely key length has the most entries. The possible keys are shown below the diagram, these are ordered according to their probability.
Press one of the key lengths and two letter distributions are displayed. The upper distribution is the one from the english language, in the lower distribution will depend on the the letters of the encrypted text in n groups. The first letter represents the first group. Let's assume you have decided on key length 7, then the first letter is placed in the first group , 7th, 14th and so on in the first Group. Now with the help of a letter frequency distribution each Group considered individually. For example, if in the third group the letter 'G' is the most common, it most likely corresponds in plain language to the 'E'.
Whenever the distributions are shifted, the corresponding autocorrelation value calculated and displayed, it is desirable to find the minimum value. Each time the text is moved, the ciphertext is decrypted with the existing key.
For the automatic release of a Vigenère encryption you are invites to use the software CrypTool2 (https://www.cryptool.org/en/cryptool2) .
The Vigenère cipher was developed in the 16th century by the French cryptologist Blaise de Vigenère (* 15th April 1523 in Saint-Pourçain; † 1596)¹. It is based on the usage of the Caesar cipher, but with changing alphabets. For long time this cipher was regarded as unbreakable. Finally, Friedrich Wilhelm Kasiski published a method to decode a text that was encoded with a Vigenère cipher.²
(1) o.V.: “Blaise de Vigenère”, http://de.wikipedia.org/wiki/Blaise_de_Vigenère, 2009-02-20
(2) o.V.: „Geschichte der Kryptographie“, http://krypto.informatik.fh-augsburg.de/geschichte.htm, 2009-02-11.
In the preceding example, we have chosen a key, which is significantly shorter than the plaintext. If we had used a longer example, it would have become clear that we are simply using 3 different Caesar ciphers over and over again. It is obvious that the security depends on the length of the key. If the key has only a length of one (the passphrase is only one letter) then the Vigenère cipher would be identical to just using the Caesar cipher. Usually the first step for an attacker would be to identify the length of the key. If the length is known, the attacker just has to perform frequency analyses for different Caesar ciphers. The special case where the key-length is identical with the length of the plaintext is called Vernam cipher. This cipher is much harder to crack but still not secure. With sufficiently long texts, key- as well as natural language, an attacker can exploit the property that the different characters in the plaintext, key and encoded text do not occur with an equal frequency and run a frequency analysis. This cipher can be made secure by using a random sequence of characters for the key, this is called a One-Time-Pad and it is unbreakable.³
(3) Schmeh, Klaus: „Kryptografie“, dpunkt.verlag, 2007, p. 47
A key of arbitrary length has to be chosen. The key, and the text that will be encoded, have to use characters from the same alphabet. For demonstration purposes, we will use the capital letters A-Z only.
As an example, the sentence "DIES IST EIN GEHEIMER TEXT" (German for: “This is a secret text”) will now be decoded with the key "KEY". First, we write the key beneath the plaintext and repeat it until the whole length of the plaintext is covered…
DIES IST EIN GEHEIMER TEXT (Klartext)
KEYK EYK EYK EYKEYKEY KEYK (Schlüssel)
Now we are using the Caesar cipher. The first character occurring in the plaintext, D is encoded by the character corresponding to the first character of the key, K. This means that the original mapping of the Caesar cipher:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
is shifted by the key 'K' to the left. The 'K' is the eleventh character in the alphabet, so we have to shift the mapping 10 positions to the left, which reads the following:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z K L M N O P Q R S T U V W X Y Z A B C D E F G H I J
A would thus be mapped to K, B to L, C to M and therefore the D (from our message "DIES IST EIN GEHEIMER TEXT") to the N. Therefore, the first character of the ciphertext will be O. Next, the second character in the plaintext, I, will be mapped by the Caesar cipher with a character corresponding to the second character in the key E.
E is the fifth character in the alphabet, so we have to shift the original mapping by 4 positions to the left:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z E F G H I J K L M N O P Q R S T U V W X Y Z A B C D
Therefore, the I will be mapped to the M. The encoded text so far is NM. If we encode the whole message with the same procedure we will get the encoded text: "NMCC MQD IGX KCRIGWIP DIVD".
What is an alphabet?
An alphabet is an ordered set of all characters which can occur in a plaintext, a secret text, or the key. "Ordered" means that sorting is possible and we can speak of the n-th character of an alphabet.
For classical methods, the alphabet often consists only of the uppercase letters (A-Z). Characters not belonging to the alphabet are not encrypted or allowed as keys.
The handling of non-alphabet characters (convert, skip, ...) can be set in the options - but this is not a function of the actual encryption process itself. This requires additional meta-information of the letters that must be recorded before encryption. Also, there is no general match on how to handle digits or special characters. Instead of performing a transformation before encryption, this implementation allows several alphabets to be specified (see below), thereby accomplishing the same within the encryption process.
Our implementation of Vigenère, Beaufort, etc. does not work internally with letters, but with numbers. Therefore, a translation must take place, which can on the one hand transform letters in numbers and, conversely, re-generate letters again. A function that performs this is called an alphabet function.
Let s be such a reversible function. Then the Vigenère encryption for an input character in and a key key can be described as:
out = s-1(s(in) + s(key))
The letters of in and key are converted into numbers, these numbers are added, and the sum is re-converted to a letter. The conversion to letters takes place modulo to the alphabet length: If a 1 is added to the last character, the result of the sum is the first character of the alphabet.
How to describe an alphabet?
Alphabets (yes, there may be several: more below) can be described by a list L of letters. The alphabet function sL returns the smallest index at which it occurs to a letter that is present in L. The index of the first character can be configured. For letters that do not occur in L, the alphabet function sL is undefined.
Although the function is well-defined when a letter occurs more than once, this makes little sense in encryption algorithms, since the reversibility suffers. A corresponding warning is displayed.
The inverse function returns the n-th character for a number n in L. To n, the length of the list L is added or subtracted as often as necessary until the index lies in the list.
Example of an alphabet
For example, take the list L =
"ABCD", whose length is 4. The message
"ACDC" should be encrypted with the key
"ABBA" according to the Vigenère method. The following steps take place:
|in||sL(in)||key||sL(key)||sL(in) + sL(key)||sL–1(sL(in) + sL(key))|
| ||0|| ||0||0|| |
| ||2|| ||1||3|| |
| ||3|| ||1||4|| |
| ||2|| ||0||2|| |
In the example, an overflow has occurred in the third letter, so that modulo |L| = 4 is calculated.
Shortcuts for alphabets
In order to simplify the representation of the alphabets, the following abbreviation has been introduced: The minus sign in the following letter 1-letter 2 is extended to all the letters between the two flanking letters.
"A-Z"for all uppercase letters,
"a-z"for all lowercase letters,
"z-a"for all lowercase letters in reverse order and
"0-9"for all digits.
The only disadvantage is that the minus sign itself has to be written as
"---", so as not to be confused as a range operator.
In the detailed representation of the alphabets (click on the "..." -button), the alphabets can be edited in the short-write mode.
It is possible to distinguish between 2 types of actions in the plain text: uppercase letters [A-Z] and digits [0-9]
- Complete alphabet:
The two partial alphabets [A-Z] and [0-9] are combined in a certain order to form a new alphabet [A-Z0-9]. If you 'shold' here 3, you get to the '2'.
- Separated partial alphabets:
However, all transformations within the respective sub-alphabet are also carried out. This means that a capital letter in the plain text is also mapped to a capital letter in the encrypted text. If you 'shuffle' here 3, you get to the 'C'.
Separated partial Alphabets
The use of several alphabets does not require the algorithms to distinguish between upper and lower case letters.
The algorithm memorizes the alphabet with which it has determined the number of the plaintext. The same alphabet is used to generate the encrypted text.
When a letter occurs in several alphabets, the first of these alphabets is used.
Options regulate the case when a letter does not appear in any alphabet: it is not encrypted, but transferred directly to the output. This is also the case when the letter is in the key. Alternatively, the non-alphabet letters in the key and the plain text can also be filtered out to increase the security. This, however, limits readability.
Example for several alphabets: Addition at Vigenère
Here both approaches are treated: for separate partial alphabets and for a memorized alphabet.
As a small example we consider Vigenère with the following two alphabets:
- L1 =
- L2 =
In both cases, both the plaintext and the key should both consist of the text
For separate partial alphabets the following results:
- The encrypted text is the smallest digit of an addition of plaintext and key when both are hexadecimal digits.
- The encrypted text is:
- Since the first alphabet L1 has been used for the digits, digits and uppercase letters in the encrypted text are always the numbers in the plain text.
- Note the difference in 'D' and 'd': The index value is the same, but the 'd' is L2, so the results differ in the encrypted text: 'A' and 'a'.
For a merged alphabets, the encrypted text is
The following table shows the calculation for the case of the separated partial alphabets L1, L2 as well as for a merged alphabet L =
|separate alphabets||merged alphabet|
|8||8||8||8||16 = 0||0||8||8||16||a|
|9||9||9||9||18 = 2||2||9||9||18||c|
|a||a||10||10||20 = 4||4||16||16||32 = 10||A|
|b||b||11||11||22 = 6||6||17||17||34 = 12||C|
|c||c||12||12||24 = 8||8||18||18||36 = 14||E|
|d||d||13||13||26 = 10||a||19||19||38 = 16||a|
|A||A||10||10||20 = 4||4||10||10||20||e|
|B||B||11||11||22 = 6||6||11||11||22 = 0||0|
|C||C||12||12||24 = 8||8||12||12||24 = 2||2|
|D||D||13||13||26 = 10||A||13||13||26 = 4||4|
cleartext = key = "0123456789abcdABCD"
Method 1: Separated: In each sub-alphabet, mod 16 is calculated (hex addition), since each sub-alphabet contains 16 elements, and it remains in the same partial alphabet from which the plaintext letter originates.
Method 2: Merged: In the alphabet, mod 22 is calculated because the alphabet contains 22 elements.