Cryptanalysis of Enigma
More modern approaches include Jim Gillogly's paper Ciphertext only Cryptanalysis of the Enigma. A followup letter correcting some mistakes in the paper can be found here. Another paper that builds on Jim Gillogly's paper is Applying Statistical Language Recognition Techniques in the Ciphertext only Cryptanalysis of Enigma by Heidi Williams. Another interesting resource is The Cryptographic Mathematics of Enigma.
The Task §
The keyspace of the Enigma cipher consists of several things: the rotors and their order, the 3 letter indicator settings, the 3 letter ring settings, and the plugboard settings. In total there are 26^3 possible indicator settings, 26^3 ring settings, 10 possible combinations of 5 rotors (which can each have 6 permutations of order), for a total of 60*26^6 = 18,534,946,560 possible configurations, then on top of this the plugboard settings add a vast number of further possibilities. In this example, we will ignore the plugboard settings, the resulting problem is much easier to deal with.
We want to decrypt messages of length 200-250 characters using only the ciphertext, as 250 characters was the maximum length used by the germans in the second world war. What we will find is that, without a plugboard, we can break messages as short as 50 characters in less than 30 seconds. We assume all messages have been enciphered with an M3 Enigma with 5 rotors. The papers referenced above use the fact that the rotor order and indicator settings can be determined independent of the plugboard and ring settings. This reduces the overall complexity significantly, with the first step being to break the rotor order and indicator settings. This first step requires 60*26^3 decryptions, and is the most computationally intensive part of the process.
We will be cracking the following Enigma message as an example:
This is a 56 character message. We will be relying on quadgram statistics as a fitness function to tell us when we have the correct settings.
Determining the Rotor Order §
The first step is to try deciphering the ciphertext with each rotor combination and rotor order for all possible combinations of indicator setting. This requires roughly 60*20^3 decryptions, and takes about 20 seconds for the C code provided below. We initially assume that the ring settings are e.g. 'AAA'. This assumption will almost always be wrong, but we can find the rotor order anyway.
After decrypting the message with rotors '123' and all possible indicator settings, the best result so far is:
ELFTELEBODZICABNOROUSMIIHPPWWUNKRLQPLGGHLORSEVNCBZNPXVRY indicator=WYE, rotors=123, rings=AAA
which has a fitness of -440.016235 (calculated using quadgram statistics). If we continue trying possible rotor combinations, we see the next best one is:
WIBLYACUXDWUCUVUCBLRGZLOUGENYVSAGLOTHRONISCLOPWTYEYVHCZY indicator=RAP, rotors=231, rings=AAA
with a fitness of -439.386963. This fitness is slightly higher, so we will store it. After some more searching we find
KDOELLIGENCEPOINTSLLHFVONWIBHHEEASTWALLOFTHEVKYDHWVOZSHG indicator=DXR, rotors=254, rings=AAA
which has a fitness of -411.859955. This is a big jump in fitness, and indeed we see large sections of what looks like plaintext showing through. After searching all possible rotors and indicators, this remains the highest scoring decryption, so now we move on to the next part.
Determining the Ring Settings §
Using the best rotor and indicator settings from the last part, we wish to find the ring settings. We try each possible ring setting on the fast rotor first (the right-most rotor), since it will on average affect more output characters than a bad ring setting on the middle rotor. The ring setting of the slowest rotor does not need to be changed, since it has no effect on the encryption. The rotors must stay registered with the recovered indicator setting, so as each ring is moved, the trial indicator setting for that rotor is moved in the same direction; thus only 26 ring/indicator settings need to be tested for each rotor. We again use quadgram statistics to rank the candidates. This process is very quick, only 26^2 = 676 possibilities need be tried. The highest scoring decryption is the following:
decryption: INTELLIGENCEPOINTSTOATTACKONTHEEASTWALLOFTHECASTLEATDAWN indicator=DWG, rotors=254, rings=AAP
It is interesting to note that the actual key settings used the indicator 'SIG' with ring settings 'PMP', but the recovered key gives an identical decryption. This is related to the location of the notches on the rotors.
C Code for Breaking Enigma §
I used the following C code to break the message above, it includes an implementation of the Enigma cipher, the quadgram scoring and a few other helper functions. To specify the message you want to decrypt, edit the ctext variable in break_enigma.c.To compile, use the following:
gcc break_enigma.c enigma.c NBestList.c scoreText.c -lm -O3
The optimisation flag -O3 helps speed things up. In general it will take less than 30 seconds to break short messages (50 characters), slightly longer for longer messages. The code is provided below, beware this is proof of concept code only, it would not be difficult to crash it. Use it at your own risk. I have python code as well, but it takes significantly longer to run (~30 min). If anyone really wants it I may post it.
If you are decypting German messages, the following quadgrams might be useful, they are based on several gigabytes of german text, but e.g. ü is assumed to be u since the enigma can't do umlauts:comments powered by Disqus