Cryptanalysis of Enigma, Part 2
Introduction §
This page continues on from part 1: Cryptanalysis of Enigma, where we looked at breaking Enigma ciphers with no plugboard. Now we will move on to ciphers which use a plugboard, and we will see that it makes the problem of cryptanalysis much harder.
We will start with a known-plaintext attack, then see what needs to be available to generalise it to a ciphertext only attack. Known plaintext means we know both the ciphertext and the corresponding plaintext, but we need to determine the key that is in use. Situations such as this arose often in WW2, where the British would intercept a message sent from a spy in britain. The spy would use a hand cipher that the british could break, then the Germans would re-transmit the message using Enigma once they recieved it. In this way a plaintext/ciphertext pair was available. If the key could be found, then all other messages transmitted on the same day could also be read by the British.
Known Plaintext attack §
We will try to break the following ciphertext:
YXBMXADQBDBAAYIMKDODAYIXNBDQZFJKOLFVEEQBCLUUXDFVQYGKEYBVRHONJKPJMKUNLYLZUKBKJOA JTWVWMOMDPGVXEPUKXBVSGHROFOSBCNKEHEHAKWKOGWTBZFXSYCGSUUPPIZTRTFVCXZVCXTFLMTPTAQ VMREGWSBFZBM
which corresponds to the plaintext:
THEENIGMACIPHERWASAFIELDCIPHERUSEDBYTHEGERMANSDURINGWORLDWARIITHEENIGMAISONEOFT HEBETTERKNOWNHISTORICALENCRYPTIONMACHINESANDITACTUALLYREFERSTOARANGEOFSIMILARCI PHERMACHINES indicator:KJS rings:WOB rotors:251 plugs:(PO ML IU KJ NH YT)
The first change we are going to make from Part 1 is the fitness function. In part 1 we used quadgram probabilities as the fitness because we did not know the actual plantext. In this 'known plaintext' scenario, we compute the fitness as the number of characters correctly deciphered. So for a length 100 ciphertext, the best fitness is 100, the worst is 0 and a random guess will achieve a fitness around 4.
If we decrypt the ciphertext using the method from part 1 but with the new fitness function, our best decryption is as follows:
TNEGHUGLACUONZSBUSVFUXGDAUONERCIBRBTYNEGEJCLHSDIRLHNWHRMDQJQMWQXKPVPJZCUHRHGLTY RVBEYTSHJIPWHNNBMPRUCAUEECWTOYEEHKAFNBHCFAHAUYACYZAMMPFCGARNYHARAHGEPFDUMWNBOCU ODERLACNUHER
This decryption doesn't look much like English, but the fitness is 48, which is quite high. Despite the plugboard, some letters survive intact; these are the self-steckered letters. All other letters are hoplessly scrambled.
Now we will try and break the plugboard settings using the rotor order and indicator settings found in the previous step. We do this by choosing a plug: e.g. 'AB' indicates we connect 'A' and 'B' and the plugboard. Because 'AB' and 'BA' are identical, there are only 26 choose 2 = 325 possible ways of choosing the first plug. We try each one, decrypting the ciphertext with this plug and the known indicator and rotor settings, and score the resulting decrypted text in the same manner as before. After doing this, we find the pair which maximises the score: 'HN'. With the first plug out of the way, we choose another plug, but there are only 24 choose 2 = 276 to search through this time. This quickly gives us the list: 'HN' 'IU' 'JK' 'LM' 'OP' 'TY'.
indicator=OAS, rotors=251, rings=AFB, plugboard= HN IU JK LM OP TY decryption: THEENIGMACIPHERWASAFIELDCIPHERUSEDBYTHEGERMANSDURINGWORLDWARIITHE ENIGMAISONEOFTHEBETTERKNOWNHISTORICALENCRYPTIONMACHINESANDITACTUALLYREFERSTOA RANGEOFSIMILARCIPHERMACHINES
As in part 1, we find the key discovered by us is not the same as the key actually used to encipher the message, which means there are multiple keys that result in identical ciphertext.
Generalising to Ciphertext-only §
The previous section required a known piece of plaintext to help us with scoring, it would be nice if we could relax this requirement. Indeed, it is possible, but in general more ciphertext is required.
The approach described in Jim Gillogly's paper Ciphertext only Cryptanalysis of the Enigma uses a very similar procedure to that described here. A followup letter correcting some mistakes in the paper can be found here. Another paper that builds on Jim Gillogly's paper is Applying Statistical Language Recognition Techniques in the Ciphertext only Cryptanalysis of Enigma by Heidi Williams.
To summarise the differences, the Williams paper uses monogram likelyhood as the fitness function to score decryption keys. This uses the fact that even with the plugboard some letters are decrypted without error, and with enough ciphertext these unmodified letters can give enough information to determine the rotor and indicator settings. In general, 1000 or more ciphertext characters are required for this to work, which may not be a reasonable assumption given that 250 characters was the longest allowable during WW2. The method described by Williams was used as a base for the version shown here, simply modifying it to be a known-plaintext attack. This drastically reduces the amount of ciphertext we actually need.
C Code for Enigma Cryptanalysis §
The code here is very similar to the code from Part 1, but there are small modifications to several of the files. The biggest change is to the scoring function which is now much simpler.
To specify the message you want to decrypt, edit the ctext and otext variables in break_enigma2.c. To compile, use the following:
gcc break_enigma2.c enigma2.c NBestList2.c scoreText2.c -lm -O3
The optimisation flag -O3 helps speed things up. In general it will take aroud 30 seconds to break 250 character messages. The code is provided below, beware this is proof of concept code only, it would not be difficult to crash it. Use it at your own risk.
comments powered by Disqus