Monogram, Bigram and Trigram frequency counts

Introduction to Frequency Analysis §

Frequency analysis is the practice of counting the number of occurances of different ciphertext characters in the hope that the information can be used to break ciphers. Frequency analysis is not only for single characters, it is also possible to measure the frequency of bigrams (also called digraphs), which is how often pairs of characters occur in text. Trigram frequency counts measure the ocurrance of 3 letter combinations.

When talking about bigram and trigram frequency counts, this page will concentrate on text characterisation as opposed to solving polygraphic ciphers e.g. playfair. The difference is that text characterisation depends on all possible 2 character combinations, since we wish to know about as many bigrams as we can (this means we allow the bigrams to overlap). When cracking playfair, we do not allow the bigrams to overlap.

If you want monogram, bigram, trigram or quadgram frequencies pre-calculated for a certain language, see Letter frequencies for various languages.

Monogram Counts §

Monogram frequency counts are most effective on substitution type ciphers such as the caesar cipher, substitution cipher, polybius square etc. It works because natural english text follows a very specific frequency distribution, which is not masked by substitution ciphers. The distribution looks like:

The following javascript will count the occurance of each character and display the result. See substitution cipher cryptanalysis on applications of frequency counts for solving substitution ciphers.

Down arrow

Bigram Counts §

Bigram counts maintain the same principle as monogram counts, but instead of counting occurances of single characters, bigram counts count the frequency of pairs of characters.

Down arrow

Trigram Counts §

Just as bigram counts count the frequency of pairs of characters, trigram counts count the frequency of triple characters.

Down arrow
comments powered by Disqus