English Letter Frequencies

The frequencies from this page are generated from around 4.5 billion characters of English text, sourced from Wortschatz. The text files containing the counts can be used with ngram_score.py and used for breaking ciphers, see this page for details. If you want to compute the letter frequencies of your own piece of text you can use this page.

Monogram Frequencies §

English single letter frequencies are as follows (in percent %):

A :  8.55        K :  0.81        U :  2.68
B :  1.60        L :  4.21        V :  1.06
C :  3.16        M :  2.53        W :  1.83
D :  3.87        N :  7.17        X :  0.19
E : 12.10        O :  7.47        Y :  1.72
F :  2.18        P :  2.07        Z :  0.11
G :  2.09        Q :  0.10                 
H :  4.96        R :  6.33                 
I :  7.33        S :  6.73                 
J :  0.22        T :  8.94                 

The english_monograms.txt file provides the counts used to generate the frequencies above:

Common English Words §

The following words are the most common words in a 'news' text corpus containing around 900 Million words. The numbers represent percent of occurance, i.e. 'THE' constitutes aroung 6.42% of all words.

      THE :  6.42            ON :  0.78           ARE :  0.47
       OF :  2.76          WITH :  0.75          THIS :  0.42
      AND :  2.75            HE :  0.75             I :  0.41
       TO :  2.67            IT :  0.74           BUT :  0.40
        A :  2.43            AS :  0.71          HAVE :  0.39
       IN :  2.31            AT :  0.58            AN :  0.37
       IS :  1.12           HIS :  0.55           HAS :  0.35
      FOR :  1.01            BY :  0.51           NOT :  0.34
     THAT :  0.92            BE :  0.48          THEY :  0.33
      WAS :  0.88          FROM :  0.47            OR :  0.30

The english_words.txt file provides the counts used to generate the frequencies above, words that occurred fewer than 5 times in the corpus were not included.

Bigram Frequencies §

A.k.a digraphs. We can't list all of the bigram frequencies here, the top 30 are the following (in percent %):

TH :  2.71        EN :  1.13        NG :  0.89
HE :  2.33        AT :  1.12        AL :  0.88
IN :  2.03        ED :  1.08        IT :  0.88
ER :  1.78        ND :  1.07        AS :  0.87
AN :  1.61        TO :  1.07        IS :  0.86
RE :  1.41        OR :  1.06        HA :  0.83
ES :  1.32        EA :  1.00        ET :  0.76
ON :  1.32        TI :  0.99        SE :  0.73
ST :  1.25        AR :  0.98        OU :  0.72
NT :  1.17        TE :  0.98        OF :  0.71

The english_bigrams.txt file provides the counts used to generate the frequencies above:

Trigram Frequencies §

A.k.a trigraphs. We can't list all of the trigram frequencies here, the top 30 are the following (in percent %):

THE :  1.81        ERE :  0.31        HES :  0.24
AND :  0.73        TIO :  0.31        VER :  0.24
ING :  0.72        TER :  0.30        HIS :  0.24
ENT :  0.42        EST :  0.28        OFT :  0.22
ION :  0.42        ERS :  0.28        ITH :  0.21
HER :  0.36        ATI :  0.26        FTH :  0.21
FOR :  0.34        HAT :  0.26        STH :  0.21
THA :  0.33        ATE :  0.25        OTH :  0.21
NTH :  0.33        ALL :  0.25        RES :  0.21
INT :  0.32        ETH :  0.24        ONT :  0.20

The english_trigrams.txt file provides the counts used to generate the frequencies above:

Quadgram Frequencies §

We can't list all of the quadgram frequencies here, the top 30 are the following (in percent %):

TION :  0.31        OTHE :  0.16        THEM :  0.12
NTHE :  0.27        TTHE :  0.16        RTHE :  0.12
THER :  0.24        DTHE :  0.15        THEP :  0.11
THAT :  0.21        INGT :  0.15        FROM :  0.10
OFTH :  0.19        ETHE :  0.15        THIS :  0.10
FTHE :  0.19        SAND :  0.14        TING :  0.10
THES :  0.18        STHE :  0.14        THEI :  0.10
WITH :  0.18        HERE :  0.13        NGTH :  0.10
INTH :  0.17        THEC :  0.13        IONS :  0.10
ATIO :  0.17        MENT :  0.12        ANDT :  0.10

The english_quadgrams.txt file provides the counts used to generate the frequencies above:

Quintgram Frequencies §

These are 5-gram frequencies for English. We can't list all of them here, the top 30 are the following (in percent %):

OFTHE :  0.18        ANDTH :  0.07        CTION :  0.05
ATION :  0.17        NDTHE :  0.07        WHICH :  0.05
INTHE :  0.16        ONTHE :  0.07        THESE :  0.05
THERE :  0.09        EDTHE :  0.06        AFTER :  0.05
INGTH :  0.09        THEIR :  0.06        EOFTH :  0.05
TOTHE :  0.08        TIONA :  0.06        ABOUT :  0.04
NGTHE :  0.08        ORTHE :  0.06        ERTHE :  0.04
OTHER :  0.07        FORTH :  0.06        IONAL :  0.04
ATTHE :  0.07        INGTO :  0.06        FIRST :  0.04
TIONS :  0.07        THECO :  0.05        WOULD :  0.04

The english_quintgrams.txt file provides the counts used to generate the frequencies above:

comments powered by Disqus
Y NGP'I ZPGO AVCE GE LGM AVCE VJ OSCC VJ Y JAGMCN CYZS; VPN Y CYZS CSJJ IAVP AVCE GE LGM AVCE VJ OSCC VJ LGM NSJSUDS - Q.U.U. IGCZYSP. (IAS ESCCGOJAYK GE IAS UYPH)