N-Gram Analysis


Author: Lina Kuche

N-grams are sequences of N characters.

N-Gram Name N Example
Monogram 1 A
Bigram 2 AB
Trigram 3 UNO
Tetragram 4 CASE
Pentagram 5 POINT
Hexagram 6 PERSON
... ... ....
Multigram N CRYPTOLOGY

The N-gram analysis determines the frequency of different N-grams in a text. Especially the gaps between equal N-grams can potentially be very useful for cracking a cipher because they can point to the key length. The following German sentence results in the N-Gram Analysis shown below.

"Dieser Beispieltext führt bei einer n-Gramm-Analyse nach Histogrammen, Bigrammen und Trigrammen zu nachfolgendem Ergebnis, wenn lediglich die 20 häufigsten Treffer berücksichtigt werden."

Rank Monogram Bigram Trigram
1 E 25 16.13% EN 8 5.16% RAM 4 2.58%
2 I 15 9.68% ER 6 3.87% AMM 4 2.58%
3 N 15 9.68% IG 5 3.23% GRA 4 2.58%
4 R 13 8.39% RA 4 2.58% MEN 3 1.94%
5 T 9 5.81% IE 4 2.58% MME 3 1.94%
6 G 9 5.81% CH 4 2.58% RBE 2 1.29%
7 M 9 5.81% MM 4 2.58% ERB 2 1.29%
8 A 8 5.16% GR 4 2.58% ICH 2 1.29%
9 S 7 4.52% ME 4 2.58% ACH 2 1.29%
10 H 7 4.52% AM 4 2.58% NAC 2 1.29%
11 D 6 3.87% EI 3 1.94% DIE 2 1.29%
12 L 5 3.23% DI 3 1.94% IGR 2 1.29%
13 B 5 3.23% NA 3 1.94% BEI 2 1.29%
14 F 5 3.23% BE 3 1.94% NNL 1 0.65%
15 C 5 3.23% IS 3 1.94% NTR 1 0.65%
16 U 4 1.94% SE 2 1.29% NLE 1 0.65%
17 W 2 1.29% ND 2 1.29% NGR 1 0.65%
18 O 2 1.29% AC 2 1.29% NIS 1 0.65%
19 Z 1 0.65% IC 2 1.29% NUN 1 0.65%
20 Y 1 0.65% RB 2 1.29% NZU 1 0.65%

Weblinks

http://en.wikipedia.org/wiki/N-gram


Print