Neural Cipher Identifier (NCID)
NCID allows to identify the cipher type, given only a piece of ciphertext. For that, NCID uses several multiple neural networks from which you can select one or more. With the 55 classical ciphers standardized by the American Cryptogram Association (ACA), the following neural networks were trained: feedforward neural network (FFNN), long short-term memory (LSTM), Transformer, and Naive Bayes network (NB). Selecting an ensemble of multiple neural network architectures normally leads to a better accuracy. Further details can be found in the "Description" tab.
The NCID project started as master thesis supervised by the University of Applied Sciences Upper Austria, Hagenberg and the CrypTool project.
The project contains code for the detection and classification of ciphers to classical algorithms by using one or multiple neural networks.
Several neural networks were trained to detect the cipher type given encrypted historical texts: For training the NCID models with all 55 ACA ciphers two text lengths were chosen from the Gutenberg library: either ciphertexts with an exact length of 100 characters or ciphertexts with variable lengths of 51-428 characters. This online version provides only the models with fixed length of 100 characters.
Selecting multiple neural network architectures ensembles all selected neural networks, which in many cases can lead to better accuracy. The following table shows the accuracy for different architectures and a fixed length of 100 characters:
|FFNN||Transformer||RF||NB||LSTM||Ensemble Mean||Ensemble Weighted|
|Accuracy in %||78.31||72.33||73.50||52.79||72.16||82.67||82.78|
Comparison to the BION classifier
With the help of William Mason, we compared the NCID solution with the models with a fixed length of 100 characters and the models with the lengths 51-428 characters to the established BION classifier using 100 handpicked plaintexts from the Kaggle Amazonreviews Dataset.
The cipher type to be detected by BION and NCID are different, because BION clustered some of the cipher types together. BION put into a "cluster" only cipher types which are very close variants. Therefore, a common set of cipher types has to be defined and misclassification within a cluster is not treated as failure.
Following 51 cipher types were used by BION: 6x6bifid, 6x6playfair, amsco, autokey, bazeries, beaufort, bifid, cadenus, checkerboard, cmBifid, columnar, condi, digrafid, foursquare, fractionatedMorse, grandpre, grille, gromark, homophonic, keyphrase, monomeDinome, morbit, myszkowski, nicodemus, nihilistSub, NihilistSub6x6, nihilistTransp, numberedKey, simplesubstitution, periodicGromark, phillips, playfair, pollux, porta, portax, progressiveKey, quagmire, ragbaby, redefence, routeTransp, runningKey, sequenceTransp, seriatedPlayfair, swagman, syllabary, tridigital, trifid, trisquare, twosquare, vigenère/variant, slidefair
Following 56 cipher types were used in NCID: amsco, autokey, baconian, bazeries, beaufort, bifid, cadenus, checkerboard, columnar_transposition, condi, cmbifid, digrafid, foursquare, fractionated_morse, grandpre, grille, gromark, gronsfeld, headlines, homophonic, key_phrase, monome_dinome, morbit, myszkowski, nicodemus, nihilist_substitution, nihilist_transposition, null, numbered_key, periodic_gromark, phillips, phillips_rc, plaintext, playfair, pollux, porta, portax, progressive_key, quagmire1, quagmire2, quagmire3, quagmire4, ragbaby, railfence, redefence, route_transposition, running_key, seriated_playfair, slidefair, swagman, tridigital, trifid, tri_square, two_square, variant, vigenère
- Quagmire I-IV are combined in BION. Therefore, all misclassifications between these classes are counted as correct in NCID.
- Gronsfeld is included in the Vigenère/Variant type in BION. Therefore, all misclassifications between Gronsfeld/Vigenère/Variant are counted as correct in NCID.
- Numbered Key implementations differ and are therefore skipped.
- Phillips C and Phillips RC are combined in one type in BION. Misclassifications between Phillips C/Phillips RC are counted correct in NCID.
- Railfence is included in the Redefence type in BION. Misclassifications between Railfence/Redefence are counted correct in NCID.
- Cipher types that need a specific input length of the text are skipped in the random text length test. (for example Cadenus)
- The key length is always 8, if applicable.
Final list of cipher types used to create the ciphertexts: amsco, autokey, bazeries, beaufort, bifid, cadenus, checkerboard, columnar_transposition, condi, cmbifid, digrafid, foursquare, fractionated_morse, grandpre, grille, gromark, gronsfeld, homophonic, key_phrase, monome_dinome, morbit, myszkowski, nicodemus, nihilist_substitution, nihilist_transposition, periodic_gromark, phillips, phillips_rc, playfair, pollux, porta, portax, progressive_key, quagmire1, quagmire2, quagmire3, quagmire4, ragbaby, railfence, redefence, route_transposition, running_key, seriated_playfair, slidefair, swagman, tridigital, trifid, tri_square, two_square, variant, vigenère
The Gromark and Periodic Gromark ciphers did not use any primers in the BION implementation, but they did use them in NCID. Therefore, this cipher is not really comparable.
- Histocrypt 2021: A Massive Machine-Learning Approach For Classical Cipher Type Detection Using Feature Engineering, pages 111ff.
- Nils Kopal: Of Ciphers and Neurons – Detecting the Type of Ciphers Using Artificial Neural Networks