Cracking the Voynich Code 2016 - Final Report
Acknowledgements
First of all, We want to express our gratitude to supervisor Professor Derek Abbott. With his help, our project can be completed on time. In the course of the project, he can always give us kindly suggestions. Secondly, We want to express our gratitude to co-supervisor Doctor Brian Ng. In the course of our project, he provided many useful methods.
Abstract
The aim of this project is to crack the Voynich manuscript which is an unknown hand-written book. This book is considered to be an unknown language, cipher code or hoax. Thesis proposal is aimed to provide methods in determining possible features of the Voynich manuscript. All the methods are related to data mining, computer coding and statistical methods. There will be specific explanation of the methods that will be carried out in the whole project. Furthermore, this document provides the management of this project. In the final part, some possible hypotheses were given according to the whole searching so that it will provide breakpoint in cracking the Voynich manuscript.
Contents
- 1 Project Introduction
- 2 Related Word (the history of the Voynich manuscript research)
- 3 Requirement
- 4 Proposed Method
- 5 Project Management
- 6 Results
- 7 Comment on progress
- 8 Conclusion
- 9 References
- 10 Appendix
Project Introduction
Background
The Voynich manuscript is a document written in unknown alphabets that was found by Wilfrid Voynich (1865-1930) in 1912 [1]. Because of the Voynich manuscript’s long history, some pages of manuscript were missing. As the result, there are almost 240 pages remaining [2]. In addition, the folios of the manuscript were numbered from f1 to f116 and each folio involved two pages, r and v.
Since this book cannot be read, it is divided into six different sections by illustrations with different styles and images:
a) Herbal:
There are one or more plants on each page, which is a format of European herbals.
b) Astronomical
There are circular diagrams such as suns, moons, and stars which suggest this part as something about astronomy or astrology.
c) Biological
Mostly naked women show that this part should be biological section.
d) Cosmological
Circular diagrams of obscure nature make this section as cosmological section.
e) Pharmaceutical
Drawings of isolated plants parts and objects resembling apothecary jars show that this section should be something about pharmaceutical.
f) Recipes
This part are full pages of text in short paragraphs.
Generally, the Voynich manuscript was made up of three parts: text, illustrations and marginal symbols.
Aim
The aim of project is using the statistics and comparison to infer that the Voynich manuscript is code, nature languages, constructed languages, cipher code or hoax from the perspective of digits.
In addition, the aims of this project also involve cracking the initial digits of the Voynich manuscript and determining the possible letters which may stand for digits.
Due to the massive number of words and illustrations in the manuscript, it is unnecessary to solve the whole manuscript in a one year project.
Motivation
In the field of linguistics, the Voynich manuscript is a representative. Researchers deem that there is a kind of useful information among the mysterious alphabets of manuscript.
In the course of this project, statistics and comparison will be applied to crack the Voynich manuscript. If the manuscript can be cracked successfully, the results of this project will be useful for linguists to compare other unknown languages.
Significance
There are many guesses about the Voynich manuscript. Because of the manuscript’s long history, many historians believe that the mysterious alphabets of the Voynich manuscript are related to ancient civilizations [3]. If manuscript can be cracked, the Voynich manuscript will be helpful for historians to explore the culture of ancient society.
In addition, the statistical method which will be used in this project is also useful in other fields, such as engineering, finance and architecture. Moreover, comparison is widely used, such as Turn-It-In, Google translate, Grammarly and Bing.
Technical Background
The major technique which will be applied in this project is data mining. Data mining is an effective method to search laws among the massive number of data and has a fantastic performance. The two major methods of data mining are statistics and comparison. Statistics is used to count the frequency of the occurrence of some special words. Comparison is served to find out relations between two languages.
In the field of linguistics, European Voynich Alphabet (EVA) is a representative digital transcription of the Voynich manuscript [4]. Then a Japanese linguist Takahashi organised the whole Voynich manuscript by using EVA [5].
Therefore, major data will be extracted from the transcription of Takahashi in the process of this project.
Moreover, other resources will be considered, such as expressions of some representative ancient languages.
Knowledge gaps
Due to the massive amount of data in the Voynich manuscript, the project requires skilled data processing technique and software programming capabilities; however, no one in this project team has ever dealt with so much data. Hence members should develop data processing ability and software programming skills.
On the other hand, the project requires particular knowledge about statistics, so members must be adept at sorting data.
Technical challenges
Technical challenges of this project involve two aspects.
First of all, it is very difficult to infer which language the author used. The language of the manuscript does not belong to any known languages [6] and even this language may have been extinct. What is more, due to the long history of the Voynich manuscript, some important information is nowhere to be searched, such as exact information about author. In that case, it is difficult to infer which language the author used from the author’s nationality. In order to solve the above problem, members must search many different languages as references and compare those languages with the language of the manuscript.
Secondly, references of cracking the Voynich manuscript are limited. Because of unknown language and mysterious illustrations in manuscript, it is difficult to crack the whole manuscript. Although there are very few words have been cracked by researchers, on one can guarantee that the results are right. In the field of linguistics, there are not recognized correct results about cracking the Voynich manuscript. In that case, it is hard to find reliable references. So members must search references from different ways and find out enough accurate references.
Related Word (the history of the Voynich manuscript research)
In the past few years, many researchers had tried to crack the Voynich manuscript by using different methods.
Mary E. D’Imperio:
In 1975, Mary E. D’Imperio was introduced to the problem of the Voynich manuscript by John Tiltman [7]. In the following years, she summed up different features of the Voynich manuscript text [8].
Nick Pelling:
Nick Pelling published his book ‘The course of the Voynich’ at 2006. Based on the illustrations in the rosettes folio of the Voynich manuscript, he believed that the manuscript originated from Milan [9].
William Ralph Bennett:
William Ralph Bennett, a Yale professor, searched the Voynich manuscript with computer. He focused on the research of text by using statistical method. Probably he was the first to note the low entropy of the Voynich manuscript text. As the result, the only language he found with entropy similar to the Voynich manuscript was Hawaiian [10].
John Tiltman:
John Tiltman was a British intelligence specialist. He cracked the text part of the Voynich manuscript with William Friedman. At last, Tiltman and Friedman suggested that the text of manuscript was a kind of artificial (constructed) language [11].
Feely:
Joseph Martin Feely was a Rochester lawyer. In 1943, Feely published a book which involved some solutions of cracking the Voynich manuscript. His solutions showed a viable method to use Latin to replace some words in the manuscript [12].
First study group:
The first study group (FSG) was founded at 1944, dissolved at 1946 [13]. Members of this organization involve:
- Robert A.Caldwell
- G. E. McCracken
- Tomas A. Miller
- Frances Puckett, later Frances Wilbur
- Mark Rhoads
- William M. Seaman
Under the joint efforts of those researchers, the FSG transcribed most parts of the Voynich manuscript and devised a transcription alphabet [14]. The details of the transcription alphabet are as shown in the Appendix section A.1.
Requirement
Although it is not necessary to crack the whole manuscript, there are some basic requirements as following:
- Text investigation: find out linguistic laws from some paragraphs of the Voynich manuscript. Such as the total number of words, the frequency of some special words and the frequency of some special single letters. Then the Voynich manuscript will be compared with other known languages.
- Illustration research: look for laws from some illustrations from the perspective of digits. Such as statistics for illustrations of each page and digits analysis.
- Marginal symbols investigation: make a thorough inquiry about marginal symbols from the perspective of digits.
- Code run smoothly.
- Evaluation for results.
- Make some assumptions which are helpful for the further research.
Proposed Method
As shown in the Appendix section A.2, the proposed methods of this project are divided into three phases.
Phase 1: Text Investigation
There are two parts in this phase: words and digits.
During the process of words research, Matlab will be used as an essential tool. Team members will attempt to search laws from three aspects:
- The total number of words in the Voynich manuscript.
- The characters and words which may stand for digits from some paragraphs of the manuscript.
- The frequency of special characters and words.
On the other hand, in the course of digits investigation, team members will search for different kinds of known expressions of digits and make a comparison with the words in the Voynich manuscript. For example, the expression of digits in Roman is as shown in the Appendix section A.3. The word which is as shown in the Appendix section A.4 is extracted from the Voynich manuscript, it is obvious that the form of the word in the Appendix section A.4 is like “*##’. According to the method of comparison mentioned above, this word may mean ‘seven’ in Roman.
Phase 2: Illustrations Investigation
An illustration which is extracted from the Voynich manuscript is as shown in the Appendix section A.5.
In this phase, illustrations will be analysed by using Matlab. Generally, there are three aspects which are needed to be completed:
- The number of different elements in the illustrations.
- The characters which may stand for digits.
- Match the characters and digits.
Phase 3: Marginal Symbols Research
A page which contains marginal symbols is as shown in the Appendix section A.6.
This phase also requires proficiency in programming by using Matlab. During the process of this phase, there are four major aspects:
- Ordering and quantitative features of the marginal symbols of each page.
- Search the characters which may stand for digits.
- The differences between marginal symbols in each page.
- Match the characters and digits and make inference about the relationship between characters and digits.
Project Management
Deliverables
As shown in Table 1, deliverables involve eleven parts.
Work Breakdown
The details about tasks are as shown in the Appendix Section A.2. The key tasks involve three aspects:
- Text investigation (digits).
- Illustrations research.
- Marginal symbols investigation.
Timeline
Timeline of project involves six parts. The specific details are as shown in the Table 2.
Task Allocation
Task allocation is divided into six parts:
Management Strategy
Team members will be managed through a minimum of two internal meetings every week, and a minimum of one fortnightly meeting with supervisors. In addition, the preparation for each meeting involves three aspects:
- Achievements in the past two weeks.
- Questions about the work of the past two weeks.
- Plan for next two weeks.
After meeting, there are two tasks:
- Meeting review.
- Code modification.
Budget
Budget involves four aspects:
- 500 AUS dollars for team members.
- Research need to be carried out further research.
- All programs that need to be used are available on university system.
- All major works can be achieved by using computer.
Risk Analysis
Details of risk analysis are as shown in the Table 4.
Missmanagement of Time
Due to other works in daily life, the mismanagement of time may occur. Hence each member should arrange the time in advance to avoid time clash.
Loss of Data or Files
During the process of project, there may be some accidents, such as code lost or failure of files storage. In order to avoid that kind of situation, team members should buy two or more USB flash drive to store the backup files.
Team memeber's quit
In order to avoid this case, team members should keep frequent contact with each other.
Lack of References
As the mentioned before, the references of the Voynich manuscript are limited. So members should expand the scope of research, such as Bling, Grammarly and other websites.
Health Issues
Members should pay attention to regular work and break to prevent health problems.
Results
Phase 1: Text Investigation
As the introduction in the section 5.4, ‘Text investigation’ is a cooperative task.
The Total Number of Words
In this stage, Matlab is used to count the total number of words in the Voynich manuscript. The results are shown in the Table 5.
According to the Table 5, the total characters number of the Voynich manuscript is 234507. The total words number is 37104. The unique words number is 8486. The average number of characters per word is 6.32.
The Frequency of Words
In this stage, Matlab is used to count the frequency of words in the Voynich manuscript and statistics is used to analyse the characteristics of the manuscript. In addition, this phase is divided into three parts:
- The frequency and the number of simple letters.
- The frequency of words.
- Comparing the Voynich manuscript with other known languages.
The frequency and the Number of Simple Letters
The results are shown in the Figure 1, Figure 2 and Figure 3.
As shown in the figures above, it is obvious that the frequencies of the simple letter ‘b’, ‘j’, ‘u’ and ‘w’ equal to zero, which means these letters have never appeared in the Voynich manuscript. In addition, the letter with the highest frequency (0.133) is ‘o’.
The Frequency of Words
The results are shown in the Figure 4.
In the Figure 4, x axis means the words in the manuscript, y axis means the frequency. Because there are almost 8486 unique words in the Voynich manuscript, so the x axis in the Figure 4 can’t show every word. In order to analyse the words with high frequency accurately, we try to extract the first 100 words. The results are shown in the Figure 5.
As shown in the Figure 5, the line keeps a downward trend and tends to be stable, which means the frequencies of the last few words are very low. But the x axis still can't show every word. In that case, we extract the first 20 words. The results are shown in the Figure 6 and Figure 7.
From the figures above, the word with the highest frequency (0.022) is ‘daiin’.
Comparing the Voynich Manuscript with Other Known Languages
As the introduction in the section 1.1, the Voynich manuscript was found in 1912. During the period of 17 Century to 18 Century, the most commonly languages are Latin, English, French and German [15]. So in this section, the project team search some references about the frequency of commonly used letters in those four languages and compare the Voynich manuscript with those four kinds of languages [16].
Part 1: The Voynich versus Latin.
The results of the occurrence frequency of letters in the Voynich manuscript and Latin are shown in the Figure 8 and Figure 9.
In order to analyse conveniently, the Figure 8 and Figure 9 are changed to the form of proportion, which is shown in the Figure 10.
According to the Figure 10, it is obvious that the commonly used letters in Latin are all the capitals. Because the Takahashi edition is a transcript from the Voynich manuscript, which means the letter ‘o’ in the Takahashi edition may does not mean ‘o’, it just looks like ‘o’ in the Voynich manuscript. So in order to get the results, the correlation between the Voynich and Latin is calculated, the result is 98.60%, which means the ‘o’ in the Takahashi edition may stand for ‘I’ in Latin. In the same way, there are potential relationships between ‘e’ and ‘E’, ‘h’ and ‘A’, ‘y’ and ‘U’.
Part 2: The Voynich versus English.
The occurrence frequencies of letters in the Voynich manuscript and English are shown in the Figure 11 and Figure 12.
The form of proportion is shown in the Figure 13.
According to the Figure 13, the correlation between the Voynich and Latin is calculated, the result is 97.76%.
In order to search the exact correlation between the Voynich and English, the next step is to compare the Voynich with other books which were written in English and the results are shown in the part 5.
Part 3: The Voynich versus French.
The occurrence frequencies of letters in the Voynich manuscript and French are shown in the Figure 14 and Figure 15.
The form of proportion is shown in the Figure 16.
According to the Figure 16, the correlation between the Voynich and Latin is 98.11%.
Though there are some similarities between the Voynich and French as the analysis above, there are much more differences. For example, as shown in the Figure 16, there are 38 letters in total in French, only 24 letters in the Voynich manuscript. Therefore, there are many differences between the Voynich and French.
Part 4: The Voynich versus German.
The occurrence frequencies of letters in the Voynich manuscript and German are shown in the Figure 17 and Figure 18.
The form of proportion is shown in the Figure 19.
According to the Figure 19, the correlation between the Voynich and Latin is 95.86%.
Though there are some similarities between the Voynich and German as the analysis above, there are some differences. For example, as shown in the Figure 19, there are 30 letters in total in German, only 24 letters in the Voynich manuscript.
In order to search the exact correlation between the Voynich and German, the next step is to compare the Voynich with other books which were written in German and the result is shown in the part 5.
Part 5: Comparing the Voynich with other books which are written in the known languages.
In order to ensure the accuracy of the results, project team search some literary classics which were written by English, French and German and compared the Voynich manuscript with those books. In order to compare them conveniently, project team extract the same number of words from every book. The results are shown in the Figure 20, Figure 21 and Figure 22.
In order to compare the Voynich manuscript with those books conveniently, line charts which can show the potential relationship between the Voynich manuscript and these books are made. The results are shown in the Figure 23, Figure 24 and Figure 25.
Figure 23 shows the percentage of unique words/total words. There is significant difference between the Voynich manuscript and English books (47.9%) or French books (27.7%). However, there is no significant difference between the Voynich manuscript and German (13.6%).
Figure 24 shows the word length the Voynich, English, French and German. There is small difference for the word length between the Voynich manuscript and English (6.7%) or French (6.0%). Furthermore, there is no significant difference for the word length between the Voynich manuscript and German (0.1%).
Figure 13 shows the percentage of words appear more than once /total unique words were compared. There is large difference between the Voynich manuscript and English (41.0%) or French (38.9%) or German (22.8%). However, the difference between the Voynich manuscript and German books is the smallest difference among these differences.
In addition, as the analysis in the part 4 section 6.1.2.3, the correlation between the Voynich manuscript and German is high (95.86%). So maybe the language which was used in the Voynich manuscript is a branch of German. As the result, there are potential relationships between the Voynich and German.
Digits
According to the introduction in the section 1.1, the Voynich manuscript was found in 1912. During the period of 17 Century to 18 Century, the most commonly used method of expressing digits is using Roman [14]. The method of expressing digits in Roman is shown in the Appendix section A.3. In addition, the method is introduced in the section 4.1.
The results of searching the characters with the form ‘*##’ in the Voynich manuscript are shown in the Figure 26.
As shown in the Figure 26, the words with the form ‘*##’ in the Voynich manuscript involve: ‘aii’, ‘dee’, ‘kee’, ‘lee’, ‘oee’, ‘oii’, ‘qee’, ‘qoo’, ‘ree’, ‘see’, ‘tee’ and ‘yee’. X axis means the occurrence number of each word. As shown in the Figure 26, the most commonly used word is ‘aii’ and the occurrence number of ‘aii’ is 563. In addition, the occurrence number of ‘ree’ is the smallest, which are 2. Y axis means the positions of each word. For example, the position of 563rd ‘aii’ is 36821, which means this word is the 36821st word in the Voynich manuscript.
As the analysis above, ‘aii’ may stand for ‘seven’ in Roman (VII).
Then, the words with the form ‘*###’ are extracted by using the same method. The results are shown in the Figure 27.
From the Figure 27, it is obvious that the occurrence frequency of ‘aiii’ is the highest, so maybe ‘aiii’ means ‘eight’ in Roman (VIII). The exact data is shown in the Appendix section A.7.
Then, these triple words are compared with other triple words which come from the known languages: English, German and Russian. The results are shown in the Table 6, Table 7 and Table 8.
According to the tables above, it is obvious that triple ‘l’ and triple ‘s’ are the most commonly used in English. In addition, triple ‘e’ and triple ‘i’ are the most commonly used in the Voynich manuscript. As the result, it can be inferred that there are potential relationships among ‘l’, ‘s’ in English and ‘e’ and ‘i’ in the Voynich.
In addition, it is obvious that triple ‘e’ is the most commonly used in German from the Table 7 above. Compare with The Voynich, it is obvious that there are potential relationship among ‘e’ in German and ‘e’, ‘i’ in the Voynich.
Moreover, triple ‘o’ is the most commonly used in Russian. As the result, it can be inferred that there are relationships among ‘o’ in Russian and ‘e’, ‘i’ in the Voynich.
Phase 2: Illustrations investigation
This phase includes three parts: statistics for illustration of each page, digits mining and conclusion. According to the section 5.4 task allocation, this part is completed by Yaxin Hu.
Searching initial numbers and possible numerical words inside images
The first part of this section is to find all initial numbers inside the images of the whole Voynich manuscript. There is a list of some part of the initial numbers below:
In order to make a comparison and mapping between initial numbers and the Voynich manuscript, all possible words that may stand for numbers. There is a list of some part of the possible words below:
Mapping all initial numbers and numerical words
When we compare the initial numbers and possible words, there can be seen some potential relationship between them, such as there are a lot of ‘s’ and ‘2’ appear in the same page (54 pairs), ‘o’ and ‘1’ for 24 pairs, ‘ol’ and ‘10’ for 14 pairs. Therefore, in order to make it simple to compare, mapping between initial numbers and possible words are made to show whether there is any relationship between them. There is a list of a part of the mapping pairs below, the other parts are shown in the section Appendix A.8.
In order to make it simple to find a more possible relationship among them, we choose the most frequency pairs for each pair, and made a new list, which is shown below, other parts are shown in the section Appendix A.9.
There can be easily seen that ‘o’ and ‘1’ appears together for 24 times. Furthermore, there are a lot of ‘ol’ and ‘10’ (14 times), ‘ol’ and ‘13’ (12 times), ‘ol’ and ‘12’ (11 times), ‘or’ and ‘10’ (19 times), ‘or’ and ‘12’ (13 times), ‘or’ and ‘13’ (12 times), ‘os’ and ‘19’ (11 times) appear together. Therefore, there is a potential relationship between ‘o’ in the Voynich manuscript and number ‘1’.
Furthermore, there are ‘r’ and ‘1’ for 48 times, ‘r’ and ‘2’ for 26 times, ‘r’ and ‘3’ for 21 times, ‘s’ and ‘2’ appear together for 54 times, ‘s’ and ‘1’ for 46 times, ‘s’ and ‘3’ for 41 times, ‘s’ and ‘5’ for 32 times, ‘y’ and ‘2’ for 36 times, ‘y’ and ‘1’ for 30 times, ‘y’ and ‘3’ for 29 times, ‘y’ and ‘5’ for 20 times. There may exist potential relationship among them, which need further investigation.
In order to make it simple to see and compare, there is a list that all the possible pairs shown below:
Phase 3: Marginal symbol research
According to the section 4.3, this phase is divided into three parts: statistics for marginal stars of each page, digits mining and conclusion. According to the section 5.4 task allocation, this phase is completed by Ruihang Feng.
Statistics for marginal stars of each page
There are 15 pages which involve marginal stars in the Voynich manuscript. As the analysis in the section 4.3, an example is shown in the Appendix section A.6. The results of this part are shown in the Appendix section A.10.
From the A.10, we can find that there are two kinds of marginal stars in the Voynich manuscript: white stars and coloured stars. In addition, A.10 also involves detailed information about the number of stars, arrangement and location in the text.
Digits analysis
In this phase, first, the number of marginal stars for each page is counted. Then, letters which may stand for digits are extracted. An example (page number: f58r) is shown in the Figure 28.
For this page, there are 3 white stars (according the Appendix section A.10) and the single letters which may stand for digits are m, o, r and s. Then all the 25 pages are counted in this way.
As the result, these 25 pages involve 16 kinds of digits: 1 3 4 5 6 7 8 9 10 12 13 14 15 16 17 and 19. Some of them stand for the total number stars of each page; some of them stand for the number of white stars or the number of the coloured stars of each page. The detailed information is shown in the Appendix section A.10.
The results of this phase are shown in the Appendix section A.11. The first column stand for those 16 kinds of digits, the information in brackets mean the number of the pages which involve that digit (for example, for the digit ‘5’, the information in brackets is 3 pages, that means there are 3 pages which involve ‘5’); the red mark represent the top several letters which has high occurrence frequency; the second column stand for the pages which involve the digits and the last column means the letters which may stand for digits.
Conclusion
According to the section 6.3.1 and 6.3.2, the conclusion of the phase ‘marginal symbol research’ is shown in the Table 19.
The letters of the first column are extracted according to the red mark in the Appendix section A.11. The forth column stand for the occurrence frequency of letters. For example, the occurrence frequency of y=5 is equal to 3/18=16.67%, ‘3’ means there are 3 pages which involve ‘y=5’ (according to the Appendix section A.10.), ‘18’ means there are 18 pages which involve ‘y’.
As the result, according to the figures above, we can find that there are the most possible potential relationships between:
- ‘y’ and ‘7’
- ‘l’ and ‘7’
- ‘r’ and ‘5’
- ‘s’ and ‘6’
- ‘o’ and ‘1’
- ‘o’ and ‘6’
- ‘ar’ and ‘13’
- ‘al’ and ‘13’
- ‘or’ and ‘13’
- ‘ol’ and ‘13’
- ‘am’ and 12
- ‘am’ and ‘19’
- ‘dy’ and ‘14’
- ‘dy and ‘19’
- ‘om’ and ‘16’
Comment on progress
In the past two semesters, the processes of this project are normal. Though we met many problems in the course of the project, such as references limitation and Matlab code error, we adjusted and modified our origin plan in time. As the result, the whole project schedule is not affected too much.
In general, we finished this project on time and reached the expected goal.
Conclusion
This project is divided into three phases: text investigation, illustration research and marginal symbol investigation. On the other hand, the major works of this project can be achieved by using computer.
In addition, the goals of this project involve three parts:
- Use statistical method Matlab to search the linguistic laws in the Voynich manuscript.
- Search laws from illustrations from the perspective of digits.
- Investigate laws from marginal symbols form the perspective of digits.
Over the past two semesters, the whole phases have been finished. As the analysis in the section 6.1, we can infer that the language which is used in the Voynich manuscript may be a branch of German.
In addition, we can get the results of the digits analysis from combining the section 6.2 and 6.3:
References
[1] R. Zandbergen (2016). The Voynich MS-Introduction [Online]. Available: http://www.voynich.nu/intro.html
[2] Kevin Knight, Sravana Reddy, What We Know About The Voynich Manuscript [Online]. Available: http://www.isi.edu/natural-language/people/voynich-11.pdf
[3] Stojko, John, Letters to God’s Eye: The Voynich Manuscript for the first time deciphered and translated into English. New York: Vantage Press, 1978.
[4] Joachim Dathe, The EVA-Transcription [Online]. Available: https://voynich2arabic.wordpress.com/eva-transcription/
[5] Vladimir Sazonov, Voynich Manuscript [Online]. Available: http://voynich.naobum.de/
[6] Reed Johnson (2013, July 9), The Unread: The Mystery Of The Voynich Manuscript [Online]. Available: http://www.newyorker.com/books/page-turner/the-unread-the-mystery-of-the-voynich-manuscript
[7] R. Zandbergen (2016), History of research of the Voynich MS [Online]. Available: http://www.voynich.nu/solvers.html#n01
[8] Mary E. D’Imperio, An Application of Cluster Analysis and Multiple Scaling to the Question of "Hands" and "Languages" in the Voynich Manuscript. Washington, DC, 1992.
[9] Pelling, Nicholas, The curse of the Voynich; the secret history of the world's most mysterious manuscript, Compelling Press, Surbiton, 2006.
[10] Bennett, William Ralph, Scientific and Engineering Problem Solving with the Computer. Englewood Cliffs: Prentice-Hall, 1976.
[11] Tiltman, John, “The Voynich Manuscript, The Most Mysterious Manuscript in the World”. NSA Technical Journal 12 (July 1967), pp.41-85.
[12] Feely, Joseph M, Roger Bacon's Cipher: The Right Key Found, Rochester, 1943.
[13] D'Imperio, Mary E, The Voynich Manuscript - an elegant enigma, Aegean Park Press, 1978.
[14] R. Zandbergen (2016), History of research of the Voynich MS [Online]. Available: http://www.voynich.nu/solvers.html#n43
[15] Wikipidia, Medieval Literature [Online]. Available: https://en.wikipedia.org/wiki/Medieval_literature#Languages
[16] Wikipidia, Letter frequency [Online]. Available: https://en.wikipedia.org/wiki/Letter_frequency
Appendix
A.1. FSG
A.2.Proposed Method
A.3. Roman Numeral
A.4. Words from the Voynich manuscript
A.5. Illustration from the Voynich manuscript
A.6. Marginal symbols from the manuscript
A.7. Digits ‘*###’
A.8. Digits ‘*###’
A9. Mapping list for most frequency letters and numbers
A.10. Statistics for marginal stars of each page
A.11. Digits mining