Cracking the Voynich Code 2016 - Final Report

From Derek
Jump to: navigation, search

Acknowledgements

First of all, We want to express our gratitude to supervisor Professor Derek Abbott. With his help, our project can be completed on time. In the course of the project, he can always give us kindly suggestions. Secondly, We want to express our gratitude to co-supervisor Doctor Brian Ng. In the course of our project, he provided many useful methods.


Abstract

The aim of this project is to crack the Voynich manuscript which is an unknown hand-written book. This book is considered to be an unknown language, cipher code or hoax. Thesis proposal is aimed to provide methods in determining possible features of the Voynich manuscript. All the methods are related to data mining, computer coding and statistical methods. There will be specific explanation of the methods that will be carried out in the whole project. Furthermore, this document provides the management of this project. In the final part, some possible hypotheses were given according to the whole searching so that it will provide breakpoint in cracking the Voynich manuscript.

Project Introduction

Background

The Voynich manuscript is a document written in unknown alphabets that was found by Wilfrid Voynich (1865-1930) in 1912 [1]. Because of the Voynich manuscript’s long history, some pages of manuscript were missing. As the result, there are almost 240 pages remaining [2]. In addition, the folios of the manuscript were numbered from f1 to f116 and each folio involved two pages, r and v.

Since this book cannot be read, it is divided into six different sections by illustrations with different styles and images:

a) Herbal:

There are one or more plants on each page, which is a format of European herbals.

b) Astronomical

There are circular diagrams such as suns, moons, and stars which suggest this part as something about astronomy or astrology.

c) Biological

Mostly naked women show that this part should be biological section.

d) Cosmological

Circular diagrams of obscure nature make this section as cosmological section.

e) Pharmaceutical

Drawings of isolated plants parts and objects resembling apothecary jars show that this section should be something about pharmaceutical.

f) Recipes

This part are full pages of text in short paragraphs.

Generally, the Voynich manuscript was made up of three parts: text, illustrations and marginal symbols.

Aim

The aim of project is using the statistics and comparison to infer that the Voynich manuscript is code, nature languages, constructed languages, cipher code or hoax from the perspective of digits.

In addition, the aims of this project also involve cracking the initial digits of the Voynich manuscript and determining the possible letters which may stand for digits.

Due to the massive number of words and illustrations in the manuscript, it is unnecessary to solve the whole manuscript in a one year project.

Motivation

In the field of linguistics, the Voynich manuscript is a representative. Researchers deem that there is a kind of useful information among the mysterious alphabets of manuscript.

In the course of this project, statistics and comparison will be applied to crack the Voynich manuscript. If the manuscript can be cracked successfully, the results of this project will be useful for linguists to compare other unknown languages.

Significance

There are many guesses about the Voynich manuscript. Because of the manuscript’s long history, many historians believe that the mysterious alphabets of the Voynich manuscript are related to ancient civilizations [3]. If manuscript can be cracked, the Voynich manuscript will be helpful for historians to explore the culture of ancient society.

In addition, the statistical method which will be used in this project is also useful in other fields, such as engineering, finance and architecture. Moreover, comparison is widely used, such as Turn-It-In, Google translate, Grammarly and Bing.

Technical Background

The major technique which will be applied in this project is data mining. Data mining is an effective method to search laws among the massive number of data and has a fantastic performance. The two major methods of data mining are statistics and comparison. Statistics is used to count the frequency of the occurrence of some special words. Comparison is served to find out relations between two languages.

In the field of linguistics, European Voynich Alphabet (EVA) is a representative digital transcription of the Voynich manuscript [4]. Then a Japanese linguist Takahashi organised the whole Voynich manuscript by using EVA [5].

Therefore, major data will be extracted from the transcription of Takahashi in the process of this project.

Moreover, other resources will be considered, such as expressions of some representative ancient languages.

Knowledge gaps

Due to the massive amount of data in the Voynich manuscript, the project requires skilled data processing technique and software programming capabilities; however, no one in this project team has ever dealt with so much data. Hence members should develop data processing ability and software programming skills.

On the other hand, the project requires particular knowledge about statistics, so members must be adept at sorting data.

Technical challenges

Technical challenges of this project involve two aspects.

First of all, it is very difficult to infer which language the author used. The language of the manuscript does not belong to any known languages [6] and even this language may have been extinct. What is more, due to the long history of the Voynich manuscript, some important information is nowhere to be searched, such as exact information about author. In that case, it is difficult to infer which language the author used from the author’s nationality. In order to solve the above problem, members must search many different languages as references and compare those languages with the language of the manuscript.

Secondly, references of cracking the Voynich manuscript are limited. Because of unknown language and mysterious illustrations in manuscript, it is difficult to crack the whole manuscript. Although there are very few words have been cracked by researchers, on one can guarantee that the results are right. In the field of linguistics, there are not recognized correct results about cracking the Voynich manuscript. In that case, it is hard to find reliable references. So members must search references from different ways and find out enough accurate references.

Related Word (the history of the Voynich manuscript research)

In the past few years, many researchers had tried to crack the Voynich manuscript by using different methods.

Mary E. D’Imperio:

In 1975, Mary E. D’Imperio was introduced to the problem of the Voynich manuscript by John Tiltman [7]. In the following years, she summed up different features of the Voynich manuscript text [8].

Nick Pelling:

Nick Pelling published his book ‘The course of the Voynich’ at 2006. Based on the illustrations in the rosettes folio of the Voynich manuscript, he believed that the manuscript originated from Milan [9].

William Ralph Bennett:

William Ralph Bennett, a Yale professor, searched the Voynich manuscript with computer. He focused on the research of text by using statistical method. Probably he was the first to note the low entropy of the Voynich manuscript text. As the result, the only language he found with entropy similar to the Voynich manuscript was Hawaiian [10].

John Tiltman:

John Tiltman was a British intelligence specialist. He cracked the text part of the Voynich manuscript with William Friedman. At last, Tiltman and Friedman suggested that the text of manuscript was a kind of artificial (constructed) language [11].

Feely:

Joseph Martin Feely was a Rochester lawyer. In 1943, Feely published a book which involved some solutions of cracking the Voynich manuscript. His solutions showed a viable method to use Latin to replace some words in the manuscript [12].

First study group:

The first study group (FSG) was founded at 1944, dissolved at 1946 [13]. Members of this organization involve:

  • Robert A.Caldwell
  • G. E. McCracken
  • Tomas A. Miller
  • Frances Puckett, later Frances Wilbur
  • Mark Rhoads
  • William M. Seaman

Under the joint efforts of those researchers, the FSG transcribed most parts of the Voynich manuscript and devised a transcription alphabet [14]. The details of the transcription alphabet are as shown in the Appendix section A.1.

Requirement

Although it is not necessary to crack the whole manuscript, there are some basic requirements as following:

  • Text investigation: find out linguistic laws from some paragraphs of the Voynich manuscript. Such as the total number of words, the frequency of some special words and the frequency of some special single letters. Then the Voynich manuscript will be compared with other known languages.
  • Illustration research: look for laws from some illustrations from the perspective of digits. Such as statistics for illustrations of each page and digits analysis.
  • Marginal symbols investigation: make a thorough inquiry about marginal symbols from the perspective of digits.
  • Code run smoothly.
  • Evaluation for results.
  • Make some assumptions which are helpful for the further research.

Proposed Method

As shown in the Appendix section A.2, the proposed methods of this project are divided into three phases.

Phase 1: Text Investigation

There are two parts in this phase: words and digits.

During the process of words research, Matlab will be used as an essential tool. Team members will attempt to search laws from three aspects:

  • The total number of words in the Voynich manuscript.
  • The characters and words which may stand for digits from some paragraphs of the manuscript.
  • The frequency of special characters and words.

On the other hand, in the course of digits investigation, team members will search for different kinds of known expressions of digits and make a comparison with the words in the Voynich manuscript. For example, the expression of digits in Roman is as shown in the Appendix section A.3. The word which is as shown in the Appendix section A.4 is extracted from the Voynich manuscript, it is obvious that the form of the word in the Appendix section A.4 is like “*##’. According to the method of comparison mentioned above, this word may mean ‘seven’ in Roman.

Phase 2: Illustrations Investigation

An illustration which is extracted from the Voynich manuscript is as shown in the Appendix section A.5.

In this phase, illustrations will be analysed by using Matlab. Generally, there are three aspects which are needed to be completed:

  • The number of different elements in the illustrations.
  • The characters which may stand for digits.
  • Match the characters and digits.


Phase 3: Marginal Symbols Research

A page which contains marginal symbols is as shown in the Appendix section A.6.

This phase also requires proficiency in programming by using Matlab. During the process of this phase, there are four major aspects:

  • Ordering and quantitative features of the marginal symbols of each page.
  • Search the characters which may stand for digits.
  • The differences between marginal symbols in each page.
  • Match the characters and digits and make inference about the relationship between characters and digits.

Project Management

Deliverables

As shown in Table 1, deliverables involve eleven parts.

Table 1. Deliverables

Work Breakdown

The details about tasks are as shown in the Appendix Section A.2. The key tasks involve three aspects:

  • Text investigation (digits).
  • Illustrations research.
  • Marginal symbols investigation.


Timeline

Timeline of project involves six parts. The specific details are as shown in the Table 2.


Table 2. Timeline

Task Allocation

Task allocation is divided into six parts:

Table 3. Task allocation

Management Strategy

Team members will be managed through a minimum of two internal meetings every week, and a minimum of one fortnightly meeting with supervisors. In addition, the preparation for each meeting involves three aspects:

  • Achievements in the past two weeks.
  • Questions about the work of the past two weeks.
  • Plan for next two weeks.

After meeting, there are two tasks:

  • Meeting review.
  • Code modification.

Budget

Budget involves four aspects:

  • 500 AUS dollars for team members.
  • Research need to be carried out further research.
  • All programs that need to be used are available on university system.
  • All major works can be achieved by using computer.


Risk Analysis

Details of risk analysis are as shown in the Table 4.

Table 4. Risk analysis

Missmanagement of Time

Due to other works in daily life, the mismanagement of time may occur. Hence each member should arrange the time in advance to avoid time clash.

Loss of Data or Files

During the process of project, there may be some accidents, such as code lost or failure of files storage. In order to avoid that kind of situation, team members should buy two or more USB flash drive to store the backup files.

Team memeber's quit

In order to avoid this case, team members should keep frequent contact with each other.

Lack of References

As the mentioned before, the references of the Voynich manuscript are limited. So members should expand the scope of research, such as Bling, Grammarly and other websites.

Health Issues

Members should pay attention to regular work and break to prevent health problems.


Results

Phase 1: Text Investigation

As the introduction in the section 5.4, ‘Text investigation’ is a cooperative task.

The Total Number of Words

In this stage, Matlab is used to count the total number of words in the Voynich manuscript. The results are shown in the Table 5.

Table 5. The total number of the Voynich manuscript

According to the Table 5, the total characters number of the Voynich manuscript is 234507. The total words number is 37104. The unique words number is 8486. The average number of characters per word is 6.32.

The Frequency of Words

In this stage, Matlab is used to count the frequency of words in the Voynich manuscript and statistics is used to analyse the characteristics of the manuscript. In addition, this phase is divided into three parts:

  • The frequency and the number of simple letters.
  • The frequency of words.
  • Comparing the Voynich manuscript with other known languages.
The frequency and the Number of Simple Letters

The results are shown in the Figure 1, Figure 2 and Figure 3.

Figure 1. Letter Frequency
Figure 2. Letter Frequency
Figure 3. Letter Frequency

As shown in the figures above, it is obvious that the frequencies of the simple letter ‘b’, ‘j’, ‘u’ and ‘w’ equal to zero, which means these letters have never appeared in the Voynich manuscript. In addition, the letter with the highest frequency (0.133) is ‘o’.

The Frequency of Words

The results are shown in the Figure 4.

Figure 4. Total Word Frequency

In the Figure 4, x axis means the words in the manuscript, y axis means the frequency. Because there are almost 8486 unique words in the Voynich manuscript, so the x axis in the Figure 4 can’t show every word. In order to analyse the words with high frequency accurately, we try to extract the first 100 words. The results are shown in the Figure 5.

Figure 5. Top 100 words

As shown in the Figure 5, the line keeps a downward trend and tends to be stable, which means the frequencies of the last few words are very low. But the x axis still can't show every word. In that case, we extract the first 20 words. The results are shown in the Figure 6 and Figure 7.

Figure 6. Top 20 words
Figure 7. Top 20 words

From the figures above, the word with the highest frequency (0.022) is ‘daiin’.

Comparing the Voynich Manuscript with Other Known Languages

As the introduction in the section 1.1, the Voynich manuscript was found in 1912. During the period of 17 Century to 18 Century, the most commonly languages are Latin, English, French and German [15]. So in this section, the project team search some references about the frequency of commonly used letters in those four languages and compare the Voynich manuscript with those four kinds of languages [16].

Part 1: The Voynich versus Latin.

The results of the occurrence frequency of letters in the Voynich manuscript and Latin are shown in the Figure 8 and Figure 9.

Figure 8. Frequency of Letters in Voynich
Figure 9. Frequency of Letters in Latin

In order to analyse conveniently, the Figure 8 and Figure 9 are changed to the form of proportion, which is shown in the Figure 10.

Figure 10. Comparison between Voynich & Latin

According to the Figure 10, it is obvious that the commonly used letters in Latin are all the capitals. Because the Takahashi edition is a transcript from the Voynich manuscript, which means the letter ‘o’ in the Takahashi edition may does not mean ‘o’, it just looks like ‘o’ in the Voynich manuscript. So in order to get the results, the correlation between the Voynich and Latin is calculated, the result is 98.60%, which means the ‘o’ in the Takahashi edition may stand for ‘I’ in Latin. In the same way, there are potential relationships between ‘e’ and ‘E’, ‘h’ and ‘A’, ‘y’ and ‘U’.

Part 2: The Voynich versus English.

The occurrence frequencies of letters in the Voynich manuscript and English are shown in the Figure 11 and Figure 12.

Figure 11. Frequency of Letters in Voynich
Figure 12. Frequency of Letters in English

The form of proportion is shown in the Figure 13.

Figure 13. Comparison between Voynich & English

According to the Figure 13, the correlation between the Voynich and Latin is calculated, the result is 97.76%.

In order to search the exact correlation between the Voynich and English, the next step is to compare the Voynich with other books which were written in English and the results are shown in the part 5.

Part 3: The Voynich versus French.

The occurrence frequencies of letters in the Voynich manuscript and French are shown in the Figure 14 and Figure 15.

Figure 14. Frequency of Letters in Voynich
Figure 15. Frequency of Letters in French

The form of proportion is shown in the Figure 16.

ComparisonbetweenVoynich&French-141.png
Figure 16. Comparison between Voynich & French

According to the Figure 16, the correlation between the Voynich and Latin is 98.11%.

Though there are some similarities between the Voynich and French as the analysis above, there are much more differences. For example, as shown in the Figure 16, there are 38 letters in total in French, only 24 letters in the Voynich manuscript. Therefore, there are many differences between the Voynich and French.

Part 4: The Voynich versus German.

The occurrence frequencies of letters in the Voynich manuscript and German are shown in the Figure 17 and Figure 18.

Figure 17. Frequency of Letters in Voynich
Figure 18. Frequency of Letters in German

The form of proportion is shown in the Figure 19.

ComparisonbetweenVoynich&German-141.png
Figure 19. Comparison between Voynich & German

According to the Figure 19, the correlation between the Voynich and Latin is 95.86%.

Though there are some similarities between the Voynich and German as the analysis above, there are some differences. For example, as shown in the Figure 19, there are 30 letters in total in German, only 24 letters in the Voynich manuscript.

In order to search the exact correlation between the Voynich and German, the next step is to compare the Voynich with other books which were written in German and the result is shown in the part 5.

Part 5: Comparing the Voynich with other books which are written in the known languages.

In order to ensure the accuracy of the results, project team search some literary classics which were written by English, French and German and compared the Voynich manuscript with those books. In order to compare them conveniently, project team extract the same number of words from every book. The results are shown in the Figure 20, Figure 21 and Figure 22.


Figure 20. Results of Books
Figure 21. Results of Books
Figure 22. Results of Books

In order to compare the Voynich manuscript with those books conveniently, line charts which can show the potential relationship between the Voynich manuscript and these books are made. The results are shown in the Figure 23, Figure 24 and Figure 25.

Figure 23. Comparison - Ratio of Unique Words

Figure 23 shows the percentage of unique words/total words. There is significant difference between the Voynich manuscript and English books (47.9%) or French books (27.7%). However, there is no significant difference between the Voynich manuscript and German (13.6%).

Figure 24. Comparison - Word Length

Figure 24 shows the word length the Voynich, English, French and German. There is small difference for the word length between the Voynich manuscript and English (6.7%) or French (6.0%). Furthermore, there is no significant difference for the word length between the Voynich manuscript and German (0.1%).

Figure 25. Comparison - Ratio Of Words Appears More Than Once

Figure 13 shows the percentage of words appear more than once /total unique words were compared. There is large difference between the Voynich manuscript and English (41.0%) or French (38.9%) or German (22.8%). However, the difference between the Voynich manuscript and German books is the smallest difference among these differences.

In addition, as the analysis in the part 4 section 6.1.2.3, the correlation between the Voynich manuscript and German is high (95.86%). So maybe the language which was used in the Voynich manuscript is a branch of German. As the result, there are potential relationships between the Voynich and German.

Digits

According to the introduction in the section 1.1, the Voynich manuscript was found in 1912. During the period of 17 Century to 18 Century, the most commonly used method of expressing digits is using Roman [14]. The method of expressing digits in Roman is shown in the Appendix section A.3. In addition, the method is introduced in the section 4.1.

The results of searching the characters with the form ‘*##’ in the Voynich manuscript are shown in the Figure 26.

Figure 26. Results of all VII pattern words

As shown in the Figure 26, the words with the form ‘*##’ in the Voynich manuscript involve: ‘aii’, ‘dee’, ‘kee’, ‘lee’, ‘oee’, ‘oii’, ‘qee’, ‘qoo’, ‘ree’, ‘see’, ‘tee’ and ‘yee’. X axis means the occurrence number of each word. As shown in the Figure 26, the most commonly used word is ‘aii’ and the occurrence number of ‘aii’ is 563. In addition, the occurrence number of ‘ree’ is the smallest, which are 2. Y axis means the positions of each word. For example, the position of 563rd ‘aii’ is 36821, which means this word is the 36821st word in the Voynich manuscript.

As the analysis above, ‘aii’ may stand for ‘seven’ in Roman (VII).

Then, the words with the form ‘*###’ are extracted by using the same method. The results are shown in the Figure 27.

Figure 27. Results of all VIII pattern words

From the Figure 27, it is obvious that the occurrence frequency of ‘aiii’ is the highest, so maybe ‘aiii’ means ‘eight’ in Roman (VIII). The exact data is shown in the Appendix section A.7.

Then, these triple words are compared with other triple words which come from the known languages: English, German and Russian. The results are shown in the Table 6, Table 7 and Table 8.

Table 6. VIII pattern words In English


Table 7. VIII pattern words In German


Table 8. VIII pattern words In Russian

According to the tables above, it is obvious that triple ‘l’ and triple ‘s’ are the most commonly used in English. In addition, triple ‘e’ and triple ‘i’ are the most commonly used in the Voynich manuscript. As the result, it can be inferred that there are potential relationships among ‘l’, ‘s’ in English and ‘e’ and ‘i’ in the Voynich.

In addition, it is obvious that triple ‘e’ is the most commonly used in German from the Table 7 above. Compare with The Voynich, it is obvious that there are potential relationship among ‘e’ in German and ‘e’, ‘i’ in the Voynich.

Moreover, triple ‘o’ is the most commonly used in Russian. As the result, it can be inferred that there are relationships among ‘o’ in Russian and ‘e’, ‘i’ in the Voynich.

Phase 2: Illustrations investigation

This phase includes three parts: statistics for illustration of each page, digits mining and conclusion. According to the section 5.4 task allocation, this part is completed by Yaxin Hu.

Searching initial numbers and possible numerical words inside images

The first part of this section is to find all initial numbers inside the images of the whole Voynich manuscript. There is a list of some part of the initial numbers below:

Table 9. A part of the initial numbers

In order to make a comparison and mapping between initial numbers and the Voynich manuscript, all possible words that may stand for numbers. There is a list of some part of the possible words below:

Possiblewords1.png
Table 10. A part of the possible words

Mapping all initial numbers and numerical words

When we compare the initial numbers and possible words, there can be seen some potential relationship between them, such as there are a lot of ‘s’ and ‘2’ appear in the same page (54 pairs), ‘o’ and ‘1’ for 24 pairs, ‘ol’ and ‘10’ for 14 pairs. Therefore, in order to make it simple to compare, mapping between initial numbers and possible words are made to show whether there is any relationship between them. There is a list of a part of the mapping pairs below, the other parts are shown in the section Appendix A.8.

Table 11.Mapping pairs for letter m
Table 12.Mapping pairs for letter n
Lettero.png
Letter02.png
Lettero3.png
Table 13.Mapping pairs for letter o
Letterr1.png
Table 14.Mapping pairs for letter r
Table 15.Mapping pairs for letter s
Lettery1.png
Table 16.Mapping pairs for letter y

In order to make it simple to find a more possible relationship among them, we choose the most frequency pairs for each pair, and made a new list, which is shown below, other parts are shown in the section Appendix A.9.

Mostfrequency.png
Mf2.png
Table 17.The most frequency pairs for each pair

There can be easily seen that ‘o’ and ‘1’ appears together for 24 times. Furthermore, there are a lot of ‘ol’ and ‘10’ (14 times), ‘ol’ and ‘13’ (12 times), ‘ol’ and ‘12’ (11 times), ‘or’ and ‘10’ (19 times), ‘or’ and ‘12’ (13 times), ‘or’ and ‘13’ (12 times), ‘os’ and ‘19’ (11 times) appear together. Therefore, there is a potential relationship between ‘o’ in the Voynich manuscript and number ‘1’.

Furthermore, there are ‘r’ and ‘1’ for 48 times, ‘r’ and ‘2’ for 26 times, ‘r’ and ‘3’ for 21 times, ‘s’ and ‘2’ appear together for 54 times, ‘s’ and ‘1’ for 46 times, ‘s’ and ‘3’ for 41 times, ‘s’ and ‘5’ for 32 times, ‘y’ and ‘2’ for 36 times, ‘y’ and ‘1’ for 30 times, ‘y’ and ‘3’ for 29 times, ‘y’ and ‘5’ for 20 times. There may exist potential relationship among them, which need further investigation.

In order to make it simple to see and compare, there is a list that all the possible pairs shown below:

Ph2con.png
Table 18.Possible relationship for letters in the Voynich manuscript and numbers

Phase 3: Marginal symbol research

According to the section 4.3, this phase is divided into three parts: statistics for marginal stars of each page, digits mining and conclusion. According to the section 5.4 task allocation, this phase is completed by Ruihang Feng.

Statistics for marginal stars of each page

There are 15 pages which involve marginal stars in the Voynich manuscript. As the analysis in the section 4.3, an example is shown in the Appendix section A.6. The results of this part are shown in the Appendix section A.10.

From the A.10, we can find that there are two kinds of marginal stars in the Voynich manuscript: white stars and coloured stars. In addition, A.10 also involves detailed information about the number of stars, arrangement and location in the text.

Digits analysis

In this phase, first, the number of marginal stars for each page is counted. Then, letters which may stand for digits are extracted. An example (page number: f58r) is shown in the Figure 28.

Figure 28: Marginal stars

For this page, there are 3 white stars (according the Appendix section A.10) and the single letters which may stand for digits are m, o, r and s. Then all the 25 pages are counted in this way.

As the result, these 25 pages involve 16 kinds of digits: 1 3 4 5 6 7 8 9 10 12 13 14 15 16 17 and 19. Some of them stand for the total number stars of each page; some of them stand for the number of white stars or the number of the coloured stars of each page. The detailed information is shown in the Appendix section A.10.

The results of this phase are shown in the Appendix section A.11. The first column stand for those 16 kinds of digits, the information in brackets mean the number of the pages which involve that digit (for example, for the digit ‘5’, the information in brackets is 3 pages, that means there are 3 pages which involve ‘5’); the red mark represent the top several letters which has high occurrence frequency; the second column stand for the pages which involve the digits and the last column means the letters which may stand for digits.

Conclusion

According to the section 6.3.1 and 6.3.2, the conclusion of the phase ‘marginal symbol research’ is shown in the Table 19.

Ph3con1.png
Ph3con2.png
Ph3con3.png
Ph3con4.png
Table 19. Conclusion

The letters of the first column are extracted according to the red mark in the Appendix section A.11. The forth column stand for the occurrence frequency of letters. For example, the occurrence frequency of y=5 is equal to 3/18=16.67%, ‘3’ means there are 3 pages which involve ‘y=5’ (according to the Appendix section A.10.), ‘18’ means there are 18 pages which involve ‘y’.

As the result, according to the figures above, we can find that there are the most possible potential relationships between:

  • ‘y’ and ‘7’
  • ‘l’ and ‘7’
  • ‘r’ and ‘5’
  • ‘s’ and ‘6’
  • ‘o’ and ‘1’
  • ‘o’ and ‘6’
  • ‘ar’ and ‘13’
  • ‘al’ and ‘13’
  • ‘or’ and ‘13’
  • ‘ol’ and ‘13’
  • ‘am’ and 12
  • ‘am’ and ‘19’
  • ‘dy’ and ‘14’
  • ‘dy and ‘19’
  • ‘om’ and ‘16’

Comment on progress

In the past two semesters, the processes of this project are normal. Though we met many problems in the course of the project, such as references limitation and Matlab code error, we adjusted and modified our origin plan in time. As the result, the whole project schedule is not affected too much.

In general, we finished this project on time and reached the expected goal.

Conclusion

This project is divided into three phases: text investigation, illustration research and marginal symbol investigation. On the other hand, the major works of this project can be achieved by using computer.

In addition, the goals of this project involve three parts:

  • Use statistical method Matlab to search the linguistic laws in the Voynich manuscript.
  • Search laws from illustrations from the perspective of digits.
  • Investigate laws from marginal symbols form the perspective of digits.

Over the past two semesters, the whole phases have been finished. As the analysis in the section 6.1, we can infer that the language which is used in the Voynich manuscript may be a branch of German.

In addition, we can get the results of the digits analysis from combining the section 6.2 and 6.3:

Finalcon.png

References

[1] R. Zandbergen (2016). The Voynich MS-Introduction [Online]. Available: http://www.voynich.nu/intro.html

[2] Kevin Knight, Sravana Reddy, What We Know About The Voynich Manuscript [Online]. Available: http://www.isi.edu/natural-language/people/voynich-11.pdf

[3] Stojko, John, Letters to God’s Eye: The Voynich Manuscript for the first time deciphered and translated into English. New York: Vantage Press, 1978.

[4] Joachim Dathe, The EVA-Transcription [Online]. Available: https://voynich2arabic.wordpress.com/eva-transcription/

[5] Vladimir Sazonov, Voynich Manuscript [Online]. Available: http://voynich.naobum.de/

[6] Reed Johnson (2013, July 9), The Unread: The Mystery Of The Voynich Manuscript [Online]. Available: http://www.newyorker.com/books/page-turner/the-unread-the-mystery-of-the-voynich-manuscript

[7] R. Zandbergen (2016), History of research of the Voynich MS [Online]. Available: http://www.voynich.nu/solvers.html#n01

[8] Mary E. D’Imperio, An Application of Cluster Analysis and Multiple Scaling to the Question of "Hands" and "Languages" in the Voynich Manuscript. Washington, DC, 1992.

[9] Pelling, Nicholas, The curse of the Voynich; the secret history of the world's most mysterious manuscript, Compelling Press, Surbiton, 2006.

[10] Bennett, William Ralph, Scientific and Engineering Problem Solving with the Computer. Englewood Cliffs: Prentice-Hall, 1976.

[11] Tiltman, John, “The Voynich Manuscript, The Most Mysterious Manuscript in the World”. NSA Technical Journal 12 (July 1967), pp.41-85.

[12] Feely, Joseph M, Roger Bacon's Cipher: The Right Key Found, Rochester, 1943.

[13] D'Imperio, Mary E, The Voynich Manuscript - an elegant enigma, Aegean Park Press, 1978.

[14] R. Zandbergen (2016), History of research of the Voynich MS [Online]. Available: http://www.voynich.nu/solvers.html#n43

[15] Wikipidia, Medieval Literature [Online]. Available: https://en.wikipedia.org/wiki/Medieval_literature#Languages

[16] Wikipidia, Letter frequency [Online]. Available: https://en.wikipedia.org/wiki/Letter_frequency

Appendix

A.1. FSG

A.1. FSG

A.2.Proposed Method

A.2.Proposed Method

A.3. Roman Numeral

A.2.Proposed Method

A.4. Words from the Voynich manuscript

A.4. Words from the Voynich manuscript

A.5. Illustration from the Voynich manuscript

A.5. Illustration from the Voynich manuscript

A.6. Marginal symbols from the manuscript

A.6. Marginal symbols from the manuscript

A.7. Digits ‘*###’

Appen71.png
A.7. Digits ‘*###’

A.8. Digits ‘*###’

M-141.png
N-141.png
O-141.png
Oc-od-ok-141.png
Ol-141.png
Om-141.png
Or-141.png
Os-141.png
Ot-141.png
Oy-141.png
P-141.png
Q-141.png
Qo-qy-141.png
R-141.png
Ra-rk-ro-ry-141.png
S-141.png
Sh-s0-ss-sy-141.png
T-141.png
Tl-to-ty.png
V-141.png
X-141.png
Y-141.png
Ya-yd-yk-yl-yy-141.png
Star-141.png

A9. Mapping list for most frequency letters and numbers

App91.png
App92.png
App93.png
App94.png
App95.png
A9. Mapping list for most frequency letters and numbers

A.10. Statistics for marginal stars of each page

App101.png
App102.png
App103.png
App104.png
A.10. Statistics for marginal stars of each page

A.11. Digits mining

Ap111.png
Ap112.png
Ap113.png
Ap114.png
Ap115.png
Ap116.png
Ap117.png
A.11. Digits mining