Difference between revisions of "Cracking the Voynich code"
(→Honours students) |
|||
Line 10: | Line 10: | ||
==Project description== | ==Project description== | ||
− | + | The Voynich Manscript is a mysterious 15th century book that no one today know what it says or who wrote it. The book is in a strange alphabet. See details here: https://en.wikipedia.org/wiki/Voynich_manuscript | |
− | + | Fortunately the whole book has been converted into an electronic format with each character changed to a convenient ascii character. We want you to write software that will search the text and perform statistical tests to get clues as to the nature of the writing. Does the document bear the statistics of a natural language or is it a fake? | |
− | + | We already have Support Vector Machine (SVM) amd Multiple Discriminant Analysis (MDA) software that you can adpat for your purposes. This software is set up to test if two texts are written by the same author or not. The great thing about our software is that it is independent of language. So you could compare it against the existing writings of Roger Bacon, who is a suspected author | |
− | + | ||
− | + | ||
− | + | ||
− | |||
==Useful notes== | ==Useful notes== | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
==Approach and methodology== | ==Approach and methodology== | ||
Line 71: | Line 60: | ||
==See also== | ==See also== | ||
− | * [[ | + | * [[Cracking the Voynich code 2014]] |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
== References and useful resources== | == References and useful resources== | ||
If you find any useful external links, list them here: | If you find any useful external links, list them here: | ||
− | |||
− | |||
* [http://ebooks.adelaide.edu.au/ Adelaide Uni Library e-book collection] | * [http://ebooks.adelaide.edu.au/ Adelaide Uni Library e-book collection] | ||
* [http://www.gutenberg.org/wiki/Main_Page Project Gutenburg e-books] | * [http://www.gutenberg.org/wiki/Main_Page Project Gutenburg e-books] | ||
* [http://onlinebooks.library.upenn.edu/archives.html#foreign Foreign language e-books] | * [http://onlinebooks.library.upenn.edu/archives.html#foreign Foreign language e-books] | ||
* [http://www.ohchr.org/EN/UDHR/Pages/Introduction.aspx UN Declaration of Human Rights - different languages] | * [http://www.ohchr.org/EN/UDHR/Pages/Introduction.aspx UN Declaration of Human Rights - different languages] | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
* [http://portal.acm.org/citation.cfm?id=1389095.1389425 Evolutionary algorithm for decryption of monoalphabetic homophonic substitution ciphers encoded as constraint satisfaction problems] | * [http://portal.acm.org/citation.cfm?id=1389095.1389425 Evolutionary algorithm for decryption of monoalphabetic homophonic substitution ciphers encoded as constraint satisfaction problems] | ||
+ | |||
==Back== | ==Back== |
Revision as of 19:21, 5 March 2014
Contents
Honours students
- 2014: Bryce Shi and Peter Roush, see Cracking the Voynich code 2014
Project guidelines
Project description
The Voynich Manscript is a mysterious 15th century book that no one today know what it says or who wrote it. The book is in a strange alphabet. See details here: https://en.wikipedia.org/wiki/Voynich_manuscript
Fortunately the whole book has been converted into an electronic format with each character changed to a convenient ascii character. We want you to write software that will search the text and perform statistical tests to get clues as to the nature of the writing. Does the document bear the statistics of a natural language or is it a fake?
We already have Support Vector Machine (SVM) amd Multiple Discriminant Analysis (MDA) software that you can adpat for your purposes. This software is set up to test if two texts are written by the same author or not. The great thing about our software is that it is independent of language. So you could compare it against the existing writings of Roger Bacon, who is a suspected author
Useful notes
Approach and methodology
You have an advantage that as engineers you know more about information theory and statistics than the average policeman or code breaking expert. You will take a structured approach to writing software code to use a process of elimination to say whether particular coding schemes were used or not.
Start with the Playfair cipher and the Vigenère cipher to begin with and you should find that you can easily test the above sequence of letters to prove the Vigenère cipher was definitely not used. Then you can go onto exploring other encryption schemes
- Note from Matthew: If you include the extra line, I'm not so sure you can prove it's not the Vigenère cipher. Also, given the date of the murder, and the dates of invention of some ciphers, there are some you could reasonably rule out (e.g. I doubt it's RSA for historical and technical reasons), however you can still implement them and try them out :). If you dig into some of the historical documents on the case you may find clues to possible decryption keys.
We would also like you to perform simple statistical tests to show if English was the most likely language or not in the original message. Also you should be able to prove if the code is the beginning letter of a sequence of words or is composed of whole words. A list of letter frequency rankings for different languages can be found here.
Then if you have time and if you are excited to take this project to a higher level you can start to check out the work of the great electrical engineer Claude Shannon and apply his techniques from information theory. You can measure the information content in the message in terms of bits for starters.
Possible extension
If you knock off this project too easily and are looking for a harder code cracking problem to try your software out on, you can progress to analyzing another famous unsolved mystery: the Voynich Manuscript
Expectations
- We don't really expect you to find the killer, though that would be cool if you do and you'll become very famous overnight.
- To get good marks we expect you to show a logical approach to decisively eliminating which coding schemes were definitely not used.
- In your conclusion, you need to come up with a short list of likely possibilities and a list of things you can definitely eliminate that the code is not.
- We expect you to critically look at the conclusions of the previous project groups and highlight to what extent your conclusions agree and where you disagree.
- We expect all the written work to be place on this wiki. No paper reports are to be handed up. Just hand up a CD with your complete project directory at the end. One CD for each group member.
- It is expected that you fill out a short progress report on the wiki each week, every Friday evening, to briefly state what you did that week and what the goals are for the following week.
- It is important to regularly see your main supervisors. Don't let more than 2 week go by without them seeing your face briefly.
- You should be making at least one formal progress meeting with supervisors per month. It does not strictly have to be exactly a month, but roughly each month you should be in a position to show some progress and have some problems and difficulties to discuss.
- The onus is on you to drive the meetings, make the appointments and set them up.
- You are expected to make a YouTube presentation of your whole project.
Relationship to possible career path
Whilst the project is fascinating as you'll learn about a specific cold case—and we do want you to have a lot of fun with it—the project does have a hard-core serious engineering side. It will familiarize you with techniques in information theory, probability, statistics, encryption, decryption, and datamining. It will also improve your software skills. The project will also involve writing software code that trawls for patterns on the world wide web (exploiting it as a huge database). This will force you to learn about search engines and databases; and the new tools you develop may lead to new IP in the area of datamining and also make you rich/famous. The types of jobs out there where these skills are useful are in computer security, comms, or in digital forensics. The types of industries that will need you are: the software industry, e-finance industry, e-security, IT industry, Google, telecoms industry, ASIO, ASIS, defence industry (e.g. DSD), etc. So go ahead and have fun with this, but keep your eye on the bigger engineering picture and try to build up an appreciation of why these techniques are useful to our industry. Now go find that killer...this message will self-destruct in five seconds :-)
See also
References and useful resources
If you find any useful external links, list them here:
- Adelaide Uni Library e-book collection
- Project Gutenburg e-books
- Foreign language e-books
- UN Declaration of Human Rights - different languages
- Evolutionary algorithm for decryption of monoalphabetic homophonic substitution ciphers encoded as constraint satisfaction problems