Editing
Final Report 2011
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Web Crawler Investigation== ===Concept=== The directed purpose for creating the web crawler search application was to see if it could be used to search for patterns evident in the Somerton Code. This section addresses the use of the completed system to perform a trial investigation. ===Technical Challenges=== The main challenges in this trial investigation were: # Extracting relevant patterns from code # Identifying seed websites The choice of an appropriate pattern to search for is a key element in determining the quality of results. As such, great care should be used so as not to waste time and other resources on searches for patterns that don’t provide usable results. Similarly, the seed webpage should be well chosen. Given the excessive number of pages available on the internet, this trial investigation will examine only a miniscule proportion and therefore a relevant seed is important. ===Design=== [[Image:The_Code.png|thumb|170px|right|Ambiguous letters evident.]] The Somerton Code, shown to the right, demonstrates distinct patterns; for example there is “ABAB”, “TTMT”, and “AIA”. These patterns are bounded by letters which are hard to identify (e.g. first letter of first and second lines - W’s or M’s) or may be crossed out (such as the O in the third line). Several test cases were designed to investigate theories regarding the code. # Pattern Match for “ABAB”, “TTMT”, “AIA”, “TTMTSAMST”. Based on the theory of a substitution cipher. Pattern matching would be able to detect words with these patterns. # Exact Initial Match for “RGOABAB”, “TBIMPANETP”, “MLIAB”, “AIAQ”, “TTMTSAM”. Based on the English Initialism theory. These sequences are designed to avoid ambiguous letters evident in certain sections of the code. Seed websites that have been chosen for this trial investigation are poem archives. The reasoning behind this is the structure of the code suggesting a four line poem is possible, along with the fact it was found in the back of the Rubaiyat of Omar Khayyam; a book of poems. ===Results=== Searches for Pattern matches to the shorter identified patterns (“ABAB”, “TTMT”, “AIA”) returned too many results for significant analysis in the time available. The longer pattern (“TTMTSAMST”) returned no results. In future examinations it is possible the many results for smaller patterns could be analysed to determine frequencies and likelihoods. Searches for the initial letter sequences were more useful with fewer hits. Crawling approximately 50,000 websites returned only one significant result, a poem titled “My Love Is A Butterfly” by Katerina Yanchuk<ref name=Yan>Yanchuk, Katerina, ''My Love is a Butterfly'', http://www.poemhunter.com/best-poems/katerina-yanchuk/my-love-is-a-butterfly/</ref>, for the initialism “MLIAB”. The poem can be viewed [[My Love is a Butterfly|here]]. Analysis has revealed no other links to the Somerton code in this poem and as such the match is considered a coincidence. A screenshot of the successful match is provided in the following thumbnail. [[File:MLIAB match.png|200px|thumb|center|Successful match! Click to enlarge.]] <center>'''Figure 16 - Web crawler "MLIAB" initialism search match'''</center> ===Conclusion=== The investigation with the web crawler has been extremely limited due to time and resource constraints towards the end of the project. Only a few searches were executed over a tiny portion of the internet. Other limitations included the use of a wireless internet connection meaning only 5,000 pages per hour were analysed. The limited results do not rule out the theory that the pattern matching web crawler has the capacity to find the message behind the code. A significant pattern matching result was identified however has been assessed as a coincidence. Potential future work could look at distributed crawling using an Ethernet connection to speed up the process.
Summary:
Please note that all contributions to Derek may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Derek:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Tools
What links here
Related changes
Special pages
Page information