Editing Final Report 2012 (section)

==Methodology==

===Statistical Analysis of Letters===

[[Image:OED.png|thumb|300px|right|Initial Letters Frequency in Oxford Dictionary.]]

The initial step within this project, involved reviewing the previous work accomplished by other projects and to set up tests that would confirm the results found were consistent. The main focus for verification involved the analysis performed on the code, and the statistical analysis that lead to these results.
The results from previous years suggested that the code was most likely to be in English, and represented the initial letters of words.  To test this theory and to create some baseline results, the online Oxford English Dictionary was searched and the number of words for each letter of the alphabet was extracted. From this data, the frequency of each letter being used in English was calculated.
 
The results show that there were many inconsistencies with the Somerton Code, but can be used as a baseline for comparison with other results. The main issue with the results produced, is that most of the words that were included are not commonly used, and as such are a poor representation of the English language. Therefore the likelihood of them being used is not related to the number of words for each letter.
In order to produce some useful results for comparison a source text was found which had been translated into over 100 different [8]. The text was the Tower of Babel passage from the Bible, and consisted of approximately 100 words and 1000 characters; which allowed it to be a suitable size for testing. <ref name="Tower of Babel"> Ager, S "Translations of the Tower of Babel"; http://www.omniglot.com/babel/index.htm</ref> In order to create the frequency representation for these results a java program was created which would take in the text for each language, and output the occurrence of each of the letters. This process was repeated for 85 of the most common languages available and from this data, the standard deviation and sum of difference to the Somerton Man Code was determine.

The results were quite inconsistent with previous results, as well as each other. The top four results for both the standard deviation and sum of difference were: Sami North, Ilocano, White Hmong and Wolof; but were in different orders. These languages are all geographically inconsistent as they represent languages spoken in Eastern Europe, Southern Asia and Western Africa. This suggests that these aren’t likely to represent the language used for the Code.
Previous studies suggested that the Code represents the initial letters of words, so this theory was tested as well. The process from before was repeated, with a modified java program which would record the first letter of each word within the text. From these results, the frequencies of each letter occurring was calculated. 

[[Image:Top20SD.png|thumb|300px|left|Top 20 Standard Deviation for Initial Letters.]]

[[Image:Top20SoD.png|thumb|300px|right|Top 20 Sum of Difference for Initial Letters.]] 




















The results described in the figures above, provide more consistent results from before. The top three languages for both the sum of difference and standard deviation were, in order: Ilocano, Tagalog and English. The first two languages are from the Philippines, and since all the other information about the Somerton Man suggests he is of "Britisher" appearance, it is unlikely that he spoke either of these languages. This leaves English as the next likely, as it is the most consistent with the information known about both the code and the Somerton Man. This also reinforces the results that the code most likely represents the first letters of an unknown sentence and is consistent with results from previous years.

===Cipher GUI===

[[Image:CipherGUI.png|thumb|350px|right|Cipher GUI Interface.]]

In the course of evaluating a range of different ciphers, the group last year developed a Java program which could both encode and decode most of the ciphers they investigated. Part of our task this year was to double-check what they had done. To do this, we tested each cipher with both words and phrases being encoded and decoded, to look for errors. Whilst most ciphers were faultless, we discovered an error in the decryption of the Affine cipher that lead to junk being returned. We investigated possible causes of the error, and were able to successfully resolve a wrapping problem (going past the beginning of the alphabet C, B, A, Z, Y... was not handled correctly), so now the cipher functions correctly both ways.

We also sought to increase the range of ciphers covered. One example we found was the Baconian Cipher, whereby each letter is represented by a unique series of (usually 5) As and Bs (i.e. A = AAAAA, B = AAAAB, and so on).  Whilst this was known of during the Somerton Man's time, the use of letters besides A and B in the code clearly indicates the Baconian Cipher was not used. Despite this, the Baconian Cipher would have made a useful addition to the CipherGUI. However,  the nature of the way it was coded, and our lack of familiarity with the program, made the addition of further ciphers difficult, so we decided to focus our efforts on other aspects of the project.

Last year's group did a lot of good work developing a "Cipher Cross-off list", which went through a range of encryption methods and determined whether it was likely or not that the Somerton Man passage had been encoded using them. We analysed their conclusions, and for the most part could find no fault with their reasoning. However, they dismissed "One-Time Pads" wholesale, arguing that the frequency analysis of the code did not fit that of an OTP. However, they assumed that the frequency analysis of the OTP would be flat, since a random pattern of letters would be used. This is fine for most purposes, but if the Somerton Man had used an actual text - for example the Rubaiyat - then its frequency distribution would not be flat. It is for all intents and purposes impossible to completely discount OTPs as the source of the cipher, since the possible texts that could have been used - even ignoring a random sequence of letters - is near infinite. You cannot disprove something for which no evidence exists; you can only prove its existence by finding it.

===Web Crawler GUI and Pre-indexed Web Data===
[[Image:OldGUI.png|thumb|300px|center|Original Web Crawler GUI Interface.]]

The WebCrawler developed by last year's group working on this project is a highly competent, fully-functional program that does what it has been designed to do very well. However, upon first appearances it is intimidating, with a range of options required to be specified before a search can even be undertaken. There is a detailed and helpful README file provided with the WebCrawler, but it is well-known that most people don't resort to manuals until absolutely desperate. Therefore we sought to redesign only the front end of the WebCrawler in order to encourage more experimentation and greater usage by people interested in  getting their search results as quickly and easily as possible.

Our brief was to design something more like Google, which is immediately accessible and could not be more intuitive. To achieve this without a loss of functionality, we decided to abstract away the more complex options and provide default values for key parameters, so that the user is only required to enter their search string to perform a basic web crawl. However, if they wanted to fine-tune the experience - change the web address the search starts from, save the results to a different location, or define a proxy server connection - they had access to this through an "Advanced" menu. Once either search has commenced, the user is shown a results page which is able to display a greater number of matches at a time - meaning less scrolling and making the results more immediately accessible to the user.

[[Image:NewGUISearchPanel.png|thumb|250px|center|New Web Crawler GUI Interface.]][[Image:NewGUIOptionsPanel.png|thumb|250px|left|Advanced Options.]][[Image:NewGUIResultsPanel.png|thumb|250px|right|Displaying Results.]]


















====Pre-Indexed Web Data====
The internet is a massive place, and the time it would take to crawl every single web page - even by the WebCrawler developed by last year's group - would be prohibitively long. Therefore it was proposed that we search for a means to traverse the web faster. With search engines such as Google regularly trawling the internet and caching the data, it was suggested we find a means to access this pre-indexed data so that search times could be significantly reduced (think how quickly Google returns its search results!). However, the data collected by search engines is not publicly available, so an alternative approach was required. We looked into the possibility of using a service such as Amazon's Elastic Cloud Compute (EC2) to speed the process up. Further investigation indicated that this would require a substantial re-writing of the WebCrawler - which is designed to use a graphical user interface for its input - and only side-stepped the issue. Amazon EC2, and other similar cloud computing solutions, would only provide a means to run large numbers of instances of the WebCrawler simultaneously and, given enough instances, this would be harsh on popular websites with large numbers of links to them, which would be receiving requests from many instances in a short space of time, potentially (accidentally!) causing a Distributed Denial of Service attack as the crawlers use up all the servers' bandwidth.

Given the amount of time and effort required to fully evaluate the possibilities for speeding up trawling the internet, and the reworking of the WebCrawler code, this would make an interesting and useful project in itself for a future group to undertake.

===3D Reconstruction===

[[Image:OurBust3.jpeg|thumb|400px|right|The bust of the Somerton Man.]]

The main addition that we were tasked to make this year was to provide a means through which a positive identification of the Somerton Man could be made. In order to do this, we needed to create a likeness of the Somerton Man as he would have appeared whilst he was alive. However, the only existing photographs and other identifiers were made after his death, so some modification would be required to make him appear life-like. The best avenue for creating a likeness was a bust made of his head and shoulders after he had been dead for some while and post-autopsy. This meant that, along with being plaster-coloured and over 60 years old, the bust was not a complete likeness, with his neck flattened and his shoulders hunched due to the body having lain on a slab for several months, and a distinct ridge around his forehead where it had been cut off to examine his brain. 

This meant that our task was three-fold: firstly, we needed to find a method of creating a three-dimensional model of the bust that could be displayed on a computer; secondly, we needed to be able to manipulate the model to fix the inaccuracies introduce post-mortem - such as the forehead ridge; and thirdly we needed to be able to then add colour to the model, based on the coroner's description of his skin, hair and eyes. After completing all those tasks, we would end up with a reasonable likeness of how the Somerton Man may have appeared before he died, which could be then circulated to allow a positive identification to be made.

[[Image:SoftwareComparison.png|thumb|400px|right|Comparison of Software Considered.]]

The first step in creating the model was determining a method by which we could reconstruct the bust in three dimensions in a computer. We looked into several means for doing this, such as 123D Catch from Autodesk, PhotoModeler Scanner from PhotoModeler, and the DAVID LaserScanner from DAVID. The former two are very simple systems, whereby a series of photographs is taken of the desired subject from multiple angles all around it, and these are then analysed by a piece of software (remotely on Autodesk servers in the case of 123D Catch and on your own PC for PhotoModeler Scanner), which then attempts to create the three-dimensional model by finding common points in each photo that allow it to interpolate how the whole subject appears in three dimensions. However, having tested these software products, we found that while they were initially simple to use, they were very particular about what they could connect and you were only aware of this after attempting to combine the images. Given that we would be doing the scan on-site at the Police History Museum, we would prefer a system that immediately told you that the picture was not helpful so that time was not wasted. PhotoModeler Scanner and 123D Catch also had a tendency to still flatten the images, creating a rough approximation of the subject in three dimensions but losing smaller protrusions as two-dimensional details on the object. With the requirement to make as accurate a reconstruction as possible, this was not of a high enough standard so we were forced to reject both 123D Catch and PhotoModeler Scanner as candidates.

This left DAVID LaserScanner as our main option. This program works somewhat differently from the others, using a laser line and webcam to capture depth measurements of the subject relative to calibration panels that provide a baseline. By taking multiple scans from different angles, then combining them into one image, it is possible to create a far more accurate reconstruction of the subject. Although being slightly more fiddly to set up (requiring calibration of the camera and the use of the calibration panels), once in operation it is straightforward to operate: in a darkened room, scan the laser line vertically over the subject repeatedly, with the scan being updated in real-time to allow the user to see if they had missed any spots or if there were any problems with the scan. This process was simply repeated as many times as necessary, slowly rotating the subject after each scan was complete to get data about every side of the subject. Thus with just a few scans a rough reconstruction was possible, and more scans lead to a more accurate representation of the original subject - the more scans the better, but this needs to be offset against any time constraints that might exist. 

Once we had settled upon the DAVID LaserScanner as our solution, we immediately had an order for the kit placed. This included all the equipment required for a successful scan: webcam and stand, calibration panels and baseboard, a 5mW laser line, and the all-important DAVID LaserScanner software. However, since we were using a laser with a power output greater than 1mW, we were required to perform a Risk Assessment and provide a Standard Operating Procedure in order to enable correct usage and avoid the risks associated with lasers. We consulted Henry Ho, a technician in the School, who was able to provide us with examples of Risk Assessments and Standard Operating Procedures, and advise us on the best approach to take, what particular risks we might need to consider, and what steps might be necessary regarding the correct use of lasers. We proceeded to draft these documents, and with feedback from Henry we were able to produce versions which were submitted to the School Manager, Stephen Guest, for approval. These required minor adjustment, but were then approved and we were able to obtain the LaserScanner equipment during the mid-year break.

Once we were in possession of the kit, we first familiarised ourselves with its operation by consulting the manual and doing some dry runs with just the software. When we were relatively confident, we started doing test scans on small objects - an apple and a hat model. These scans helped us work out how long it would likely take us per scan, and so how many scans we could do in a fixed period of time. They showed us the limitations of the system - such as the difficulty in getting scans of the top and bottom of the subject, and helped us to figure out ways to overcome them. Through this process we were able to perform our first fully successful model - taking in all the scans and combining them to create a reasonable facsimile of the real-world object, the hat model.

[[Image:Bust2Left.png|thumb|170px|left|First Fully Successful Scan.]]
[[Image:Bust2Right.png|thumb|170px|right|First Fully Successful Scan.]]

One significant difference between the hat model and the actual bust was the extent of the shoulders. The hat model remains narrow because it truncates at the base of the neck, but the bust would be significantly wider due to the inclusion of the Somerton Man's shoulders. The calibration panels provided were A3 in size, and this turned out to not be large enough for our purposes; either the bust would have to be too far away from the panels for accurate reproduction, or the bust would be wider than the panels and so not being able to be scanned successfully. Therefore we had enlarged versions made which we felt would overcome this problem - though we had no way to test with certainty without access to the bust. These enlarged calibration panels were mounted on a thicker backboard to provide stability, but this also prevented them from fitting into the baseboard that came with the originals. The software required a 90 degree angle between the boards, but we were unable to guarantee this now.

After we were satisfied that we were as prepared as we could be, we contacted the SA Police Museum and were given permission to visit and scan the bust on-site. The scanning itself was conducted over a period of 2 hours, with 65 separate scans covering all angles of the bust to ensure we had enough data to make as accurate a reconstruction as possible. Fortunately, the glass case the bust was contained in had a groove around each side (for the glass to sit in) that was a perfect fit for the boards and gave a good right angle for the software to work with. Our time with the bust was a success, though we felt more time would have been beneficial - the time constraints meant that more attention ended up being focused on one side than the other, and this is evident in the resulting 3-D model.

With our time with the bust finished, we turned our attention to combining the scans into one continuous model. This was a very time-consuming and processor-intensive task, requiring combining two scans in order to achieve the best match-up, then adding in and aligning a third, then the fourth, and so on. We made several separate attempts at this task, using varying numbers of the scans, adding them in different orders, different techniques for adding them (all scans available at once to be added; then just having two, combining, then adding the third and so forth), until eventually we achieved a model we were happy with given the time constraints and limitations of the software. This model has since been sent to be made a reality using a 3-D printer within the school, and we look forward to seeing the results of our efforts.

[[Image:FinalScan.png|thumb|300px|center|Screenshot of the Final Scan.]]

====Aside: Ear Analysis====
Though the Somerton Man is yet to have been identified, numerous attempts have been made using his relatively rare physiological features. One such feature is the enlarged cymba in relation to cavum of his ears. This is the inverse of the norm, where the cavum tends to be the larger of the two. However, the relative sizes can be distorted by taking pictures at angles other than directly in line with the ear (i.e. with the camera not parallel to the ground at ear height). To investigate this effect initially, we attepted to use the DAVID LaserScanner to do a 3-D scan of a human ear (my own). However, the software is only effective with completely stationary subjects, so our attempts at scanning were not of sufficient quality for our purposes. Instead, Tom took multiple photographs of my ear from different angles, and then we used these to compute the areas of the cymba and cavum visible in each image. These areas were then turned into a ratio (cavum area/cymba area) to generate a single value which could be compared across pictures and represented graphically. The results showed that as the camera angle gets lower relative to the ear, the size of the cymba relative to the cavum increases - so pictures taken from below parallel overestimate the size of the cymba and so could be used to falsely suggest an identity for the Somerton Man. Picture 1 represents the image taken from highest above the ear (i.e. the greatest angle, with parallel to the ear representing  0°, and perpendicular to the ground representing  +90° and  -90° above and below the ear respectively) through to Picture 7 being the lowest image relative to the ear.

[[Image:CymbaCavumGraph.png|thumb|300px|right|Graph showing relationship between cavum/cymba ratio and photo angle.]]
{| class="wikitable" style="text-align: center; width: 200px; height: 200px;
|-
!    Picture !! Cymba (pixels) !! Cavum (pixels) !! Cavum/Cymba Ratio 
|-
|    1       || 223   || 2712  || 12.16143498 
|-
|    2       || 473   || 3032  || 6.410147992 
|-
|    3       || 1257  || 3451  || 2.745425617 
|-
|    4       || 1782  || 3718  || 2.086419753 
|-
|    5       || 1905  || 2732  || 1.434120735 
|-
|    6       || 2373  || 1640  || 0.691108302 
|-
|    7       || 1294  || 398   || 0.307573416 
|}