Editing Final Report/Thesis 2015 (section)

==Project Management - Planning and Feasibility==

===Work Breakdown/Deliverables===
The workload for this project was broken down into its main tasks. These can be seen in list form in the Final Project Gantt Chart (see Timeline section). The ''key'' deliverables are represented as milestones on the Gantt Chart. The dependencies of the tasks and deliverables can be seen in the Gantt Chart as black arrows, these are as follows: The Research Proposal and Progress Report have dependence on the Draft Research Proposal, which has dependence on the Proposal Seminar. Of the specific project tasks, Task 1 was completed first, and Tasks 2, 3 and 4 were completed in parallel. The Final Seminar Presentation, Project Exhibition Poster, Final Performance, Youtube video and Dump of final work are all dependent on the completion of the specific project tasks. The Final Report/Honours Thesis was completed in parallel with the rest of the work from the Research Proposal and Progress Report hand-up, onwards.

===Timeline===
The timeline for this project was created in the form of a Gantt Chart. The proposed Gantt Chart can be seen in Figure 42.
[[File:Project_Gantt_Chart.png|thumb|1000px|centre|'''Fig. 42:''' Proposed Project Gantt Chart]]

The final Gantt Chart after all revisions and updates can be seen in Figure 43.
[[File:Gantt_Chart_Final.png|thumb|1000px|centre|'''Fig. 43:''' Final Project Gantt Chart]]

Changes made from the originally proposed Gantt Chart to the final revised Gantt Chart include the renaming of Tasks 2 and 4 to N-Gram Search and Statistical Frequency of Letters Reanalysis.  Task 2 was completed earlier than expected, but cleaning up results for presentation and finding meaningful combinations of the results proved to take longer than expected, and so the second part of Task 2 was extended.  Task 3 was also extended so that Jikai was able to complete this task.  Task 4 was commenced earlier than proposed since the bulk of Task 2 was completed early.  Due to this, Task 4 was completed in parallel with Tasks 2 and 3 towards the end of the project timeline.  The dump of final work and project youtube video were moved to be completed after the due date of the Final Report/Thesis upon discussion with our supervisors.  Overall, our initially proposed Gantt Chart estimated our project timeline quite accurately and only minor changes needed to be made.

===Task Allocation===
The workload for the tasks within this project were allocated based on the strengths and skillset of each member, as well as the estimated time taken and complexity of each task. A table of the project task allocation can be seen in Figure 44. The key allocations were that Nicholas Gencarelli undertook the tasks of Project Management, N-Gram Search and the Project Exhibition Poster. Jikai Yang undertook the tasks of the use of the ''Rubaiyat of Omar Khayyam'' as a ''One-time Pad'', and the project Youtube video. The allocations did not require changing throughout the project life cycle apart from the decision for both members to perform a statistical reanalysis for Task 4 rather than both analysing the mass spectrometer data from the Somerton Man's hair.
[[File:Table_of_Task_Allocation_Final.png|thumb|500px|centre|'''Fig. 44:''' Table of Task Allocation]]

===Management Strategy===
A number of management strategies were adopted for use throughout the project. One of which was frequent face-to-face contact through regular meetings every 2-3 weeks. Another was regular communication between group members via text message and email. Collaboration is another strategy that was useful, if one member required assistance on a particular task, the other was able to step in and help. This was achieved through the use of flexible task allocation. The group was able to make use of collaborative software including Google Drive for working together on project documents, and Git Hub repository for working together on code for software development. The project Wiki page was updated in real time including the weekly progress section to monitor and review work completed by each member every week, as well as plan tasks for the upcoming week. Finally, the use of a Gantt chart was used as a management strategy to incorporate clearly defined task and goals and established a critical path through use of task dependencies.

===Budget===
The project budget for this honours group was set at 500 dollars at the commencement of the project.  It was initially proposed for the budget to depend on the n-gram database chosen to be used for the search engine in Task 2.  As discussed in the Method section of Task 2: N-Gram Search, a variety of options were considered and the main two largest databases were found to be Microsoft Web N-Gram Services<ref>C. X. Zhai et al. (2010, July 19-23). Web N-gram Workshop [online]. Available: http://research.microsoft.com/en-us/events/webngram/sigir2010web_ngram_workshop_proceedings.pdf</ref>, and Google N-Gram <ref>Google Books. (2012 July). Ngram Viewer [Online]. Available: http://storage.googleapis.com/books/ngrams/books/datasetsv2.html</ref>.  

The Microsoft alternative was found to be free to use for academic purposes after applying for a user token, and is stored for free on Microsoft’s web server, hence there was no need to purchase storage upon which to store the database<ref>C. X. Zhai et al. (2010, July 19-23). Web N-gram Workshop [online]. Available: http://research.microsoft.com/en-us/events/webngram/sigir2010web_ngram_workshop_proceedings.pdf</ref>.

The Google alternative was available for free when obtaining the raw dataset, or at a cost of 150 dollars for a student license when purchased from the University of Pennsylvania Linguistic Data Consortium <ref>T.Brants and A.Franz. (2006). Web 1T 5-gram Version 1 [online]. Available: https://catalog.ldc.upenn.edu/LDC2006T13</ref>.  Unlike the Microsoft alternative, if the Google N-Gram option was chosen, a portion of the budget would have had to be dedicated to storing the database.  It was initially proposed to store the database on a hard drive at a cost of approximately 100 dollars. 

The proposed budget can be seen in the tables highlighting the key costs of each option in Figure 45.    
[[File:Proposed_Budget.png|thumb|500px|centre|'''Fig. 45:''' Proposed Budgeting Table]]

For reasons discussed in the Method section of Task 2: N-Gram Search, upon deciding to use the Google N-Gram database, a decision was to be made whether to purchase the University of Pennsylvania's Linguistic Data Consortium version or to obtain it for free directly from Google.  A decision was made to utilise the free database provided by Google as it was not deemed justifiable to spend $150 on the processed data from the Linguistic Data Consortium since it was proposed that the raw dataset could be cleaned up through writing software.

The initial budget was based on the assumption that the Google N-Gram database could be stored locally, although this was feasibly possible in its compressed form, the local computing power available would have been insufficient to run the search engine code through the database within a the time frame of the project.  As discussed in the Method section of Task 2: N-Gram Search, a cloud based computing service called ‘Amazon Elastic Compute Cloud’ was utilised to store and process the database.  The free tier was considered but did not provide the specifications required to meet the needs of our task, and so instances on Amazon EC2 were hired at a rate of 0.853 dollars per hour <ref>Amazon Web Services. (2015). Amazon EC2 Pricing [Online]. Available: https://aws.amazon.com/ec2/pricing/</ref>.  Upon storing the initial full database, running our search code, and downloading our results generated from the outputs of the code, the total cost of utilising the service came to 576 dollars.   This caused our project to exceed the initially proposed budget.  The reason for the additional project expenditure was that despite our efforts, it was difficult to predict the precise time that it would take to upload, store and process the database on the cloud service.  The initially proposed budget did not include the need or costing for the Amazon server since this was not something that could be reasonably foreseen at the start of the project since it was initially thought that the Microsoft N-Gram Service would be suitable for the needs of the project, and if this was not suitable, that the Google N-gram alternative would be able to be stored locally.  

The final revised budget including total project expenditure can be seen in Figure 46. 
[[File:Final_Budget.png|thumb|500px|centre|'''Fig. 46:''' Final Budgeting Table]]

In conclusion, despite going over budget, the additional funds were kindly provided by the school of Electrical and Electronic engineering upon sending an application for funding including justification of our purchases.  The project work has benefited through the purchase of the Amazon service since we were able to complete a search of specific n-gram combinations of the code on the full Google N-Gram database.  It has provided us with results to present as part of our thesis and allowed us to meet the requirements set out in the aim of Task 2.

===Risk Analysis===
A risk assessment was undertaken for this project to include risk identification, analysis, evaluation and treatment strategies using the Adelaide University risk matrix procedure <ref>No Author. RISK MANAGEMENT HANDBOOK [online]. Available: http://www.adelaide.edu.au/legalandrisk/docs/resources/Risk_Management_Handbook.pdf</ref>. This can be seen in Figure 47. One of the risks that occurred during the project was the inaccurate estimation of time and resources. This occurred since the group and supervisors were unhappy with the results obtained from the initial analysis of letter frequency performed in Task 1.  This was rectified by implementing the flexibility of our schedule and by replacing the initially proposed Task 4: Mass Spectrometer Data Analysis, with a new Task 4: Statistical Frequency of Letters Reanalysis.  Another risk that occurred throughout the project was Illness.  This was able to be dealt with relatively easily through working from home for a short period of time.  The minor misunderstanding of project tasks occurred on a few occasions, but these were clarified through scheduling meetings with group members and supervisors.  Bugs in code were reduced to the best of our ability through thorough testing and debugging of code.  Finally, the inability to decipher the Somerton Man Code was a risk estimated with an almost certain likelihood.  Despite being unable to avoid this risk throughout the project, its effects were considered negligible, and the group was still able to complete all work to the best of its ability, and further the research into the decryption of the code for not only future honours groups, but also the wider community through publishing our results on our Wiki.

[[File:Risk_Assessment_Final.png|thumb|500px|centre|'''Fig. 47:''' Table of Risk Assessment]]