Cipher cracking 2010 weekly progress
From Derek
Contents
- 1 Weekly progress and questions
- 1.1 Semester 1 Week 1
- 1.2 Semester 1 Week 2
- 1.3 Semester 1 Week 3
- 1.4 Semester 1 Week 4
- 1.5 Semester 1 Week 5
- 1.6 Semester 1 Mid-Semester Break
- 1.7 Semester 1 Week 6
- 1.8 Semester 1 Week 7
- 1.9 Semester 1 Week 8
- 1.10 Semester 1 Week 9
- 1.11 Semester 1 Week 10
- 1.12 Semester 1 Week 11
- 1.13 Semester 1 Week 12
- 1.14 Semester 2 Week 1
- 1.15 Semester 2 Week 2
- 1.16 Semester 2 Week 3
- 1.17 Semester 2 Week 4
Weekly progress and questions
This is where you record your progress and ask questions. Make sure you update this every week. The deadline is every Friday evening. However, if you sometimes slip a little into the weekend (so long as you don't do it too often) we won't be too hard on your marks.
Please remember that we make use of this progress section to give you your project mark. Your mark will suffer if you don't complete this section.
Semester 1 Week 1
Kevin
This Week
- . Had initial group meeting.
- . Researched background of the case using Wikipedia and looking at the work from last years project.
Michael
This Week
- . Had introduction meeting with project coordinators and Kevin.
- . Researched basic case background and ciphers.
Semester 1 Week 2
Kevin
This Week
- . Looked into different web crawlers that we could use for our project.
- . Started working on Proposal Seminar.
- . Bought Secrets of Codes book to help with Proposal Seminar.
Michael
This Week
- . Developed a rough project breakdown structure.
- . Began compiling information on some available web crawling devices.
- . Constructed a basic outline for the proposal seminar.
Semester 1 Week 3
Kevin
This Week
- . Had our Proposal Seminar presentation.
- . Started to verify the past years results. Started to carefully look at the java code from the previous project group.
Next Week Plan
- . Survey several people while drunk/not drunk for random letters.
- . Finish or almost finish verifying past year results. Check if we come to the same conclusions.
- . Brush up on java writing skills.
Michael
This Week
- . Converted the work breakdown structured into a Gantt Chart.
- . Completed the Proposal Seminar and presented it on Wednesday.
- . Drew up a survey sheet for recording 45 random letters to be taken from a variety of subjects.
- . Started collecting random letter samples to explore the theory that the code is just a sequence of random letters.
Next Week Plan
- . Complete the collection of random letter samples (intoxicated samples will most likely have to wait until the following weekend :D)
- . Draw up an outline for the stage 1 design document.
- . Continue researching available web crawlers and hopefully come to a conclusion on one that will be most useful for the project.
Semester 1 Week 4
Kevin
This Week
- . Started writing up my sections of the Stage 1 Design Document.
- . Got a few random letter surveys done. Not really sure if they were that intoxicated though ;(.
- . Looked through the java code from last year's project. Fairly certain the code is working correctly and we get the same outputs.
- . Drew up an initial flow chart of what we think the web crawler code should do.
Next Week Plan
- . Finish the Stage 1 Design Document.
- . Finishing touches on confirming last year's results if needed.
- . Possibly start initial coding of the Web Crawler if time permits.
Michael
This Week
- . Gathered quite a few more random letter samples which should be a sufficient amount. Haven't had the opportunity to get intoxicated samples yet however the upcoming holidays should provide them.
- . Began compiling our Stage 1 Design Document. Acquired specific guidelines for the contents of the document that are relevant to project.
Next Week Plan
- . Main focus is to complete the Stage 1 Design Document. In doing so the project will also progress significantly in terms of knowledge on available web crawlers and specific hypotheses that should be tested for the project.
Semester 1 Week 5
Kevin
This Week
- . Focused mainly on finishing Stage 1 Design Document.
Next 2 Weeks Plan
- . Start coding basic search algorithm.
- . Finish verifying previous year's project results.
- . Read up on how web crawlers work. Come to an conclusion on which web crawler is best for us to modify.
Michael
This Week
- . Worked on and complete stage 1 design document
- . Downloaded and experimented breifly with Arachnid and Jspider web crawlers
Next 2 Weeks Plan
- . Finalise collection of random letter samples and compile results
- . continue researching and experiementing with pre existing web crawlers
- . brush up on java coding
Semester 1 Mid-Semester Break
Kevin
Last 2 Weeks
- Finished verifying previous year's code. From my best understanding of the commenting/java code, I agree with the results.
- Looked at the user guides/manuals of several different web crawlers. Still not sure which one we're going to use though.
- Started to code a basic java file to parse through text.
Next Week Plan
- Complete the Peer Review.
- Continue coding basic java code.
- Hopefully decide on which web crawler we want to use.
Michael
Last 2 Weeks
- Completed collection of random letters
- read into web crawler fundamentals to get more of an understanding of their general structure
Next Week Plan
- main focus will be to get the peer review written up
- compile random letter samples into graphs and compare with last years results
- spend some time continuing with crawler fundamentals
Semester 1 Week 6
Kevin
This Week
- Started doing the Peer Review.
- Spent most of this week doing other assignments and catching up on other courses.
Next Week Plan
- Finish the Peer Review
- Continue working on basic pattern finding java code.
Michael
This Week
- Began and completed Peer Review
- Started running small tests and playing with the Arachnid crawler. Appears to be quite slow and inconsistent.
Next Week Plan
- Keep experimenting with Arachnid.
- Try running at least 1 other crawler.
Semester 1 Week 7
Kevin
This Week
- Finished the Peer Review Report.
- Continued to work on text parsing java code. It still has quite a few errors that need to be ironed out.
Next Week Plan
- Continue modifying text parsing code and hopefully finish it.
- Play around with some of the webcrawlers to get a better understanding of how it works and how to modify the code.
Michael
This Week
- Ran some more tests with the supplied arachnid code
- read up on the J-spider crawler
Next Week Plan
- try editing arachnid to manipulate results of tests
- test and assignment for other courses next week so they will take up most of my time
Semester 1 Week 8
Kevin
This Week
- Was busy with tests and assignments in other subjects and wasn't really able to get much done :(.
Next Week Plan
- Fix and finish up basic text matching code.
- Otherwise, same goals as last week as I wasn't able to complete them.
Michael
This Week
- Spent most of my time completing assignments and studying for tests.
- Ran a few more tests with arachnid.
Next Week Plan
- Meeting with Derek and Matt to hopefully get some more guidance
- Read further into J-spider to see where that may take us
Semester 1 Week 9
Kevin
This Week
- Finished writing code to find exact matches in a text file. Looks to work exactly as I hope. Should only take a few minor changes to implement ignoring HTML code and/or searching initialisms only.
- Messed around trying to get a feel of the JSpider and Arachnid web crawlers.
Next Week Plan
- Implement above changes to java code.
- Read up on how to implement the code into the web crawler.
Michael
This Week
- Made progress with Arachnid, still having issues with consistency
- got some ideas on alternative paths from the meeting (i.e. using Wget)
Next Week Plan
- Begin outlining progress report
- Look into Wget and how to use it
- Make a decision on Arachnid and whether or not it is going to be useful
Semester 1 Week 10
Kevin
This Week
- Modified my Java code program so that it can choose to search between exact words or initialisms in a text file. Haven't yet implemented ignoring HTML code.
- Started working on the progress report.
Next Week Plan
- Main aim is to finish the progress report.
Michael
This Week
- Found an issue with Arachnid with respect to character encoding - getting null titles for UTF - 8 html pages. also often having "handleBadIO" errors which isnt very promising
- Began looking into Wget
- Started progress report
Next Week Plan
- Major focus is to complete our Progress Report
- Secondary task is to get Wget running on my laptop so that I can begin exploring with downloading web pages and pulling out text/URLs
Semester 1 Week 11
Kevin
This Week
- Still working on the progress report.
- Found some errors in my java code when attempting different cases and had to debug and modify to get it working correctly. Searching for exact initialisms seems to work perfectly now.
Next Week Plan
- Finish progress report.
- Implement ignoring HTML. Only way I can think of is ignoring characters/words between "<" and ">".
Michael
This Week
- Made a start on the progress reports
- still getting errors with character encoding but have a few things to try
Next Week Plan
- Getting the progress report will be the priority, along with assignments for other subjects
- Try to make progress with the character encoding issue if possible, otherwise this may need to wait until i get time between studying for exams
Semester 1 Week 12
Kevin
This Week
- Finished progress report.
- Implemented ignoring HTML by ignoring characters between "<" and ">".Also implemented wildcard searches when "*" is used.
- Both implemented methods seem to work, though haven't done much testing yet.
Next Week Plan
- Test and debug above implementations more.
- Figure out a way to implement "similar" initialisms.
- Read up on how to implement my code with the web crawler.
- These will most likely have to wait until after exams because I'm busy with studying and such.
Michael
This Week
- Completed the Progress Report
- looked further into Wget
Next Week Plan
- Exam study will take up most of my time for the next few weeks however after exams there is considerable time to focus solely on the project
- After exam plans:
- Get Wget working properly and try to use it to retrieve data from websites
- Decide whether to continue with Arachnid or focus on Wget
- Work with Kevin to start amalgamating our parts of the project
Semester 2 Week 1
Kevin
Last Few Weeks
- Tested ignoring HTML code and wildcards. Seemed to work properly.
- Changed code so that it could parse through multiple files.
- Added code to allow user to enter a directory for files to be parsed. There were some errors involving the directory list but it was fixed and now searches the entered directory and all sub-directories.
- The parser now also outputs the results to a text file called "Results.txt". Also had to make sure that the parser filters out files with "Result" in the name so it doesn't parse through the results file.
- Started working with Michael to bring parser and web crawler together to tabulate results.
Next Week Plan
- Get more parsing results from web sites and analyse them.
- Modify parser to work with Cyrillic alphabet for Russian, etc.
Michael
Last Few Weeks
- While researching wget I came across website mirroring software HTTrack and found it quite useful
- Developed a batch file to take an input URL from either a file or user input and collect html files using HTTrack
- Experimented with parameters and settings to avoid downloading images and to alter search depth
- Began working with Kevin to get results using his pattern algorithm
- Combined the pattern code and HTTrack to be able to parse several URLs and run pattern matching searches on the results
Next Week Plan
- Determine sites/searches of interest and collect results for initialism
- Go through Somerton Man code to find most likely initialism sequences
- Work with Kevin on his code to optimise and extend
Semester 2 Week 2
Kevin
This Week
- Worked with Michael to get some results ready for our meeting.
- Added a results summary to output file for easy reading and made the user terminal interface for code more user friendly.
- Had meeting with Derek and Matt to talk about what we've done and what we should be doing now.
Next Week Plan
- Get some results from the Bible, Shakespeare and Rubiayat text files.
- Try to work on some more of the things we talked about in our meeting.
Michael
This Week
Next Week Plan
Semester 2 Week 3
Kevin
This Week
- Was busy this week with assignments and studying for tests.
- Edited the parser to take into account periods, commas, exclamation marks, etc that separate words (ie 'word1.word2' would be read as 1 word previously, but would not count them as 2 words). This makes the parser more accurate and lowers the chances of finding false positives in the results.
Next Week Plan
- Work on stuff from last week.
Michael
This Week
- Worked with Kevin to optimise the patter code.
- Added in some English probability calculation algorithm so that results are displayed after a search.
- obtained some useful text files to run tests on and collect results
- got a few more intoxicated letter samples - still not enough for a useful result
Next Week Plan
- Main aim is to collect intoxicated letter samples and run tests with our code to compile results
Semester 2 Week 4
Michael
This Week
- Finalised collection of intoxicated letter samples
- Ran initial tests with Kevin on the Rubaiyat of Omar Khayyam and Shakespeare texts
- Developed new pattern matching algorithm to search for substitution patterns. currently works for all possible 4-symbol combinations i.e. @@@@ or @@@# or @@## etc.
Next Week Plan
- Produce graph of intoxicated letter samples and see what the result looks like
- Get the pattern matching algorithm embedded into the main "FindMatch.java" file
- Produce a 3 symbol pattern matching method as this will probably be more useful with the Mystery Code letters.
Kevin
This Week
- Integrated Michael's FindPattern code into the parser. I think he still needs to add more work to it and do some testing.
- Ran parser and got results from The Rubiayat, Shakespeare's stuff and the Bible (RSV and KJV). Used random 4 character long parts from the Somerton man code to find the expected and actual proportion of initialisms in all of the texts.
Next Week Plan
- Put the results into a readable table to easily compare.
- Try different parts of the Somerton man code.
- Try using 3 character long segments instead of 4.
- As above, but for other texts or websites that might be useful.