Authorship detection: 2010 group
Contents
- 1 Supervisors
- 2 Students
- 3 Weekly progress and questions
- 4 See also
- 5 Back
Supervisors
Collaborators
- François-Pierre Huchet, ITII Pays de la Loire, Nantes, France.
- J. José Alviar, University of Navarra, Spain
Students
Weekly progress and questions
Semester 2, Week 1
Jie Dong
Progress and Status this week:
- First meeting with Derek, Brian and Maryam and other group member Leng and Tien-en.
- Derek, Brian and Maryam introduce us the basic idea of this data mining project
- The idea of authorship detection was introduced
- Several applications which data mining technique can be applied was mentioned
- Researches of past year students were mentioned and Maryam sent us several past year research report together with the code
- Research on the project, especially on SVM and some algorithms
Plan and Goals for new week:
- Prepare for the proposal seminar.
- Read research report from past years students.
- Understand project handbook.
Leng Tan
Progress and Status This Week
- the 1st meeting for the final year project was held with the supervisor, Prof Derek Abbot, co-supervisor, Dr Brian Ng, and Mrs Maryam, along with the team member.
- the initial project scope was introduced and general idea of the aim of the project is discussed.
- basic idea on the techniques of authorship detection is shown as well.
- several ideas for the future application of this project is highlighted.
- some hints on getting started was given which is to read Talis's final year report, which will be provided by Mrs Maryam.
- the first milestone of the project which is the proposal seminar is reminded.
Plan and Goals for Next Week
- fully read and understand Talis report.
- have a brief look on the code that will be supplied by Mrs Maryam.
- do some research on the background information of some controversial issues like the works on William Shakespeares, the Federalist Paper and the Letter to Hebrew.
- read through the project handbook of 2010 to have a rough idea of all the milestones of the project focusing on the project seminar.
Tien-en Phua
Progress and Status this week:
- Met up with project supervisor, Prof Derek Abbot, co-supervisor, Dr Brian Ng, and Mrs Maryam
- Derek discuss the concept behind authorship detection
- Derek explains about multi-dimensional graphs to link a disputed text to a known author.
- Discuss about possible future applications. Brian suggested code plagiarism and possibly music.
- Was provided by Maryam with other projects by students and started to go through the report by Talis.
- Went through the FYP Project handbook
Plan and Goals for new week:
- Identity the methods Talis used in his report
- Research on various methods
- Read up on past works regarding authorship detection
- Research on controvesy
Semester 2, Week 2
Jie Dong
Progress and Status this week:
- Three methods are chosen for this project: word frequency, word recurrence interval, and trigram markov model
- Reading material on SVM (SVM tutorial)
- Play with SVM software on Matlab
- Prepare slides for proposal seminar presentation on project aim, background, and part of project process
Plan and Goals for new week:
- Combine slides with other group member and do some modification
- Send slides draft to supervisor for feedback
- Do more modification
- Presentation on Thursday
Leng Tan
Progress and Status This Week
- identified 3 methods that was mentioned by Talis.
- have a brief knowledge and information of the controversial issue.
- have a brief idea on the upcoming propose seminar.
Plan and Goals for Next Week
- research on SVM.
- research on the backgroud history of the project
- research on the different technique use before in history
- prepare project proposal
Tien-en Phua
Progress and Status this week:
- Identity the three methods that Talis applied in his project, namely Word Frequency, Word Recurrence and Trigram Markvo
- Briefly understand how the three methods work
- Identity the past works done by other researchers.
- Identity three main controvesy namely the Federalist papers, Shakespeare plays and the Letter to the Hebrews
Plan and Goals for new week:
- Prepare for Project Proposal
- Develop Gantt chart, project budget and risk analysis
- Identity major milestones in project
- Write up on controvesy
- Further research on three methods
Semester 2, Week 3
Jie Dong
Progress and Status this week:
- We were introduced to Matthew and François-Pierre Huchet who are also participating in this project in Monday's meeting.
- Came up with draft(first whole draft) of proposal presentation slides. Discuss about the role of each person.
- Send slides to Brian and Matthew for feedback
- modify our slides
- Presentation on Thursday
Plan and Goals for new week:
- Do more researches for three methods and SVM
- Prepare for stage 1 design document
Leng Tan
Progress and Status This Week
- rough draft slides on the past research have been done for the propose seminar.
- a comparison list of the different technique is done.
- start research on SVM that is to be added in the slides with the different technique
- had a meeting with supervisors, and was introduced to Dr Matthew.
- focus 100% on the propose seminar.
Plan and Goals for Next Week
- have a more detailed review on the 3 methods.
- read the criteria for the stage 1 design document.
Tien-en Phua
Progress and Status this week:
- Prepare for project proposal
- Developed gantt chart, project budget and risk analysis
- Developed slides for milestones and controvesy
- Research on SVM (Support Vector Machine)
- Gain a better understanding on Word Frequency, WRI and Trigram Markvo
Plan and Goals for new week:
- Proceed to develop Stage 1 Design Document
- Understand SVM
- Develop Work Breakdown Structure
- Delegate task to individual members
- Read up on the other 4 reports
Semester 2, Week 4
Jie Dong
Progress and Status This Week
- In this project, we plan to have each person working on one method -- I am working on Trigram Markov model
- Read past reports for trigram Markov information
- Make stage 1 design document template
- Write project aim, background, and project approach in design document
Plan and Goals for Next Week
- Modify the design document draft
- Send to supervisors for feedback
- More modification
- Prepare a tutorial on SVM for other group members
Leng Tan
Progress and Status This Week
- research on the 3 methods have completed.
- fully read and understood the criteria for stage 1 design document.
- have a brief meeting with group members to delicate the tasks in preparing the stage 1 design document.
Plan and Goals for Next Week
- do a rough draft on the tasks that is allocated.
- do a layout design for the document.
Tien-en Phua
Progress and Status this week:
- Develop Work Breakdown Structure
- Identity tasks required for Stage 1 Design Document
- Broken down task and assigned to each member
- In the process of development of Stage 1 Design Document
- Further research on SVM and Word Frequency
Plan and Goals for new week:
- Complete write up on Word Frequency and SVM
- Complete Stage 1 Design Document
- Coding and further research on Word Frequency
- Read up on the other 4 reports
Semester 2, Week 5
Jie Dong
Progress and Status this week:
- Done abstract, project aim, background and significance
- Done description of data extraction part for Trigram Markov model in design document
- Feedback from supervisors on design document
- Final modification on design document
- Format the design document on wiki
Plan and Goals for Next Week:
- Design on Trigram Markov model
- learn to use SVM
- a bit coding on trigram Markov model
Leng Tan
Progress and Status this week:
- Done Literature Review of design document
- Done description of data extraction part for WRI in design document
- Done project approach and milestone for design document
- added modified WBS in appendix
- done initial check and compilation of Design document
Plan and Goals for Next Week:
- start do rough design for WRI of data extraction in java
- read SVM
Tien-en Phua
Progress and Status this week:
- Completed design document
- Project Requirements
- Description of data extraction of Function Word Frequency analysis
- Project Budget
- Background and Significance of Hebrews
- Edited Gantt Chart, WBS to synchronise
- Edited and grammar check etc
- Basic layout of software design for data extraction algorithm
- Wiki page
Plan and Goals for Next Week:
- Commence programming of algorithm using Java
- Read up on SVM
Semester 2, Week 6
Jie Dong
Progress and Status this week:
- Research on Trigram Markov model
- Two models are proposed:
- Simple Trigram Markov model: only consider the effect of trigram in the text
- Potential problem with first model: sparse data, new trigram appears in the test text, lead to poor cross entropy
- Second model: Hidden Markov model on trigram: Not only count on trigram, but also unigram and bigram effects are taken into consideration. The transition probability is consisted from all three probabilities.
- The existence of punctuation and uppercase letter should be considered for text written in English.
- Programming on text file input and exception handle in JAVA
Plan and Goals for new week:
- Discuss the models with supervisor
- SVM problem
- Programming on first model
Leng Tan
Progress and Status this week:
- Done a design for the WRI code after discussion with group members.
- written about 50% of the code for data extraction using WRI.
- read a bit on SVM but still don't understand it.
Plan and Goals for new week:
- finish the coding for WRI.
- try to get help for SVM.
Tien-en Phua
Progress and Status this week:
- Finish the design algorithm code in java for word function frequency (pseudo - code).
- Start implementing the algorithm code.
- Code have been halfway done.
Plan and Goals for new week:
- Finish coding.
- Discuss about SVM problems.
Semester 2, Week 7
Jie Dong
Leng Tan
Progress and Status this week:
- Finish the Java coding for WRI technique in data extraction algorithm.
- Tested and verified that the code is working properly using a small test file. (text file with only few sentences)
- Have a meeting with Brian discussing on the SVM input and output.
Plan and Goals for new week:
- Figure out SVM.
- Test and try out SVM on matlab using small test files.
Tien-en Phua
Progress and Status this week:
- Completed coding for data extraction algorithm (DEA)
- Discuss implementation of output of data from DEA to SVM
- Analyse how other researches analyse their data
Plan and Goals for new week:
- Modification and refining of DEA code
- Continue analysis of how other researches used this DEA for authorship attribution
- Try applying data to SVM
Semester 2, Week 8
Tien-en Phua
Jie Dong
Leng Tan
Progress and Status this week:
- Receive a stage 1 design document on "Audio Assisted Vision System for Visually Impaired People".
- The document was fully read and take noted on presentation and various other perspective.
- The document was reviewed and a formal peer review report was produced.
- Investigation on Matlab for SVM was halted for a moment due to the peer review report.
Plan and Goals for new week:
- Figure out SVM.
- Test and try out SVM on matlab using small test files.
Semester 2, Week 9
Tien-en Phua
Jie Dong
Leng Tan
Semester 2, Week 10
Tien-en Phua
Jie Dong
Leng Tan
Semester 2, Week 11
Tien-en Phua
Jie Dong
Leng Tan
Semester 2, Week 12
Tien-en Phua
Jie Dong
Leng Tan
See also
- Authorship detection: Who wrote the Letter to the Hebrews?
- Critical design review 2010: Who wrote the Letter to the Hebrews?
- Final report 2010: Who wrote the Letter to the Hebrews?