Authorship detection: 2010 group
- 1 Supervisors
- 2 Students
- 3 Weekly progress and questions
- 4 See also
- 5 Back
- François-Pierre Huchet, ITII Pays de la Loire, Nantes, France.
- J. José Alviar, University of Navarra, Spain
Weekly progress and questions
Semester 2, Week 1
Jie Dong
Progress and Status this week:
- First meeting with Derek, Brian and Maryam and other group member Leng and Tien-en.
- Derek, Brian and Maryam introduce us the basic idea of this data mining project
- The idea of authorship detection was introduced
- Several applications which data mining technique can be applied was mentioned
- Researches of past year students were mentioned and Maryam sent us several past year research report together with the code
- Research on the project, especially on SVM and some algorithms
Plan and Goals for new week:
- Prepare for the proposal seminar.
- Read research report from past years students.
- Understand project handbook.
Leng Tan
Progress and Status This Week
- the 1st meeting for the final year project was held with the supervisor, Prof Derek Abbot, co-supervisor, Dr Brian Ng, and Mrs Maryam, along with the team member.
- the initial project scope was introduced and general idea of the aim of the project is discussed.
- basic idea on the techniques of authorship detection is shown as well.
- several ideas for the future application of this project is highlighted.
- some hints on getting started was given which is to read Talis's final year report, which will be provided by Mrs Maryam.
- the first milestone of the project which is the proposal seminar is reminded.
Plan and Goals for Next Week
- fully read and understand Talis report.
- have a brief look on the code that will be supplied by Mrs Maryam.
- do some research on the background information of some controversial issues like the works on William Shakespeares, the Federalist Paper and the Letter to Hebrew.
- read through the project handbook of 2010 to have a rough idea of all the milestones of the project focusing on the project seminar.
Tien-en Phua
Progress and Status this week:
- Met up with project supervisor, Prof Derek Abbot, co-supervisor, Dr Brian Ng, and Mrs Maryam
- Derek discuss the concept behind authorship detection
- Derek explains about multi-dimensional graphs to link a disputed text to a known author.
- Discuss about possible future applications. Brian suggested code plagiarism and possibly music.
- Was provided by Maryam with other projects by students and started to go through the report by Talis.
- Went through the FYP Project handbook
Plan and Goals for new week:
- Identity the methods Talis used in his report
- Research on various methods
- Read up on past works regarding authorship detection
- Research on controvesy
Semester 2, Week 2
Jie Dong
Progress and Status this week:
- Three methods are chosen for this project: word frequency, word recurrence interval, and trigram markov model
- Reading material on SVM (SVM tutorial)
- Play with SVM software on Matlab
- Prepare slides for proposal seminar presentation on project aim, background, and part of project process
Plan and Goals for new week:
- Combine slides with other group member and do some modification
- Send slides draft to supervisor for feedback
- Do more modification
- Presentation on Thursday
Leng Tan
Progress and Status This Week
- identified 3 methods that was mentioned by Talis.
- have a brief knowledge and information of the controversial issue.
- have a brief idea on the upcoming propose seminar.
Plan and Goals for Next Week
- research on SVM.
- research on the backgroud history of the project
- research on the different technique use before in history
- prepare project proposal
Tien-en Phua
Progress and Status this week:
- Identity the three methods that Talis applied in his project, namely Word Frequency, Word Recurrence and Trigram Markvo
- Briefly understand how the three methods work
- Identity the past works done by other researchers.
- Identity three main controvesy namely the Federalist papers, Shakespeare plays and the Letter to the Hebrews
Plan and Goals for new week:
- Prepare for Project Proposal
- Develop Gantt chart, project budget and risk analysis
- Identity major milestones in project
- Write up on controvesy
- Further research on three methods
Semester 2, Week 3
Jie Dong
Progress and Status this week:
- We were introduced to Matthew and François-Pierre Huchet who are also participating in this project in Monday's meeting.
- Came up with draft(first whole draft) of proposal presentation slides. Discuss about the role of each person.
- Send slides to Brian and Matthew for feedback
- modify our slides
- Presentation on Thursday
Plan and Goals for new week:
- Do more researches for three methods and SVM
- Prepare for stage 1 design document
Leng Tan
Progress and Status This Week
- rough draft slides on the past research have been done for the propose seminar.
- a comparison list of the different technique is done.
- start research on SVM that is to be added in the slides with the different technique
- had a meeting with supervisors, and was introduced to Dr Matthew.
- focus 100% on the propose seminar.
Plan and Goals for Next Week
- have a more detailed review on the 3 methods.
- read the criteria for the stage 1 design document.
Tien-en Phua
Progress and Status this week:
- Prepare for project proposal
- Developed gantt chart, project budget and risk analysis
- Developed slides for milestones and controvesy
- Research on SVM (Support Vector Machine)
- Gain a better understanding on Word Frequency, WRI and Trigram Markvo
Plan and Goals for new week:
- Proceed to develop Stage 1 Design Document
- Understand SVM
- Develop Work Breakdown Structure
- Delegate task to individual members
- Read up on the other 4 reports
Semester 2, Week 4
Jie Dong
Progress and Status This Week
- In this project, we plan to have each person working on one method -- I am working on Trigram Markov model
- Read past reports for trigram Markov information
- Make stage 1 design document template
- Write project aim, background, and project approach in design document
Plan and Goals for Next Week
- Modify the design document draft
- Send to supervisors for feedback
- More modification
- Prepare a tutorial on SVM for other group members
Leng Tan
Progress and Status This Week
- research on the 3 methods have completed.
- fully read and understood the criteria for stage 1 design document.
- have a brief meeting with group members to delicate the tasks in preparing the stage 1 design document.
Plan and Goals for Next Week
- do a rough draft on the tasks that is allocated.
- do a layout design for the document.
Tien-en Phua
Progress and Status this week:
- Develop Work Breakdown Structure
- Identity tasks required for Stage 1 Design Document
- Broken down task and assigned to each member
- In the process of development of Stage 1 Design Document
- Further research on SVM and Word Frequency
Plan and Goals for new week:
- Complete write up on Word Frequency and SVM
- Complete Stage 1 Design Document
- Coding and further research on Word Frequency
- Read up on the other 4 reports
Semester 2, Week 5
Jie Dong
Progress and Status this week:
- Done abstract, project aim, background and significance
- Done description of data extraction part for Trigram Markov model in design document
- Feedback from supervisors on design document
- Final modification on design document
- Format the design document on wiki
Plan and Goals for Next Week:
- Design on Trigram Markov model
- learn to use SVM
- a bit coding on trigram Markov model
Leng Tan
Progress and Status this week:
- Done Literature Review of design document
- Done description of data extraction part for WRI in design document
- Done project approach and milestone for design document
- added modified WBS in appendix
- done initial check and compilation of Design document
Plan and Goals for Next Week:
- start do rough design for WRI of data extraction in java
- read SVM
Tien-en Phua
Progress and Status this week:
- Completed design document
- Project Requirements
- Description of data extraction of Function Word Frequency analysis
- Project Budget
- Background and Significance of Hebrews
- Edited Gantt Chart, WBS to synchronise
- Edited and grammar check etc
- Basic layout of software design for data extraction algorithm
- Wiki page
Plan and Goals for Next Week:
- Commence programming of algorithm using Java
- Read up on SVM
Semester 2, Week 6
Jie Dong
Progress and Status this week:
- Research on Trigram Markov model
- Two models are proposed:
- Simple Trigram Markov model: only consider the effect of trigram in the text
- Potential problem with first model: sparse data, new trigram appears in the test text, lead to poor cross entropy
- Second model: Hidden Markov model on trigram: Not only count on trigram, but also unigram and bigram effects are taken into consideration. The transition probability is consisted from all three probabilities.
- The existence of punctuation and uppercase letter should be considered for text written in English.
- Programming on text file input and exception handle in JAVA
Plan and Goals for new week:
- Discuss the models with supervisor
- SVM problem
- Programming on first model
Leng Tan
Progress and Status this week:
- Done a design for the WRI code after discussion with group members.
- written about 50% of the code for data extraction using WRI.
- read a bit on SVM but still don't understand it.
Plan and Goals for new week:
- finish the coding for WRI.
- try to get help for SVM.
Tien-en Phua
Progress and Status this week:
- Finish the design algorithm code in java for word function frequency (pseudo - code).
- Start implementing the algorithm code.
- Code have been halfway done.
Plan and Goals for new week:
- Finish coding.
- Discuss about SVM problems.
Semester 2, Week 7
Jie Dong
Progress and Status this week:
- Reading chapter about Hidden Markov Chain of "Statistical language learning"
- Came up with my own test text to verify my code is working properly
- Meeting with Brian discuss my current work, the current approach does not work efficiently
Plan and Goals for new week:
- The previous algorithm only considers effect of the trigram words. Result for a test paragraph contains a lot useless information, which about 70% of trigrams only appear once. Information which is worth using in classification is just about 10%. By extracting common trigrams from several test texts, few of them left. Hence, another enhanced model, in which unigrams and bigrams are also taken into consideration, will be tested in the following week.
- SVM will also be used to test the result in coming week. Investigating how to use SVM functions in MATLAB, svmtrain and svmclassify (Bioinformatics toolbox)
Leng Tan
Progress and Status this week:
- Finish the Java coding for WRI technique in data extraction algorithm.
- Tested and verified that the code is working properly using a small test file. (text file with only few sentences)
- Have a meeting with Brian discussing on the SVM input and output.
Plan and Goals for new week:
- Figure out SVM.
- Test and try out SVM on matlab using small test files.
Tien-en Phua
Progress and Status this week:
- Completed coding for data extraction algorithm (DEA)
- Discuss implementation of output of data from DEA to SVM
- Analyse how other researches analyse their data
Plan and Goals for new week:
- Modification and refining of DEA code
- Continue analysis of how other researches used this DEA for authorship attribution
- Try applying data to SVM
Semester 2, Week 8
Jie Dong
Leng Tan
Progress and Status this week:
- Receive a stage 1 design document on "Audio Assisted Vision System for Visually Impaired People".
- The document was fully read and take noted on presentation and various other perspective.
- The document was reviewed and a formal peer review report was produced.
- Investigation on Matlab for SVM was halted for a moment due to the peer review report.
Plan and Goals for new week:
- Figure out SVM.
- Test and try out SVM on matlab using small test files.
Tien-en Phua
Progress and Status this week:
- Complete the Data Extraction Algorithm
- Completed Peer Review
- Review Peer Document
Plan and Goals for new week:
- Apply data to SVM
- Determine progress of project
Semester 2, Week 9
Jie Dong
Leng Tan
Tien-en Phua
Semester 2, Week 10
Jie Dong
Leng Tan
Tien-en Phua
Semester 2, Week 11
Jie Dong
Leng Tan
Tien-en Phua
Semester 2, Week 12
Jie Dong
Leng Tan
Tien-en Phua
See also
- Authorship detection: Who wrote the Letter to the Hebrews?
- Critical design review 2010: Who wrote the Letter to the Hebrews?
- Final report 2010: Who wrote the Letter to the Hebrews?