Authorship detection: 2011 group
Supervisors
Collaborators
- François-Pierre Huchet, ITII Pays de la Loire, Nantes, France.
- Talis Putnins, BICEPS, Latvia.
- J. José Alviar, University of Navarra, Spain
2011 Students
Weekly progress and questions
Semester 2, Week 1
Yan Xie
Progress and Status this week:
- All team members had the first meeting with a Supervisor Derek, Co-supervisors Brian and Maryam
- The basic idea and various applications were introduced by Derek
- Discuss about previous attempts and further exploration on the meeting
- Research the topic about authorship detection and data mining
- Review the researches of past year students
Plan and Goals for new week:
- Further study on the past researches
- Search the proper algorithms
- Have a group meeting with the other members Kai and Zhaokun
Kai He
Progress and Status This Week
- Met with a Supervisor Derek, Co-supervisors Brian and Maryam.
- The supervisors introduced the concept of this project and discuss the outcome from last year project students
- Research on authorship detection
- Study the previous algorithms
Plan and Goals for Next Week
- Literature search training will be held next week
- Have a meeting with team members
- Research on various methods
- Read papers on authorship detection
Zhaokun Wang
Progress and Status This Week
1. Fist meeting with Derek and Brian and other group member Kai and Yan.
2. Derek and Brian introduced the outline and background about this project
3. Based on previous year researches, Derek gave some suggestion about the following research.
4. Derek passed the previous research resources to us.
 Plan and Goals for Next Week 
1. Read through and understand previous research report.
2. Research on controversy.
3. Research on various methods.
4. Prepare the proposal seminar.
Semester 2, Week 2
Yan Xie
Progress and Status this week:
- Review past year’s three methods: word frequency, word recurrence interval and trigram markov model
- On-going researches
- Attend a literature search training session with the other members
- Discuss algorithms chosen for this project on the meeting
- Prepare the proposal seminar on week 3
Plan and Goals for new week:
- Modify the slides and send it to supervisors
- Prepare the presentation
- Analysis the chosen algorithms
- Discuss the project management with the other members next meeting
Kai He
Progress and Status This Week
- Attend the literature search training
- Identity the algorithms are used in this project
- Prepare the proposal algorithms and complete the slides for presentation
- Further reading on research papers
Plan and Goals for Next Week
- Set up the Work Breakdown Structure, Milestones, Gantt Chart and Project Budget
- Send the presentation slides to supervisors
- Prepare the presentation of proposal seminar next week
- Analysis the proposal algorithms used in this project
- Research and discuss the classifier
- Have a team meeting with the other members
Zhaokun Wang
Progress and Status This Week
Plan and Goals for Next Week
Semester 2, Week 3
Yan Xie
Progress and Status this week:
- Complete the Gantt Chart, Work Breakdown Structure, Milestones, Budget and risk analysis with the other team members
- Modifications on the slides of presentation
- Prepare the presentation
- Introduce the Common N-grams
Plan and Goals for new week:
- Research on SVM classifier for the algorithm Common N-grams used
- Start to design the Common N-grams
- Make stage one progress report template
Kai He
Progress and Status This Week
- Modify the slides after getting a feedback from Brian
- Prepare the presentation this week
- Identity classifiers used with the algorithms
- Plan the upcoming goal for the proposal algorithms
- Start to design the method: Maximal Frequent Word Sequence
Plan and Goals for Next Week
- Have a detail review on the method of Maximal Frequent Word Sequence
- Understand the classifier of Naïve Bayes
- Prepare the stage one progress report
Zhaokun Wang
Progress and Status This Week
Plan and Goals for Next Week
Semester 2, Week 4
Yan Xie
Progress and Status this week:
- Work on the method of Common N-grams by using Java
- Fully read paper of the algorithm and classifier
- Discuss the design of Common N-grams with the other members
- Delegate tasks of the stage one progress report to individual members
Plan and Goals for new week:
- Complete parts of Executive Summary, Previous Studies, Coding Requirements and Tasks on Stage Two Report on the stage one progress report
- Modify Work Breakdown Structure, Risk Assessment, Milestones, Monitoring Scheme and Proposed Budget
- Complete writing on Common N-grams and SVM
- Write up the draft of the stage one progress report and send it to supervisors for feedback
- Modification on stage one progress report until deadline
Kai He
Progress and Status This Week
- Researches on the method of Maximal Frequent Word Sequence have completed
- Coding on Maximal Frequent Word Sequence
- Have a meeting with the other members to delicate the tasks of the stage one progress report
- Write Project Background and Significance, Technical Background, Motivations and Key Requirements of the stage one progress report
- Modify the stage one report with the criteria
- Grammar checking
 Plan and Goals for Next Week 
- Coding on Maximal Frequent Word Sequence
- Complete my tasks on stage one report
- Send the draft to supervisors
- Modify and format
Zhaokun Wang
Progress and Status This Week
Plan and Goals for Next Week
Semester 2, Week 5
Yan Xie
Progress and Status this week:
- Done my allocated parts of the stage one report
- Attend a group weekly meeting within the team and discuss uncompleted sections
- Help formatting
- Send the report draft to supervisors
- Modify the report after getting feedback from supervisors
Plan and Goals for new week:
- Develop the method of Common N-grams
- Read papers
- Learn to use SVM
Kai He
Progress and Status This Week
- Finish Project Background and Significance, Technical Background, Motivations and Key Requirements
- Write Input and Output Specifications, and Testing and Verification
- Help to write the part of Project Management
- Grammar checking and formatting
- Modification on the stage one progress report after getting feedback from supervisors
- Done the final version of the stage one progress report and submit
- Coding on Maximal Frequent Word Sequence
 Plan and Goals for Next Week 
- Coding on Maximal Frequent Word Sequence
- Have a meeting with the other members discussing the upcoming goals
- Review papers
Zhaokun Wang
Progress and Status This Week
Plan and Goals for Next Week
Semester 2, Week 6
Yan Xie
Progress and Status this week:
- Read the papers of the algorithm of Common N-grams
- Have a big structure of programming Common N-grams
- Review paper of SVM
- The classifier SVM – still consider how to use the produced output text file as the input of the SVM
- Participate the group meeting
Plan and Goals for new week:
- Discuss the code with the team
- Coding on Common N-grams
- Design SVM
Kai He
Progress and Status This Week
- Research on Maximal Frequent Word Sequence
- Develop the programing on Maximal Frequent Word Sequence
- Debugging
- Help the other members coding
Plan and Goals for Next Week
- Complete about 30% - 40% of the code for data extraction using Maximal Frequent Word Sequence
- Discuss classifiers
Zhaokun Wang
Progress and Status This Week
Plan and Goals for Next Week
Semester 2, Week 7
Yan Xie
Progress and Status this week:
- Discuss the Common N-grams problems with the other members
- Finish about 50% of the code for data extraction using Common N-grams
- Have a group meeting with the other two members reporting my current progress of extraction method of Common N-grams
- Introduce the stage two report
Plan and Goals for new week:
- Continue coding of Common N-grams
- Participate the meeting about stage two report with the other members
- Try to figure out how to use SVM function in MATLAB
Kai He
Progress and Status This Week
- Review the paper of
- Algorithm for Maximal Frequent Sequences in Document Clustering
- Experimenting with Maximal Frequent Sequences for Multi-Document Summarization
- Discovery of Frequent Word Sequences in Text
- Done 30% of the code for data extraction using Maximal Frequent Word Sequence
- Review the paper of Augmenting Naïve Bayes Classifiers with Statistical Language Models
- Review the criteria of stage two report
Plan and Goals for Next Week
- Coding and Debugging
- Discuss implementation of output of data from Maximal Frequent Word Sequence to Naïve Bayes Classifiers
- Prepare the stage two report
Zhaokun Wang
Progress and Status This Week
Plan and Goals for Next Week
Semester 2, Week 8
Yan Xie
Progress and Status this week:
- Coding of Common N-grams
- Discuss the project management
- Investigation on SVM in MATLAB
- Participate a meeting discuss how to apply the generate data to the classifier
Plan and Goals for new week:
- Complete software coding v1.0 at the end of Week 11
- Start to write the stage two report
- Review SVM from previous attempt
Kai He
Progress and Status This Week
- Coding and Debugging on Maximal Frequent Word Sequence
- Further research on Naïve Bayes
- Discuss the Naïve Bayes Classifier with the other members
Plan and Goals for Next Week
- Write the project management of the stage two report
- Continue coding and debugging
- Weekly meeting with the other team members
Zhaokun Wang
Progress and Status This Week
Plan and Goals for Next Week
Semester 2, Week 9
Yan Xie
Progress and Status this week:
- Add some new classes on the code of Common N-grams
- Code modification
- Weekly meeting with the other team members to report the progress of Common N-grams coding
- Write parts of Project Objectives, Background, Algorithm Programming and Project Management on the stage two report
- Get feedback of the stage one progress report from Brian
Plan and Goals for new week:
- Complete software coding v1.0 at the end of Week 11
- Continue code modification
- Testing
Kai He
Progress and Status This Week
- Done 60% of the code for data extraction using Maximal Frequent Word Sequence
- Help debugging the code of the common N-grams
- Report the code progress so far in the team meeting
- Set up the upcoming goals: Software Coding V1.0, Stage 2 Report Due, Software Testing V1.0 and Software Coding V2.0
- Start to design the training process and classification process using Naïve Bayes Classifier
Plan and Goals for Next Week
- Write the parts of Introduction, Objectives, Background, Algorithm Definition, Work Breakdown Structure, Milestones and Budgets on the stage two report
- Choose some simple text files to test
- Further research on the classifier of Naïve Bayes
Zhaokun Wang
Progress and Status This Week
Plan and Goals for Next Week
Semester 2, Week 10
Yan Xie
Progress and Status this week:
- Done most code of the common N-grams
- Delete the unused inner classes
- Discuss SVM with the other team members
Plan and Goals for new week:
- Complete software coding v1.0 at the end of Week 11
- Figure out SVM
- Try to test the code using some simple text file
- Write the stage two report
Kai He
Progress and Status This Week
- Write the stage two report
- Complete the coding of Maximal Frequent Word Sequence
- Working on modified Maximal Frequent Word Sequence
- Test efficiency using different input texts
 Plan and Goals for Next Week 
- Modify code of Maximal Frequent Word Sequence
- Design the Naïve Bayes classifier
- Report the progress in the team meeting
- Continue write the stage two report
Zhaokun Wang
Progress and Status This Week
Plan and Goals for Next Week
Semester 2, Week 11
Yan Xie
Progress and Status this week:
- Complete software coding v1.0 of the Common N-grams
- Using my own text to verify this code is working properly
- Compare using a small test file with a large test file
- Begin by building large sets of training data and testing data by randomly collecting extracted features from Author Profiles on SVM
- Done the draft of the stage two report
Plan and Goals for new week:
- Modify the stage two report
- Submit the stage two report
- Use same training data, unknown data to test two extraction algorithms
Kai He
Progress and Status This Week
- Complete the draft of the stage two report
- Grammar checking and formatting
- The output of Maximal Frequent Word Sequence code is not proper, modification is needed
 Plan and Goals for Next Week 
- Delivery the stage two report
- Complete the code of Maximal Frequent Word Sequence
- Test the output
- Have a meeting discuss the upcoming goals
Zhaokun Wang
Progress and Status This Week
Plan and Goals for Next Week
Semester 2, Week 12
Yan Xie
Progress and Status this week:
- Submit the stage two report and send it to supervisors
- Report my individual work done so far
- Report the code of the common N-grams completed and tested
- Report the progress of SVM
- Discuss the upcoming goals with the other members
Plan and Goals for new week:
- Prepare for exams
Kai He
Progress and Status This Week
- Send my stage two report to supervisors
- Weekly meeting with the other team members to report the progress of the project
Plan and Goals for Next Week
- Stop project
- Work on exams
Zhaokun Wang
Progress and Status This Week
Plan and Goals for Next Week
Semester 1, Week 1
Yan Xie
Progress and Status this week:
- Review two algorithms and three classifiers
- Group members present individual report so far on the group weekly meeting
- Work on coding SVM program
- Check the Milestones for the upcoming goals
Plan and Goals for new week:
- Email supervisors to have a meeting reporting the progress of the report
- Discuss the performance of the current progress
- Modify the SVM program
- Prepare the project description and images for project exhibition
Kai He
Progress and Status This Week
- Meet with the team members discussing the classifiers
- Simplify the code of Maximal Frequent Word Sequence
- Work on the Naïve Bayes classifier
- Do some testing
Plan and Goals for Next Week
- Arrange a time meeting with supervisors
- Discuss the key methods used in Naïve Bayes with the team
Zhaokun Wang
Progress and Status This Week
Plan and Goals for Next Week
Semester 1, Week 2
Yan Xie
Progress and Status this week:
- Confirm a meeting time with supervisors
- Complete a project description and image, also email to Braden
- Discuss SVM with the team members
- Continue working on SVM
Plan and Goals for new week:
- Meet up with supervisors
- Code modification
- Plan the upcoming goals within the team
- Test programs using English text
- Start to prepare the exhibition and final seminar
Kai He
Progress and Status This Week
- Done half of the program of the Naïve Bayes
- Change the classes in the program
- Code modification
- Check the project description and image
- Have a brief meeting with the team members
Plan and Goals for Next Week
- Have a meeting with supervisors
- Develop software
- Prepare the exhibition and final seminar
Zhaokun Wang
Progress and Status This Week
Plan and Goals for Next Week
Semester 1, Week 3
Yan Xie
Progress and Status this week:
- Get feedback from meeting with supervisors
- Consider the punctuation remove, lowercase conversion, space combination and word overlapping
- Develop the java code of the Common N-gram
- Analysis the poor result from text with chapter numbers and titles
Plan and Goals for new week:
- Complete the java code of the Common N-gram
- Test the 132 English text, Federalist Paper and Greek New Testament
Kai He
Progress and Status This Week
Plan and Goals for Next Week
Zhaokun Wang
Progress and Status This Week
Plan and Goals for Next Week
Semester 1, Week 4
Yan Xie
Progress and Status this week:
- Remove all chapter numbers and titles
- Add ranking method in the program
- Finish the code of Common N-gram
- Run the completed program on 132 English text, Federalist Paper and Greek New Testament
- Draft the structure of the final seminar PPT
Plan and Goals for new week:
- Analysis the output of tested text and consider removing tail and setting threshold in the big size of training data
- Discuss the tested result with the group members
- Prepare the slides of final seminar with the group members
Kai He
Progress and Status This Week
Plan and Goals for Next Week
Zhaokun Wang
Progress and Status This Week
Plan and Goals for Next Week
Semester 1, Week 5
Yan Xie
Progress and Status this week:
- Set threshold in the output of tested test
- Analysis the input format of SVM
- Work on preparing final seminar
Plan and Goals for new week:
- Send the draft of PPT to Brian
- PPT Slides modification
- Prepare the presentation with the group members
Kai He
Progress and Status This Week
Plan and Goals for Next Week
Zhaokun Wang
Progress and Status This Week
Plan and Goals for Next Week
Semester 1, Week 6
Yan Xie
Progress and Status this week:
- Classify all authors’ output file after setting threshold when N equals from 2 to 10
- The Java code of Common N-gram update:
- eg. In 132 English Text, when n = 2, combine six authors’ features and create a master list.
- From N=2 to N=10, it gives 9 master lists. Find each author’s features with its frequency of occurrence in the master list and only list frequencies as one part of the input format of SVM.
 
- Also classify the output files of Federalist Paper and Greek New Testament
- Finish the input format of SVM and write matlab code of SVM
Plan and Goals for new week:
- Prepare the final report and is due on week 11
- SVM code modification
- Do some testing
Kai He
Progress and Status This Week
Plan and Goals for Next Week
Zhaokun Wang
Progress and Status This Week
Plan and Goals for Next Week
Semester 1, Week 7
Yan Xie
Progress and Status this week:
- Amend the SVM matlab code
- Test the 132 English Text, Federalist Paper and Greek New Testament, and produce the output of the dispute text
- Gains the performance results and arrive to a conclusion (possible authors)
- Meet with the other group members and discuss the results
- Build the structure of the final report
Plan and Goals for new week:
- Analysis the results of the Common N-gram and compare the classification accuracy of the other algorithm of Maximal Frequent Word Sequence with group members
- Give some suggestions on potential modification
- Start working on some parts of the final report
Kai He
Progress and Status This Week
Plan and Goals for Next Week
Zhaokun Wang
Progress and Status This Week
Plan and Goals for Next Week
Semester 1, Week 8
Yan Xie
Progress and Status this week:
- Summary the algorithms of Common N-gram and Maximal Frequent Word Sequence
- Test the other text files (English New Testament) using Common N-gram algorithm and SVM classification
- Write the part of Common N-gram in the final report
- Have a meeting with the other group members discussing the upcoming goal
Plan and Goals for new week:
- Analysis the English New Testament output gained from SVM classification and also compared with using the Maximal Frequent Word Sequence algorithm and the Naïve Bayes classification
- Write the final report
Kai He
Progress and Status This Week
Plan and Goals for Next Week
Zhaokun Wang
Progress and Status This Week
Plan and Goals for Next Week
Semester 1, Week 9
Yan Xie
Progress and Status this week:
Plan and Goals for new week:
Kai He
Progress and Status This Week
Plan and Goals for Next Week
Zhaokun Wang
Progress and Status This Week
Plan and Goals for Next Week
Semester 1, Week 10
Yan Xie
Progress and Status this week:
Plan and Goals for new week:
Kai He
Progress and Status This Week
Plan and Goals for Next Week
Zhaokun Wang
Progress and Status This Week
Plan and Goals for Next Week
Semester 1, Week 11
Yan Xie
Progress and Status this week:
Plan and Goals for new week:
Kai He
Progress and Status This Week
Plan and Goals for Next Week
Zhaokun Wang
Progress and Status This Week
Plan and Goals for Next Week
Semester 1, Week 12
Yan Xie
Progress and Status this week:
Plan and Goals for new week:
Kai He
Progress and Status This Week
Plan and Goals for Next Week
Zhaokun Wang
Progress and Status This Week
Plan and Goals for Next Week
See also
- Authorship detection: Who wrote the Letter to the Hebrews?
- Minutes of Meeting 2011: Who wrote the Letter to the Hebrews?
- Proposal Seminar 2011: Who wrote the Letter to the Hebrews?
- Final Seminar 2011: Who wrote the Letter to the Hebrews?
- Stage One Progress Report 2011: Who wrote the Letter to the Hebrews?
- Stage Two Progress Report 2011: Who wrote the Letter to the Hebrews?
- Final Report 2011: Who wrote the Letter to the Hebrews?
- Youtube Video Presentation 2011: Who wrote the Letter to the Hebrews?