Difference between revisions of "Authorship detection: 2011 group"

From Derek
Jump to: navigation, search
(Created page with '== Supervisors == *Prof Derek Abbott *Dr Matthew Berryman *Dr Brian Ng *Mrs Maryam Ebrahimpour ...')
 
 
(92 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
== Supervisors ==
 
== Supervisors ==
 
*[[Derek Abbott|Prof Derek Abbott]]
 
*[[Derek Abbott|Prof Derek Abbott]]
*[[Matthew J. Berryman|Dr Matthew Berryman]]
+
*[[Matthew Berryman|Dr Matthew Berryman]]
 
*[[Brian W.-H. Ng|Dr Brian Ng]]
 
*[[Brian W.-H. Ng|Dr Brian Ng]]
 
*[[Maryam Ebrahimpour|Mrs Maryam Ebrahimpour]]
 
*[[Maryam Ebrahimpour|Mrs Maryam Ebrahimpour]]
 +
 
===Collaborators===
 
===Collaborators===
*[[François-Pierre Huchet]], ITII Pays de la Loire, Nantes, France.
+
*[[Francois Huchet|François-Pierre Huchet]], ITII Pays de la Loire, Nantes, France.
 +
*[[Talis Putnins]], BICEPS, Latvia.
 
*[[J. José Alviar]], University of Navarra, Spain
 
*[[J. José Alviar]], University of Navarra, Spain
  
==Students==
+
==2011 Students==
*[[Jie Dong]]
+
 
*[[Leng Tan]]
+
*[[Yan Xie]]
*[[Tien-en Phua]]
+
*[[Kai He]]
 +
*[[Zhaokun Wang]]  
  
 
== Weekly progress and questions ==
 
== Weekly progress and questions ==
Line 17: Line 20:
 
===Semester 2, Week 1===
 
===Semester 2, Week 1===
  
====Jie Dong====
+
====Yan Xie====
 
'''Progress and Status this week:'''
 
'''Progress and Status this week:'''
# First meeting with Derek, Brian and Maryam and other group member Leng and Tien-en.
+
# All team members had the first meeting with a Supervisor Derek, Co-supervisors Brian and Maryam
# Derek, Brian and Maryam introduce us the basic idea of this data mining project
+
# The basic idea and various applications were introduced by Derek
# The idea of authorship detection was introduced
+
# Discuss about previous attempts and further exploration on the meeting
# Several applications which data mining technique can be applied was mentioned
+
# Research the topic about authorship detection and data mining
# Researches of past year students were mentioned and Maryam sent us several past year research report together with the code
+
# Review the researches of past year students
# Research on the project, especially on SVM and some algorithms
+
  
 
'''Plan and Goals for new week:'''
 
'''Plan and Goals for new week:'''
# Prepare for the proposal seminar.
+
# Further study on the past researches
# Read research report from past years students.
+
# Search the proper algorithms
# Understand project handbook.
+
# Have a group meeting with the other members Kai and Zhaokun
  
====Leng Tan====
+
====Kai He====
 
'''Progress and Status This Week'''
 
'''Progress and Status This Week'''
  
# the 1st meeting for the final year project was held with the supervisor, Prof Derek Abbot, co-supervisor, Dr Brian Ng, and Mrs Maryam, along with the team member.
+
# Met with a Supervisor Derek, Co-supervisors Brian and Maryam.  
# the initial project scope was introduced and general idea of the aim of the project is discussed.
+
# The supervisors introduced the concept of this project and discuss the outcome from last year project students
# basic idea on the techniques of authorship detection is shown as well.
+
# Research on authorship detection  
# several ideas for the future application of this project is highlighted.
+
# Study the previous algorithms
# some hints on getting started was given which is to read Talis's final year report, which will be provided by Mrs Maryam.
+
# the first milestone of the project which is the proposal seminar is reminded.
+
  
 
''' Plan and Goals for Next Week '''
 
''' Plan and Goals for Next Week '''
  
# fully read and understand Talis report.
+
# Literature search training will be held next week
# have a brief look on the code that will be supplied by Mrs Maryam.
+
# Have a meeting with team members
# do some research on the background information of some controversial issues like the works on William Shakespeares, the Federalist Paper and the Letter to Hebrew.
+
# Research on various methods
# read through the project handbook of 2010 to have a rough idea of all the milestones of the project focusing on the project seminar.
+
# Read papers on authorship detection
  
====Tien-en Phua====
+
====Zhaokun Wang====
'''Progress and Status this week:'''  
+
'''Progress and Status This Week'''
# Met up with project supervisor, Prof Derek Abbot, co-supervisor, Dr Brian Ng, and Mrs Maryam
+
# Derek discuss the concept behind authorship detection
+
# Derek explains about multi-dimensional graphs to link a disputed text to a known author.
+
# Discuss about possible future applications. Brian suggested code plagiarism and possibly music.
+
# Was provided by Maryam with other projects by students and started to go through the report by Talis. 
+
# Went through the FYP Project handbook
+
  
'''Plan and Goals for new week:'''  
+
1. Fist meeting with Derek and Brian and other group member Kai and Yan.
# Identity the methods Talis used in his report
+
 
# Research on various methods
+
2. Derek and Brian introduced the outline and background about this project
# Read up on past works regarding authorship detection
+
 
# Research on controvesy
+
3. Based on previous year researches, Derek gave some suggestion about the following research.
 +
 
 +
4. Derek passed the previous research resources to us.
 +
 
 +
 
 +
''' Plan and Goals for Next Week '''
 +
 
 +
1. Read through and understand previous research report.
 +
 
 +
2. Research on controversy.
 +
 
 +
3. Research on various methods.
 +
 
 +
4. Prepare the proposal seminar.
  
 
===Semester 2, Week 2===
 
===Semester 2, Week 2===
  
====Jie Dong====
+
====Yan Xie====
 
'''Progress and Status this week:'''
 
'''Progress and Status this week:'''
# Three methods are chosen for this project: word frequency, word recurrence interval, and trigram markov model
+
# Review past year’s three methods: word frequency, word recurrence interval and trigram markov model
# Reading material on SVM (SVM tutorial)
+
# On-going researches
# Play with SVM software on Matlab
+
# Attend a literature search training session with the other members
# Prepare slides for proposal seminar presentation on project aim, background, and part of project process
+
# Discuss algorithms chosen for this project on the meeting
 +
# Prepare the proposal seminar on week 3
  
 
'''Plan and Goals for new week:'''
 
'''Plan and Goals for new week:'''
# Combine slides with other group member and do some modification
+
# Modify the slides and send it to supervisors
# Send slides draft to supervisor for feedback
+
# Prepare the presentation
# Do more modification
+
# Analysis the chosen algorithms
# Presentation on Thursday
+
# Discuss the project management with the other members next meeting
  
====Leng Tan====
+
====Kai He====
 
'''Progress and Status This Week'''
 
'''Progress and Status This Week'''
  
# identified 3 methods that was mentioned by Talis.
+
# Attend the literature search training
# have a brief knowledge and information of the controversial issue.
+
# Identity the algorithms are used in this project
# have a brief idea on the upcoming propose seminar.
+
# Prepare the proposal algorithms and complete the slides for presentation
 +
# Further reading on research papers
  
'''Plan and Goals for Next Week '''
+
''' Plan and Goals for Next Week '''
  
# research on SVM.
+
# Set up the Work Breakdown Structure, Milestones, Gantt Chart and Project Budget
# research on the backgroud history of the project
+
# Send the presentation slides to supervisors
# research on the different technique use before in history
+
# Prepare the presentation of proposal seminar next week
# prepare project proposal
+
# Analysis the proposal algorithms used in this project
 +
# Research and discuss the classifier
 +
# Have a team meeting with the other members
  
====Tien-en Phua====
+
====Zhaokun Wang====
'''Progress and Status this week:'''  
+
'''Progress and Status This Week'''
# Identity the three methods that Talis applied in his project, namely Word Frequency, Word Recurrence and Trigram Markvo
+
# Briefly understand how the three methods work
+
# Identity the past works done by other researchers.
+
# Identity three main controvesy namely the Federalist papers, Shakespeare plays and the Letter to the Hebrews
+
  
'''Plan and Goals for new week:'''  
+
# Abstract on proposal seminar.
# Prepare for Project Proposal
+
# Allocate seminar role for each group member.
# Develop Gantt chart, project budget and risk analysis
+
# Prepare outline PowerPoint slides.
# Identity major milestones in project
+
# Identify the brief idea on the project.
# Write up on controvesy
+
 
# Further research on three methods
+
 
 +
''' Plan and Goals for Next Week '''
 +
# Present proposal seminar.
 +
# Identify the methods on project.
 +
# Identify classifiers on project.
  
 
===Semester 2, Week 3===
 
===Semester 2, Week 3===
  
====Jie Dong====
+
====Yan Xie====
 
'''Progress and Status this week:'''
 
'''Progress and Status this week:'''
# We were introduced to Matthew and François-Pierre Huchet who are also participating in this project in Monday's meeting.
+
# Complete the Gantt Chart, Work Breakdown Structure, Milestones, Budget and risk analysis with the other team members
# Came up with draft(first whole draft) of proposal presentation slides. Discuss about the role of each person.
+
# Modifications on the slides of presentation
# Send slides to Brian and Matthew for feedback
+
# Prepare the presentation
# modify our slides
+
# Introduce the Common N-grams
# Presentation on Thursday
+
  
 
'''Plan and Goals for new week:'''
 
'''Plan and Goals for new week:'''
# Do more researches for three methods and SVM
+
# Research on SVM classifier for the algorithm Common N-grams used
# Prepare for stage 1 design document
+
# Start to design the Common N-grams
 
+
# Make stage one progress report template
====Leng Tan====
+
  
 +
====Kai He====
 
'''Progress and Status This Week'''
 
'''Progress and Status This Week'''
  
# rough draft slides on the past research have been done for the propose seminar.
+
# Modify the slides after getting a feedback from Brian
# a comparison list of the different technique is done.
+
# Prepare the presentation this week
# start research on SVM that is to be added in the slides with the different technique
+
# Identity classifiers used with the algorithms
# had a meeting with supervisors, and was introduced to Dr Matthew.
+
# Plan the upcoming goal for the proposal algorithms
# focus 100% on the propose seminar.
+
# Start to design the method: Maximal Frequent Word Sequence
  
'''Plan and Goals for Next Week'''
+
''' Plan and Goals for Next Week '''
  
# have a more detailed review on the 3 methods.
+
# Have a detail review on the method of Maximal Frequent Word Sequence
# read the criteria for the stage 1 design document.
+
# Understand the classifier of Naïve Bayes
 +
# Prepare the stage one progress report
  
====Tien-en Phua====
+
====Zhaokun Wang====
'''Progress and Status this week:'''  
+
'''Progress and Status This Week'''
# Prepare for project proposal
+
# Developed gantt chart, project budget and risk analysis
+
# Developed slides for milestones and controvesy
+
# Research on SVM (Support Vector Machine)
+
# Gain a better understanding on Word Frequency, WRI and Trigram Markvo
+
  
'''Plan and Goals for new week:'''  
+
# Discuss about proposal slides with Brian.
# Proceed to develop Stage 1 Design Document
+
# Modify the slides.
# Understand SVM
+
# Present proposal seminar.
# Develop Work Breakdown Structure
+
 
# Delegate task to individual members
+
 
# Read up on the other 4 reports
+
''' Plan and Goals for Next Week '''
 +
 
 +
# Further researches about methods.
 +
# Prepare for stage one report
  
 
===Semester 2, Week 4===
 
===Semester 2, Week 4===
  
====Jie Dong====
+
====Yan Xie====
 +
'''Progress and Status this week:'''
 +
# Work on the method of Common N-grams by using Java
 +
# Fully read paper of the algorithm and classifier
 +
# Discuss the design of Common N-grams with the other members
 +
# Delegate tasks of the stage one progress report to individual members
 +
 
 +
'''Plan and Goals for new week:'''
 +
# Complete parts of Executive Summary, Previous Studies, Coding Requirements and Tasks on Stage Two Report on the stage one progress report
 +
# Modify Work Breakdown Structure, Risk Assessment, Milestones, Monitoring Scheme and Proposed Budget
 +
# Complete writing on Common N-grams and SVM
 +
# Write up the draft of the stage one progress report and send it to supervisors for feedback
 +
# Modification on stage one progress report until deadline
  
 +
====Kai He====
 
'''Progress and Status This Week'''
 
'''Progress and Status This Week'''
# In this project, we plan to have each person working on one method -- I am working on Trigram Markov model
 
# Read past reports for trigram Markov information
 
# Make stage 1 design document template
 
# Write project aim, background, and project approach in design document
 
  
'''Plan and Goals for Next Week'''
+
# Researches on the method of Maximal Frequent Word Sequence have completed
# Modify the design document draft
+
# Coding on Maximal Frequent Word Sequence
# Send to supervisors for feedback
+
# Have a meeting with the other members to delicate the tasks of the stage one progress report
# More modification
+
# Write Project Background and Significance, Technical Background, Motivations and Key Requirements of the stage one progress report
# Prepare a tutorial on SVM for other group members
+
# Modify the stage one report with the criteria
 +
# Grammar checking
  
====Leng Tan====
 
  
'''Progress and Status This Week'''  
+
''' Plan and Goals for Next Week '''
  
# research on the 3 methods have completed.
+
# Coding on Maximal Frequent Word Sequence
# fully read and understood the criteria for stage 1 design document.
+
# Complete my tasks on stage one report
# have a brief meeting with group members to delicate the tasks in preparing the stage 1 design document.
+
# Send the draft to supervisors
 +
# Modify and format
  
 +
====Zhaokun Wang====
 +
'''Progress and Status This Week'''
  
'''Plan and Goals for Next Week'''
+
# Test previous methods.
 +
# Compared with previous researches, clarity and identify methods and classifiers we use.
 +
# Processing stage one report.
  
# do a rough draft on the tasks that is allocated.
+
''' Plan and Goals for Next Week '''
# do a layout design for the document.
+
  
====Tien-en Phua====
+
# Finish stage one report.
'''Progress and Status this week:'''
+
# Allocate the report roles for each group members.
# Develop Work Breakdown Structure
+
# Identity tasks required for Stage 1 Design Document
+
# Broken down task and assigned to each member
+
# In the process of development of Stage 1 Design Document
+
# Further research on SVM and Word Frequency
+
 
+
'''Plan and Goals for new week:'''
+
# Complete write up on Word Frequency and SVM
+
# Complete Stage 1 Design Document
+
# Coding and further research on Word Frequency
+
# Read up on the other 4 reports
+
  
 
===Semester 2, Week 5===
 
===Semester 2, Week 5===
  
====Jie Dong====
+
====Yan Xie====
 
+
 
'''Progress and Status this week:'''
 
'''Progress and Status this week:'''
 +
# Done my allocated parts of the stage one report
 +
# Attend a group weekly meeting within the team and discuss uncompleted sections
 +
# Help formatting
 +
# Send the report draft to supervisors
 +
# Modify the report after getting feedback from supervisors
  
# Done abstract, project aim, background and significance
+
'''Plan and Goals for new week:'''
# Done description of data extraction part for Trigram Markov model in design document
+
# Develop the method of Common N-grams
# Feedback from supervisors on design document
+
# Read papers
# Final modification on design document
+
# Learn to use SVM
# Format the design document on wiki
+
  
'''Plan and Goals for Next Week:'''
+
====Kai He====
# Design on Trigram Markov model
+
'''Progress and Status This Week'''
# learn to use SVM
+
# a bit coding on trigram Markov model
+
  
====Leng Tan====
+
# Finish Project Background and Significance, Technical Background, Motivations and Key Requirements
 +
# Write Input and Output Specifications, and Testing and Verification
 +
# Help to write the part of Project Management
 +
# Grammar checking and formatting
 +
# Modification on the stage one progress report after getting feedback from supervisors
 +
# Done the final version of the stage one progress report and submit
 +
# Coding on Maximal Frequent Word Sequence
  
'''Progress and Status this week:'''
 
  
# Done Literature Review of design document
+
''' Plan and Goals for Next Week '''
# Done description of data extraction part for WRI in design document
+
# Done project approach and milestone for design document
+
# added modified WBS in appendix
+
# done initial check and compilation of Design document
+
  
'''Plan and Goals for Next Week:'''
+
# Coding on Maximal Frequent Word Sequence
 +
# Have a meeting with the other members discussing the upcoming goals
 +
# Review papers
  
# start do rough design for WRI of data extraction in java
+
====Zhaokun Wang====
# read SVM
+
'''Progress and Status This Week'''
  
==== Tien-en Phua ====
+
# Allocate stage one-report roles.
'''Progress and Status this week:'''
+
#Allocate research method: common N-gram for me.
# Completed design document
+
# Allocate classifier method: dissimilarity calculation for me.
#* Project Requirements
+
# Modify stage one report after feedback.
#* Description of data extraction of Function Word Frequency analysis
+
#* Project Budget
+
#* Background and Significance of Hebrews
+
#* Edited Gantt Chart, WBS to synchronise
+
#* Edited and grammar check etc
+
# Basic layout of software design for data extraction algorithm
+
# Wiki page
+
'''Plan and Goals for Next Week:'''
+
# Commence programming of algorithm using Java
+
# Read up on SVM
+
  
===Semester 2, Week 6===
 
  
====Jie Dong====
+
''' Plan and Goals for Next Week '''
 +
 
 +
# Coding and developing N-gram
 +
# Researching on dissimilarity
 +
 
 +
===Semester 2, Week 6===
  
 +
====Yan Xie====
 
'''Progress and Status this week:'''
 
'''Progress and Status this week:'''
# Research on Trigram Markov model
+
# Read the papers of the algorithm of Common N-grams
# Two models are proposed:
+
# Have a big structure of programming Common N-grams
#* Simple Trigram Markov model: only consider the effect of trigram in the text
+
# Review paper of SVM
#* Potential problem with first model: sparse data, new trigram appears in the test text, lead to poor cross entropy
+
# The classifier SVM – still consider how to use the produced output text file as the input of the SVM
#* Second model: Hidden Markov model on trigram: Not only count on trigram, but also unigram and bigram effects are taken into consideration. The transition probability is consisted from all three probabilities.
+
# Participate the group meeting
# The existence of punctuation and uppercase letter should be considered for text written in English.
+
# Programming on text file input and exception handle in JAVA
+
  
 
'''Plan and Goals for new week:'''
 
'''Plan and Goals for new week:'''
# Discuss the models with supervisor
+
# Discuss the code with the team
# SVM problem
+
# Coding on Common N-grams
# Programming on first model
+
# Design SVM
  
====Leng Tan====
+
====Kai He====
 +
'''Progress and Status This Week'''
  
'''Progress and Status this week:'''
+
# Research on Maximal Frequent Word Sequence
# Done a design for the WRI code after discussion with group members.
+
# Develop the programing on Maximal Frequent Word Sequence
# written about 50% of the code for data extraction using WRI.
+
# Debugging
# read a bit on SVM but still don't understand it.
+
# Help the other members coding
  
'''Plan and Goals for new week:'''
+
''' Plan and Goals for Next Week '''
# finish the coding for WRI.
+
# try to get help for SVM.
+
  
====Tien-en Phua====
+
# Complete about 30% - 40% of the code for data extraction using Maximal Frequent Word Sequence
 +
# Discuss classifiers
  
'''Progress and Status this week:'''
+
====Zhaokun Wang====
# Finish the design algorithm code in java for word function frequency (pseudo - code).
+
'''Progress and Status This Week'''
# Start implementing the algorithm code.
+
# Code have been halfway done.
+
  
'''Plan and Goals for new week:'''
+
# Learning and coding on N-gram
# Finish coding.
+
# Debugging
# Discuss about SVM problems.
+
  
===Semester 2, Week 7===
+
''' Plan and Goals for Next Week '''
  
====Jie Dong====
+
# Discussing within team about coding
 +
# Design classifier method
  
 +
===Semester 2, Week 7===
 +
 +
====Yan Xie====
 
'''Progress and Status this week:'''
 
'''Progress and Status this week:'''
# Reading chapter about Hidden Markov Chain of "Statistical language learning"
+
# Discuss the Common N-grams problems with the other members
# Came up with my own test text to verify my code is working properly
+
# Finish about 50% of the code for data extraction using Common N-grams
# Meeting with Brian discuss my current work, the current approach does not work efficiently
+
# Have a group meeting with the other two members reporting my current progress of extraction method of Common N-grams
 +
# Introduce the stage two report
  
 
'''Plan and Goals for new week:'''
 
'''Plan and Goals for new week:'''
# The previous algorithm only considers effect of the trigram words. Result for a test paragraph contains a lot useless information, which about 70% of trigrams only appear once. Information which is worth using in classification is just about 10%. By extracting common trigrams from several test texts, few of them left. Hence, another enhanced model, in which unigrams and bigrams are also taken into consideration, will be tested in the following week.
+
# Continue coding of Common N-grams
# SVM will also be used to test the result in coming week. Investigating how to use SVM functions in MATLAB, svmtrain and svmclassify (Bioinformatics toolbox)
+
# Participate the meeting about stage two report with the other members
# Peer review assessment
+
# Try to figure out how to use SVM function in MATLAB
====Leng Tan====
+
  
'''Progress and Status this week:'''
+
====Kai He====
# Finish the Java coding for WRI technique in data extraction algorithm.
+
'''Progress and Status This Week'''
# Tested and verified that the code is working properly using a small test file. (text file with only few sentences)
+
# Have a meeting with Brian discussing on the SVM input and output.
+
  
'''Plan and Goals for new week:'''
+
# Review the paper of
# Figure out SVM.
+
# Algorithm for Maximal Frequent Sequences in Document Clustering
# Test and try out SVM on matlab using small test files.
+
# Experimenting with Maximal Frequent Sequences for Multi-Document Summarization
 +
# Discovery of Frequent Word Sequences in Text
 +
# Done 30% of the code for data extraction using Maximal Frequent Word Sequence
 +
# Review the paper of Augmenting Naïve Bayes Classifiers with Statistical Language Models
 +
# Review the criteria of stage two report
  
====Tien-en Phua====
+
''' Plan and Goals for Next Week '''
'''Progress and Status this week:'''
+
# Completed coding for data extraction algorithm (DEA)
+
# Discuss implementation of output of data from DEA to SVM
+
# Analyse how other researches analyse their data
+
  
'''Plan and Goals for new week:'''
+
# Coding and Debugging
# Modification and refining of DEA code
+
# Discuss implementation of output of data from Maximal Frequent Word Sequence to Naïve Bayes Classifiers
# Continue analysis of how other researches used this DEA for authorship attribution
+
# Prepare the stage two report
# Try applying data to SVM
+
 
 +
====Zhaokun Wang====
 +
'''Progress and Status This Week'''
 +
 
 +
# Group meeting, discussing with other team members.
 +
# Coding on N-gram
 +
# Structuring the stage two report
 +
 
 +
 
 +
''' Plan and Goals for Next Week '''
 +
 
 +
# Keep coding N-gram
 +
# Group meeting about stage two report
 +
# Begin to coding dissimilarity classifier
  
 
===Semester 2, Week 8===
 
===Semester 2, Week 8===
====Jie Dong====
+
 
 +
====Yan Xie====
 
'''Progress and Status this week:'''
 
'''Progress and Status this week:'''
# Peer review assessment on the design document on "Audio assisted vision system"
+
# Coding of Common N-grams
 +
# Discuss the project management
 
# Investigation on SVM in MATLAB
 
# Investigation on SVM in MATLAB
# Working on modified trigram model
+
# Participate a meeting discuss how to apply the generate data to the classifier
  
 
'''Plan and Goals for new week:'''
 
'''Plan and Goals for new week:'''
# Test my result of java program with SVM
+
# Complete software coding v1.0 at the end of Week 11
 +
# Start to write the stage two report
 +
# Review SVM from previous attempt
  
====Leng Tan====
+
====Kai He====
 +
'''Progress and Status This Week'''
  
'''Progress and Status this week:'''
+
# Coding and Debugging on Maximal Frequent Word Sequence
# Receive a stage 1 design document on "Audio Assisted Vision System for Visually Impaired People".
+
# Further research on Naïve Bayes
# The document was fully read and take noted on presentation and various other perspective.
+
# Discuss the Naïve Bayes Classifier with the other members
# The document was reviewed and a formal peer review report was produced.
+
# Investigation on Matlab for SVM was halted for a moment due to the peer review report.
+
  
'''Plan and Goals for new week:'''
+
''' Plan and Goals for Next Week '''
# Figure out SVM.
+
# Test and try out SVM on matlab using small test files.
+
  
====Tien-en Phua====
+
# Write the project management of the stage two report
'''Progress and Status this week:'''
+
# Continue coding and debugging
# Complete the coding of Data Extraction Algorithm. Able to load file, remove punctuations, create a new output file for Support Vector Machine input
+
# Weekly meeting with the other team members
# Review Peer Document and did some research on the principles of echolocation performed by bats to understand the document
+
# Completed Peer Review on Audio Assisted Vision System For Visually Impair People
+
  
'''Plan and Goals for new week:'''
+
====Zhaokun Wang====
# Apply the generated data by the data extraction algorithm to Support vector machine
+
'''Progress and Status This Week'''
# Determine progress of project and review schedule.
+
 
 +
# Coding N-gram
 +
# Group meeting about stage two report
 +
# Try to begin coding dissimilarity classifier
 +
# Researching on dissimilarity classifier
 +
 
 +
 
 +
''' Plan and Goals for Next Week '''
 +
 
 +
# Write stage two report
 +
# Group meeting
  
 
===Semester 2, Week 9===
 
===Semester 2, Week 9===
====Jie Dong====
+
 
 +
====Yan Xie====
 
'''Progress and Status this week:'''
 
'''Progress and Status this week:'''
# Hidden Markov model is implemented using Java, and the program produces a table containing probabilities information for some common trigrams from some texts input. The problem with it currently is because I am feeding all words appeared in texts into the program, there are few common trigrams among certain number of input texts. For example, I have tried with total 20 input texts from two authors, the number of trigram they do have in common is just one. In this case, I also set the program to allow part of these texts to have common trigram and others just put zero probabilities for these trigrams, the result is still not efficient.
+
# Add some new classes on the code of Common N-grams
# Read through Tails trigram description and code, I found that he simplified the method and extracted the key specification by deleting the non key words. By testing his idea using Java code, I found it does extract a lot more information than mine, however a question also raised to me is that whether it would reduce the accuracy of classification since it changes  original text to another. This simplification needs to be proved.
+
# Code modification
# Produced result by extraction algorithm is fed into MATLAB SVM methods (svmtrain and svmclassify),it shows my extraction algorithm is not working properly. Sometimes, the predicted author for chosen texts are correct and sometimes are not. In term of SVM itself, it only supports classifying for two groups and multi-group classification produces error. In addition, they can only plot SVM structure for two dimensional data. Hence, more enhanced SVM toolboxes should be studied.
+
# Weekly meeting with the other team members to report the progress of Common N-grams coding
 +
# Write parts of Project Objectives, Background, Algorithm Programming and Project Management on the stage two report
 +
# Get feedback of the stage one progress report from Brian
  
'''Plan and Goals for next week:'''
+
'''Plan and Goals for new week:'''
# GUI design
+
# Complete software coding v1.0 at the end of Week 11
# Test efficiency using different groups of input texts
+
# Continue code modification
# Try another SVM toolbox from: http://asi.insa-rouen.fr/enseignants/~arakotom/toolbox/index.html
+
# Testing
  
====Leng Tan====
+
====Kai He====
'''Progress and Status this week:'''
+
'''Progress and Status This Week'''
# A basic SVM code which receives a text file input is produced.
+
# The SVM code will need 2 training data group and a number of test data group.
+
# The standardize format for the input to SVM was decided by team members.
+
# The input format will be in a MxN matrix where the first column will be the author and subsequent column is the data. (in my case, standard deviations)
+
# Initial data uses 20 standard deviation columns.
+
  
'''Plan and Goals for next week:'''
+
# Done 60% of the code for data extraction using Maximal Frequent Word Sequence
# The SVM do predict the author wrongly and this need to be resolve.
+
# Help debugging the code of the common N-grams
# Might be due to insufficient train data.
+
# Report the code progress so far in the team meeting
# Further testing is required.
+
# Set up the upcoming goals: Software Coding V1.0, Stage 2 Report Due, Software Testing V1.0 and Software Coding V2.0
# Might consider implementing GUI.
+
# Start to design the training process and classification process using Naïve Bayes Classifier
# Need to have a meeting with supervisors on progress and GUI implementation (can combine together GUI of Java and Matlab?)
+
  
====Tien-en Phua====
+
''' Plan and Goals for Next Week '''
'''Progress and Status this week:'''
+
 
# Research for statistical software for obtaining the covariance of data [http://www.statgraphics.com/ StatGraphics]
+
# Write the parts of Introduction, Objectives, Background, Algorithm Definition, Work Breakdown Structure, Milestones and Budgets on the stage two report
# Download and installed a choose software and attempts to operate the program
+
# Choose some simple text files to test
# Research on a book discussing the possible author of Hebrews [http://orders.koorong.com/search/product/view.jhtml?code=9780805447149 Nacsbt: Lukan Authorship Of Hebrews]
+
# Further research on the classifier of Naïve Bayes
'''Plan and Goals for next week:'''
+
 
# Obtain the covariance of the data
+
====Zhaokun Wang====
# Check to see if data extraction algorithm produce similar results as Talis
+
'''Progress and Status This Week'''
# Produce code to "chop" all text file to a specific length for analysis
+
 
# Input data to SVM and observe the outcome
+
# Coding and debugging N-gram
# Combine functions for analysis
+
# Writing stage two report
 +
# Changing a little bit progress about schedule
 +
# Group meeting with group member and report the stages up to now
 +
 
 +
 
 +
''' Plan and Goals for Next Week '''
 +
 
 +
# Writing the stage two report
 +
# Developing on dissimilarity classifier
 +
# Testing
  
 
===Semester 2, Week 10===
 
===Semester 2, Week 10===
====Jie Dong====
 
'''Progress and Status this week:'''
 
# Original JAVA program is re-built in a standard eclipse project
 
# Delete Transition class, no longer used
 
# Change three classes (State, Gram, Record) to inner classes correspondingly
 
# Reduce original three main methods in separate class to only one in Driver class
 
# Move methods for User inputs to Driver class, including parameters and paths
 
# Add three header lines to Java program output: number of texts, number of disputed texts, number of trigram used
 
'''Plan and Goals for next week:'''
 
# Standardise three algorithms into one project folder
 
# Use same training data, unknown data to test three extraction algorithms
 
# Compare their accuracies in different situations(number of key words, number of texts,etc)
 
  
====Leng Tan====
+
====Yan Xie====
 
'''Progress and Status this week:'''
 
'''Progress and Status this week:'''
# had a meeting with the supervisors and report on the progress of the project.
+
# Done most code of the common N-grams
# SVM code is remain the same for the time being.
+
# Delete the unused inner classes
# A tabled results should be produced to compare the difference between each data extraction algorithm.
+
# Discuss SVM with the other team members
# the main idea of the progress report is discussed.
+
  
'''Plan and Goals for next week:'''
+
'''Plan and Goals for new week:'''
# A standardise template to combine all 3 data extraction algorithm was discussed.
+
# Complete software coding v1.0 at the end of Week 11
# WRI code need to be slightly modified.
+
# Figure out SVM
# need to plan the initial design for the GUI.
+
# Try to test the code using some simple text file
 +
# Write the stage two report
  
====Tien-en Phua====
+
====Kai He====
'''Progress and Status this week:'''
+
'''Progress and Status This Week'''
# Modify code to accept multiple inputs
+
 
# Extract out federalist papers for testing on support vector machine using function word analysis
+
# Write the stage two report
# Meeting with supervisors on Wednesday for progress updates and guidance on next step
+
# Complete the coding of Maximal Frequent Word Sequence
# Commencement of progress report
+
# Working on modified Maximal Frequent Word Sequence
 +
# Test efficiency using different input texts
 +
 
 +
 
 +
''' Plan and Goals for Next Week '''
 +
 
 +
# Modify code of Maximal Frequent Word Sequence
 +
# Design the Naïve Bayes classifier
 +
# Report the progress in the team meeting
 +
# Continue write the stage two report
 +
 
 +
====Zhaokun Wang====
 +
'''Progress and Status This Week'''
 +
 
 +
# Testing N-gram code and debugging
 +
# Writing stage two report
 +
# Coding dissimilarity classifier
 +
# Group meeting
 +
 
 +
 
 +
 
 +
''' Plan and Goals for Next Week '''
  
'''Plan and Goals for next week:'''
+
# Finish coding on N-gram
# Produce a table of result displaying the accuracy of the algorithm with SVM Kernel function
+
# Coding on dissimilarity classifier
# Complete progress report, project background, project specification, progress thus far and project management
+
# Writing report  
# Combine the three algorithm together into a single driver file
+
# Group meeting to report N-gram coding
# Discuss and design possible implementation of a GUI
+
  
 
===Semester 2, Week 11===
 
===Semester 2, Week 11===
====Jie Dong====
+
 
 +
====Yan Xie====
 
'''Progress and Status this week:'''
 
'''Progress and Status this week:'''
# Update progress report
+
# Complete software coding v1.0 of the Common N-grams
# JAVA program modification:
+
# Using my own text to verify this code is working properly
#* Sort list of files read in according to their name order
+
# Compare using a small test file with a large test file
#* Replace manually parameter setup to automaticly read in data, form train set and testing set according to three header lines
+
# Begin by building large sets of training data and testing data by randomly collecting extracted features from Author Profiles on SVM
 +
# Done the draft of the stage two report
  
'''Plan and Goals for next week:'''
+
'''Plan and Goals for new week:'''
# Write a standard document to combine our java extraction program together
+
# Modify the stage two report
# Complete Progress report
+
# Submit the stage two report
 +
# Use same training data, unknown data to test two extraction algorithms
  
====Leng Tan====
+
====Kai He====
'''Progress and Status this week:'''
+
'''Progress and Status This Week'''
# Do progress report.
+
  
'''Plan and Goals for next week:'''
+
# Complete the draft of the stage two report
# catch up on assignments and prepare for exams.
+
# Grammar checking and formatting
 +
# The output of Maximal Frequent Word Sequence code is not proper, modification is needed
  
====Tien-en Phua====
 
'''Progress and Status this week:'''
 
# Update of progress report
 
  
'''Plan and Goals for next week:'''
+
''' Plan and Goals for Next Week '''
# Complete 4 upcoming assignment
+
 
# Prepare for power system quiz
+
# Delivery the stage two report
 +
# Complete the code of Maximal Frequent Word Sequence
 +
# Test the output
 +
# Have a meeting discuss the upcoming goals
 +
 
 +
====Zhaokun Wang====
 +
'''Progress and Status This Week'''
 +
 
 +
# Modify and finish N-gram
 +
# Testing N-gram code using training texts
 +
#Coding dissimilarity classifier
 +
#Working on stage two report
 +
 
 +
 
 +
''' Plan and Goals for Next Week '''
 +
 
 +
# Modify stage two report
 +
# Using training data to test N-gram coding
  
 
===Semester 2, Week 12===
 
===Semester 2, Week 12===
====Jie Dong====
+
 
 +
====Yan Xie====
 
'''Progress and Status this week:'''
 
'''Progress and Status this week:'''
# Rough draft for java extraction program standard and send other group member the standard
+
# Submit the stage two report and send it to supervisors
# Modify progress report and upload to Wiki
+
# Report my individual work done so far
 +
# Report the code of the common N-grams completed and tested
 +
# Report the progress of SVM
 +
# Discuss the upcoming goals with the other members
  
'''Plan and Goals for next week:'''
+
'''Plan and Goals for new week:'''
# Stop project for a period of time to prepare for exams
+
# Prepare for exams
  
====Leng Tan====
+
====Kai He====
'''Progress and Status this week:'''
+
'''Progress and Status This Week'''
# assigments due for this week is completed.
+
# Send my stage two report to supervisors
 +
# Weekly meeting with the other team members to report the progress of the project
  
'''Plan and Goals for next week:'''
+
''' Plan and Goals for Next Week '''
# Stop project as exams are coming.
+
  
====Tien-en Phua====
+
# Stop project
'''Progress and Status this week:'''
+
# Work on exams
# Completed all assignments due this week
+
 
 +
====Zhaokun Wang====
 +
'''Progress and Status This Week'''
 +
 
 +
# Submit stage two report
 +
# Group meeting to report progress
 +
# Coding on dissimilarity classifier
 +
 
 +
''' Plan and Goals for Next Week '''
  
'''Plan and Goals for next week:'''
+
# None (prepare about final exam)
# Need to prepare for exams. SWOT week next week.
+
* Project will "pause" till after exam period, 20 Nov 2010, thereafter the team will be working individually back in their home country and update each other via email
+
  
 
===Semester 1, Week 1===
 
===Semester 1, Week 1===
  
====Jie Dong====
+
====Yan Xie====
 
'''Progress and Status this week:'''
 
'''Progress and Status this week:'''
# Had a small discussion with the team members and work on SVM.
+
# Review two algorithms and three classifiers
# Modify SVM program to support multi-group classification function
+
# Group members present individual report so far on the group weekly meeting
# Test the accuracy of the whole classifying program with English texts
+
# Work on coding SVM program
# Generate accuracy table with respect to three different variables: tolerance, number of key words and kernal function(linear, quadratic, rbf, polynomial)
+
# Check the Milestones for the upcoming goals
 +
 
 
'''Plan and Goals for new week:'''
 
'''Plan and Goals for new week:'''
# Discuss with supervisor about the performance of current program and suggest ways to increase accuracy
+
# Email supervisors to have a meeting reporting the progress of the report
# Apply interface developed by Joel
+
# Discuss the performance of the current progress
 +
# Modify the SVM program
 +
# Prepare the project description and images for project exhibition
  
====Leng Tan====
+
====Kai He====
'''Progress and Status this week:'''
+
'''Progress and Status This Week'''
# Brief discussion with team members on the project.
+
 
# the english texts is used to test the accurancy of the program.
+
# Meet with the team members discussing the classifiers
# Try different kernel function of the SVM while testing the accurancy.
+
# Simplify the code of Maximal Frequent Word Sequence
 +
# Work on the Naïve Bayes classifier
 +
# Do some testing
  
 
''' Plan and Goals for Next Week '''
 
''' Plan and Goals for Next Week '''
# Organize a meeting with the supervisors for updates.
 
# discuss with joel for a constant text length.
 
# try to combine the code.
 
  
====Tien-en Phua====
+
# Arrange a time meeting with supervisors
'''Progress and Status this week:'''
+
# Discuss the key methods used in Naïve Bayes with the team
# Conduct a brief meeting with team members to further evaluate on SVM.
+
# Modified program from using function word objects to use of arrays and arraylist instead. Improve resource management and performance time
+
# Modified program to take in large amount of data as input instead of a single file
+
# Modified program to create a new folder to store all temporary (or modified) data. Reduce the clutter in the parent folder
+
# Test program using the federalist papers
+
  
'''Plan and Goals for new week:'''
+
====Zhaokun Wang====
# Have a meeting with supervisors showing the results.
+
'''Progress and Status This Week'''
# Further testing
+
 
 +
# Group meeting to report progress of project during the summer break
 +
# Keep coding on dissimilarity classifier
 +
# Do testing on training data
 +
 
 +
 
 +
''' Plan and Goals for Next Week '''
 +
 
 +
# Plan to meeting with supervisor
 +
# Modify and coding dissimilarity classifier
  
 
===Semester 1, Week 2===
 
===Semester 1, Week 2===
====Jie Dong====
 
'''Progress and Status this week:'''
 
# Met up with supervisors
 
# Applied trigram model algorithm on the 170 English text and test the accuracy of SVM for trigram Markov Model
 
# Number of key words used in the test are 5,10,15,20,25,30,35,40,45,50
 
# Four different kernel functions were used: Linear, Quadratic, rbf, polynomial. And it has been shown that Linear kernel function have the best performance among these four. However, the accuracy is still very low about 50%.
 
'''Plan and Goals for new week:'''
 
# The effect of punctuations in the text should be taken into consideration, such as "-" and "'"
 
# Modified Trigram software
 
# Further testing
 
====Leng Tan====
 
'''Progress and Status this week:'''
 
# Met up with supervisor
 
# applied algorithm on the 170 english text and test the accuracy of the SVM for WRI
 
# applied different kernel function and observe the different result
 
# develop on word count program for text
 
'''Plan and Goals for new week:'''
 
# Modify the delete punctuation method in the interface (look at minutes report number 10th for specs)
 
# implement interface
 
# change the number of keywords (currently is 20, try 5,10,15,20,25 and observe the difference)
 
# start using the new testament as test data
 
  
====Tien-en Phua====
+
====Yan Xie====
 
'''Progress and Status this week:'''
 
'''Progress and Status this week:'''
# Had meeting with supervisors
+
# Confirm a meeting time with supervisors
# Applied algorithm to 170 English text
+
# Complete a project description and image, also email to Braden
# Applied algorithm to 85 Federalist paper
+
# Discuss SVM with the team members
# Monitor project progress and re-evaluate the project milestone and timeline
+
# Continue working on SVM
# Develop software for chopping text
+
# Develop software to count total words of text and also the number of occurrence of each word for better text analysis
+
  
 
'''Plan and Goals for new week:'''
 
'''Plan and Goals for new week:'''
# Identify the reason for in-correct classifications
+
# Meet up with supervisors
# Further testing to ensure the correct operation
+
# Code modification
# Study Greek alphabets
+
# Plan the upcoming goals within the team
 +
# Test programs using English text
 +
# Start to prepare the exhibition and final seminar
  
===Semester 1, Week 3===
+
====Kai He====
====Jie Dong====
+
'''Progress and Status This Week'''
'''Progress and Status this week:'''
+
# Get rid of the concept of tolerance.
+
# Considering the meaning of punctuations appeared in the English texts, especially "-" and "'".
+
# Content which are not written by author should be removed before extraction, such as chapter number and title.
+
# Test the effect of above modification
+
  
 +
# Done half of the program of the Naïve Bayes
 +
# Change the classes in the program
 +
# Code modification
 +
# Check the project description and image
 +
# Have a brief meeting with the team members
  
'''Goals Next Week'''
+
''' Plan and Goals for Next Week '''
# Prepare test data using Federalist Paper
+
 
# Prepare test data using Greek text
+
# Have a meeting with supervisors
 +
# Develop software
 +
# Prepare the exhibition and final seminar
 +
 
 +
====Zhaokun Wang====
 +
'''Progress and Status This Week'''
 +
 
 +
# Group meeting within group
 +
# Modify and coding dissimilarity classifier
 +
# Working on project description and image
 +
 
 +
 
 +
''' Plan and Goals for Next Week '''
 +
 
 +
# Meeting with supervisor
 +
# Keep coding
 +
# Prepare for the final seminar
 +
 
 +
===Semester 1, Week 3===
  
====Leng Tan====
+
====Yan Xie====
 
'''Progress and Status this week:'''
 
'''Progress and Status this week:'''
# Developed a program to count the total number of words that contained "-" and "'"
+
# Get feedback from meeting with supervisors
# Implemented interface made by Joel
+
# Consider the punctuation remove, lowercase conversion, space combination and word overlapping
# Modified the WRI method and change the threshold of the number of keywords.
+
# Develop the java code of the Common N-gram
'''Plan and Goals for new week:'''
+
# Analysis the poor result from text with chapter numbers and titles
# Try to improve the accuracy.
+
 
====Tien-en Phua====
+
'''Progress and Status this week:'''
+
# Analyse the results for the Federalist and 170 English Text
+
# Continue developing auxiliary software (ie CountWord program, Punctuation program)
+
# Research on ways to balance the training data to SVM
+
 
'''Plan and Goals for new week:'''
 
'''Plan and Goals for new week:'''
# Continue testing on Federalist and 170 English Text
+
# Complete the java code of the Common N-gram
# Aim to achieve an 70% accuracy
+
# Test the 155 English text, 82 Federalist Paper and 27 Greek New Testament
# Standardize the training data to SVM
+
 
 +
====Kai He====
 +
'''Progress and Status This Week'''
 +
 
 +
# Have a meeting with supervisors to discuss our project’s progress.
 +
# Consider how to realize overlapping detection using colors in Java.
 +
# Continue developing the Maximal Frequent Word Sequence Algorithm
 +
# Start preparing the final Seminar in week 6.
 +
 
 +
''' Plan and Goals for Next Week '''
 +
 
 +
#      Finish coding the Maximal Frequent Word Sequence Algorithm
 +
#      Have a draft for the final seminar.
 +
 
 +
====Zhaokun Wang====
 +
'''Progress and Status This Week'''
 +
 
 +
# Getting feedback from supervisor
 +
# Fixing on N-gram (suggestion from supervisors)
 +
# Group meeting with team members
 +
 
 +
 
 +
''' Plan and Goals for Next Week '''
 +
 
 +
# Keep on dissimilarity classifier
 +
# Finish fixing N-gram
  
 
===Semester 1, Week 4===
 
===Semester 1, Week 4===
====Jie Dong====
+
 
 +
====Yan Xie====
 
'''Progress and Status this week:'''
 
'''Progress and Status this week:'''
# Run program modified last week on English data set
+
# Engage in removing all chapter numbers and titles
# Varying threshold and size of training data
+
# Add ranking method in the program
# Achieve a classification accuracy of around 80%
+
# Finish the code of Common N-gram
# Help group member to prepare for Federalist data set
+
# Run the completed program on 155 English text, 82 Federalist Paper and 27 Greek New Testament
 +
# Draft the structure of the final seminar PPT
  
 
'''Plan and Goals for new week:'''
 
'''Plan and Goals for new week:'''
# Study the cause of unsatisfactory classification accuracy and try to improve it
+
# Analysis the output of tested text and consider removing tail and setting threshold in the big size of training data
# Perform similar tests on Federalist Paper
+
# Discuss the tested result with the group members
# Discuss results with other group member, and see their algorithm performance
+
# Prepare the slides of final seminar with the group members
  
====Leng Tan====
+
====Kai He====
'''Progress and Status this week:'''
+
'''Progress and Status This Week'''
# Had a meeting with supervisors.
+
# English text achieves only around 25-30%.
+
# Study the New Testament.
+
'''Plan and Goals for new week:'''
+
# Try and find the Greek file for the new Testament.
+
# try use Federalist Text.
+
====Tien-en Phua====
+
'''Progress and Status this week:'''
+
# Develop a method of normalizing text.
+
# Run test on 170 English Text. Obtained a 100% accuracy
+
# Run test on Federalist Text. Obtained a 91% accuracy
+
  
'''Plan and Goals for new week:'''
+
# Maximal Frequent Word Sequence code is completed to combine features for different threshold n.
# Obtain a full set of Greek text
+
# Remove titles and redundant information from the allocated 150 English corpus.
# Chop Greek text accordingly
+
# Generate extracted features from the text corpuses.
# Require further testing and analysis
+
# A first draft PowerPoint is completed for the final seminar.
# Apply Greek text accordingly
+
# Research on the overlapping problem and find it cannot be done using Java since the text corpuses are plain texts, they do not support color highlighted.
  
===Semester 1, Week 5===
+
''' Plan and Goals for Next Week '''
====Jie Dong====
+
'''Progress and Status this week:'''
+
# Algorithm update:
+
#* The new version of trigram extraction algorithm inserts a "#" before a sentence and a "$" after a sentence. For example, there is a string "Today is a good day. I want to go to picnic."After TextEditor class, it will becomes "# Today is a good day & # I want to go to picnic $"
+
#* The motivation to this modification is because in an English text, each sentence exists relatively independent with each other. In terms of the example above, "......a good day. I want ......", it is not necessary to calculate the probability of apperance of "I" after the bigram "good day". Instead, it will be more significant to characterise an author's writing habit by knowing the probability of apperance of "I" in the start of a sentence, i.e. after the bigram "$ #". Likewise, the probability of a word appearing at the end of the sentence is important to know as well, that is "day $ #".In addition, by this method, we can discover how often is a specific word used in one sentence
+
#* To determine the beginning and end of a sentence, delimiter "." is used. In the future, with further study of English text characteristics, there might be more delimiters
+
# Generate classification results based on Federalist Text.
+
  
 +
#      Finish coding the Naïve Bayes classifier to take multiple input files.
 +
#      Assemble the PowerPoint and start practicing.
  
'''Plan and Goals for new week:'''
+
====Zhaokun Wang====
# Perform more tests on different disputed texts
+
'''Progress and Status This Week'''
# Try another key words selection algorithm: based on occurring frequency
+
 
 +
# Finish coding N-gram
 +
#Removing unnecessary marks on the testing texts
 +
# Run all texts using N-gram code
 +
# Group meeting about final seminar
 +
# Finalize dissimilarity classifier
 +
 
 +
''' Plan and Goals for Next Week '''
 +
 
 +
# Prepare for final seminar
 +
# Done running on texts using N-gram
 +
# Compared with training data, and analysis tested texts output
 +
 
 +
===Semester 1, Week 5===
  
====Leng Tan====
+
====Yan Xie====
 
'''Progress and Status this week:'''
 
'''Progress and Status this week:'''
Tried using Federalist Text.
+
# Set threshold in the output of tested test
best results give accuracy up to 70% when threshold = 10, data dimension = 25. this might be due to the short text length of the Federalist Text.
+
# Analysis the input format of SVM
It is noted that WRI works better without normalization.
+
# Work on preparing final seminar
Found Greek File for the new Testament but not sure if is the right one.
+
  
 
'''Plan and Goals for new week:'''
 
'''Plan and Goals for new week:'''
Do Federalist Text again with different disputed text.
+
# Send the draft of PPT to Brian
Try redo English text again with normalization.
+
# PPT Slides modification
====Tien-en Phua====
+
# Prepare the presentation with the group members
'''Progress and Status this week:'''
+
 
# Analysis of federalist result as it is most similar in style to the new testaments text
+
====Kai He====
# Namely that most of the federalist paper is written by Hamilton and likewise the new testaments is written by Paul with a few others written by different authors like Luke, John, Peter
+
'''Progress and Status This Week'''
# Comparison of results with other feature extraction algorithm
+
 
# After comparison of Function Word Analysis (FWA) and frequency occurrence of function words, the FWA proves to be a better algorithm as it produces more accurate results than frequency occurrence.  
+
# Naïve Bayes classifier code 80% modified. Have bugs in the code.
# Using FWA reduces the need to chop text and allowing lesser data to be "chunk" out.
+
# Group meeting to prepare for the final seminar.
'''Plan and Goals for new week:'''
+
# PowerPoint slides are added to one, roles and tasks are allocated for each member.
# According to Gantt Chart, the implementation of controversies should take place next week.
+
 
# Implement both FWA and frequency occurrence to the KJV text
+
 
# Frequency occurrence should produce consistent results to Talis.
+
''' Plan and Goals for Next Week '''
 +
 
 +
#       Finish debugging.
 +
#       Send the completed PowerPoint to our supervisors for feedback.
 +
#       Prepare the final seminar
 +
 
 +
====Zhaokun Wang====
 +
'''Progress and Status This Week'''
 +
 
 +
# Allocation the final seminar
 +
# Finish dissimilarity classifier
 +
# Fixing input format on dissimilarity classifier
 +
 
 +
 
 +
''' Plan and Goals for Next Week '''
 +
 
 +
# Modify PPT slides for final seminar
 +
# Preparing final seminar
  
 
===Semester 1, Week 6===
 
===Semester 1, Week 6===
====Jie Dong====
+
 
'''Progress and Status this week:'''
+
====Yan Xie====
# With the modification last week, I re-ran the test on English data set
+
# The classification accuracy increased to 85% - 90%. The highest was achieved when threshold = 30
+
# Clear trend can be observed: increasing size of training data, accuracy increases, threshold firstly increase and then drop
+
# Perform tests on Federalist paper, but the accuracy is very low, at about 35% average
+
# Discuss with supervisor and group member with the result on Federalist paper
+
# Since function words analysis achieve a good performance, it was suggested that combine part of them to enhance the algorithm
+
'''Plan and Goals for new week:'''
+
# Implement Trigram Markov model to select trigrams with "Golden Key words"
+
# Start to prepare form final seminar
+
# Achieve test results on King James Version
+
====Leng Tan====
+
 
'''Progress and Status this week:'''
 
'''Progress and Status this week:'''
# It was verified that the English text actually does not have any problem.
+
# Classify all authors’ output file after setting threshold when N equals from 2 to 10
# Results shown was not favourable. The prediction was very inconsistent, achieving a low accuracy rate of 53% most of the time.
+
# The Java code of Common N-gram update:
# It was suggested that the testing could be bias to Madison as only Madison text was taken as testing data.
+
#* eg. In 155 English Text, when n = 2, combine six authors’ features and create a master list.  
# Comparing with the earlier results using the English text, which involves 100 training data and 70 disputed text, the accuracy and consistency was even much lower.
+
#* From N=2 to N=10, it gives 9 master lists. Find each author’s features with its frequency of occurrence in the master list and only list frequencies as one part of the input format of SVM.
# WRI might not be suitable for authorship detection.
+
# Also classify the output files of Federalist Paper and Greek New Testament
# It was suggested to combine the Function Word Frequency developed by Joel to enhance the algorithm.
+
# Finish the input format of SVM and write matlab code of SVM
  
 
'''Plan and Goals for new week:'''
 
'''Plan and Goals for new week:'''
# Examine the WRI algorithm with further testing.
+
# Prepare the final report and is due on week 11
# Implement enhanced version of the WRI by combining the algorithm with function word frequency.
+
# SVM code modification
# Prepare the powerpoint slides for the seminar.
+
# Do some testing
# Start to make the initial stage of the video.
+
  
====Tien-en Phua====
+
====Kai He====
'''Progress and Status this week:'''
+
'''Progress and Status This Week'''
# Analysis results of FWA on KJV
+
# Analysis results of frequency occurrence on KJV
+
# Frequency occurrence produces consistent results with Talis listing down Paul, Barnabas, Luke and Matthew as the possible authors
+
# FWA produces a different results which will be discuss WHY.
+
# Discuss seminar structure with team
+
# Delegate task to team members for seminar
+
# Produce a uniform set of data for testing and results presentation
+
  
'''Plan and Goals for new week:'''
+
# Naïve Bayes classifier debugged. Now consider how to present the output results.
# Consolidate results from English text, Federalist text and King James Version
+
# Have meeting with Brian to talk about our PowerPoint slides.
# Research on future improvement for FWA
+
# Finalize our PowerPoint.
# Conduct a detail literature review on the background of the new testaments
+
# More practice on the final seminar.
 +
# Did our final seminar on Friday.
 +
 
 +
''' Plan and Goals for Next Week '''
 +
 
 +
# Consider the structure of the final report.
 +
# Further test on the methods .
 +
 
 +
====Zhaokun Wang====
 +
'''Progress and Status This Week'''
 +
 
 +
# Classify the output files of federalist paper and Greek New Testament
 +
# Fixing problems about input format on dissimilarity classifier
 +
# Classify all authors output files and setting N (2 to 10)
 +
 
 +
''' Plan and Goals for Next Week '''
 +
 
 +
# Modify dissimilarity classifier
 +
# Do testing
  
 
===Semester 1, Week 7===
 
===Semester 1, Week 7===
====Jie Dong====
+
 
'''Progress and Status this week:'''
+
====Yan Xie====
# Implemented Trigram Markov model to select trigrams with chosen function words
+
# Discovered that trigram containing chosen function words usually occurs more than once. Hence, selection threshold selection words similarly with function word selection method
+
# Made draft of powerpoint slides on SVM and trigram part
+
# Run classification test on King James Version of New Testament
+
# Finalise performance results for English text, Federalist and KJV
+
'''Plan and Goals for new week:'''
+
# Combine slides made by all group members and modifiy, slides should be finialised by next week
+
# Practice makes perfect!!! :)
+
# Discuss results of our own extraction algorithms among group members, make suggestion on potential modification
+
====Leng Tan====
+
 
'''Progress and Status this week:'''
 
'''Progress and Status this week:'''
# Enhanced version of the WRI combined with function word frequency is done.
+
# Amend the SVM matlab code
# Get the results and arrive to a conclusion.
+
# Test the 155 English Text, 82 Federalist Paper and 27 Greek New Testament, and produce the output of the dispute text
# Prepare the powerpoint slides for the final seminar.
+
# Gains the performance results and arrive to a conclusion (possible authors)
# started recording some video footage for the final year project video.
+
# Meet with the other group members and discuss the results
 +
# Build the structure of the final report
  
 
'''Plan and Goals for new week:'''
 
'''Plan and Goals for new week:'''
# Examine the WRI algorithm with further testing.
+
# Analysis the results of the Common N-gram and compare the classification accuracy of the other algorithm of Maximal Frequent Word Sequence with group members
# Implement enhanced version of the WRI by combining the algorithm with function word frequency.
+
# Give some suggestions on potential modification
# Complete the powerpoint slides for the seminar.
+
# Start working on some parts of the final report
# Start to make the initial stage of the video.
+
  
====Tien-en Phua====
+
====Kai He====
'''Progress and Status this week:'''
+
'''Progress and Status This Week'''
# Consolidate results from English text, Federalist text and King James Version
+
# Research on future improvement for FWA
+
## Calculate mean of a function word in a group of text by an author
+
## Calculate the standard deviation of a function word in a group of text by an author
+
## Consider calculating the probability of the occurrence of a function word by inputting the above parameters to SVM
+
# Possible authors of Hebrews namely Apollos, Clement, Paul, Barnabas, Luke and Peter
+
  
'''Plan and Goals for new week:'''
+
# Have a brief idea of how the final report will be structured.
# Complete presentation slides
+
# Capture test results for the final report.
# Practice presentation at least twice before seminar
+
# Meeting with the group.
# Assist team members in analyzing their results
+
 
 +
''' Plan and Goals for Next Week '''
 +
 
 +
# Modify the output file for using SVM.
 +
# Evaluate results.
 +
# Plan to upload things to this wiki
 +
 
 +
====Zhaokun Wang====
 +
'''Progress and Status This Week'''
 +
 
 +
# Group meeting with group members  
 +
# Doing tests using dissimilarity method
 +
# Test the 132 English Text, Federalist Paper and Greek New Testament, and produce the output of the dispute texts
 +
# Layout for final report
 +
 
 +
 
 +
''' Plan and Goals for Next Week '''
 +
 
 +
# Writing final report
 +
# Analysis accuracy between two methods
  
 
===Semester 1, Week 8===
 
===Semester 1, Week 8===
====Jie Dong====
 
'''Progress and Status this week:'''
 
# Practice more times for the Final Year Seminar.
 
# Final seminar on Thursday
 
'''Plan and Goals for new week:'''
 
# Run tests on Gospel of Luke and Acts of Apostles in KJV which were prepared by Joel.
 
  
====Leng Tan====
+
====Yan Xie====
 
'''Progress and Status this week:'''
 
'''Progress and Status this week:'''
# Final Year Project Seminar
+
# Summary the results from algorithms of Common N-gram and Maximal Frequent Word Sequence
'''Plan and Goals for new week:'''
+
# Test the other text files (English New Testament) using Common N-gram algorithm and SVM classification
# Need to further discuss on pre-processing of the texts before implementing feature extraction algorithm.
+
# Write the part of Common N-gram in the final report
# Run tests on Gospel of Luke and Acts of Apostles in Koine Greek which were prepared by Joel.
+
# Have a meeting with the other group members discussing the upcoming goal
# Discuss with Joel on automated function word.
+
  
====Tien-en Phua====
 
'''Progress and Status this week:'''
 
# Project final seminar
 
 
'''Plan and Goals for new week:'''
 
'''Plan and Goals for new week:'''
# Discuss on the techniques for pre-processing of Koine Greek
+
# Analysis the English New Testament output gained from SVM classification and also compared with using the Maximal Frequent Word Sequence algorithm and the Naïve Bayes classification
# Run test on Gospel of Luke and Acts of Apostles in Koine Greek
+
# Write the final report
# Obtain Koine Greek on possible authors of the letter to Hebrews
+
 
 +
====Kai He====
 +
'''Progress and Status This Week'''
 +
 
 +
# Group meeting .
 +
# Obtained test results from the Federal list and New Testaments.
 +
# Finish coding in order to use SVM.
 +
# Help debug  codes from other group members.
 +
 
 +
''' Plan and Goals for Next Week '''
 +
 
 +
# More tests and writings
 +
# To upload things to wiki
 +
 
 +
====Zhaokun Wang====
 +
'''Progress and Status This Week'''
 +
 
 +
#
 +
#
 +
#
 +
 
 +
''' Plan and Goals for Next Week '''
 +
 
 +
#
 +
#
 +
#
  
 
===Semester 1, Week 9===
 
===Semester 1, Week 9===
====Jie Dong====
 
'''Progress and Status this week:'''
 
# Meeting with supervisor
 
# Run test on English Version of Gospel of Luke
 
# Run test on English Version of Acts of Apostles
 
'''Plan and Goals for new week:'''
 
# Perform the same tests on Koine Greek version of New Testament
 
# Planning for final report
 
# Think of ideas on the video
 
  
====Leng Tan====
+
====Yan Xie====
 
'''Progress and Status this week:'''
 
'''Progress and Status this week:'''
# Meeting with supervisor discuss on final report
+
# Find the small part of the generated outputs of text files using Common N-gram need to modify and write few lines of code to achieve, e.g. Duplicate feature adding
# Run test on Gospel of Luke
+
# All the text file including 155 English texts, 82 Federalist Paper and 27 Greek New Testament, are needed to generate again, and process the output into SVM as input to perform the possibility
# Run test on Acts of Apostles
+
# Also, try testing the English version text of New Testament, which contains 27 texts, as well
'''Plan and Goals for new week:'''
+
# Analysis the gained results and compared with the algorithm of Maximal Frequent Word Sequence, and documentation
# Start planning for the final report
+
# Work on writing the final report due to two weeks left
# discuss with team on the video
+
  
====Tien-en Phua====
 
'''Progress and Status this week:'''
 
# Meeting with supervisor
 
# Run test on Koine Greek + KJV to determine the author of the Gospel of Luke
 
# Run test on Koine Greek + KJV to determine the author of the Acts of the Apostle
 
# Analysis results and discuss with team
 
 
'''Plan and Goals for new week:'''
 
'''Plan and Goals for new week:'''
# Commence final report writing and discussion
+
# Commence the section of SVM of the final report
# Obtain set of text for Barnabas and Clement
+
# Email supervisors about the final report due to some queries
 +
# Discuss the youtube video and post coming to next three weeks
 +
 
 +
====Kai He====
 +
'''Progress and Status This Week'''
 +
 
 +
# Compare results with Common N-gram.
 +
# Upload and help format stage reports on the wiki page.
 +
# Upload my weekly reports onto the wiki.
 +
# Write final report.
 +
# Methods code modification.
 +
 
 +
''' Plan and Goals for Next Week '''
 +
 
 +
#      Have a draft final report.
 +
 
 +
====Zhaokun Wang====
 +
'''Progress and Status This Week'''
 +
 
 +
#
 +
#
 +
#
 +
 
 +
''' Plan and Goals for Next Week '''
 +
 
 +
#
 +
#
 +
#
  
 
===Semester 1, Week 10===
 
===Semester 1, Week 10===
====Jie Dong====
 
'''Progress and Status this week:'''
 
# Discuss with the team on the structure of the final report
 
# Validate trigram Markov model using Koine Greek version of New Testament: Luke and Acts
 
# Predict potential authors for the Letter to the Hebrews
 
# Write up section for Support Vector Machine
 
# Start to write on Trigram Markov model
 
'''Plan and Goals for new week:'''
 
# Find people to do a brief proof reading on what I write
 
# Complete the report
 
  
====Leng Tan====
+
====Yan Xie====
 
'''Progress and Status this week:'''
 
'''Progress and Status this week:'''
# Discuss with team on the video.
+
# Write up the SVM in the final report
# Discuss on the overall style of the report.
+
# Meet up with the group for the final report
# Completed a template to use for the final report.
+
# Consider the video and post
# Write on past research.
+
 
# Write on project management.
+
# Write on WRI.
+
 
'''Plan and Goals for new week:'''
 
'''Plan and Goals for new week:'''
# Proof read the report.
+
# Work on the final report
# Complete the report.
+
# Email to supervisors arranging a time to run test, report what we have done and predict the potential authors for the Letter to the Hebrews
# Prepare for final exhibition poster
+
# Prepare the post
  
====Tien-en Phua====
+
====Kai He====
'''Progress and Status this week:'''
+
'''Progress and Status This Week'''
# Discussion with the team on the structure for the final report
+
# Write up background of the letter of Hebrews
+
# Write up background of the Bible
+
# Write up on project aim, approach and report strucutre
+
# Research on a standard set of corpus for team to work on
+
# Set of text of the Epistle of Barnabas and the First Epistle of Clement to Corinitians obtained in Koine Greek.
+
# Process the Koine greek text to beta code
+
  
'''Plan and Goals for new week:'''
+
# Group meeting for the poster and video.
# Complete final report
+
# Write final report
# commence planning for exhibition
+
 
 +
''' Plan and Goals for Next Week '''
 +
 
 +
# Plan to have a meeting with supervisors to report our progress.
 +
# Finish the final report.
 +
# Upload the rest of my weekly reports to the wiki
 +
 
 +
====Zhaokun Wang====
 +
'''Progress and Status This Week'''
 +
 
 +
#
 +
#
 +
#
 +
 
 +
''' Plan and Goals for Next Week '''
 +
 
 +
#
 +
#
 +
#
  
 
===Semester 1, Week 11===
 
===Semester 1, Week 11===
====Jie Dong====
+
 
 +
====Yan Xie====
 
'''Progress and Status this week:'''
 
'''Progress and Status this week:'''
# Apply the common set of data to Trigram Markov model
+
# Output data analysis and documentation
# Complete testing results for Trigram Markov Model to write up results for final report
+
# Write sections of Common Ngram and SVM  
# Write section for Trigram Markov model and edit SVM part
+
# Complete the final report
# Prepare appendix section
+
# Prepare the poster
# Discuss with Clement about layout of poster
+
# Meet with supervisors and answer potential author who wrote the letter to Hebrews
# Made a draft for our poster
+
## Background, color theme, layout and detail section content
+
## Draft for Introduction and Controversy
+
## Flow diagram of our project appraoch
+
  
 
'''Plan and Goals for new week:'''
 
'''Plan and Goals for new week:'''
# Complete our poster with other members
+
# Send the poster to Braden
# Prepare for exhibition
+
# Prepare the project exhibition
 +
# Start recording video with the other group members
  
====Leng Tan====
+
====Kai He====
'''Progress and Status this week:'''
+
'''Progress and Status This Week'''
# complete results for WRI.
+
# Touch up on the final report.
+
# Preliminary discuss for the poster.
+
'''Plan and Goals for new week:'''
+
# Prepare the poster.
+
  
====Tien-en Phua====
+
#Write the project final report
'''Progress and Status this week:'''
+
#Have meeting with supervisors to present the project's outcomes
# Prepare a common set of data for team to write up results for final report
+
#Prepare poster and video for the exhibition
* English Text, 156 text, 26 per author. 22 training, 4 disputed
+
* The Federalist papers, 82 text, 17 disputed, 65 training
+
* Kings James Version
+
* Koine Greek using Barnabas, Clement, John, Luke, Mark, Matthew, Paul, Peter
+
# Write up on results for the english text and discussion
+
# Write up on results for the federalist papers text and discussion
+
# Write up on results for the king james version and discussion
+
# Write up on results for the koine greek and discussion
+
# Write up abstract
+
# Prepare appendix
+
  
'''Plan and Goals for new week:'''
+
 
# Prepare for project exhibition
+
''' Plan and Goals for Next Week '''
 +
 
 +
#Finalise the poster and video
 +
#Prepare the exhibition
 +
 
 +
====Zhaokun Wang====
 +
'''Progress and Status This Week'''
 +
 
 +
#
 +
#
 +
#
 +
 
 +
''' Plan and Goals for Next Week '''
 +
 
 +
#
 +
#
 +
#
  
 
===Semester 1, Week 12===
 
===Semester 1, Week 12===
====Jie Dong====
+
 
 +
====Yan Xie====
 
'''Progress and Status this week:'''
 
'''Progress and Status this week:'''
# Complete final exhibition poster
+
# Finish poster and send it to Braden
# Making flyers for final year exhibition
+
# Discuss the structure of the video within the team
# Start video editing for introduction and SVM process
+
# Finish video
 +
# Present results at the project exhibition
  
 
'''Plan and Goals for new week:'''
 
'''Plan and Goals for new week:'''
# Complete the video
+
# Pop up document to the Wiki page
 +
# Project closeout
  
====Leng Tan====
+
====Kai He====
'''Progress and Status this week:'''
+
'''Progress and Status This Week'''
# Complete the poster with the team
+
# Make the flyers
+
# Upload the final report to wiki format
+
# Start the video editing for results and future application
+
  
'''Plan and Goals for new week:'''
+
#Send poster to Braden
#Complete the video
+
#Make video
 +
#Demonstrate the project's outcomes at exhibition
 +
#Project closeout
 +
 
 +
''' Plan and Goals for Next Week '''
 +
 
 +
#Upload document to project wiki page
 +
 
 +
====Zhaokun Wang====
 +
'''Progress and Status This Week'''
 +
 
 +
#
 +
#
 +
#
 +
 
 +
''' Plan and Goals for Next Week '''
 +
 
 +
#
 +
#
 +
#
  
====Tien-en Phua====
 
'''Progress and Status this week:'''
 
# start the video editing for SVM and three algorithms
 
# create new account for youtube
 
  
'''Plan and Goals for new week:'''
 
#complete video and upload to youtube
 
  
 
==See also==
 
==See also==
 
*[[Authorship detection: Who wrote the Letter to the Hebrews?]]
 
*[[Authorship detection: Who wrote the Letter to the Hebrews?]]
*[[Minutes of Meeting 2010: Who wrote the Letter to the Hebrews?]]
+
*[[Proposal Seminar 2011: Who wrote the Letter to the Hebrews?]]
*[[Critical design review 2010: Who wrote the Letter to the Hebrews?]]
+
*[[Final Seminar 2011: Who wrote the Letter to the Hebrews?]]
*[[Progress Report 2010: Who wrote the Letter to the Hebrews?]]
+
*[[Stage One Progress Report 2011: Who wrote the Letter to the Hebrews?]]
*[[Final report 2010: Who wrote the Letter to the Hebrews?]]
+
*[[Stage Two Progress Report 2011: Who wrote the Letter to the Hebrews?]]
*[[Youtube Video Presentation 2010: Who wrote the Letter to the Hebrews?]]
+
*[[Final Report 2011: Who wrote the Letter to the Hebrews?]]
 +
*[[Exhibition Poster 2011: Who wrote the Letter to the Hebrews?]]
 +
*[[Youtube Video Presentation 2011: Who wrote the Letter to the Hebrews?]]
  
 
==Back==
 
==Back==

Latest revision as of 03:58, 7 June 2012

Contents

Supervisors

Collaborators

2011 Students

Weekly progress and questions

Semester 2, Week 1

Yan Xie

Progress and Status this week:

  1. All team members had the first meeting with a Supervisor Derek, Co-supervisors Brian and Maryam
  2. The basic idea and various applications were introduced by Derek
  3. Discuss about previous attempts and further exploration on the meeting
  4. Research the topic about authorship detection and data mining
  5. Review the researches of past year students

Plan and Goals for new week:

  1. Further study on the past researches
  2. Search the proper algorithms
  3. Have a group meeting with the other members Kai and Zhaokun

Kai He

Progress and Status This Week

  1. Met with a Supervisor Derek, Co-supervisors Brian and Maryam.
  2. The supervisors introduced the concept of this project and discuss the outcome from last year project students
  3. Research on authorship detection
  4. Study the previous algorithms

Plan and Goals for Next Week

  1. Literature search training will be held next week
  2. Have a meeting with team members
  3. Research on various methods
  4. Read papers on authorship detection

Zhaokun Wang

Progress and Status This Week

1. Fist meeting with Derek and Brian and other group member Kai and Yan.

2. Derek and Brian introduced the outline and background about this project

3. Based on previous year researches, Derek gave some suggestion about the following research.

4. Derek passed the previous research resources to us.


Plan and Goals for Next Week

1. Read through and understand previous research report.

2. Research on controversy.

3. Research on various methods.

4. Prepare the proposal seminar.

Semester 2, Week 2

Yan Xie

Progress and Status this week:

  1. Review past year’s three methods: word frequency, word recurrence interval and trigram markov model
  2. On-going researches
  3. Attend a literature search training session with the other members
  4. Discuss algorithms chosen for this project on the meeting
  5. Prepare the proposal seminar on week 3

Plan and Goals for new week:

  1. Modify the slides and send it to supervisors
  2. Prepare the presentation
  3. Analysis the chosen algorithms
  4. Discuss the project management with the other members next meeting

Kai He

Progress and Status This Week

  1. Attend the literature search training
  2. Identity the algorithms are used in this project
  3. Prepare the proposal algorithms and complete the slides for presentation
  4. Further reading on research papers

Plan and Goals for Next Week

  1. Set up the Work Breakdown Structure, Milestones, Gantt Chart and Project Budget
  2. Send the presentation slides to supervisors
  3. Prepare the presentation of proposal seminar next week
  4. Analysis the proposal algorithms used in this project
  5. Research and discuss the classifier
  6. Have a team meeting with the other members

Zhaokun Wang

Progress and Status This Week

  1. Abstract on proposal seminar.
  2. Allocate seminar role for each group member.
  3. Prepare outline PowerPoint slides.
  4. Identify the brief idea on the project.


Plan and Goals for Next Week

  1. Present proposal seminar.
  2. Identify the methods on project.
  3. Identify classifiers on project.

Semester 2, Week 3

Yan Xie

Progress and Status this week:

  1. Complete the Gantt Chart, Work Breakdown Structure, Milestones, Budget and risk analysis with the other team members
  2. Modifications on the slides of presentation
  3. Prepare the presentation
  4. Introduce the Common N-grams

Plan and Goals for new week:

  1. Research on SVM classifier for the algorithm Common N-grams used
  2. Start to design the Common N-grams
  3. Make stage one progress report template

Kai He

Progress and Status This Week

  1. Modify the slides after getting a feedback from Brian
  2. Prepare the presentation this week
  3. Identity classifiers used with the algorithms
  4. Plan the upcoming goal for the proposal algorithms
  5. Start to design the method: Maximal Frequent Word Sequence

Plan and Goals for Next Week

  1. Have a detail review on the method of Maximal Frequent Word Sequence
  2. Understand the classifier of Naïve Bayes
  3. Prepare the stage one progress report

Zhaokun Wang

Progress and Status This Week

  1. Discuss about proposal slides with Brian.
  2. Modify the slides.
  3. Present proposal seminar.


Plan and Goals for Next Week

  1. Further researches about methods.
  2. Prepare for stage one report

Semester 2, Week 4

Yan Xie

Progress and Status this week:

  1. Work on the method of Common N-grams by using Java
  2. Fully read paper of the algorithm and classifier
  3. Discuss the design of Common N-grams with the other members
  4. Delegate tasks of the stage one progress report to individual members

Plan and Goals for new week:

  1. Complete parts of Executive Summary, Previous Studies, Coding Requirements and Tasks on Stage Two Report on the stage one progress report
  2. Modify Work Breakdown Structure, Risk Assessment, Milestones, Monitoring Scheme and Proposed Budget
  3. Complete writing on Common N-grams and SVM
  4. Write up the draft of the stage one progress report and send it to supervisors for feedback
  5. Modification on stage one progress report until deadline

Kai He

Progress and Status This Week

  1. Researches on the method of Maximal Frequent Word Sequence have completed
  2. Coding on Maximal Frequent Word Sequence
  3. Have a meeting with the other members to delicate the tasks of the stage one progress report
  4. Write Project Background and Significance, Technical Background, Motivations and Key Requirements of the stage one progress report
  5. Modify the stage one report with the criteria
  6. Grammar checking


Plan and Goals for Next Week

  1. Coding on Maximal Frequent Word Sequence
  2. Complete my tasks on stage one report
  3. Send the draft to supervisors
  4. Modify and format

Zhaokun Wang

Progress and Status This Week

  1. Test previous methods.
  2. Compared with previous researches, clarity and identify methods and classifiers we use.
  3. Processing stage one report.

Plan and Goals for Next Week

  1. Finish stage one report.
  2. Allocate the report roles for each group members.

Semester 2, Week 5

Yan Xie

Progress and Status this week:

  1. Done my allocated parts of the stage one report
  2. Attend a group weekly meeting within the team and discuss uncompleted sections
  3. Help formatting
  4. Send the report draft to supervisors
  5. Modify the report after getting feedback from supervisors

Plan and Goals for new week:

  1. Develop the method of Common N-grams
  2. Read papers
  3. Learn to use SVM

Kai He

Progress and Status This Week

  1. Finish Project Background and Significance, Technical Background, Motivations and Key Requirements
  2. Write Input and Output Specifications, and Testing and Verification
  3. Help to write the part of Project Management
  4. Grammar checking and formatting
  5. Modification on the stage one progress report after getting feedback from supervisors
  6. Done the final version of the stage one progress report and submit
  7. Coding on Maximal Frequent Word Sequence


Plan and Goals for Next Week

  1. Coding on Maximal Frequent Word Sequence
  2. Have a meeting with the other members discussing the upcoming goals
  3. Review papers

Zhaokun Wang

Progress and Status This Week

  1. Allocate stage one-report roles.
  2. Allocate research method: common N-gram for me.
  3. Allocate classifier method: dissimilarity calculation for me.
  4. Modify stage one report after feedback.


Plan and Goals for Next Week

  1. Coding and developing N-gram
  2. Researching on dissimilarity

Semester 2, Week 6

Yan Xie

Progress and Status this week:

  1. Read the papers of the algorithm of Common N-grams
  2. Have a big structure of programming Common N-grams
  3. Review paper of SVM
  4. The classifier SVM – still consider how to use the produced output text file as the input of the SVM
  5. Participate the group meeting

Plan and Goals for new week:

  1. Discuss the code with the team
  2. Coding on Common N-grams
  3. Design SVM

Kai He

Progress and Status This Week

  1. Research on Maximal Frequent Word Sequence
  2. Develop the programing on Maximal Frequent Word Sequence
  3. Debugging
  4. Help the other members coding

Plan and Goals for Next Week

  1. Complete about 30% - 40% of the code for data extraction using Maximal Frequent Word Sequence
  2. Discuss classifiers

Zhaokun Wang

Progress and Status This Week

  1. Learning and coding on N-gram
  2. Debugging

Plan and Goals for Next Week

  1. Discussing within team about coding
  2. Design classifier method

Semester 2, Week 7

Yan Xie

Progress and Status this week:

  1. Discuss the Common N-grams problems with the other members
  2. Finish about 50% of the code for data extraction using Common N-grams
  3. Have a group meeting with the other two members reporting my current progress of extraction method of Common N-grams
  4. Introduce the stage two report

Plan and Goals for new week:

  1. Continue coding of Common N-grams
  2. Participate the meeting about stage two report with the other members
  3. Try to figure out how to use SVM function in MATLAB

Kai He

Progress and Status This Week

  1. Review the paper of
  2. Algorithm for Maximal Frequent Sequences in Document Clustering
  3. Experimenting with Maximal Frequent Sequences for Multi-Document Summarization
  4. Discovery of Frequent Word Sequences in Text
  5. Done 30% of the code for data extraction using Maximal Frequent Word Sequence
  6. Review the paper of Augmenting Naïve Bayes Classifiers with Statistical Language Models
  7. Review the criteria of stage two report

Plan and Goals for Next Week

  1. Coding and Debugging
  2. Discuss implementation of output of data from Maximal Frequent Word Sequence to Naïve Bayes Classifiers
  3. Prepare the stage two report

Zhaokun Wang

Progress and Status This Week

  1. Group meeting, discussing with other team members.
  2. Coding on N-gram
  3. Structuring the stage two report


Plan and Goals for Next Week

  1. Keep coding N-gram
  2. Group meeting about stage two report
  3. Begin to coding dissimilarity classifier

Semester 2, Week 8

Yan Xie

Progress and Status this week:

  1. Coding of Common N-grams
  2. Discuss the project management
  3. Investigation on SVM in MATLAB
  4. Participate a meeting discuss how to apply the generate data to the classifier

Plan and Goals for new week:

  1. Complete software coding v1.0 at the end of Week 11
  2. Start to write the stage two report
  3. Review SVM from previous attempt

Kai He

Progress and Status This Week

  1. Coding and Debugging on Maximal Frequent Word Sequence
  2. Further research on Naïve Bayes
  3. Discuss the Naïve Bayes Classifier with the other members

Plan and Goals for Next Week

  1. Write the project management of the stage two report
  2. Continue coding and debugging
  3. Weekly meeting with the other team members

Zhaokun Wang

Progress and Status This Week

  1. Coding N-gram
  2. Group meeting about stage two report
  3. Try to begin coding dissimilarity classifier
  4. Researching on dissimilarity classifier


Plan and Goals for Next Week

  1. Write stage two report
  2. Group meeting

Semester 2, Week 9

Yan Xie

Progress and Status this week:

  1. Add some new classes on the code of Common N-grams
  2. Code modification
  3. Weekly meeting with the other team members to report the progress of Common N-grams coding
  4. Write parts of Project Objectives, Background, Algorithm Programming and Project Management on the stage two report
  5. Get feedback of the stage one progress report from Brian

Plan and Goals for new week:

  1. Complete software coding v1.0 at the end of Week 11
  2. Continue code modification
  3. Testing

Kai He

Progress and Status This Week

  1. Done 60% of the code for data extraction using Maximal Frequent Word Sequence
  2. Help debugging the code of the common N-grams
  3. Report the code progress so far in the team meeting
  4. Set up the upcoming goals: Software Coding V1.0, Stage 2 Report Due, Software Testing V1.0 and Software Coding V2.0
  5. Start to design the training process and classification process using Naïve Bayes Classifier

Plan and Goals for Next Week

  1. Write the parts of Introduction, Objectives, Background, Algorithm Definition, Work Breakdown Structure, Milestones and Budgets on the stage two report
  2. Choose some simple text files to test
  3. Further research on the classifier of Naïve Bayes

Zhaokun Wang

Progress and Status This Week

  1. Coding and debugging N-gram
  2. Writing stage two report
  3. Changing a little bit progress about schedule
  4. Group meeting with group member and report the stages up to now


Plan and Goals for Next Week

  1. Writing the stage two report
  2. Developing on dissimilarity classifier
  3. Testing

Semester 2, Week 10

Yan Xie

Progress and Status this week:

  1. Done most code of the common N-grams
  2. Delete the unused inner classes
  3. Discuss SVM with the other team members

Plan and Goals for new week:

  1. Complete software coding v1.0 at the end of Week 11
  2. Figure out SVM
  3. Try to test the code using some simple text file
  4. Write the stage two report

Kai He

Progress and Status This Week

  1. Write the stage two report
  2. Complete the coding of Maximal Frequent Word Sequence
  3. Working on modified Maximal Frequent Word Sequence
  4. Test efficiency using different input texts


Plan and Goals for Next Week

  1. Modify code of Maximal Frequent Word Sequence
  2. Design the Naïve Bayes classifier
  3. Report the progress in the team meeting
  4. Continue write the stage two report

Zhaokun Wang

Progress and Status This Week

  1. Testing N-gram code and debugging
  2. Writing stage two report
  3. Coding dissimilarity classifier
  4. Group meeting


Plan and Goals for Next Week

  1. Finish coding on N-gram
  2. Coding on dissimilarity classifier
  3. Writing report
  4. Group meeting to report N-gram coding

Semester 2, Week 11

Yan Xie

Progress and Status this week:

  1. Complete software coding v1.0 of the Common N-grams
  2. Using my own text to verify this code is working properly
  3. Compare using a small test file with a large test file
  4. Begin by building large sets of training data and testing data by randomly collecting extracted features from Author Profiles on SVM
  5. Done the draft of the stage two report

Plan and Goals for new week:

  1. Modify the stage two report
  2. Submit the stage two report
  3. Use same training data, unknown data to test two extraction algorithms

Kai He

Progress and Status This Week

  1. Complete the draft of the stage two report
  2. Grammar checking and formatting
  3. The output of Maximal Frequent Word Sequence code is not proper, modification is needed


Plan and Goals for Next Week

  1. Delivery the stage two report
  2. Complete the code of Maximal Frequent Word Sequence
  3. Test the output
  4. Have a meeting discuss the upcoming goals

Zhaokun Wang

Progress and Status This Week

  1. Modify and finish N-gram
  2. Testing N-gram code using training texts
  3. Coding dissimilarity classifier
  4. Working on stage two report


Plan and Goals for Next Week

  1. Modify stage two report
  2. Using training data to test N-gram coding

Semester 2, Week 12

Yan Xie

Progress and Status this week:

  1. Submit the stage two report and send it to supervisors
  2. Report my individual work done so far
  3. Report the code of the common N-grams completed and tested
  4. Report the progress of SVM
  5. Discuss the upcoming goals with the other members

Plan and Goals for new week:

  1. Prepare for exams

Kai He

Progress and Status This Week

  1. Send my stage two report to supervisors
  2. Weekly meeting with the other team members to report the progress of the project

Plan and Goals for Next Week

  1. Stop project
  2. Work on exams

Zhaokun Wang

Progress and Status This Week

  1. Submit stage two report
  2. Group meeting to report progress
  3. Coding on dissimilarity classifier

Plan and Goals for Next Week

  1. None (prepare about final exam)

Semester 1, Week 1

Yan Xie

Progress and Status this week:

  1. Review two algorithms and three classifiers
  2. Group members present individual report so far on the group weekly meeting
  3. Work on coding SVM program
  4. Check the Milestones for the upcoming goals

Plan and Goals for new week:

  1. Email supervisors to have a meeting reporting the progress of the report
  2. Discuss the performance of the current progress
  3. Modify the SVM program
  4. Prepare the project description and images for project exhibition

Kai He

Progress and Status This Week

  1. Meet with the team members discussing the classifiers
  2. Simplify the code of Maximal Frequent Word Sequence
  3. Work on the Naïve Bayes classifier
  4. Do some testing

Plan and Goals for Next Week

  1. Arrange a time meeting with supervisors
  2. Discuss the key methods used in Naïve Bayes with the team

Zhaokun Wang

Progress and Status This Week

  1. Group meeting to report progress of project during the summer break
  2. Keep coding on dissimilarity classifier
  3. Do testing on training data


Plan and Goals for Next Week

  1. Plan to meeting with supervisor
  2. Modify and coding dissimilarity classifier

Semester 1, Week 2

Yan Xie

Progress and Status this week:

  1. Confirm a meeting time with supervisors
  2. Complete a project description and image, also email to Braden
  3. Discuss SVM with the team members
  4. Continue working on SVM

Plan and Goals for new week:

  1. Meet up with supervisors
  2. Code modification
  3. Plan the upcoming goals within the team
  4. Test programs using English text
  5. Start to prepare the exhibition and final seminar

Kai He

Progress and Status This Week

  1. Done half of the program of the Naïve Bayes
  2. Change the classes in the program
  3. Code modification
  4. Check the project description and image
  5. Have a brief meeting with the team members

Plan and Goals for Next Week

  1. Have a meeting with supervisors
  2. Develop software
  3. Prepare the exhibition and final seminar

Zhaokun Wang

Progress and Status This Week

  1. Group meeting within group
  2. Modify and coding dissimilarity classifier
  3. Working on project description and image


Plan and Goals for Next Week

  1. Meeting with supervisor
  2. Keep coding
  3. Prepare for the final seminar

Semester 1, Week 3

Yan Xie

Progress and Status this week:

  1. Get feedback from meeting with supervisors
  2. Consider the punctuation remove, lowercase conversion, space combination and word overlapping
  3. Develop the java code of the Common N-gram
  4. Analysis the poor result from text with chapter numbers and titles

Plan and Goals for new week:

  1. Complete the java code of the Common N-gram
  2. Test the 155 English text, 82 Federalist Paper and 27 Greek New Testament

Kai He

Progress and Status This Week

  1. Have a meeting with supervisors to discuss our project’s progress.
  2. Consider how to realize overlapping detection using colors in Java.
  3. Continue developing the Maximal Frequent Word Sequence Algorithm
  4. Start preparing the final Seminar in week 6.

Plan and Goals for Next Week

  1. Finish coding the Maximal Frequent Word Sequence Algorithm
  2. Have a draft for the final seminar.

Zhaokun Wang

Progress and Status This Week

  1. Getting feedback from supervisor
  2. Fixing on N-gram (suggestion from supervisors)
  3. Group meeting with team members


Plan and Goals for Next Week

  1. Keep on dissimilarity classifier
  2. Finish fixing N-gram

Semester 1, Week 4

Yan Xie

Progress and Status this week:

  1. Engage in removing all chapter numbers and titles
  2. Add ranking method in the program
  3. Finish the code of Common N-gram
  4. Run the completed program on 155 English text, 82 Federalist Paper and 27 Greek New Testament
  5. Draft the structure of the final seminar PPT

Plan and Goals for new week:

  1. Analysis the output of tested text and consider removing tail and setting threshold in the big size of training data
  2. Discuss the tested result with the group members
  3. Prepare the slides of final seminar with the group members

Kai He

Progress and Status This Week

  1. Maximal Frequent Word Sequence code is completed to combine features for different threshold n.
  2. Remove titles and redundant information from the allocated 150 English corpus.
  3. Generate extracted features from the text corpuses.
  4. A first draft PowerPoint is completed for the final seminar.
  5. Research on the overlapping problem and find it cannot be done using Java since the text corpuses are plain texts, they do not support color highlighted.

Plan and Goals for Next Week

  1. Finish coding the Naïve Bayes classifier to take multiple input files.
  2. Assemble the PowerPoint and start practicing.

Zhaokun Wang

Progress and Status This Week

  1. Finish coding N-gram
  2. Removing unnecessary marks on the testing texts
  3. Run all texts using N-gram code
  4. Group meeting about final seminar
  5. Finalize dissimilarity classifier

Plan and Goals for Next Week

  1. Prepare for final seminar
  2. Done running on texts using N-gram
  3. Compared with training data, and analysis tested texts output

Semester 1, Week 5

Yan Xie

Progress and Status this week:

  1. Set threshold in the output of tested test
  2. Analysis the input format of SVM
  3. Work on preparing final seminar

Plan and Goals for new week:

  1. Send the draft of PPT to Brian
  2. PPT Slides modification
  3. Prepare the presentation with the group members

Kai He

Progress and Status This Week

  1. Naïve Bayes classifier code 80% modified. Have bugs in the code.
  2. Group meeting to prepare for the final seminar.
  3. PowerPoint slides are added to one, roles and tasks are allocated for each member.


Plan and Goals for Next Week

  1. Finish debugging.
  2. Send the completed PowerPoint to our supervisors for feedback.
  3. Prepare the final seminar

Zhaokun Wang

Progress and Status This Week

  1. Allocation the final seminar
  2. Finish dissimilarity classifier
  3. Fixing input format on dissimilarity classifier


Plan and Goals for Next Week

  1. Modify PPT slides for final seminar
  2. Preparing final seminar

Semester 1, Week 6

Yan Xie

Progress and Status this week:

  1. Classify all authors’ output file after setting threshold when N equals from 2 to 10
  2. The Java code of Common N-gram update:
    • eg. In 155 English Text, when n = 2, combine six authors’ features and create a master list.
    • From N=2 to N=10, it gives 9 master lists. Find each author’s features with its frequency of occurrence in the master list and only list frequencies as one part of the input format of SVM.
  3. Also classify the output files of Federalist Paper and Greek New Testament
  4. Finish the input format of SVM and write matlab code of SVM

Plan and Goals for new week:

  1. Prepare the final report and is due on week 11
  2. SVM code modification
  3. Do some testing

Kai He

Progress and Status This Week

  1. Naïve Bayes classifier debugged. Now consider how to present the output results.
  2. Have meeting with Brian to talk about our PowerPoint slides.
  3. Finalize our PowerPoint.
  4. More practice on the final seminar.
  5. Did our final seminar on Friday.

Plan and Goals for Next Week

  1. Consider the structure of the final report.
  2. Further test on the methods .

Zhaokun Wang

Progress and Status This Week

  1. Classify the output files of federalist paper and Greek New Testament
  2. Fixing problems about input format on dissimilarity classifier
  3. Classify all authors output files and setting N (2 to 10)

Plan and Goals for Next Week

  1. Modify dissimilarity classifier
  2. Do testing

Semester 1, Week 7

Yan Xie

Progress and Status this week:

  1. Amend the SVM matlab code
  2. Test the 155 English Text, 82 Federalist Paper and 27 Greek New Testament, and produce the output of the dispute text
  3. Gains the performance results and arrive to a conclusion (possible authors)
  4. Meet with the other group members and discuss the results
  5. Build the structure of the final report

Plan and Goals for new week:

  1. Analysis the results of the Common N-gram and compare the classification accuracy of the other algorithm of Maximal Frequent Word Sequence with group members
  2. Give some suggestions on potential modification
  3. Start working on some parts of the final report

Kai He

Progress and Status This Week

  1. Have a brief idea of how the final report will be structured.
  2. Capture test results for the final report.
  3. Meeting with the group.

Plan and Goals for Next Week

  1. Modify the output file for using SVM.
  2. Evaluate results.
  3. Plan to upload things to this wiki

Zhaokun Wang

Progress and Status This Week

  1. Group meeting with group members
  2. Doing tests using dissimilarity method
  3. Test the 132 English Text, Federalist Paper and Greek New Testament, and produce the output of the dispute texts
  4. Layout for final report


Plan and Goals for Next Week

  1. Writing final report
  2. Analysis accuracy between two methods

Semester 1, Week 8

Yan Xie

Progress and Status this week:

  1. Summary the results from algorithms of Common N-gram and Maximal Frequent Word Sequence
  2. Test the other text files (English New Testament) using Common N-gram algorithm and SVM classification
  3. Write the part of Common N-gram in the final report
  4. Have a meeting with the other group members discussing the upcoming goal

Plan and Goals for new week:

  1. Analysis the English New Testament output gained from SVM classification and also compared with using the Maximal Frequent Word Sequence algorithm and the Naïve Bayes classification
  2. Write the final report

Kai He

Progress and Status This Week

  1. Group meeting .
  2. Obtained test results from the Federal list and New Testaments.
  3. Finish coding in order to use SVM.
  4. Help debug codes from other group members.

Plan and Goals for Next Week

  1. More tests and writings
  2. To upload things to wiki

Zhaokun Wang

Progress and Status This Week

Plan and Goals for Next Week

Semester 1, Week 9

Yan Xie

Progress and Status this week:

  1. Find the small part of the generated outputs of text files using Common N-gram need to modify and write few lines of code to achieve, e.g. Duplicate feature adding
  2. All the text file including 155 English texts, 82 Federalist Paper and 27 Greek New Testament, are needed to generate again, and process the output into SVM as input to perform the possibility
  3. Also, try testing the English version text of New Testament, which contains 27 texts, as well
  4. Analysis the gained results and compared with the algorithm of Maximal Frequent Word Sequence, and documentation
  5. Work on writing the final report due to two weeks left

Plan and Goals for new week:

  1. Commence the section of SVM of the final report
  2. Email supervisors about the final report due to some queries
  3. Discuss the youtube video and post coming to next three weeks

Kai He

Progress and Status This Week

  1. Compare results with Common N-gram.
  2. Upload and help format stage reports on the wiki page.
  3. Upload my weekly reports onto the wiki.
  4. Write final report.
  5. Methods code modification.

Plan and Goals for Next Week

  1. Have a draft final report.

Zhaokun Wang

Progress and Status This Week

Plan and Goals for Next Week

Semester 1, Week 10

Yan Xie

Progress and Status this week:

  1. Write up the SVM in the final report
  2. Meet up with the group for the final report
  3. Consider the video and post

Plan and Goals for new week:

  1. Work on the final report
  2. Email to supervisors arranging a time to run test, report what we have done and predict the potential authors for the Letter to the Hebrews
  3. Prepare the post

Kai He

Progress and Status This Week

  1. Group meeting for the poster and video.
  2. Write final report

Plan and Goals for Next Week

  1. Plan to have a meeting with supervisors to report our progress.
  2. Finish the final report.
  3. Upload the rest of my weekly reports to the wiki

Zhaokun Wang

Progress and Status This Week

Plan and Goals for Next Week

Semester 1, Week 11

Yan Xie

Progress and Status this week:

  1. Output data analysis and documentation
  2. Write sections of Common Ngram and SVM
  3. Complete the final report
  4. Prepare the poster
  5. Meet with supervisors and answer potential author who wrote the letter to Hebrews

Plan and Goals for new week:

  1. Send the poster to Braden
  2. Prepare the project exhibition
  3. Start recording video with the other group members

Kai He

Progress and Status This Week

  1. Write the project final report
  2. Have meeting with supervisors to present the project's outcomes
  3. Prepare poster and video for the exhibition


Plan and Goals for Next Week

  1. Finalise the poster and video
  2. Prepare the exhibition

Zhaokun Wang

Progress and Status This Week

Plan and Goals for Next Week

Semester 1, Week 12

Yan Xie

Progress and Status this week:

  1. Finish poster and send it to Braden
  2. Discuss the structure of the video within the team
  3. Finish video
  4. Present results at the project exhibition

Plan and Goals for new week:

  1. Pop up document to the Wiki page
  2. Project closeout

Kai He

Progress and Status This Week

  1. Send poster to Braden
  2. Make video
  3. Demonstrate the project's outcomes at exhibition
  4. Project closeout

Plan and Goals for Next Week

  1. Upload document to project wiki page

Zhaokun Wang

Progress and Status This Week

Plan and Goals for Next Week


See also

Back