Difference between revisions of "Authorship detection: 2010 group"

From Derek
Jump to: navigation, search
(Semester 2, Week 8)
(Semester 2, Week 2)
Line 77: Line 77:
 
# Do more modification
 
# Do more modification
 
# Presentation on Thursday
 
# Presentation on Thursday
 +
 +
====Leng Tan====
 +
'''Progress and Status This Week'''
 +
 +
# identified 3 methods that was mentioned by Talis.
 +
# have a brief knowledge and information of the controversial issue.
 +
# have a brief idea on the upcoming propose seminar.
 +
 +
'''Plan and Goals for Next Week '''
 +
 +
# research on SVM.
 +
# research on the backgroud history of the project
 +
# research on the different technique use before in history
 +
# prepare project proposal
  
 
====Tien-en Phua====
 
====Tien-en Phua====
Line 91: Line 105:
 
# Write up on controvesy
 
# Write up on controvesy
 
# Further research on three methods
 
# Further research on three methods
 
====Leng Tan====
 
 
 
 
'''Progress and Status This Week'''
 
 
# identified 3 methods that was mentioned by Talis.
 
# have a brief knowledge and information of the controversial issue.
 
# have a brief idea on the upcoming propose seminar.
 
 
'''Plan and Goals for Next Week '''
 
 
# research on SVM.
 
# research on the backgroud history of the project
 
# research on the different technique use before in history
 
# prepare project proposal
 
  
 
===Semester 2, Week 3===
 
===Semester 2, Week 3===

Revision as of 07:38, 18 September 2010

Supervisors

Collaborators

Students

Weekly progress and questions

Semester 2, Week 1

Jie Dong

Progress and Status this week:

  1. First meeting with Derek, Brian and Maryam and other group member Leng and Tien-en.
  2. Derek, Brian and Maryam introduce us the basic idea of this data mining project
  3. The idea of authorship detection was introduced
  4. Several applications which data mining technique can be applied was mentioned
  5. Researches of past year students were mentioned and Maryam sent us several past year research report together with the code
  6. Research on the project, especially on SVM and some algorithms

Plan and Goals for new week:

  1. Prepare for the proposal seminar.
  2. Read research report from past years students.
  3. Understand project handbook.

Leng Tan

Progress and Status This Week

  1. the 1st meeting for the final year project was held with the supervisor, Prof Derek Abbot, co-supervisor, Dr Brian Ng, and Mrs Maryam, along with the team member.
  2. the initial project scope was introduced and general idea of the aim of the project is discussed.
  3. basic idea on the techniques of authorship detection is shown as well.
  4. several ideas for the future application of this project is highlighted.
  5. some hints on getting started was given which is to read Talis's final year report, which will be provided by Mrs Maryam.
  6. the first milestone of the project which is the proposal seminar is reminded.

Plan and Goals for Next Week

  1. fully read and understand Talis report.
  2. have a brief look on the code that will be supplied by Mrs Maryam.
  3. do some research on the background information of some controversial issues like the works on William Shakespeares, the Federalist Paper and the Letter to Hebrew.
  4. read through the project handbook of 2010 to have a rough idea of all the milestones of the project focusing on the project seminar.

Tien-en Phua

Progress and Status this week:

  1. Met up with project supervisor, Prof Derek Abbot, co-supervisor, Dr Brian Ng, and Mrs Maryam
  2. Derek discuss the concept behind authorship detection
  3. Derek explains about multi-dimensional graphs to link a disputed text to a known author.
  4. Discuss about possible future applications. Brian suggested code plagiarism and possibly music.
  5. Was provided by Maryam with other projects by students and started to go through the report by Talis.
  6. Went through the FYP Project handbook

Plan and Goals for new week:

  1. Identity the methods Talis used in his report
  2. Research on various methods
  3. Read up on past works regarding authorship detection
  4. Research on controvesy

Semester 2, Week 2

Jie Dong

Progress and Status this week:

  1. Three methods are chosen for this project: word frequency, word recurrence interval, and trigram markov model
  2. Reading material on SVM (SVM tutorial)
  3. Play with SVM software on Matlab
  4. Prepare slides for proposal seminar presentation on project aim, background, and part of project process

Plan and Goals for new week:

  1. Combine slides with other group member and do some modification
  2. Send slides draft to supervisor for feedback
  3. Do more modification
  4. Presentation on Thursday

Leng Tan

Progress and Status This Week

  1. identified 3 methods that was mentioned by Talis.
  2. have a brief knowledge and information of the controversial issue.
  3. have a brief idea on the upcoming propose seminar.

Plan and Goals for Next Week

  1. research on SVM.
  2. research on the backgroud history of the project
  3. research on the different technique use before in history
  4. prepare project proposal

Tien-en Phua

Progress and Status this week:

  1. Identity the three methods that Talis applied in his project, namely Word Frequency, Word Recurrence and Trigram Markvo
  2. Briefly understand how the three methods work
  3. Identity the past works done by other researchers.
  4. Identity three main controvesy namely the Federalist papers, Shakespeare plays and the Letter to the Hebrews

Plan and Goals for new week:

  1. Prepare for Project Proposal
  2. Develop Gantt chart, project budget and risk analysis
  3. Identity major milestones in project
  4. Write up on controvesy
  5. Further research on three methods

Semester 2, Week 3

Jie Dong

Progress and Status this week:

  1. We were introduced to Matthew and François-Pierre Huchet who are also participating in this project in Monday's meeting.
  2. Came up with draft(first whole draft) of proposal presentation slides. Discuss about the role of each person.
  3. Send slides to Brian and Matthew for feedback
  4. modify our slides
  5. Presentation on Thursday

Plan and Goals for new week:

  1. Do more researches for three methods and SVM
  2. Prepare for stage 1 design document

Leng Tan

Progress and Status This Week

  1. rough draft slides on the past research have been done for the propose seminar.
  2. a comparison list of the different technique is done.
  3. start research on SVM that is to be added in the slides with the different technique
  4. had a meeting with supervisors, and was introduced to Dr Matthew.
  5. focus 100% on the propose seminar.

Plan and Goals for Next Week

  1. have a more detailed review on the 3 methods.
  2. read the criteria for the stage 1 design document.

Tien-en Phua

Progress and Status this week:

  1. Prepare for project proposal
  2. Developed gantt chart, project budget and risk analysis
  3. Developed slides for milestones and controvesy
  4. Research on SVM (Support Vector Machine)
  5. Gain a better understanding on Word Frequency, WRI and Trigram Markvo

Plan and Goals for new week:

  1. Proceed to develop Stage 1 Design Document
  2. Understand SVM
  3. Develop Work Breakdown Structure
  4. Delegate task to individual members
  5. Read up on the other 4 reports

Semester 2, Week 4

Jie Dong

Progress and Status This Week

  1. In this project, we plan to have each person working on one method -- I am working on Trigram Markov model
  2. Read past reports for trigram Markov information
  3. Make stage 1 design document template
  4. Write project aim, background, and project approach in design document

Plan and Goals for Next Week

  1. Modify the design document draft
  2. Send to supervisors for feedback
  3. More modification
  4. Prepare a tutorial on SVM for other group members

Leng Tan

Progress and Status This Week

  1. research on the 3 methods have completed.
  2. fully read and understood the criteria for stage 1 design document.
  3. have a brief meeting with group members to delicate the tasks in preparing the stage 1 design document.


Plan and Goals for Next Week

  1. do a rough draft on the tasks that is allocated.
  2. do a layout design for the document.

Tien-en Phua

Progress and Status this week:

  1. Develop Work Breakdown Structure
  2. Identity tasks required for Stage 1 Design Document
  3. Broken down task and assigned to each member
  4. In the process of development of Stage 1 Design Document
  5. Further research on SVM and Word Frequency

Plan and Goals for new week:

  1. Complete write up on Word Frequency and SVM
  2. Complete Stage 1 Design Document
  3. Coding and further research on Word Frequency
  4. Read up on the other 4 reports

Semester 2, Week 5

Jie Dong

Progress and Status this week:

  1. Done abstract, project aim, background and significance
  2. Done description of data extraction part for Trigram Markov model in design document
  3. Feedback from supervisors on design document
  4. Final modification on design document
  5. Format the design document on wiki

Plan and Goals for Next Week:

  1. Design on Trigram Markov model
  2. learn to use SVM
  3. a bit coding on trigram Markov model

Leng Tan

Progress and Status this week:

  1. Done Literature Review of design document
  2. Done description of data extraction part for WRI in design document
  3. Done project approach and milestone for design document
  4. added modified WBS in appendix
  5. done initial check and compilation of Design document

Plan and Goals for Next Week:

  1. start do rough design for WRI of data extraction in java
  2. read SVM

Tien-en Phua

Progress and Status this week:

  1. Completed design document
    • Project Requirements
    • Description of data extraction of Function Word Frequency analysis
    • Project Budget
    • Background and Significance of Hebrews
    • Edited Gantt Chart, WBS to synchronise
    • Edited and grammar check etc
  2. Basic layout of software design for data extraction algorithm
  3. Wiki page

Plan and Goals for Next Week:

  1. Commence programming of algorithm using Java
  2. Read up on SVM

Semester 2, Week 6

Tien-en Phua

Progress and Status this week:

  1. Finish the design algorithm code in java for word function frequency (pseudo - code).
  2. Start implementing the algorithm code.
  3. Code have been halfway done.

Plan and Goals for new week:

  1. Finish coding.
  2. Discuss about SVM problems.

Jie Dong

Progress and Status this week:

  1. Research on Trigram Markov model
  2. Two models are proposed:
    • Simple Trigram Markov model: only consider the effect of trigram in the text
    • Potential problem with first model: sparse data, new trigram appears in the test text, lead to poor cross entropy
    • Second model: Hidden Markov model on trigram: Not only count on trigram, but also unigram and bigram effects are taken into consideration. The transition probability is consisted from all three probabilities.
  3. The existence of punctuation and uppercase letter should be considered for text written in English.
  4. Programming on text file input and exception handle in JAVA

Plan and Goals for new week:

  1. Discuss the models with supervisor
  2. SVM problem
  3. Programming on first model

Leng Tan

Progress and Status this week:

  1. Done a design for the WRI code after discussion with group members.
  2. written about 50% of the code for data extraction using WRI.
  3. read a bit on SVM but still don't understand it.

Plan and Goals for new week:

  1. finish the coding for WRI.
  2. try to get help for SVM.

Semester 2, Week 7

Tien-en Phua

Progress and Status this week:

  1. Completed coding for data extraction algorithm (DEA)
  2. Discuss implementation of output of data from DEA to SVM
  3. Analyse how other researches analyse their data

Plan and Goals for new week:

  1. Modification and refining of DEA code
  2. Continue analysis of how other researches used this DEA for authorship attribution
  3. Try applying data to SVM

Jie Dong

Leng Tan

Progress and Status this week:

  1. Finish the Java coding for WRI technique in data extraction algorithm.
  2. Tested and verified that the code is working properly using a small test file. (text file with only few sentences)
  3. Have a meeting with Brian discussing on the SVM input and output.

Plan and Goals for new week:

  1. Figure out SVM.
  2. Test and try out SVM on matlab using small test files.

Semester 2, Week 8

Tien-en Phua

Jie Dong

Leng Tan

Progress and Status this week:

  1. Receive a stage 1 design document on "Audio Assisted Vision System for Visually Impaired People".
  2. The document was fully read and take noted on presentation and various other perspective.
  3. The document was reviewed and a formal peer review report was produced.
  4. Investigation on Matlab for SVM was halted for a moment due to the peer review report.

Plan and Goals for new week:

  1. Figure out SVM.
  2. Test and try out SVM on matlab using small test files.

Semester 2, Week 9

Tien-en Phua

Jie Dong

Leng Tan

Semester 2, Week 10

Tien-en Phua

Jie Dong

Leng Tan

Semester 2, Week 11

Tien-en Phua

Jie Dong

Leng Tan

Semester 2, Week 12

Tien-en Phua

Jie Dong

Leng Tan

See also

Back