Difference between revisions of "Characterization Code"

From Derek
Jump to: navigation, search
(Description)
 
(One intermediate revision by the same user not shown)
Line 7: Line 7:
 
* Total number of words
 
* Total number of words
 
* Total number of characters
 
* Total number of characters
* Characters that only appear at the start or end of words (To be implemented)
+
* Characters that only appear at the start or end of words (Separate code)
  
 
== Usage ==
 
== Usage ==
Line 21: Line 21:
 
* [https://drive.google.com/a/student.adelaide.edu.au/folderview?id=0B3xk_r8iaE_IR2I3SG0yVkpQN1k&usp=sharing 'Voynich Results']
 
* [https://drive.google.com/a/student.adelaide.edu.au/folderview?id=0B3xk_r8iaE_IR2I3SG0yVkpQN1k&usp=sharing 'Voynich Results']
 
Each text file contains the results from each transcriber where the name of the text file '"X" Data' refers to the transcriber code as outlined in the Interlinear Archive.
 
Each text file contains the results from each transcriber where the name of the text file '"X" Data' refers to the transcriber code as outlined in the Interlinear Archive.
 
(ADD PLOTS once these are displaying in MATLAB correctly)
 
  
 
==See also==
 
==See also==

Latest revision as of 11:40, 4 November 2015

Currently in progress...

Description

The characterization code returns basic, first-order statistics on a given text. These statistics include:

  • Unique words
  • Unique characters
  • Total number of words
  • Total number of characters
  • Characters that only appear at the start or end of words (Separate code)

Usage

To use the code, download the 'Characteristics - All Files Within A Folder' folder and extract the files into your MATLAB folder. Open MATLAB and go to the 'Characteristics - All Files Within A Folder' folder and make that your working folder. Run the 'Driver.m' file. A screen will open asking for a folder, choose the folder that contains the text files to be characterized (NOTE: This folder should ONLY contain files that you want read and characterized). Allow the program to run, this may take a few minutes depending on the amount of data it must process. Once completed tables will be output to a file called 'TestData.txt' which will contain all the characteristic data of your text files.

(OPTIONAL: If time permits, create YouTube video showing the running of the code)

Testing

Testing of this code was completed using a small paragraph of English text. This can be found within the 'TestFolder'. The returned results were compared with manually found results to ensure that the code was returning the expected results.

Results

Results of the characterization code on the Voynich Manuscript can be found below:

Each text file contains the results from each transcriber where the name of the text file '"X" Data' refers to the transcriber code as outlined in the Interlinear Archive.

See also

Back