Characterization Code

From Derek
Revision as of 11:40, 4 November 2015 by A1211832 (talk | contribs) (Description)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Currently in progress...

Description[edit]

The characterization code returns basic, first-order statistics on a given text. These statistics include:

  • Unique words
  • Unique characters
  • Total number of words
  • Total number of characters
  • Characters that only appear at the start or end of words (Separate code)

Usage[edit]

To use the code, download the 'Characteristics - All Files Within A Folder' folder and extract the files into your MATLAB folder. Open MATLAB and go to the 'Characteristics - All Files Within A Folder' folder and make that your working folder. Run the 'Driver.m' file. A screen will open asking for a folder, choose the folder that contains the text files to be characterized (NOTE: This folder should ONLY contain files that you want read and characterized). Allow the program to run, this may take a few minutes depending on the amount of data it must process. Once completed tables will be output to a file called 'TestData.txt' which will contain all the characteristic data of your text files.

(OPTIONAL: If time permits, create YouTube video showing the running of the code)

Testing[edit]

Testing of this code was completed using a small paragraph of English text. This can be found within the 'TestFolder'. The returned results were compared with manually found results to ensure that the code was returning the expected results.

Results[edit]

Results of the characterization code on the Voynich Manuscript can be found below:

Each text file contains the results from each transcriber where the name of the text file '"X" Data' refers to the transcriber code as outlined in the Interlinear Archive.

See also[edit]

Back[edit]