Characterization Code

From Derek
Revision as of 16:37, 31 March 2015 by A1211832 (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Currently in progress...

Description

The characterization code returns basic, first-order statistics on a given text. These statistics include:

  • Unique words
  • Unique characters
  • Total number of words
  • Total number of characters
  • Characters that only appear at the start or end of words (To be implemented)

Usage

To use the code, download the 'Characteristics - All Files Within A Folder' folder and extract the files into your MATLAB folder. Open MATLAB and go to the 'Characteristics - All Files Within A Folder' folder and make that your working folder. Run the 'Driver.m' file. A screen will open asking for a folder, choose the folder that contains the text files to be characterized (NOTE: This folder should ONLY contain files that you want read and characterized). Allow the program to run. Once completed tables will be output to a file called 'TestData.txt' which will contain all the characteristic data of your text files.

Testing

Testing of this code was completed using a small paragraph of English text. This can be found within the 'TestFolder'. The returned results were compared with manually found results to ensure that the code was returning the expected results.

Results

Results of the characterization code on the Voynich Manuscript can be found below:

Each text file contains the results from each transcriber where the name of the text file '"X" Data' refers to the transcriber code as outlined in the Interlinear Archive.

See also

Back