Editing Final Report/Thesis 2019 (section)

===Methods===
An ethnicity tool called Eurogenes Ad-Mix Utilities was used. This tool was provided by GEDmatch and can generate a report of ethnicity proportions to the given DNA kit. Eurogenes K13 model is selected as the 'calculator' model. This model calculates and gives results of the ethnicity proportion in 13 different global regions as shown in Figure 10, and this mode is primarily for European background people since it provides more sub-continental regions for Europe. The Somerton Man's DNA was selected as input kit of the utility and the ethnicity report was generated.

[[File:ethnicity_sample.png|thumb|300px|center|Figure 10:  A sample report of Eurogenes Ad-Mix Utilities]]

In addition, to investigate the reliability of a low quality DNA data file's ethnicity report, several complete DNA samples was analysed. The project ordered 2 sets of complete DNA reference data from 23andMe which provide same format as Somerton Man's file. A program was developed that allows the user to degrade the selected DNA file into different levels of DNA data. This program was also developed using C++. The project team degraded each complete DNA sample files into 9 files by removing 10% SNPs, 20% SNPs and then step by step to 90% SNPs. An extra file which contains only the SNPs with same rsids in Somerton Man's DNA file was created and was named as degraded_DNA for each set of complete DNA sample data. These files were then uploaded to GEDmatch and the same ethnicity research was conducted as what has been done on Somerton man's DNA raw data. All ethnicity reports were recorded, and the change of how the ethnicity proportion changes was also observed. 

In order to provide stronger evidence to prove whether the low quality DNA file's ethnicity report is reliable or not, different degradation algorithms were introduced. The first strategy was that for every 10 SNPs, the first n% SNPs were removed where n% is the percentage of SNPs we would like to remove. The next algorithm performed was the opposite of the first algorithm. This algorithm removed the last n% SNPs for every 10 SNPs, where n% is the percentage of SNPs we would like to remove.  The third and fourth methods were to remove the first and last n% of SNPs for each chromosome, where n% is the percentage of SNPs we would like to remove.