Editing Final Report 2011 (section)

===Design===
The design for the web crawler process is explained by the diagram below.

<center>[[Image:Web crawler process.png| Web Crawler process.]]</center>
<center>'''Figure 12 - Web crawler software operational process'''</center>

The design starts with a seed from which it attempts to load the protocol definition file robots.txt. If robots.txt does not exist or does exist but doesn’t limit crawling web page content is loaded from which embedded links are extracted. These links are added to the link queue via a sub-process as the thread returns to attempting to load the next web page in the queue.