Small World Networks 2008
Contents
Supervisors
Prof Derek Abbott & Dr Matthew Berryman
Weekly progress and questions
Week 2
Wenrui (Linda):
Sweet and I had a short meeting last Wednesday, we decided to separate our proposal into three parts. Each person will talk about a little bit in each part. because we think it's necessary for both of us to understand every parts to get started.
For myself, I have prepared some definition about the seminar, and also I have read the book called "linked". But I haven't finish it yet. For the coming week, I will continue work on the proposal. maybe need to get start to make the PPT. find some fun video from www.youtube.com.
Zhujing (Sweet):
Linda & I managed our coming proposal this Wendnesday. There will be 3 parts and each part will take 5 mins(total 15 mins):
1.Introductions and background
What is "Small World Network"?
What is "Six Degrees of Separation"?
Is this theory existing in our life?
2.Show some fun things to support that the theory is existing around us
Relationship network of Hollywood stars
High school dating
A pretended network
About the pretended one, I have an idea. Recently, I am playing a wii game, "Super Mario Galaxy". The whole world in the game just 6 galaxys and each galaxy has 4-5 planets. If Mario wanna go any planet, he needs to fly to the centre of the galaxy first which the planet belongs to, and then tracks the network amoung the planets to get the destination.I sopport every planet is a individual node and each flying track is a link. We can calculate manually that how many degrees Mario needs :)
3.The serious things, we have no idea about this part, but we will think about it this weekend.
Week 3
Wenrui (Linda):
For this week, I just use all the information I prepared for last week to made the ppt, basically on the introduction part and the fun part.
For the coming week, we will have our regular meeting(once two weeks) to discuss the serious part. such as the hierarchical clustering algorithms, so hopefully we will work out more about this part next week.
Zhujing (Sweet):
This week, we continued prepare our part 1&2, and had some ideas of the part 3.
1> I decided to change "Mario galaxy" to another game which mentioned in fun part last week, because the other one has a more clear network image.
2> Start drawing some pretended small networks(with 5-10 nodes) to find some measures for the part 3.
3> Read the book "Linked" to get some more ideas.
Week 4
Wenrui (Linda):
We had a regular meeting with Matthew last Tuesday, during the meeting, we talked about the serious part. we talked about three types of failure - random link (or node) failure, targeted link (or node) failure, and cascading failures (one link fails, then the next links fail), and also an example of Google's algorithm ranks. hopefully it will be very helpful for the seminar next week.
For the coming weekends,I will finish my PPT, and send it to Matthew on the early of next week, hopefully can get some good feedback before the formal seminar.
Zhujing (Sweet):
After our regular meeting, we got some useful sense of the coming seminar, especially the most important part, the serious one.
1> About the fun part, we decided change the movie to a new series, because this series has better and more smooth images. And in the old movie there are some kinds of different languages, we must to use the subtitle. But the series is all English, that is more convenient for our work.
2>For the serious part, I was praparing the pertended networks. And I changed something of my old networks to make these show some different characters, such as target link. Because of these changes I need to recalculate some stuffs.
3>Countiue reading "Linked" to get more sense :)
Week 5
Wenrui(Linda):
last week, we did our proposal seminar, I didn't do that well, I talked too much about the hierarchical networks. However, I didn't clearly explain something important, and didn't include something important. e.g.
- we should include a plan for our software design.
- I should talk more about our Gantt chat, and as well as the break down structures.
- In the risks analysis part, the idea is to talk about the technical risks in the project itself, but not some personal issues.
- We should include the discussion of the data sets we will use.
- We should include the discussion of relevance to electrical engineering.
- we should mention what is our aim for the project.
- we should include the slides to explain finding shortest path, analyzing network efficiency, finding important hubs, finding weaknesses that can cause failure of a network, identifying weaknesses that make the network vulnerable to attack.
During the presentation, we learned our lessons, i.e. if we couldn't answer the question from the audiences, at least we should try to say something which is related to the question, it's very bad to just stand there and say nothing. if couldn't understand the question, just ask them to ask in a easy way. if still cannot answer, then we can say:" good question, We do not know the answer to that and we will definitely follow it up later."
For the coming week, I am planning to work on the software design. I will basically read the algorithms on the internet. hopefully I can write some simple code myself, and start to test it with the Mini networks.
Zhujian(Sweet):
This week, we just finished our proposal seminar. There were some weaknesses in our presentation, but we will improve these next time. The biggest problem for me is too nervous to listen to the questions. After prsentation, I just recalled these and found I could answer most of these. That is the most important thing I need to improve.
For the coming week, we will start some matlab analysis and read the references about our project.
Week 6
Wenrui(Linda):
For last week, I have searched something about the software design part on the internet, but I haven't really finish that yet. I will continue my research next week as we got more time for the holidays.
Use enron as an example, we can start to write some simple program to analysis the degree distribution and as well as the clustering coefficient. e.g. the train map in Hongkong or New York.(any big cities)
what's more, I need to prepare for next week's meeting, we will talk about the software design by using enron as an example, hopefully next week's regular meeting will be helpful for our progress.
Project proposal
For the project proposal you need to have:
- Some fun stuff (fun small-world networks)
- Some more serious engineering reasons why small-world networks are important - namely robustness of power networks, communications networks, Google searching, finding who is the leader in social networks (like we discussed about Enron)
- Some slides on network measures (correlation coefficient, cluster detection algorithms).
- How you will start with some small networks that you can test your software on before moving on to a large data set.
- Some slides on project management - how you might structure your software modules & data flow, and definitely Gantt Chart (including slack space to cope with risks), budget, milestones, who is doing what (& why).
Here are some fun network graphs: [1] There are some fun and some more serious networks here: [2]
When you include graphics from the web, don't forget to include on the slide the web address you found them on. It is also a good idea to have a slide at the end with some key references (e.g. Barabasi book, Milgram's paper, the paper I gave you on finding hierarchies).
Critical Design Review
For the Critical Design Review (not due until Week 8) you need to have the following parts:
- Project Aim & Background
- Literature review, where you discuss a number (say 10-15) of papers / books and their contributions. Try and summarise in a few sentences what each one is about.
- Your approach - what you are going to do, your software design (see below), and who is going to do what.
- Analysis of your software design - what are the hard sections, what are the critical parts that you need to get working first.
- Project management stuff. For risks you don't need to worry much about Occupational Health & Safety (apart from general computer set up) but there are other risks in your project to do with task timing, what happens if someone (including Derek and I) is unavailable, budget over-runs (should be minimal risk but you can still mention it).
Aside from the engineering applications (where small-world networks are important), the other engineering in this project is software engineering. You should have a block diagram showing the main parts of the software, i.e. graph file format converter, graph drawer, network measure calculators, hierarchy algorithm, and matlab analysis (graphing and statistics), and my software for converting the email data set into a graph file. For some of these you can then include what sub-modules they have, e.g. the network measure calculator might have sub-modules for degree distribution, for clustering coefficient. You should also have a flow diagram showing the data flow between these modules, and in the critical design review give details of the algorithms. You can software you find on the web but you will need to acknowledge it in your proposal report, make sure you understand how it works, and test it. It may be easier though, to write the software yourself then writing a file format converter to get other software to work.
The Critical Design Review can just be emailed directly to myself and Derek on the due date, and you only need one for your group. For the final report you need to write one each (please don't copy each other's writing but write it in your words - you can copy equations, flow charts, graphs, diagrams, reference list, software though. Derek and I are happy to read draft copies.
Worked examples
Directed graph
http://www.berrymanconsulting.com/images/SimpleDirectedGraph.jpg
Pajek .net file
*vertices 4 1 "1" 2 "2" 3 "3" 4 "4" *arcslist 1 2 4 2 3 4 3 1 4 1 2
Note that in my drawing done with different software I have drawn the bidirectional edges as two separate edges, whereas Pajek draws them as a single edge with arrows on both ends.
In-degree of nodes:
- Node 1 has 2 incoming edges
- 2
- 1
- 2
So the in-degree distribution is:
- 1
- 3
(i.e. the number of edges with 1 incoming edge is 1, and the number of edges with 2 incoming edges is 3) Average in-degree is (1*1+3*2)/4=1.75
Out-degree of nodes:
- 2
- 2
- 1
- 2
So the out-degree distribution is:
- 1
- 3
Average out-degree is (1*1+3*2)/4=1.75 Note in general that for a directed graph, the out-degree distribution is not the same as the in-degree.
To calculate the clustering coefficients we first need to find the neighbourhoods of each node. The neighbourhood of node i, is all the other nodes that either have an edge from node i or to node i.
The neighbourhood of node 1 is {2,3,4}, the neighbourhood of node 2 is {1,3,4}, the neighbourhood of node 3 is {1,2} and the neighbourhood of node 4 is {1,2}.
We also need the total degree of each node = in-degree + out-degree, thus
- 4
- 4
- 2
- 4
Using the notation from here, these are the k_i (k underscore i).
We also need the number of edges between nodes j and k in the neighbourhood of i, i.e. for node 1, we count all the edges between nodes 2, 3, and 4 (the neighbourhood of 1). The counts are:
- 3
- 3
- 1
- 1
So the clustering coefficient for each node is the count divided by k_i(k_i - 1), i.e.
- 3/(4*3) = 1/4
- 3/(4*3) = 1/4
- 1/(2*1) = 1/2
- 1/(4*3) = 1/12
The clustering coefficient for the graph as a whole is just the average of the clustering coefficients of the nodes, i.e. (1/4 + 1/4 + 1/2 + 1/12) / 4 = 13/48 approx. 0.271
Undirected Graph
http://www.berrymanconsulting.com/images/SimpleUndirectedGraph.jpg
Pajek .net file:
*vertices = 5 1 "1" 2 "2" 3 "3" 4 "4" 5 "5" *edges 1 2 1 3 2 3 2 4 2 5 3 5 4 5
Degree:
- 2
- 4
- 3
- 2
- 3
Degree distribution:
- 0
- 2
- 2
- 1
Average degree = (2*2 + 2*3 + 4)/5 = 14/5 = 2.8
Neighbourhoods:
- {2,3}
- {1,3,4,5}
- {1,2,5}
- {2,5}
- {2,3,4}
Number of edges within neighbourhoods:
- 1
- 3
- 2
- 1
- 2
Clustering coefficients
- 2*1/(2*1) = 1
- 2*3/(4*3) = 1/2
- 2*2/(3*2) = 2/3
- 2*1/(2*1) = 1
- 2*2/(3*2) = 2/3
Average clustering coefficient = (1 + 1/2 + 2/3 + 1 + 2/3) / 5 approx. 0.767
Reading list
Simulations
Data sets
- Enron email data
- My conversion of data from Andres Corrada-Emmanuel's Enron research page
- Network data compiled by Mark Newman. Most of the data is in the GML format. This data includes the karate club social data.
Software
- Pajek
- Pajek file format (also supported by JUNG)
- JUNG - Java Universal Network / Graph framework
- To use JUNG you will also need:
- COLT jar file
- Xerces (only if you want to read Graph ML files).
- Software for random hierarchical graphs
- Eclipse Java development environment for Windows
- Tutorial on using Eclipse to start a new Java project
- Note that this window has the option for External JAR files (which you can use to include the COLT and Xerces files).