Small World Networks 2008
Contents
Supervisors
Prof Derek Abbott & Dr Matthew Berryman
Weekly progress and questions
Week 2
Wenrui (Linda):
Sweet and I had a short meeting last Wednesday, we decided to separate our proposal into three parts. Each person will talk about a little bit in each part. because we think it's necessary for both of us to understand every parts to get started.
For myself, I have prepared some definition about the seminar, and also I have read the book called "linked". But I haven't finish it yet. For the coming week, I will continue work on the proposal. maybe need to get start to make the PPT. find some fun video from www.youtube.com.
Zhujing (Sweet):
Linda & I managed our coming proposal this Wendnesday. There will be 3 parts and each part will take 5 mins(total 15 mins):
1.Introductions and background
What is "Small World Network"?
What is "Six Degrees of Separation"?
Is this theory existing in our life?
2.Show some fun things to support that the theory is existing around us
Relationship network of Hollywood stars
High school dating
A pretended network
About the pretended one, I have an idea. Recently, I am playing a wii game, "Super Mario Galaxy". The whole world in the game just 6 galaxys and each galaxy has 4-5 planets. If Mario wanna go any planet, he needs to fly to the centre of the galaxy first which the planet belongs to, and then tracks the network amoung the planets to get the destination.I sopport every planet is a individual node and each flying track is a link. We can calculate manually that how many degrees Mario needs :)
3.The serious things, we have no idea about this part, but we will think about it this weekend.
Week 3
Wenrui (Linda):
For this week, I just use all the information I prepared for last week to made the ppt, basically on the introduction part and the fun part.
For the coming week, we will have our regular meeting(once two weeks) to discuss the serious part. such as the hierarchical clustering algorithms, so hopefully we will work out more about this part next week.
Zhujing (Sweet):
This week, we continued prepare our part 1&2, and had some ideas of the part 3.
1> I decided to change "Mario galaxy" to another game which mentioned in fun part last week, because the other one has a more clear network image.
2> Start drawing some pretended small networks(with 5-10 nodes) to find some measures for the part 3.
3> Read the book "Linked" to get some more ideas.
Week 4
Wenrui (Linda):
We had a regular meeting with Matthew last Tuesday, during the meeting, we talked about the serious part. we talked about three types of failure - random link (or node) failure, targeted link (or node) failure, and cascading failures (one link fails, then the next links fail), and also an example of Google's algorithm ranks. hopefully it will be very helpful for the seminar next week.
For the coming weekends,I will finish my PPT, and send it to Matthew on the early of next week, hopefully can get some good feedback before the formal seminar.
Zhujing (Sweet):
After our regular meeting, we got some useful sense of the coming seminar, especially the most important part, the serious one.
1> About the fun part, we decided change the movie to a new series, because this series has better and more smooth images. And in the old movie there are some kinds of different languages, we must to use the subtitle. But the series is all English, that is more convenient for our work.
2>For the serious part, I was praparing the pertended networks. And I changed something of my old networks to make these show some different characters, such as target link. Because of these changes I need to recalculate some stuffs.
3>Countiue reading "Linked" to get more sense :)
Week 5
Wenrui(Linda):
last week, we did our proposal seminar, I didn't do that well, I talked too much about the hierarchical networks. However, I didn't clearly explain something important, and didn't include something important. e.g.
- we should include a plan for our software design.
- I should talk more about our Gantt chat, and as well as the break down structures.
- In the risks analysis part, the idea is to talk about the technical risks in the project itself, but not some personal issues.
- We should include the discussion of the data sets we will use.
- We should include the discussion of relevance to electrical engineering.
- we should mention what is our aim for the project.
- we should include the slides to explain finding shortest path, analyzing network efficiency, finding important hubs, finding weaknesses that can cause failure of a network, identifying weaknesses that make the network vulnerable to attack.
During the presentation, we learned our lessons, i.e. if we couldn't answer the question from the audiences, at least we should try to say something which is related to the question, it's very bad to just stand there and say nothing. if couldn't understand the question, just ask them to ask in a easy way. if still cannot answer, then we can say:" good question, We do not know the answer to that and we will definitely follow it up later."
For the coming week, I am planning to work on the software design. I will basically read the algorithms on the internet. hopefully I can write some simple code myself, and start to test it with the Mini networks.
Zhujian(Sweet):
This week, we just finished our proposal seminar. There were some weaknesses in our presentation, but we will improve these next time. The biggest problem for me is too nervous to listen to the questions. After prsentation, I just recalled these and found I could answer most of these. That is the most important thing I need to improve.
For the coming week, we will start some matlab analysis and read the references about our project.
Week 6
Wenrui(Linda):
For last week, I have searched something about the software design part on the internet, but I haven't really finish that yet. I will continue my research next week as we got more time for the holidays.
Use enron as an example, we can start to write some simple program to analysis the degree distribution and as well as the clustering coefficient. e.g. the train map in Hongkong or New York.(any big cities)
what's more, I need to prepare for next week's meeting, we will talk about the software design by using enron as an example, hopefully next week's regular meeting will be helpful for our progress.
Zhujian(Sweet):
I was programming some functions in Java and read more references about the properties of network to figure out the measures of my hand-calculating part. I am not very sure whether my measures are correct. Next week we will have a meeting, I will ask for some helps.
Btw, when I run some Java codes in windows system, my computer alway not very stable :( Maybe I need install a Linux system :(
Week 7
Wenrui(Linda):
According to our meeting on Wednesday, we talked about how to write the Pajek .net file from a simple state diagram. How we calculate the average degree distributions, and hence the find out the clustering coefficients.
For the coming week ,we will design a simple state diagram ourselves, and try to use the same method as Matthew showed us to calculate the clustering coefficients. I was planning to use New York city train map as an example to build our network, but Matthew gave us a better suggestion, that is to use the Singapore train map, because the road the is smaller and more organized. So that can reduce our work load.
Zhujian(Sweet):
We just had meeting on Wednesday. And then, I find some mistakes of my measures. I will fix these and figure out how to test in Pajek. There are still some problems of Java running in windows system. I am trying solve this this week. Maybe something softwares crash.
Week 8
Wenrui(Linda):
From last week, I have started some calculations with the simple systems(train map), this is basic on the clustering coefficients calculation. Moreover, I have done the introduction part of the report which is due on week 8, it is similar to what we have done for proposal anyway.
For the coming week, we will try to do as much as we can for the Critical Design Review report.
Zhujian(Sweet):
This week, I start to read a new book named "Models and Methods in Social Network Analysis". That is very helpful for my understanding of the measures. I try to prove some fomulas by my own way. That is a very interesting thing :)
Week 9
Wenrui(Linda)
From last week, I have finished the calculations of the clustering coefficient by using both the undirected and directed graphs, for undirected graph, I used Adelaide train map system as an example, set each station as a node in the graph. I used the Adelaide road map as an example of the directed graph, because some of the street are only on way to go. but some of them are both ways. I have also finish the project management part, which is the risks anysis, and the budget.
For the comming week, we will finish the report.
Zhujian(Sweet)
1> I have finished all the mini network examples which I mentioned in the proposal seminar manual calculation. And change these undirected networks to directed networks to compare the difference between them. The calculations include average degree (for directed networks include in-degree and out-degree), clustering coefficient, short path between 2 nodes, the longest distance between 2 nodes in the network, etc.
2> I downloaded a book named “Exploratory Social Network Analysis with Pajek” to know more properties of network and be familiar with Pajek, But still have some confused problem of using this software. I was trying solving this.
3> Preparing the critical design review report. I am in the charge of manual calculation and coding test. Coz I got in trouble of using Pajek, I will do the test next week, just need some hints. And for the JUNG part, now I decide to temporary stop. I forgot a lot of about java programming stuffs. That need a lot of time to recall. I will do that after the due week.
Week 10
Wenrui(Linda)
For this week, we have corrected the calculation mistakes that Matthew corrected for us, what's more, we have finished the report.
For the comming week, i will work on the peer review. read the book and as well as study more about the software.
Matthew, do we need more meetings for the last 3 weeks of this semester?
Zhujian(Sweet)
1> Corrected the mistakes in our calculation parts, found the right way to use Pajek to test and finally, finished the report.
2> Download the book "Models and Methods in Social Network Analysis", which I mentioned in mid-break, to solve some problems I met in book "Exploratory Social Network Analysis with Pajek".
3> Prapared peer review report.
Week 11 & 12
Wenrui(Linda)
For these two weeks, we had just read some reference books, we got our feedback for critical design review, so that we can improve our final project report. For the coming winter holidays, we decided to work on the software part of the project, some functionality of the software, such as java, Pajek
Zhujian(Sweet)
1> Read the books which I mentioned before
2> Find some courses which I am studing using our "Small world network". Some ones are quite similar, and the others are not but use some theories of our project. Such as, shortest path, degree, target link
Week 13
Wenrui(Linda)
Last week, I have watched the movie "number" that Matthew gave to us. Volume one is talking about to find the central node of a network system, it's pretty relevant to our project. hopefully we can also work out some big events later by applying our discover during the project.
Also I will ask sweet to get the copy of "Models and Methods in Social Network Analysis" for me to read through.
Zhujian(Sweet)
Week 14
Wenrui(Linda)
For the last few weeks of the semester, we will be busy for the exams and some of the homeworks. so maybe we will stop our project for the exam period, will continue our research after the exams. during these time, we will just read some reference book, such as "Linked" and some other papers. Matthew, could you give us a tasks list, i think it will be more clear for us to know our assignment for the project. we will figure that out during the semester break.
Zhujian(Sweet)
Project proposal
For the project proposal you need to have:
- Some fun stuff (fun small-world networks)
- Some more serious engineering reasons why small-world networks are important - namely robustness of power networks, communications networks, Google searching, finding who is the leader in social networks (like we discussed about Enron)
- Some slides on network measures (correlation coefficient, cluster detection algorithms).
- How you will start with some small networks that you can test your software on before moving on to a large data set.
- Some slides on project management - how you might structure your software modules & data flow, and definitely Gantt Chart (including slack space to cope with risks), budget, milestones, who is doing what (& why).
Here are some fun network graphs: [1] There are some fun and some more serious networks here: [2]
When you include graphics from the web, don't forget to include on the slide the web address you found them on. It is also a good idea to have a slide at the end with some key references (e.g. Barabasi book, Milgram's paper, the paper I gave you on finding hierarchies).
Critical Design Review
For the Critical Design Review (not due until Week 8) you need to have the following parts:
- Project Aim & Background
- Literature review, where you discuss a number (say 10-15) of papers / books and their contributions. Try and summarise in a few sentences what each one is about.
- Your approach - what you are going to do, your software design (see below), and who is going to do what.
- Analysis of your software design - what are the hard sections, what are the critical parts that you need to get working first.
- Project management stuff. For risks you don't need to worry much about Occupational Health & Safety (apart from general computer set up) but there are other risks in your project to do with task timing, what happens if someone (including Derek and I) is unavailable, budget over-runs (should be minimal risk but you can still mention it).
Aside from the engineering applications (where small-world networks are important), the other engineering in this project is software engineering. You should have a block diagram showing the main parts of the software, i.e. graph file format converter, graph drawer, network measure calculators, hierarchy algorithm, and matlab analysis (graphing and statistics), and my software for converting the email data set into a graph file. For some of these you can then include what sub-modules they have, e.g. the network measure calculator might have sub-modules for degree distribution, for clustering coefficient. You should also have a flow diagram showing the data flow between these modules, and in the critical design review give details of the algorithms. You can software you find on the web but you will need to acknowledge it in your proposal report, make sure you understand how it works, and test it. It may be easier though, to write the software yourself then writing a file format converter to get other software to work.
The Critical Design Review can just be emailed directly to myself and Derek on the due date, and you only need one for your group. For the final report you need to write one each (please don't copy each other's writing but write it in your words - you can copy equations, flow charts, graphs, diagrams, reference list, software though. Derek and I are happy to read draft copies.
Worked examples
Directed graph
http://www.berrymanconsulting.com/images/SimpleDirectedGraph.jpg
Pajek .net file
*vertices 4 1 "1" 2 "2" 3 "3" 4 "4" *arcslist 1 2 4 2 3 4 3 1 4 1 2
Note that in my drawing done with different software I have drawn the bidirectional edges as two separate edges, whereas Pajek draws them as a single edge with arrows on both ends.
In-degree of nodes:
- Node 1 has 2 incoming edges
- 2
- 1
- 2
So the in-degree distribution is:
- 1
- 3
(i.e. the number of edges with 1 incoming edge is 1, and the number of edges with 2 incoming edges is 3) Average in-degree is (1*1+3*2)/4=1.75
Out-degree of nodes:
- 2
- 2
- 1
- 2
So the out-degree distribution is:
- 1
- 3
Average out-degree is (1*1+3*2)/4=1.75 Note in general that for a directed graph, the out-degree distribution is not the same as the in-degree.
To calculate the clustering coefficients we first need to find the neighbourhoods of each node. The neighbourhood of node i, is all the other nodes that either have an edge from node i or to node i.
The neighbourhood of node 1 is {2,3,4}, the neighbourhood of node 2 is {1,3,4}, the neighbourhood of node 3 is {1,2} and the neighbourhood of node 4 is {1,2}.
We also need the total degree of each node = in-degree + out-degree, thus
- 4
- 4
- 2
- 4
Using the notation from here, these are the k_i (k underscore i).
We also need the number of edges between nodes j and k in the neighbourhood of i, i.e. for node 1, we count all the edges between nodes 2, 3, and 4 (the neighbourhood of 1). The counts are:
- 3
- 3
- 1
- 1
So the clustering coefficient for each node is the count divided by k_i(k_i - 1), i.e.
- 3/(4*3) = 1/4
- 3/(4*3) = 1/4
- 1/(2*1) = 1/2
- 1/(4*3) = 1/12
The clustering coefficient for the graph as a whole is just the average of the clustering coefficients of the nodes, i.e. (1/4 + 1/4 + 1/2 + 1/12) / 4 = 13/48 approx. 0.271
Undirected Graph
http://www.berrymanconsulting.com/images/SimpleUndirectedGraph.jpg
Pajek .net file:
*vertices = 5 1 "1" 2 "2" 3 "3" 4 "4" 5 "5" *edges 1 2 1 3 2 3 2 4 2 5 3 5 4 5
Degree:
- 2
- 4
- 3
- 2
- 3
Degree distribution:
- 0
- 2
- 2
- 1
Average degree = (2*2 + 2*3 + 4)/5 = 14/5 = 2.8
Neighbourhoods:
- {2,3}
- {1,3,4,5}
- {1,2,5}
- {2,5}
- {2,3,4}
Number of edges within neighbourhoods:
- 1
- 3
- 2
- 1
- 2
Clustering coefficients
- 2*1/(2*1) = 1
- 2*3/(4*3) = 1/2
- 2*2/(3*2) = 2/3
- 2*1/(2*1) = 1
- 2*2/(3*2) = 2/3
Average clustering coefficient = (1 + 1/2 + 2/3 + 1 + 2/3) / 5 approx. 0.767
Reading list
Simulations
Data sets
- Enron email data
- My conversion of data from Andres Corrada-Emmanuel's Enron research page
- Network data compiled by Mark Newman. Most of the data is in the GML format. This data includes the karate club social data.
Software
- Pajek
- Software for random hierarchical graphs
- Pajek file format (also supported by JUNG)
- JUNG - Java Universal Network / Graph framework
- To use JUNG you will also need:
- COLT jar file
- Xerces (only if you want to read Graph ML files).
- Eclipse Java development environment for Windows
- Tutorial on using Eclipse to start a new Java project
- Note that this window has the option for External JAR files (which you can use to include the COLT and Xerces files).