1
1

I have a CSV file with 4 million edges of a directed network representing people communicating with each other (e.g. John sends a message to Mary, Mary sends a message to Ann, John sends another message to Mary, etc.) I would like to do two things:

  1. Find degree, betweeness and (maybe) eigenvector centrality measures for each person.
  2. Get a visualization of the network.

I would like to do this on the command-line on a Linux server since my laptop does not have much power. I have R installed on that server and the statnet library. I found this 2009 post of someone more competent than me trying to do the same thing and having problems with. So I was wondering if anyone else has any pointers on how to do this, preferably taking me step by step since I only know how to load the CSV file and nothing else.

Just to give you an idea, this is how my CSV file looks like:

$ head comments.csv
"src","dest"
"6493","139"
"406705","369798"

$ wc -l comments.csv 
4210369 comments.csv

asked Feb 15 '11 at 17:07

Andr%C3%A9s%20Monroy%20Hern%C3%A1ndez's gravatar image

Andrés Monroy Hernández
76245

edited Feb 15 '11 at 17:14


2 Answers:

Degree should be easy. I don't know for betweenness. But for eigenvector centrality I did it for a similar sized problem in this piece of python code: this computes the first eigenvector for 5M edges that represent links between Wikipedia articles. I used both a randomized SVD approach and the more classical power iteration method (a.k.a PageRank). Both were tractable on a single macbook pro machine (only one core is used).

For larger datasets I probably go for the distributed implementation of the Lanczos algorithm available in Apache Mahout. This implementation will require the setup of an Hadoop MapReduce cluster.

answered Feb 15 '11 at 17:42

ogrisel's gravatar image

ogrisel
398464480

The igraph library in R should easily be able to handle this. For details on how to load and process the dataset, see this post: http://www.cybaea.net/Blogs/Data/SNA-with-R-Loading-large-networks-using-the-igraph-library.html

A single machine with a 8 GB of RAM should have no trouble with it.

answered Oct 06 '11 at 17:05

David%20A%20Shamma's gravatar image

David A Shamma
1

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.