1
1

Hi, I wish to retrieve the data from FIFA website and then analyze them from the Social Network Analysis perspective. I've so far been manually copying the data. So, I've only managed to analyze Germany vs Australia pass-networks and Xavi's pass-network during Spain vs Swiss.

My question is, what is the best way to retrieve the data from the website (eg. http://bit.ly/9eU5lB)? Specifically, the one that can only be access once we click on the plus (+) sign.

Will Perl be able to do this? Many thanks.

Update: I downloaded the JSON formatted file, and then parse it in Excel using VBA. I've then tried to visualize the data using Cytoscape and ORA, eg.: http://bit.ly/b4QTMe

asked Jul 06 '10 at 22:08

Mathias%20Dharmawirya's gravatar image

Mathias Dharmawirya
35126

edited Jul 11 '10 at 10:13


7 Answers:

FIFA.com provides statistics encoded in JSON, so all you need is a language that allows you to parse JSON data. Additionally, a web scraping library would allow you to retrieve complementary data, if needed.

Team statistics

FIFA provides overall and per match statistics in a single JSON file, whose uri has the following pattern:

http://www.fifa.com/worldcup/statistics/teams/team=TEAM_ID/mstat.txt

When you click on the plus sign, you are shown per-match statistics; these correspond to children of the matches node.

All 32 TEAM_IDs in the following paste: http://paste2.org/p/906573

Player statistics

Each player has a unique ID (henceforth PLAYER_ID). You could retrieve the PLAYER_IDs first by the team they belong to:

http://www.fifa.com/worldcup/statistics/teams/team=TEAM_ID/teamplayersstats.txt

Match statistics for individual players can be retrieved from:

http://www.fifa.com/worldcup/statistics/players/player=PLAYER_ID/mstat.txt

answered Jul 07 '10 at 11:15

Alex%20Laertes's gravatar image

Alex Laertes
7124

edited Jul 07 '10 at 11:21

Other than Perl Mechanize, Python BeautifulSoup, you can look at a list of >30 different types of Web scrapers

answered Jul 08 '10 at 12:21

Clifton%20Phua's gravatar image

Clifton Phua
12

I'd go with the JSON option and parse it that way, but if you're in a Ruby frame of mind, you could use Hpricot.

answered Jul 07 '10 at 11:21

Nick%20Ryberg's gravatar image

Nick Ryberg
1

Table2Clipboard is another nice Firefox extension for copying data from a website.

As an aside, have you seen this data? http://spreadsheets.google.com/ccc?key=0AonYZs4MzlZbdENPOHJvbjlpZ21RV2VHcnptSHBiQWc&hl=en#gid=0

answered Jul 07 '10 at 05:10

Anton%20Ballus's gravatar image

Anton Ballus
266101415

I would use R for this. I posted an example here that is specifically related to scraping the FIFA website.

One of the benefits of doing this (beyond the relative ease) is that your data is now in R and can be analyzed immediately. I recommend the igraph package for the SNA.

answered Jul 07 '10 at 04:48

Shane's gravatar image

Shane
241210

edited Jul 08 '10 at 13:05

You need two things to scrape a website 1) Firebug 2) Some scraping library

Firebug ( http://getfirebug.com/ ) is a firefox extension which lets you identify specific regions of a page and get an address for it (called XPath) Using this, you could figure out which addresses in a page contain what information and then plug that into your scraping script to automatically collect that information

A screen scraping library lets you simulate a virtual user who is surfing a website in some pre-determined manner and, using xpath addresses, to collect information from the different pages it is travelling to.

Most languages have a couple of scraping libraries available to them.
Here are some
- http://scrapy.org/
- http://mechanize.rubyforge.org/mechanize/GUIDE_rdoc.html
- http://search.cpan.org/dist/WWW-Mechanize/
You should probably find one in whichever language you like working with

Using Firebug + Scraping Library, you could write scripts that collect information from any website which presents information in a bunch of fixed templates.

I haven't studied how the '+' buttons work, but I don't think it would be that hard to scrape the stuff they present.

Before you start writing a scraper, you should definitely verify that this information is not already available in xml or an rss feed somewhere.

answered Jul 06 '10 at 23:38

Aditya%20Mukherji's gravatar image

Aditya Mukherji
2251612

edited Jul 06 '10 at 23:43

For simple data scraping tasks like that you can use the beautiful soap library in Python.

I also highly suggest you get Firebug for firefox so you can see what are the hidden URLs that AJAX systems are using. That big table might be pulled from a simple XML data and firebug will show you that. Then you can just download that XML file and keep it updated by a simple urllib library or a wget command.

R is also capable to do nice data scraping tasks.

answered Jul 06 '10 at 22:28

Mark%20Alen's gravatar image

Mark Alen
1323234146

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.