Revision history[back]
click to hide/show revision 1
Revision n. 1

Jul 09 '10 at 00:49

Joseph%20Turian's gravatar image

Joseph Turian
579051125146

I second the suggestion to use 80legs for simple computational crawling. You can use my code py80legsformat to grok their data from within Python.

get-theinfo is a great mailing list of data hoarders. Many times, the people on that list already have the data that you need.

I also think there would be value in asking on this site where you can get so-and-so specific dataset.

click to hide/show revision 2
Revision n. 2

Jul 12 '10 at 06:38

Joseph%20Turian's gravatar image

Joseph Turian
579051125146

I second the suggestion to use 80legs for simple computational crawling. You can use my code py80legsformat to grok their data from within Python.

get-theinfo is a great mailing list of data hoarders. Many times, the people on that list already have the data that you need.

I also think there would be value in asking on this site where you can get so-and-so specific dataset.

[edit: SiteScraper recently released their Python web-scraping code here as webscraping and sitescraper. Given that this is the author's primary freelancing specialty, I am interested to check it out.]

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.