Revision history[back]
click to hide/show revision 1
Revision n. 1

Jul 08 '10 at 11:32

Andrew%20Montalenti's gravatar image

Andrew Montalenti
314

Check out Scrapy:

http://scrapy.org/

It's perfect for large-scale scraping tasks. We use it for all sorts of one-time scraping tasks on my startup, http://parse.ly. Usually takes about 1 hour to write a scraper for a big site, and then the crawls run pretty quickly due to use of Python Twisted (evented IO framework). Plus, comes with a nice web-based console for monitoring crawl jobs in process.

click to hide/show revision 2
Revision n. 2

Jul 08 '10 at 11:32

Andrew%20Montalenti's gravatar image

Andrew Montalenti
314

Check out Scrapy:

http://scrapy.org/Scrapy.

It's perfect for large-scale scraping tasks. We use it for all sorts of one-time scraping tasks on my startup, http://parse.ly. Usually takes about 1 hour to write a scraper for a big site, and then the crawls run pretty quickly due to use of Python Twisted (evented IO framework). Plus, comes with a nice web-based console for monitoring crawl jobs in process.

click to hide/show revision 3
Revision n. 3

Jul 08 '10 at 11:33

Andrew%20Montalenti's gravatar image

Andrew Montalenti
314

Check out Scrapy.

It's perfect for large-scale scraping tasks. We use it for all sorts of one-time scraping tasks on my startup, http://parse.ly. Parse.ly. Usually takes about 1 hour to write a scraper for a big site, and then the crawls run pretty quickly due to use of Python Twisted (evented IO framework). Plus, comes with a nice web-based console for monitoring crawl jobs in process.

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.