If I wanted to extract product names from text, how would I get training data? E.g. 'makeup' or 'Dolce & Gabbana' might both be terms that would indicate a product that someone might buy. Doing a Google search to see if there are ads on the term seems like a good way, but they'd ban you as a bot before long. What other ways would there be to determine if a word might have commercial significance?

asked Nov 30 '11 at 11:45

Ben%20McCann's gravatar image

Ben McCann

One Answer:

Have you tried Wikipedia? D&G is in the category Luxury Brands (see bottom of page), which itself is in the category brand, etc. You can download Wikipedia dumps in sql or xml.

answered Dec 02 '11 at 11:22

Renaud%20Richardet's gravatar image

Renaud Richardet

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.