|
If I wanted to extract product names from text, how would I get training data? E.g. 'makeup' or 'Dolce & Gabbana' might both be terms that would indicate a product that someone might buy. Doing a Google search to see if there are ads on the term seems like a good way, but they'd ban you as a bot before long. What other ways would there be to determine if a word might have commercial significance? |
|
Have you tried Wikipedia? D&G is in the category Luxury Brands (see bottom of page), which itself is in the category brand, etc. You can download Wikipedia dumps in sql or xml. |