I am trying to match new product description with the existing ones. Product description looks like this: Panasonic DMC-FX07EB digital camera silver. These are steps to be performed:

  1. Tokenize description in form of: Panasonic => Brand, DMC-FX07EB => Model, etc.
  2. Get few candidates with similar features
  3. Get the best candidate.

I am having problem with the first step (1). In order to get 'Panasonic => Brand', DMC-FX07EB => Model, silver => color, I need to have index where each token of the product description correspond to certain attribute name (Brand, model, color, etc.) in the existing database. The problem is that in my database product descriptions are presented as one atomic attribute e.g. 'description' (no separated product attributes).

Basically I don't have training data, so I am trying to build index of all product attributes so I can build training data. Any suggestions? Better approach to do this?

P.S. For every product I have manually matched product description, which is as well in a form of one atomic attribute.

asked Dec 01 '14 at 05:42

Dzeno's gravatar image

Dzeno
1112

edited Dec 01 '14 at 07:14

Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.