Dear all,

I'm new to machine learning, and want to get clear about a few things before I dig into implementation.

We deal with semantic e-commerce, more specifically we have to get done the task of putting machine-readable annotations to web pages, for example to product offerings. (see http://purl.org/goodrelations/ )

Our approach to date is to develop extensions for the shop systems which extract the values such as product name, price etc. from the DB and wrap it with the RDFa code to implement it as a RDFa chunk at the bottom of the HTML page.

My idea is now as we have a decent base of HTML source pages (product offerings) and matching machine-readable RDFA (or the data-entities like price if you want), one could use that to "train" a machine to extract that automatically in unknown pages.

Do you think this approach as viable?

Thanks in advance,

Uwe

asked Aug 09 '10 at 10:44

semantium's gravatar image

semantium
1112

1

The main question is what is the penalty if your automatic approach extracts the information incorrectly. And how expensive is it to manually correct the automatic extraction?

(Aug 09 '10 at 11:10) Joseph Turian ♦♦

One Answer:

This approach is viable, yes. This is called information extraction in the machine learning community, and you can find a good reference here. I think there is software out there that would make it easy to duplicate these results.

answered Aug 09 '10 at 11:01

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.