Hi, I would like to extract relevant information from the HTML pages and free-text as well. I have read several approaches and IE tools. I found that there are some the approaches might be useful, such as: WHISK, RAPIER, RoadRunner, SRV.

So, anyone has tried those approaches before or used another ones. I need any comments and review about this problem.

Thanks!

This question is marked "community wiki".

asked Jul 20 '11 at 03:17

khanh%20leo's gravatar image

khanh leo
1234


4 Answers:

You can use grammar description language to extract information such as http://code.google.com/p/graph-expression/

This answer is marked "community wiki".

answered Jul 20 '11 at 04:58

yura's gravatar image

yura
1025374854

Thanks your reply! So, the open-source which you preferred has any documentation, docs or APIs? And how about powerful of this one for Information Extraction field?

This answer is marked "community wiki".

answered Jul 20 '11 at 09:37

khanh%20leo's gravatar image

khanh leo
1234

pretty powerful it used in several NLP commercial startups with some extension as replacer of GATE.

(Jul 21 '11 at 03:33) yura

Here are my bookmarks on the topic: http://pinboard.in/u:lrwiman/t:information+extraction I've been collecting all the papers and links I've seen on the topic for the past several months. I hope that's helpful.

This answer is marked "community wiki".

answered Jul 31 '11 at 02:37

Lucas%20Wiman's gravatar image

Lucas Wiman
13615

You can find this blog post helpful. And here is a huge list of approaches and resources which could guide you.

This answer is marked "community wiki".

answered Jul 21 '11 at 20:13

johny's gravatar image

johny
11

edited Jul 21 '11 at 20:14

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.