1
1

I'm very new so bear with me.

What specific NLP techniques should I consider to perform three tasks to extraction information from researcher's summary text documents: (1)identify researcher, name of research project and sponsoring institution [ I believe this falls within the entity recognition NLP task] (2)identify the research projects phases i.e. planning; research design; research analysis; research dissemination and elements within these phase e.g. planning: problem/opportunity described, hypothesis and/or question(s), purpose, significance, anticipated results and limitation and challenges [I believe this requires topic identification] (3)identify keyword/phrase within research summary text document to facilitate matching and user searches

Lastly, are there particular open-source toolkits that will accomplish these tasks? e.g. Mallet, Stanford University modeling toolbox, GenSim topic model, Apache's Mahout, NTLK, WEKA.

Thank you

asked Apr 26 '13 at 14:21

Matthew%20Reed's gravatar image

Matthew Reed
16112


2 Answers:

Thank you Eugene.

I shall review the links/documents you provided. I believe that XML files associated with the research project and the corresponding summaries/abstracts should be available and provide at least some of the keywords, and entity names, thus reducing the potential NLP processing.

Again thank you.

answered Apr 27 '13 at 20:26

Matthew%20Reed's gravatar image

Matthew Reed
16112

There's been a good amount of work to streamline grants, assign papers to referees, measure diversity at conferences, providing more helpful "Categories and Subject Descriptors" and "Keywords" etc. As with general indexing, there's quite a few features to select from: the raw text of the paper, outbound citations, inbound citations to researcher' prior work,

http://www.cs.cmu.edu/~dshahaf/kdd2012-shahaf-guestrin-horvitz.pdf

http://www.ics.uci.edu/~newman/

http://www.cs.toronto.edu/~lcharlin/papers/framework_for_optimizing_paper_matching.pdf

http://www.reddit.com/r/MachineLearning/comments/1bd95f/using_ml_to_assign_conference_abstracts_to/

answered Apr 27 '13 at 17:57

eugene%20tani's gravatar image

eugene tani
112

edited Apr 27 '13 at 18:04

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.