|
I am working on a classification task where we are building models that detect the type of an entity present in a span of text (ie, annotation). These models can be built with a dataset where each instance is represented by three independent text variables:
Data Set 1: Entity type classification in spans of text.
Data Set 2: Boundary detection - "start-of-entity" In the first example, the transition between preContext and text marks the start of an organization-type entity. In the second example, there is no entity present at the transition between preContext and text, therefore all of the dependent variable columns are marked as zero.
I been using basic NLP techniques like TF/IDF, N-grams, Tokenizers, Stemmers, POS Taggers, Stoplist for the above problem. But I now really want to do is to experiment with some new technique other than what I tried. This is my Problem and I couldn't able to find any valid techniques. If you can suggest me It will be great i.e The only way to make significant further gains is to start to start thinking outside the box!. Could you please suggest me some new techniques for solving above problems? |
This seems like named entity recognition. CRFs are popular for that, with various feature sets usually comprising things like "previous word is X" for all words X.