Dear all,

I was wondering if there are some papers(or technical reports, blog posts, ...) about template engineering for CRF. One of the most critical part in using CRF is the one in which we need to tell the graphical model how to build the features starting from our dataset. In other words, what are the relations among tokens and features that we think are sensible for the classification task. In CRF++, and other similar tools, we do this crafting a template file.

Currently, I am using a Python script that reads the training set and generates a template file according to some rules I coded in it. The problem is that those rules are applied to every feature without caring too much if the feature N is significantly related with the N+X. This leads to the use of huge template (and then huge feature sets) that inevitably leads to a bit of over-fitting.

A possibility could be reading the feature weights that are computed after the training phase and make them a little bit more smooth. This is what regularisation parameter already try to do.

My point: I would like to produce, potentially, a .template file manually, I mean with awareness. Are there around some guidelines, tricks, suggestions for doing this? Even personal experiences.

Thanks, michele.

asked Feb 04 '13 at 07:04

michele's gravatar image

michele
16113


One Answer:

There are no papers on this specifically that I know of, but engineering templates for CRFs is not very different from normal feature engineering, and you cand find examples of features in a large fraction of NLP papers. One good idea is to store the feature template configuration in a data file, like xml, and write code to generate the templates so you can easily cross-validate it.

answered Feb 04 '13 at 13:15

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

Thanks Alex.

(Feb 05 '13 at 08:10) michele
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.