|
Dear all, I was wondering if there are some papers(or technical reports, blog posts, ...) about template engineering for CRF. One of the most critical part in using CRF is the one in which we need to tell the graphical model how to build the features starting from our dataset. In other words, what are the relations among tokens and features that we think are sensible for the classification task. In CRF++, and other similar tools, we do this crafting a template file. Currently, I am using a Python script that reads the training set and generates a template file according to some rules I coded in it. The problem is that those rules are applied to every feature without caring too much if the feature N is significantly related with the N+X. This leads to the use of huge template (and then huge feature sets) that inevitably leads to a bit of over-fitting. A possibility could be reading the feature weights that are computed after the training phase and make them a little bit more smooth. This is what regularisation parameter already try to do. My point: I would like to produce, potentially, a .template file manually, I mean with awareness. Are there around some guidelines, tricks, suggestions for doing this? Even personal experiences. Thanks, michele. |
|
There are no papers on this specifically that I know of, but engineering templates for CRFs is not very different from normal feature engineering, and you cand find examples of features in a large fraction of NLP papers. One good idea is to store the feature template configuration in a data file, like xml, and write code to generate the templates so you can easily cross-validate it. Thanks Alex.
(Feb 05 '13 at 08:10)
michele
|