I am trying to understand the details of the second architecture described in Natural language processing (almost) from scratch. In the paper it is called the "sentence approach" and extracts features from local windows at each position in the input sentence and then takes a max over word positions in the standard max pooling way. Table 9 in the paper shows part of speech tagging, chunking, and NER results for the max pooling approach. For these tasks the system needs to make a prediction for each word in the sentence. How can the pooling system still make accurate predictions for these tasks while the function it implements is invariant to a reordering of the input words? I suppose it isn't quite invariant since the weights for a single window will produce a different activation if the words in that window get reordered, but once the local window features get computed, their ordering is lost. Is this enough to get the results in table 9? Or do the inputs get augmented in a similar way to what is done for SRL, i.e. each input position gets an extra feature that is the displacement to the position being tagged by the network?

Although I can't find any text mentioning that the input feature augmentation occurs for tasks other than SRL, since the feature vector after pooling is of a fixed size, in order to make predictions for multiple positions, they need to run the network repeatedly. In order to not predict the same thing for all positions, the input needs to change when they run the network again.

asked Jan 21 '13 at 19:20

gdahl's gravatar image

gdahl ♦
341453559

edited Jan 21 '13 at 19:42


One Answer:

See section 3.3.3 - in the sentence approach you always have "additional markers in the network input," which note the distance to the word of interest. For NER you need just one marker (i - pos_w, w = current word for tagging) and for SRL you need the second one indicating distance to the verb (i - pos_v) indicated on page 2471.

There's no way to reorder the words of the sentence without disrupting some of the local features, as you observe.

answered Dec 03 '13 at 11:44

Christopher%20Malon's gravatar image

Christopher Malon
1

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.