1
1

I have a simple neural network which classifies words into IOB labelling, and 99%+ of my labels are O (outside any sequence). I have a simple NN architecture with word embeddings as the input layer, then a hidden layer of 80 neurons, and a categorical output layer of 7 neurons (O + 3 pairs of I/B labels).

I get really quick to very small loss values with adagrad and categorical cross-entropy loss, and wonder if the network is able to train on such small values of gradients at all.

For example, these are my training and validation loss/accuracy values after 146 epochs:

Epoch 146

276055/276058 [============================>.] - ETA: 0s - loss: 0.0065 - acc: 0.9989

Validation -- loss: 0.00444149752754 - acc: 0.99941503533

Is there anything I can do to make training effective in such case? I have already created a balanced training and validation sets (have the same number of sentences with and without interesting sequences of I/B in them), but that doesn't solve the problem.

asked Sep 17 at 12:01

Beka's gravatar image

Beka
16122

Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.