I am new to machine learning but I would like to get some feedback on an observation. When you are training a neural network for a classification task all the examples that I have seen so far have you update the weights even when the classification was correct. That is the output node with the maximum value corresponds to the class. The explanation seems to be that this reinforces the correct classification. However when you train on data with class imbalance you end up with the majority, if not all the classifications, being of the majority class. It seems that the majority class is reinforced too much and the weight updates in favor of the minority classes are reversed. The winnow algorithm only updates the weights when the classification is incorrect and it seems at first glance better with class imbalance. I know that it is for a different problem but the strategy seems to be one that is applicable, only update weights on incorrect classification, in that no class is reinforced too much and weight updates for minority classes are not reversed. I haven't had the patience to test it properly (more that 10 training iterations over the data with SGD) as my hard drive has an issue and makes the OS seize up after too much work, but it seems to match the class distribution better even if the accuracy isn't real good. Is there any mathematically reason why it wouldn't work this way? Or is it likely that with more training you will just end up the same class imbalance anyway? Is it worth looking at again after HD replacement or move on?

asked Jan 12 '13 at 05:57

blah%20blah%20blah's gravatar image

blah blah blah
6111


One Answer:

You can use whatever loss function you want when training a neural net. You don't have to use one that produces non-zero updates for correctly classified training cases. The behavior you are talking about stems from using the log loss or in other words training with a cross entropy loss. In order to have the outputs of a neural net correspond to well calibrated probabilities it is necessary to push the network to increase the probability of the correct class even if the decision rule used with the network would already have selected that class. For example, imagine a training set with all the training cases in a single class (bizarre I know, but this is just a hypothetical to illustrate my point). For concreteness, call the always correct class A and let class B refer to the other class the neural net can output. Suppose that the neural net being trained has initial random weights that assign a probability of at least 0.6 to the class A for all training cases. If we use a training objective that only updates on an incorrect classification, the weights will never be updated. In this case, even though we should be certain from the training data that everything is of class A, the net will only ever produce a probability of around 60% that a training case is in class A. However, if we optimize the cross entropy loss the net will eventually assign probability 1 to all training inputs being in class A and thus report what I would call the "correct" confidence in its labeling decision.

answered Jan 13 '13 at 20:33

gdahl's gravatar image

gdahl ♦
341453559

Thanks for the answer. Nice explanation.

(Jan 15 '13 at 04:40) blah blah blah
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.