The Entropy of a feature might be a good way to assess its relevance to your classification task.
A feature which occurs uniformly across classes has high entropy and does not really provide much information to help you in your classification.
OTOH one which occurs only in one or just a few classes has a lower entropy, i.e. is more opinionated, so might be a better training feature.
Eventually you could start adding additional features by figuring out which ones can help you discriminate between the non-diagonal pairs of your confusion matrix which show a high misclassification rate.