|
I am training a classifier using a set of observations that take on fractional values (e.g., 3/4 or 49/70). I don't have enough training data for 49/70 to occur many times, and I was tempted to use some kind of binning scheme to turn these values into discrete feature values. The problem is that both the ratio and the cardinality of the numerator/denominator are important parts of each observation. Is there a principled way to convert fractions like these into features to feed into the classifier. Assume it is a classifier where features are taken independently, like Naive Bayes, so converting each observation into multiple features isn't a good option. So if I have observations 3/4 (=.75), 36/50 (=.72) and 48/62 (=.77), I want to turn these into discrete feature values where the second and third are more similar (i.e., more likely to take on the same feature value) to each other than to the first, because the numerators/denominators are much larger. I am not particularly familiar with strategies for converting observations into discrete feature values, so any pointers will be much appreciated. |
|
Does it make sense to use the ratio and add a supplemental feature for the denominator or the order of magnitude of the denominator? |
Maybe you can describe more of what you are trying to do?
Are you trying to build a feature set to classify some complex data points, that themselves have some kind of binomial observations in them? Is the cardinality important because you have high uncertainty when your cardinality is low?
Good question, I guess I left out that important detail. The actual observations are somewhat arcane, but the important thing is that they are much more likely to have low cardinality, so the ones with high cardinality are fairly sparse. My training data is missing many x/y values, so I thought that binning would allow me to group the observation values and provide more coverage with my training data.
My first attempt used bins over the fractional values (e.g., 0 <= x/y < .1 in bin 1, etc.), but observations with low cardinality have much different class distributions than observations with high cardinality, so that didn't work well.