|
Hi. So.. if I have a continuous valued feature , say price, then I can apply all sorts of transformations in it .. like mean normalization and stuff. But when you have a discrete feature (lets say some sort of ids), then how to do all these steps. Lets say, I am doing the linear regression with price and some id as feature. then my learned function is : weight_1 * price + weight_2*id?? That doesnt sound right?? Any pointers will be appreciated |
|
Encode categorical feature using expanded boolean / binary variables:
becomes:
Then you can proceed with regular feature normalization techniques (e.g. variance scaling, whitening). Note that if you have categories with a high cardinality (e.g. "some sort of ids"), this binary representation will greatly increase the number of features. In that case you will have to use a sparse matrix representation for your dataset and you should do any variance scaling without centering the data so as to not destroy the zeros and explode the memory usage. Hi. I am not sure that I understood. So in your example.. Now instead of one feature color: (some value) I will have something like feature:color --> red 1, blue 0, green 0 (If i have to indicate that the current example has read color.. or is it something like i represent a vector 100 ?? and then how would i apply feature transformation in it.. Lets say, I have 3 examples.. whose color attributes are red, blue and green? How would i do variance scaling in this? Thanks
(Mar 27 '12 at 00:26)
Mohitdeep Singh
|
|
Alright.. I discussed this in detail in quora.. and finally understood. Just to share back with the community. link:(http://www.quora.com/Machine-Learning/What-are-good-ways-to-deal-with-problems-where-you-have-both-discrete-and-continous-features) Thanks for all the help link here |