0
1

Hi. So.. if I have a continuous valued feature , say price, then I can apply all sorts of transformations in it .. like mean normalization and stuff. But when you have a discrete feature (lets say some sort of ids), then how to do all these steps. Lets say, I am doing the linear regression with price and some id as feature. then my learned function is : weight_1 * price + weight_2*id?? That doesnt sound right?? Any pointers will be appreciated

asked Mar 26 '12 at 18:02

Mohitdeep%20Singh's gravatar image

Mohitdeep Singh
21569


2 Answers:

Encode categorical feature using expanded boolean / binary variables:

"colors" in ["red", "blue", "green"]

becomes:

"colors=red" in [0, 1]
"colors=blue" in [0, 1]
"colors=green" in [0, 1]

Then you can proceed with regular feature normalization techniques (e.g. variance scaling, whitening). Note that if you have categories with a high cardinality (e.g. "some sort of ids"), this binary representation will greatly increase the number of features. In that case you will have to use a sparse matrix representation for your dataset and you should do any variance scaling without centering the data so as to not destroy the zeros and explode the memory usage.

answered Mar 26 '12 at 19:11

ogrisel's gravatar image

ogrisel
498995591

edited Mar 26 '12 at 19:11

Hi. I am not sure that I understood. So in your example.. Now instead of one feature color: (some value) I will have something like feature:color --> red 1, blue 0, green 0 (If i have to indicate that the current example has read color.. or is it something like i represent a vector 100 ?? and then how would i apply feature transformation in it.. Lets say, I have 3 examples.. whose color attributes are red, blue and green? How would i do variance scaling in this? Thanks

(Mar 27 '12 at 00:26) Mohitdeep Singh

Alright.. I discussed this in detail in quora.. and finally understood. Just to share back with the community. link:(http://www.quora.com/Machine-Learning/What-are-good-ways-to-deal-with-problems-where-you-have-both-discrete-and-continous-features)

Thanks for all the help link here

answered Mar 28 '12 at 17:30

Mohitdeep%20Singh's gravatar image

Mohitdeep Singh
21569

edited Mar 28 '12 at 17:32

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.