|
Hi everyone, i'm studying simple machine learning algorithms, beginning with a simple gradient descent, but i've got some trouble trying to implement it in python. here is the example i'm trying to reproduce, i've got data about houses with the (living area (in feet2), and number of bedrooms) with the resulting price : Living area (feet2) : 2104 #bedrooms : 3 Price (1000$s) : 400 i'm trying to do a simple regression using the gradient descent method, but my algorithm won't work... The form of the algorithm is not using vectors on purpose (i'm trying to understand itstep by step).
I understand the math, i'm constructing a predicting function $$h_{theta}(x) = theta_0 + theta_1 x_1 + theta_2 x_2$$ with $x_1$ and $x_2$ being the variables (living area, number of bedrooms) and $h_{theta}(x)$ the estimated price. I'm using the cost function ($hserror$) (for one point) : $$hserror = frac{1}{2} (h_{theta}(x) - y)^2$$ This is a usual problem, but i'm more of a software engineer and i'm learning one step at a time, can you tell me what's wrong ? Thank you for your time. |
|
Technically your code is correct. The input values you use however, differ a lot in size and the value 2104 will cause the algorithm to jump around too wildly. This simply means you have to use a very low learning rate. Try setting it to 1e-7 or 1e-8. In practice this is not so much of an issue when you have a dataset of more than one datapoint with not too extreme differences in mean and variance of the different variables. Hope this helped. okay i did as you said, (thank you for telling me i'm not completely crazy :)), but now the only answer i end up having is the tuple (0, 0, 0) for (theta0, theta1, theta2). Is it because i use only one point to find three variables ?
(Oct 01 '10 at 08:26)
ogirardot
I actually ran your code :) and for me this is not the case when I use a sufficiently low learning rate... Are you sure you didn't change anything else? It should not matter that there is only one data point although the problem is overdetermined and there will be infinitely many solutions to it that have zero error in this case ( like (400, 0, 0) for example). (0, 0, 0) is obviously not one of those though.
(Oct 01 '10 at 08:46)
Philemon Brakel
you're right i don't know what happened, now i get this : iteration : 211, error : 0.982183829028, derror : 0.0930941727598 theta0 : 35, theta1 : 0, theta2 : 98 but using these thetas i end up with : 35 + 21040 + 983 = 329 (!= 400). i guess this is the consequence of my stopping condition (error < 0.1).
(Oct 01 '10 at 09:01)
ogirardot
is there anyway for me to check that this is "actually" working graphically (for example). Because now even if i lower the error condition to (error < 0.0000001), i get results like that : iteration : 375, error : 1.01978422998e-09, derror : 9.66580450426e-11 theta0 : 63, theta1 : 0, theta2 : 40 63+40*3 = 183 Even further from any solution ... i don't get it
(Oct 01 '10 at 09:05)
ogirardot
You should indeed be very close to the actual score now with such a low error score. Are you sure theta1 is exactly 0? If it is only like 0.01 it still has a significant influence.
(Oct 01 '10 at 09:11)
Philemon Brakel
i posted the new code i use, btw i reproduced the (0,0,0) solution, it's when i use thetas between [0:1] using random.random() values... Something must be wrong
(Oct 01 '10 at 09:17)
ogirardot
actually you were right, it's just that python printed integer (lol) i tried something with the folowing code, and it worked as you can see from the output.
(Oct 01 '10 at 09:23)
ogirardot
btw as you could guess there also was no problem with the (0,0,0) solution as it turned out to be more ( theta0 : 0.409616, theta1 : 0.188541, theta2 : 0.966485) solutions... Sorry for being silly
(Oct 01 '10 at 09:34)
ogirardot
showing 5 of 8
show all
|
|
executing this code 10 times :
and the result is :
So it works ! the %d was just obfuscating it, changing it to %f it becomes :
Thank you for your help !!! i wouldn't have managed to go this far alone ! 1
For me it works fine (error of order 1e-9) but it it will indeed walk through directions of equivalent solutions because you have too many parameters. It can increase one coefficient and lower one of the others while staying at the same solution and decreasing the error only slightly.
(Oct 01 '10 at 09:37)
Philemon Brakel
|
If you're using python, consider writing your code in Theano, which can automatically compute gradients for you.