2
1

Hi everyone, i'm studying simple machine learning algorithms, beginning with a simple gradient descent, but i've got some trouble trying to implement it in python.

here is the example i'm trying to reproduce, i've got data about houses with the (living area (in feet2), and number of bedrooms) with the resulting price :

Living area (feet2) : 2104

#bedrooms : 3

Price (1000$s) : 400

i'm trying to do a simple regression using the gradient descent method, but my algorithm won't work... The form of the algorithm is not using vectors on purpose (i'm trying to understand itstep by step).

i = 1
import sys
derror=sys.maxint
error = 0
step = 0.0001
dthresh = 0.1
import random

theta1 = random.random()
theta2 = random.random()
theta0 = random.random()
while derror>dthresh:
    diff = 400 - theta0 - 2104 * theta1 - 3 * theta2
    theta0 = theta0 + step * diff * 1
    theta1 = theta1 + step * diff * 2104
    theta2 = theta2 + step * diff * 3
    hserror = diff**2/2
    derror = abs(error - hserror)
    error = hserror
    print 'iteration : %d, error : %s' % (i, error)
    i+=1

I understand the math, i'm constructing a predicting function $$h_{theta}(x) = theta_0 + theta_1 x_1 + theta_2 x_2$$ with $x_1$ and $x_2$ being the variables (living area, number of bedrooms) and $h_{theta}(x)$ the estimated price.

I'm using the cost function ($hserror$) (for one point) : $$hserror = frac{1}{2} (h_{theta}(x) - y)^2$$ This is a usual problem, but i'm more of a software engineer and i'm learning one step at a time, can you tell me what's wrong ?

Thank you for your time.

asked Oct 01 '10 at 05:53

ogirardot's gravatar image

ogirardot
36116

If you're using python, consider writing your code in Theano, which can automatically compute gradients for you.

(Oct 01 '10 at 10:46) Joseph Turian ♦♦

2 Answers:

Technically your code is correct. The input values you use however, differ a lot in size and the value 2104 will cause the algorithm to jump around too wildly. This simply means you have to use a very low learning rate. Try setting it to 1e-7 or 1e-8. In practice this is not so much of an issue when you have a dataset of more than one datapoint with not too extreme differences in mean and variance of the different variables. Hope this helped.

answered Oct 01 '10 at 07:20

Philemon%20Brakel's gravatar image

Philemon Brakel
153092244

okay i did as you said, (thank you for telling me i'm not completely crazy :)), but now the only answer i end up having is the tuple (0, 0, 0) for (theta0, theta1, theta2). Is it because i use only one point to find three variables ?

(Oct 01 '10 at 08:26) ogirardot

I actually ran your code :) and for me this is not the case when I use a sufficiently low learning rate... Are you sure you didn't change anything else? It should not matter that there is only one data point although the problem is overdetermined and there will be infinitely many solutions to it that have zero error in this case ( like (400, 0, 0) for example). (0, 0, 0) is obviously not one of those though.

(Oct 01 '10 at 08:46) Philemon Brakel

you're right i don't know what happened, now i get this : iteration : 211, error : 0.982183829028, derror : 0.0930941727598 theta0 : 35, theta1 : 0, theta2 : 98

but using these thetas i end up with : 35 + 21040 + 983 = 329 (!= 400).

i guess this is the consequence of my stopping condition (error < 0.1).

(Oct 01 '10 at 09:01) ogirardot

is there anyway for me to check that this is "actually" working graphically (for example). Because now even if i lower the error condition to (error < 0.0000001), i get results like that :

iteration : 375, error : 1.01978422998e-09, derror : 9.66580450426e-11

theta0 : 63, theta1 : 0, theta2 : 40

63+40*3 = 183

Even further from any solution ... i don't get it

(Oct 01 '10 at 09:05) ogirardot

You should indeed be very close to the actual score now with such a low error score. Are you sure theta1 is exactly 0? If it is only like 0.01 it still has a significant influence.

(Oct 01 '10 at 09:11) Philemon Brakel

i posted the new code i use, btw i reproduced the (0,0,0) solution, it's when i use thetas between [0:1] using random.random() values...

Something must be wrong

(Oct 01 '10 at 09:17) ogirardot

actually you were right, it's just that python printed integer (lol) i tried something with the folowing code, and it worked as you can see from the output.

(Oct 01 '10 at 09:23) ogirardot

btw as you could guess there also was no problem with the (0,0,0) solution as it turned out to be more ( theta0 : 0.409616, theta1 : 0.188541, theta2 : 0.966485) solutions...

Sorry for being silly

(Oct 01 '10 at 09:34) ogirardot
showing 5 of 8 show all

executing this code 10 times :

for x in range(10):
    i = 1
    import sys
    derror=sys.maxint
    error = 0
    step = 0.00000001
    dthresh = 0.0000000001
    import random

    theta1 = random.random()*100
    theta2 = random.random()*100
    theta0 = random.random()*100
    while derror>dthresh:
        diff = 400 - (theta0 + 2104 * theta1 + 3 * theta2)
        theta0 = theta0 + step * diff * 1
        theta1 = theta1 + step * diff * 2104
        theta2 = theta2 + step * diff * 3
        hserror = diff**2/2
        derror = abs(error - hserror)
        error = hserror
        #print 'iteration : %d, error : %s, derror : %s' % (i, error, derror)
        i+=1
    print ' theta0 : %d, theta1 : %d, theta2 : %d' % (theta0, theta1, theta2)
    print ' done : %d' %(theta0 + 2104 * theta1 + 3*theta2)

and the result is :

 theta0 : 54, theta1 : 0, theta2 : 2
 done : 400
 theta0 : 50, theta1 : 0, theta2 : 78
 done : 400
 theta0 : 47, theta1 : 0, theta2 : 94
 done : 400
 theta0 : 66, theta1 : 0, theta2 : 81
 done : 400
 theta0 : 88, theta1 : 0, theta2 : 2
 done : 400
 theta0 : 38, theta1 : 0, theta2 : 26
 done : 400
 theta0 : 42, theta1 : 0, theta2 : 98
 done : 400
 theta0 : 0, theta1 : 0, theta2 : 14
 done : 400
 theta0 : 15, theta1 : 0, theta2 : 58
 done : 400
 theta0 : 17, theta1 : 0, theta2 : 45
 done : 400

So it works ! the %d was just obfuscating it, changing it to %f it becomes :

 theta0 : 48.412337, theta1 : 0.094492, theta2 : 50.925579
 done : 400.000043
 theta0 : 0.574007, theta1 : 0.185363, theta2 : 3.140553
 done : 400.000042
 theta0 : 28.588457, theta1 : 0.041746, theta2 : 94.525769
 done : 400.000043
 theta0 : 42.240593, theta1 : 0.096398, theta2 : 51.645989
 done : 400.000043
 theta0 : 98.452431, theta1 : 0.136432, theta2 : 4.831866
 done : 400.000043
 theta0 : 18.022160, theta1 : 0.148059, theta2 : 23.487524
 done : 400.000043
 theta0 : 39.461977, theta1 : 0.097899, theta2 : 51.519412
 done : 400.000042
 theta0 : 40.979868, theta1 : 0.040312, theta2 : 91.401406
 done : 400.000043
 theta0 : 15.466259, theta1 : 0.111276, theta2 : 50.136221
 done : 400.000043
 theta0 : 72.380926, theta1 : 0.013814, theta2 : 99.517853
 done : 400.000043

Thank you for your help !!! i wouldn't have managed to go this far alone !

answered Oct 01 '10 at 09:16

ogirardot's gravatar image

ogirardot
36116

edited Oct 01 '10 at 09:26

1

For me it works fine (error of order 1e-9) but it it will indeed walk through directions of equivalent solutions because you have too many parameters. It can increase one coefficient and lower one of the others while staying at the same solution and decreasing the error only slightly.

(Oct 01 '10 at 09:37) Philemon Brakel
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.