• I am using scikits, python
  • I am trying to work on a Logistic Regression problem
  • The results that I get are as follows[0 means driver not alert, 1 means driver alert]

    C= 1.9

    Cross validation Set
    precision recall f1-score support

          0       0.83      0.73      0.78     51003
          1       0.82      0.89      0.85     69863

    avg / total 0.82 0.82 0.82 120866

    Test Set
    precision recall f1-score support

          0       0.82      0.73      0.77     50590
          1       0.82      0.89      0.85     70276

    avg / total 0.82 0.82 0.82 120866

  • In order to learn whats going wrong, we need to understand if there is a high variance vs high bias in the data and take necessary steps further

  • What are the ways(good) one can implement to test this? I am very new to the field so don't know much about it

asked Feb 22 '12 at 10:16

daydreamer's gravatar image


2 Answers:

From a practical perspective, what you really want to know is whether or not you are overfitting the training data. One good way to test this is to look at your mean error on the training set vs on the test set. If they are the same, this is an indication that you are not overfitting the training data. You may perhaps be underfitting it in this case, however. If your error on the training data is significantly less than that on the test data, this indicates overfitting. Ideally, your model should be as complex as possible (e.g., the regularization should be as weak as possible) while keeping the training and test errors about equal to each other.

answered Feb 22 '12 at 11:05

Kevin%20Canini's gravatar image

Kevin Canini

Let me repeat here the answer I gave on quora. You can estimate the amount of remaining bias vs variance by plotting the learning curves as demonstrated in this gist. The idea comes from Practical advice for applying machine learning which is a blog post that summarized practical tips and tricks from the ml-class.org online class.

answered Feb 22 '12 at 10:51

ogrisel's gravatar image


Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.