|
|
There are different thing you can do to improve Logistic Regression, but it depends on how messy you want to get with the code. I see you are using "l1" as a penalty, why? Is your data-set sparse? If not, perhaps you can have better results using l2. When you use Cross Validation, how many folds are you using? 2,3,4. You have to check if you have a high variance error or a high bias error. If you have a high variance, using more data or a smaller set of features might be a good idea. If you have a high bias, you can try looking for more features. You can also try to modify the weight of the "penalty" parameter. If you have access to the source code, you can slo try using other optimizers. Since it is Kaggle, you can always try to use other algorithms, like SVMs or Gaussian Processes. Here is a link with some rules of thumb to improve your results in a classification setting. what specifically penalty means and when should we use them? What do you mean by folds in Cross Validation ? also Thank you Leon for details
(Feb 22 '12 at 00:50)
daydreamer
In regression problems (linear or logistic), you can say that the weights might tend to get out of control (keep growing) if you do a simple optimization. To prevent this, we use penalty (or regularization) terms, which will "clamp" the growth of the weights. Without this clamping, your model would be a perfect fit for the training data, but a terrible fit for your test data (that is called overfitting). http://en.wikipedia.org/wiki/Overfitting The clamp makes your model flexible to new training data. We call this: "preventing overfitting". Basically, the model you end up with is not a perfect fit for your training data, but it will be a better fit for new data. Common examples of penalties are l1 and l2 regularization, which are the magnitude of your parameter set in 2 different spaces. I hope is clear
(Feb 22 '12 at 00:55)
Leon Palafox ♦
totally understood this, thank you Leon, I would try out the suggestions and will share my results. thank you again, much appreciated!
(Feb 22 '12 at 01:01)
daydreamer
Sure, good luck
(Feb 22 '12 at 01:02)
Leon Palafox ♦
BTW, I forgot, folds in the Cross Validation mean the number of times you divide your data to do a cross checking. http://en.wikipedia.org/wiki/Cross-validation_(statistics) Usually libraries have a default of 2, but perhaps you can change that to more and see what happens.
(Feb 22 '12 at 01:04)
Leon Palafox ♦
cool, thank you
(Feb 22 '12 at 01:13)
daydreamer
Could you clarify what you mean by 'spaces' in "...are l1 and l2 regularization, which are the magnitude of your parameter set in 2 different spaces." please?
(Feb 22 '12 at 16:36)
Vam
L2 and L1 are metrics, a regularizer is basically the sum of all the magnitudes of your weights. And the magnitude of a vector is it's distance to the origin. L1 and L2 define different spaces with different metrics, which means that you define different operations for the distance. There are as many LP spaces as you want, but L1 and L2 are the commonly used in ML settings. http://en.wikipedia.org/wiki/L1-norm
(Feb 22 '12 at 17:48)
Leon Palafox ♦
showing 5 of 8
show all
|