|
Hi all,
(It is ok if the prediction is accurate within 1 month either side of the sale month.) So the models' outputs and actual outcomes look like this:
What is the best way to evaluate such a model? Thanks in advance, Anton |
|
For the question of when the car will sell, my answer is that it depends how much you know. In particular, I'm struck by this statement here:
If you have some explicit knowledge about the costs of the model (i.e. that one-off predictions have no cost ... if that is what that statement actually means) then it is best to incorporate that knowledge to the extent possible. The more accurately your evaluation function reflects the true misclassification costs, the better. If the model is making discrete predictions (i.e. will only ever predict 0, 1, 2, ... 12) and one-off predictions have no cost, then an explicit cost matrix may be the best answer. You could assign no cost to the diagonal and off-diagonal entries, and then decide on what it means to miss by two, three, four, or whatever months. Also, if it is more expensive to over-predict than under-predict, for example, you could incorporate that as well. Does that make sense? I hope this helps. |
|
Sounds like this could be formulated as a regression problem. In general you would use something like the RMS (root mean square) error: sqrt{frac{sum_i^n{(f(x_i)-y_i)^2}}{n}} Sorry, I have an image of this equation, but the interface of this site fails when selecting an image to upload :( |
|
I guess you can use some sort of logistic regression for your prediction model. In order to evaluate the accuracy of your model you can use AUC. See the following paper for more information (or any other paper in a machine learning journal Shivani Agarwal, Thore Graepel, Ralf Herbrich, Sariel Har-Peled, and Dan Roth. Generalization bounds for the area under the ROC curve. Journal of Machine Learning Research, 6:393–425, 2005. |
|
Thank you both for your suggestions. Michael, the problem with using RMSE is that 0 is quite different from 1-12 but is treated the same by this evaluation method. Mark, in formulating this as a binary prediction problem, are you suggesting that predictions should be made for a given car in a given month? So if Car 1 is predicted to sell in January then the prediction should be 1 otherwise the prediction should be 0. This is a good solution, however, I also want to give credit to models that are able to predict accurately within a month either side of the actual month. You're quite right. You could view this as two separate problems: First decide if the car will be sold at all or not (binary problem), then, if it is going to be sold, predict when it will be sold (regression or multi-class problem). Because of the predictions being per month, the second problem this is currently formulated as a multiclass problem. However, looking at the nature of the problem, it would be more natural to formulate it as a regression problem. Any chance you will have more accurate data than 'per month'?
(Jul 07 '10 at 06:41)
Michel Valstar
|