I am trying to evaluate model performance (regression problem). In literature, some use RMSe and others use correlation.

Is there any difference between both the approaches? Here: http://stats.stackexchange.com/questions/56302/what-are-good-rmse-values I saw that RMSE is dependent on data range. Is this the only difference? Please let me know.

Thanks :)

asked Apr 11 '14 at 06:10

Sharath%20Chandra's gravatar image

Sharath Chandra
311131621


One Answer:

If your goal is to actually measure the performance of a regression model (i.e., accuracy), you should be using RMSE or a similar metric.

Spearman's Correlation role is more to provide an indication of monotonicity of the prediction, as it ignores the distances between values and looks only at the rankings themselves.

In other words, RMSE lets you know if things are close to where you say they are, Spearman's lets you know if they are in the right order.

answered Apr 11 '14 at 23:01

Daniel%20E%20Margolis's gravatar image

Daniel E Margolis
1065510

Is RMSE dependent on the range of values which the predicted outputs can take? For example, if my output range is -4 to 4 and I get an rmse of 1, what does it mean? And if the range is -100 to 100, what does the rmse value of 1 mean?

Is it sure that when rmse is decreasing, correlation has to increase?

(Apr 12 '14 at 01:06) Sharath Chandra

Yes, the RMSE value is dependent on the range of values since it is a measure of error.

For example, if you are trying to predict the amount of time that a computer process will take your error is likely to be in minutes or seconds (or less). Alternatively, if you are trying to predict the amount of time that it will take to build a house, it may take days or weeks.

Being incorrect by an hour is a huge error in the first example, but is almost negligible in the second example. So a RMSE of thirty seconds for a computer process might mean about the same thing as an RMSE of 12 hours for building a house. Thus, it really can't be used to compare models that predict different values, only models predicting the same thing.

As to your other question, when RMSE is decreasing, it means that some error in the model prediction is decreasing. This is less of an issue of correlation as it is about accuracy. In other words, when you say that house X will take Y days to build, how close to Y days does it take on average?

Of course, it is a bit different than just the mean absolute error, as the act of squaring the errors before they are averaged can amplify the impact of larger errors (error of 4 becomes 16). If your issue is purely one of correlation, calculating Pearson's r between the model predictions and the actual values could be done, if desired. I would only be curious as to why you were choosing correlation rather than accuracy as your measure of performance.

(Apr 12 '14 at 10:09) Daniel E Margolis
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.