I am currently working on a RandomForest based prediction method using protein sequence data. I have generated two models first model (NF) using standard set of features and the second model (HF) using hybrid features. I have done Mathews Correlation Coefficient (MCC) and Accuracy calculation and the following are my results:

Model 1 (NF): Training Accuracy - 62.85% Testing Accuracy - 56.38 MCC - 0.1673

Model 2 (HF): Training Accuracy - 60.34 Testing Accuracy - 61.78 MCC - 0.1856

Since there is a trade-off in accuracy and MCC between the models am confused about the prediction power of the models. Could you please share your thoughts on which model I should consider for further analysis.

asked Nov 30 '10 at 16:07

Khader%20Shameer's gravatar image

Khader Shameer
31248

edited Dec 02 '10 at 11:21

According to train and test accuracy, it looks like Model 1 is overfitting. What is MCC?

(Dec 01 '10 at 22:56) Frank

Frank thanks for your suggestion. MCC = Mathews correlation coefficient.

(Dec 02 '10 at 13:04) Khader Shameer

Cross posted this question to Cross Validated. http://stats.stackexchange.com/questions/5093/statistical-validation-of-randomforest-models

(Dec 03 '10 at 00:26) Khader Shameer
Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.