I'm building a pricing prediction model where the product prices are between $5 and $250,000. The training prices are the right half of a bell curve, with most prices below $500. I've done oversampling to balance the set more, but I find that the regressor seems to very rarely predicts towards the low end. The other problem I had was that for smaller prices, I care more about accuracy than for bigger prices. For example, the different between $5 and $100 is important, but the difference between $200,000 and $200,100 doesn't matter to me. To solve this problem, I use a logarithmic scale for the prices, which seems to help. But I'm still in the situation where it very rarely predicts below $500. Any suggestions on approaches. I'm using scikit-learn at the moment (with linear regression or random forests). Would trying to build my own loss function help?

Any insights or suggestions would be appreciated.

Thanks, Ryan

asked May 08 '14 at 17:51

Ryan%20Stout2's gravatar image

Ryan Stout2
1111

edited May 10 '14 at 13:25

Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.