|
I'm building a pricing prediction model where the product prices are between $5 and $250,000. The training prices are the right half of a bell curve, with most prices below $500. I've done oversampling to balance the set more, but I find that the regressor seems to very rarely predicts towards the low end. The other problem I had was that for smaller prices, I care more about accuracy than for bigger prices. For example, the different between $5 and $100 is important, but the difference between $200,000 and $200,100 doesn't matter to me. To solve this problem, I use a logarithmic scale for the prices, which seems to help. But I'm still in the situation where it very rarely predicts below $500. Any suggestions on approaches. I'm using scikit-learn at the moment (with linear regression or random forests). Would trying to build my own loss function help? Any insights or suggestions would be appreciated. Thanks, Ryan |