|
Greetings to everyone! I have being trying to solve this riddle but I have not succeeded, so I will really appreciate it, if someone give me an answer. ;) I have this dataset which you can find it here https://dl.dropbox.com/u/8546316/Electricity%20Dataset.csv, containing many different characteristics of different houses, including their types of heating, or the number of adults and children living in the house. In total there are 255 records. I want to use an algorithm, that can be trained using the dataset above, in order to be able to predict the electricity consumption of a house that is not in the set. I have tried several machine learning algorithms (using weka and rapidminer) like linear regression, or SVM. However I had about 350 mean absolute error, which is not good. I tried to make my data to take values from 0 to 1, or to delete some characteristics. I did not managed to find some good results. Can you make it to work? What type of preprocessing should I use, and what type of algorithm? If you are able to do that you are unbelievable. Thanks in advance, and I am waiting the one who can solve it, and could help me at that, to learn from him/her :))) George |
|
How much are you paying? @amair Haha. We have economic crisis. That's a time for volunteering. ;) Would you like to give me a hand with that problem?
(Mar 13 '13 at 02:18)
GeorgeLewis
|
|
This dataset has many caveats that prevent you form using it directly in things like Weka or any off the shelf ML algorithm. The features seem highly sparse (you have lots of zeros), so I would first apply some dimmensionality reduction technique to best represent the data. These can be things like PCA, Factor Analysis, etc. What you have is a logistic regression for multiple classes, so you need to train a classifier with multiple classes, where each lass could be a bin that contains ranges of power, that way you could label your data with an integer representing the different ranges you have (that is the reason I asked whether you needed exact values or just estimates) Also, a logistic regression for multiple classes could be seen as an SVM for multiple classes. |
Do you need the exact consumption or just an estimate?
Thank you Mr @Leon for your answer. I need just an estimation...
@Leon Palafox
Hi, I am also interested in that problem. Where does the data come from? Is it publicly available? Would it possible for me to use it as well? If so, I might be able to help you, since I also need that kind of data in a project I am involved in, and thus, I would be able to help you on my work time...