I'm trying to execute the below code and I don't understand what I'm doing wrong. The purpose of the code is to use Python's & sklearn's train_test_split function to partition the data into training and testing chunks.
The data (downloadable here) is cost of rent data for various houses/condos, along with each house/condo's properties. Ultimately I'm trying to use predictive modeling to predict rent prices (so rent prices are the target).
I'm using Pandas to import the data because genfromtxt was causing errors, presumably because of all the null values.
Here's the code:
import pandas as pd
rentdata = pd.read_csv('6000_clean.csv')
import sklearn as sk
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cross_validation import train_test_split
#trying to make "a" all rows of the first column and "b" all rows of columns
2-46, i.e., "a" will be only target data (rent prices) and "b" will be the data.
a, b = rentdata[ : ,0], rentdata[ : ,1:46]
What results is the following error:
TypeError Traceback (most recent call last)
<ipython-input-24-789fb8e8c2f6> in <module>()
8 from sklearn.cross_validation import train_test_split
9
---> 10 a, b = rentdata[ : ,0], rentdata[ : ,1:46]
11
C:\Users\Nick\Anaconda\lib\site-packages\pandas\core\frame.pyc in __getitem__(self, key)
2001 # get column
2002 if self.columns.is_unique:
-> 2003 return self._get_item_cache(key)
2004
2005 # duplicate columns
C:\Users\Nick\Anaconda\lib\site-packages\pandas\core\generic.pyc in _get_item_cache(self, item)
665 return cache[item]
666 except Exception:
--> 667 values = self._data.get(item)
668 res = self._box_item_values(item, values)
669 cache[item] = res
C:\Users\Nick\Anaconda\lib\site-packages\pandas\core\internals.pyc in get(self, item)
1653 def get(self, item):
1654 if self.items.is_unique:
-> 1655 _, block = self._find_block(item)
1656 return block.get(item)
1657 else:
C:\Users\Nick\Anaconda\lib\site-packages\pandas\core\internals.pyc in _find_block(self, item)
1933
1934 def _find_block(self, item):
-> 1935 self._check_have(item)
1936 for i, block in enumerate(self.blocks):
1937 if item in block:
C:\Users\Nick\Anaconda\lib\site-packages\pandas\core\internals.pyc in _check_have(self, item)
1939
1940 def _check_have(self, item):
-> 1941 if item not in self.items:
1942 raise KeyError('no item named %s' % com.pprint_thing(item))
1943
C:\Users\Nick\Anaconda\lib\site-packages\pandas\core\index.pyc in __contains__(self, key)
317
318 def __contains__(self, key):
--> 319 hash(key)
320 # work around some kind of odd cython bug
321 try:
TypeError: unhashable type
You can download the CSV to get a look at the data here: http://wikisend.com/download/776790/6000_clean.csv
What am I doing wrong here? Am I approaching this problem from the wrong direction, code issues aside?
asked
Feb 02 '14 at 00:23
Nick Jones
1●2●2●2
Your problem is in Pandas, not scikit-learn.