I'm trying to execute the below code and I don't understand what I'm doing wrong. The purpose of the code is to use Python's & sklearn's train_test_split function to partition the data into training and testing chunks.

The data (downloadable here) is cost of rent data for various houses/condos, along with each house/condo's properties. Ultimately I'm trying to use predictive modeling to predict rent prices (so rent prices are the target).

I'm using Pandas to import the data because genfromtxt was causing errors, presumably because of all the null values.

Here's the code:

    import pandas as pd
    rentdata = pd.read_csv('6000_clean.csv')

    import sklearn as sk
    import numpy as np
    import matplotlib.pyplot as plt

    from sklearn.cross_validation import train_test_split

    #trying to make "a" all rows of the first column and "b" all rows of columns
    2-46, i.e., "a" will be only target data (rent prices) and "b" will be the data.

    a, b = rentdata[ : ,0], rentdata[ : ,1:46]

What results is the following error:

TypeError                                 Traceback (most recent call last)
<ipython-input-24-789fb8e8c2f6> in <module>()
      8 from sklearn.cross_validation import train_test_split
      9 
---> 10 a, b = rentdata[ : ,0], rentdata[ : ,1:46]
     11

C:\Users\Nick\Anaconda\lib\site-packages\pandas\core\frame.pyc in __getitem__(self, key)
   2001             # get column
   2002             if self.columns.is_unique:
-> 2003                 return self._get_item_cache(key)
   2004 
   2005             # duplicate columns

C:\Users\Nick\Anaconda\lib\site-packages\pandas\core\generic.pyc in _get_item_cache(self, item)
    665             return cache[item]
    666         except Exception:
--> 667             values = self._data.get(item)
    668             res = self._box_item_values(item, values)
    669             cache[item] = res

C:\Users\Nick\Anaconda\lib\site-packages\pandas\core\internals.pyc in get(self, item)
   1653     def get(self, item):
   1654         if self.items.is_unique:
-> 1655             _, block = self._find_block(item)
   1656             return block.get(item)
   1657         else:

C:\Users\Nick\Anaconda\lib\site-packages\pandas\core\internals.pyc in _find_block(self, item)
   1933 
   1934     def _find_block(self, item):
-> 1935         self._check_have(item)
   1936         for i, block in enumerate(self.blocks):
   1937             if item in block:

C:\Users\Nick\Anaconda\lib\site-packages\pandas\core\internals.pyc in _check_have(self, item)
   1939 
   1940     def _check_have(self, item):
-> 1941         if item not in self.items:
   1942             raise KeyError('no item named %s' % com.pprint_thing(item))
   1943

C:\Users\Nick\Anaconda\lib\site-packages\pandas\core\index.pyc in __contains__(self, key)
    317 
    318     def __contains__(self, key):
--> 319         hash(key)
    320         # work around some kind of odd cython bug
    321         try:

TypeError: unhashable type

You can download the CSV to get a look at the data here: http://wikisend.com/download/776790/6000_clean.csv

What am I doing wrong here? Am I approaching this problem from the wrong direction, code issues aside?

asked Feb 02 '14 at 00:23

Nick%20Jones's gravatar image

Nick Jones
1222

Your problem is in Pandas, not scikit-learn.

(Feb 02 '14 at 05:31) Gael Varoquaux

One Answer:

Pandas indexing is obtuse. You want the df.ix[...] form in order to use slices.

answered Feb 02 '14 at 18:39

cityhall's gravatar image

cityhall
76378

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.