Filtering by condition in Python dataset -

i'm struggling sorting operation of stata file in phyton3: asked keep households without kids out of dataset/table:

i used filtering condition filter these rows out of table:

filtering_condition = df["kids"] > 0  df_nokids = df.loc[filtering_condition,"kids"]

this, however, gives me unknown error:

keyerror                                  traceback (most recent call last) /opt/anaconda/anaconda3/lib/python3.5/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance) 1944             try: -> 1945                 return self._engine.get_loc(key)    1946             except keyerror:  pandas/index.pyx in pandas.index.indexengine.get_loc (pandas/index.c:4154)()  pandas/index.pyx in pandas.index.indexengine.get_loc (pandas/index.c:4018)()  pandas/hashtable.pyx in pandas.hashtable.pyobjecthashtable.get_item (pandas/hashtable.c:12368)()  pandas/hashtable.pyx in pandas.hashtable.pyobjecthashtable.get_item     (pandas/hashtable.c:12322)()  keyerror: 'kids'  during handling of above exception, exception occurred:  keyerror                                  traceback (most recent call last) <ipython-input-321-e72cd8a67065> in <module>()       1 #keep households without kids , use dataset   rest of assignment ----> 2 filtering_condition = df["kids"] > 0       3 df_nokids = df.loc[filtering_condition,"kids"]  /opt/anaconda/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in  __getitem__(self, key)    1995             return self._getitem_multilevel(key)    1996         else: -> 1997             return self._getitem_column(key)    1998     1999     def _getitem_column(self, key):  /opt/anaconda/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in  _getitem_column(self, key)    2002         # column    2003         if self.columns.is_unique: -> 2004             return self._get_item_cache(key)    2005     2006         # duplicate columns & possible reduce dimensionality  /opt/anaconda/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py    in _get_item_cache(self, item)    1348         res = cache.get(item)    1349         if res none: -> 1350             values = self._data.get(item)    1351             res = self._box_item_values(item, values)    1352             cache[item] = res  /opt/anaconda/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py     in get(self, item, fastpath)    3288     3289             if not isnull(item): -> 3290                 loc = self.items.get_loc(item)    3291             else:    3292                 indexer = np.arange(len(self.items))   [isnull(self.items)]   /opt/anaconda/anaconda3/lib/python3.5/site-packages/pandas/indexes/base.py    in get_loc(self, key, method, tolerance)    1945                 return self._engine.get_loc(key)    1946             except keyerror: -> 1947                 return     self._engine.get_loc(self._maybe_cast_indexer(key))    1948     1949         indexer = self.get_indexer([key], method=method,    tolerance=tolerance)  pandas/index.pyx in pandas.index.indexengine.get_loc (pandas/index.c:4154)()  pandas/index.pyx in pandas.index.indexengine.get_loc (pandas/index.c:4018)()  pandas/hashtable.pyx in pandas.hashtable.pyobjecthashtable.get_item (pandas/hashtable.c:12368)()  pandas/hashtable.pyx in pandas.hashtable.pyobjecthashtable.get_item (pandas/hashtable.c:12322)()  keyerror: 'kids'

any explanations of doing wrong?

thanks!

datafile:

do mean this:

df_kids = df[df['kids']>0]

this selects rows 'kids' column not zero.

Search This Blog

WIKI

Filtering by condition in Python dataset -

Comments

Post a Comment

Popular posts from this blog

c++ - CPP, 'X' button listener -

shared memory - gstreamer shmsrc and shmsink with h264 data -

.net - Bulk insert via Dapper is slower than inserting rows one-by-one -