Need to assign dic to Pandas Dataframe - python

I have problems when I try to assign a dict to the df DataFrame,
df.loc[index,'count'] = dict()
as I get this error message:
Incompatible indexer with Series
To work around this problem, I can do this,
df.loc[index,'count'] = [dict()]
, but I don't like this solution since I have to resolve the list before getting the dictionary i.e.
a = (df.loc[index,'count'])[0]
How can I solve this situation in a more elegant way?
EDIT1
One way to replicate the whole code is as follow
Code:
import pandas as pd
df = pd.DataFrame(columns= ['count', 'aaa'])
d = dict()
df.loc[0, 'count'] = [d]; print('OK!');
df.loc[0, 'count'] = d
Output:
OK!
Traceback (most recent call last):
File "<ipython-input-193-67bbd89f2c69>", line 4, in <module>
df.loc[0, 'count'] = d
File "/usr/lib64/python3.6/site-packages/pandas/core/indexing.py", line 194, in __setitem__
self._setitem_with_indexer(indexer, value)
File "/usr/lib64/python3.6/site-packages/pandas/core/indexing.py", line 625, in _setitem_with_indexer
value = self._align_series(indexer, Series(value))
File "/usr/lib64/python3.6/site-packages/pandas/core/indexing.py", line 765, in _align_series
raise ValueError('Incompatible indexer with Series')
ValueError: Incompatible indexer with Series

Related

Why does pandas generate a KeyError when looking up date in date-indexed table?

Consider the following code:
date_index = np.array(['2019-01-01', '2019-01-02'], dtype=np.datetime64)
df = pd.DataFrame({'a': np.array([1, 2])}, index=date_index)
date_to_lookup = date_index[0]
print(df.at[date_to_lookup, 'a'])
One might expect it to work and print 1. Yet (at least in Anaconda python 3.7.3 with Pandas 0.24.2) it fails with the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../site-packages/pandas/core/indexing.py", line 2270, in __getitem__
return self.obj._get_value(*key, takeable=self._takeable)
File ".../site-packages/pandas/core/frame.py", line 2771, in _get_value
return engine.get_value(series._values, index)
File "pandas/_libs/index.pyx", line 81, in pandas._libs.index.IndexEngine.get_value
File "pandas/_libs/index.pyx", line 89, in pandas._libs.index.IndexEngine.get_value
File "pandas/_libs/index.pyx", line 447, in pandas._libs.index.DatetimeEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 987, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 993, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 17897
It appears that Pandas DataFrame and Series objects always store dates as dtype 'datetime64[ns]' or 'datetime64[ns, tz]', and the issue arises because Pandas automatically converts 'datetime64[D]' dtype to 'datetime64[ns]' when creating the index, but does not do that when looking up an element in that index. I could avoid the error above by converting the key to 'datetime64[ns]'. E.g. both of the following lines successfully print 1:
print(df.at[pd.to_datetime(date_to_lookup), 'a'])
print(df.at[date_to_lookup.astype('datetime64[ns]'), 'a'])
This behavior (automatic dtype conversion when creating an index, but not when looking up an element) seems counterintuitive to me. What is the reason it was implemented this way? Is there some coding style one is expected to follow to avoid errors like this? Or is it a bug I should file?
You can avoid this by select by positions with DataFrame.iat and Index.get_loc for position of column a:
print(df.iat[0, df.columns.get_loc('a')])
#alternative
#print(df.iloc[0, df.columns.get_loc('a')])
1
Another idea is use df.index for selecting instead date_index[0]:
print(df.at[df.index[0], 'a'])
I think this is a bug you found in 0.24.2, it works on my system python 3.7.2 and pandas 0.25.3:
date_index = np.array(['2019-01-01', '2019-01-02'], dtype=np.datetime64)
df = pd.DataFrame({'a': np.array([1, 2])}, index=date_index)
date_to_lookup = date_index[0]
print(df.at[date_to_lookup, 'a'])
1

tz_localize: KeyError: ('Asia/Singapore', u'occurred at index 0')

Reference to: Python pandas convert unix timestamp with timezone into datetime
Did a search on this topic but still can't find the answer.
I have a dataframe whichh is the following format:
df timestamp
1 1549914000
2 1549913400
3 1549935000
3 1549936800
5 1549936200
I use the following to convert epoch to date:
df['date'] = pd.to_datetime(df['timestamp'], unit='s')
This line will produce a date that is always 8 hours behind my local time.
So I followed the example in the link to use apply + tz.localize to Asia/Singapore, I tried the following code on the next line after the above code.
df['date'] = df.apply(lambda x: x['date'].tz_localize(x['Asia/Singapore']), axis=1)
but python return an error as below:
Traceback (most recent call last):
File "/home/test/script.py", line 479, in <module>
schedule.every(10).minutes.do(main).run()
File "/opt/cloudera/parcels/Anaconda-4.0.0/lib/python2.7/site-packages/schedule/__init__.py", line 411, in run
ret = self.job_func()
File "/home/test/script.py", line 361, in main
df['date'] = df.apply(localize_ts, axis = 1)
File "/opt/cloudera/parcels/Anaconda-4.0.0/lib/python2.7/site-packages/pandas/core/frame.py", line 4877, in apply
ignore_failures=ignore_failures)
File "/opt/cloudera/parcels/Anaconda-4.0.0/lib/python2.7/site-packages/pandas/core/frame.py", line 4973, in _apply_standard
results[i] = func(v)
File "/home/test/script.py", line 359, in localize_ts
return pd.to_datetime(row['date']).tz_localize(row['Asia/Singapore'])
File "/opt/cloudera/parcels/Anaconda-4.0.0/lib/python2.7/site-packages/pandas/core/series.py", line 623, in __getitem__
result = self.index.get_value(self, key)
File "/opt/cloudera/parcels/Anaconda-4.0.0/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 2574, in get_value
raise e1
KeyError: ('Asia/Singapore', u'occurred at index 0')
Did I replace .tz_localize(x['tz']) in correctly?
As written, your code is looking for a column named Asia/Singapore. Try this instead:
df['date'] = df['date'].dt.tz_localize('Asia/Singapore')
you can try
import numpy as np
import pandas as pd
df = pd.DataFrame({'timestamp': [1549952400, 1549953600]},index=['1', '2'])
df['timestamp2'] = df['timestamp'] + 28800
df['date'] = pd.to_datetime(df['timestamp2'], unit='s')
df = df.drop('timestamp2', 1)

How can I add rows in pandas by using "loc" and "for"?

I want to add a data of dataframe to new dataframe by "loc". I used "loc" but an error was occurred. Can I add a data?
>>> import pandas as pd
>>> df = pd.DataFrame({'A': [1.0, 1.2, 3.4, 4.1, 8.2]})
>>> import pandas as pd
>>> df_new = pd.DataFrame(columns=['A'])
>>> for i in df:
... df_new.loc[i] = df.loc[i]
...
Traceback (most recent call last):
File "/Users/Hajime/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1434, in _has_valid_type
error()
File "/Users/Hajime/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1429, in error
(key, self.obj._get_axis_name(axis)))
KeyError: 'the label [A] is not in the [index]'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/Users/Hajime/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1328, in __getitem__
return self._getitem_axis(key, axis=0)
File "/Users/Hajime/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1551, in _getitem_axis
self._has_valid_type(key, axis)
File "/Users/Hajime/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1442, in _has_valid_type
error()
File "/Users/Hajime/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1429, in error
(key, self.obj._get_axis_name(axis)))
KeyError: 'the label [A] is not in the [index]'
But an following code is succeed.
>>> df_new.loc[1] = df.loc[1]
>>> df_new
A
1 1.2
Why don't you take a look at what for is iterating over here?
In [353]: for i in df:
...: print(i)
...:
A
Conclusion - Iteration over df results in iteration over the column names. What you're looking for is something along the lines of df.iterrows, or iterating over df.index.
For example,
for i, r in df.iterrows():
df_new.loc[i, :] = r
df_new
A
0 1.0
1 1.2
2 3.4
3 4.1
4 8.2
The error is in this part:
for i in df:
df_new.loc[i] = df.loc[i]
for loc, the first argument is for index. but i is a column name
if you just want add df to df_new. use concat.
df_new = pd.concat([df_new, df])
import pandas as pd
df = pd.DataFrame({'A': [1.0, 1.2, 3.4, 4.1, 8.2]})
import pandas as pd
df_new = pd.DataFrame(columns=['A'])
for i in df:
Just adding :, before i will do what you want in the first place
df.loc[index of row, column name]
Now what are you doing wrong ? You are passing column name as row index which does not exist
df_new.loc[:,i] = df.loc[:,i]
Anyhow you can pass all the columns in 1 go :
df_new[col_names]=df[col_names]
col_names is a list

ValueError in DataFrame Pandas

My objective is to..
if the dataframe is empty, i need to insert a row with index->value of the variable URL and columns-> value of URL along with the sorted_list
if non-empty, i need to insert a row with index->value of the variable URL and columns->sorted_list
What I did was... I initialized a DataFrame self.pd and then for each row with values as above said I created a local DataFrame variable df1 and append it to self.df.
My code:
import pandas as pd
class Reward_Matrix:
def __init__(self):
self.df = pd.DataFrame()
def add(self, URL, webpage_list):
sorted_list = []
check_list = list(self.df.columns.values)
print('check_list: ',check_list)
for i in webpage_list: #to ensure no duplication columns
if i not in check_list:
sorted_list.append(i)
if self.df.empty:
sorted_list.insert(0, URL)
df1 = pd.DataFrame(0,index=[URL], columns=[sorted_list])
else:
df1 = pd.DataFrame(0,index=[URL], columns=[sorted_list])
print(df1)
print('sorted_list: ',sorted_list)
print("length: ",len(df1.columns))
self.df.append(df1)
But I get the following error:
Traceback (most recent call last):
File "...Continuum\anaconda3\lib\site-packages\pandas\core\internals.py", line 4294, in create_block_manager_from_blocks
placement=slice(0, len(axes[0])))]
File "...Continuum\anaconda3\lib\site-packages\pandas\core\internals.py", line 2719, in make_block
return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
File "...Continuum\anaconda3\lib\site-packages\pandas\core\internals.py", line 115, in __init__
len(self.mgr_locs)))
ValueError: Wrong number of items passed 1, placement implies 450
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "...eclipse-workspace\Crawler\crawl_core\src_main\run.py", line 23, in test_start
test.crawl_run(self.URL)
File "...eclipse-workspace\Crawler\crawl_core\src_main\test_crawl.py", line 42, in crawl_run
self.reward.add(URL, webpage_list)
File "...eclipse-workspace\Crawler\crawl_core\src_main\dynamic_matrix.py", line 21, in add
df1 = pd.DataFrame(0,index=[URL], columns=[sorted_list])
File "...Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 352, in __init__
copy=False)
File "...Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 483, in _init_ndarray
return create_block_manager_from_blocks([values], [columns, index])
File "...Continuum\anaconda3\lib\site-packages\pandas\core\internals.py", line 4303, in create_block_manager_from_blocks
construction_error(tot_items, blocks[0].shape[1:], axes, e)
File "...Continuum\anaconda3\lib\site-packages\pandas\core\internals.py", line 4280, in construction_error
passed, implied))
ValueError: Shape of passed values is (1, 1), indices imply (450, 1)
I am not well-versed with DataFrame and Pandas. I had been getting this error for quite some time and I am getting confused when I go through similar questions asked in StackOverflow as I can't understand where I went wrong!
Can someone help me out?
I think you need remove [], because else get nested list:
df1 = pd.DataFrame(0,index=[URL], columns=sorted_list)
Sample:
sorted_list = ['a','b','c']
URL = 'url1'
df1 = pd.DataFrame(0,index=[URL], columns=sorted_list)
print (df1)
a b c
url1 0 0 0
df1 = pd.DataFrame(0,index=[URL], columns=[sorted_list])
print (df1)
>ValueError: Shape of passed values is (1, 1), indices imply (3, 1)

Pandas MultiIndex names not working

The axis 0 in the IndexError strikes me as odd. Where is my mistake?
It works if I do not rename the columns before setting the MultiIndex (uncomment line df = df.set_index([0, 1]) and comment the three above). Tested with stable and dev versions.
I am fairly new to python and pandas so any other suggestions for improvement are much appreciated.
import itertools
import datetime as dt
import numpy as np
import pandas as pd
from pandas.io.html import read_html
dfs = read_html('http://www.epexspot.com/en/market-data/auction/auction-table/2006-01-01/DE',
attrs={'class': 'list hours responsive'},
skiprows=1)
df = dfs[0]
hours = list(itertools.chain.from_iterable([[x, x] for x in range(1, 25)]))
df[0] = hours
df = df.rename(columns={0: 'a'})
df = df.rename(columns={1: 'b'})
df = df.set_index(['a', 'b'])
#df = df.set_index([0, 1])
today = dt.datetime(2006, 1, 1)
days = pd.date_range(today, periods=len(df.columns), freq='D')
colnames = [day.strftime(format='%Y-%m-%d') for day in days]
df.columns = colnames
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/frame.py", line 2099, in __setattr__
super(DataFrame, self).__setattr__(name, value)
File "properties.pyx", line 59, in pandas.lib.AxisProperty.__set__ (pandas/lib.c:29330)
File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/generic.py", line 656, in _set_axis
self._data.set_axis(axis, labels)
File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/internals.py", line 1039, in set_axis
block.set_ref_items(self.items, maybe_rename=maybe_rename)
File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/internals.py", line 93, in set_ref_items
self.items = ref_items.take(self.ref_locs)
File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/index.py", line 395, in take
taken = self.view(np.ndarray).take(indexer)
IndexError: index 7 is out of bounds for axis 0 with size 7
This is a very subtle bug. Going to be fixed by: https://github.com/pydata/pandas/pull/5345 in upcoming release 0.13 (very shortly).
As a workaround, you can do this after then set_index but before the column assignment
df = DataFrame(dict([ (c,col) for c, col in df.iteritems() ]))
The internal state of the frame was off; it is the renames followed by the set_index which caused this, so this recreates it so you can work with it.

Categories