Pandas MultiIndex names not working - python

The axis 0 in the IndexError strikes me as odd. Where is my mistake?
It works if I do not rename the columns before setting the MultiIndex (uncomment line df = df.set_index([0, 1]) and comment the three above). Tested with stable and dev versions.
I am fairly new to python and pandas so any other suggestions for improvement are much appreciated.
import itertools
import datetime as dt
import numpy as np
import pandas as pd
from pandas.io.html import read_html
dfs = read_html('http://www.epexspot.com/en/market-data/auction/auction-table/2006-01-01/DE',
attrs={'class': 'list hours responsive'},
skiprows=1)
df = dfs[0]
hours = list(itertools.chain.from_iterable([[x, x] for x in range(1, 25)]))
df[0] = hours
df = df.rename(columns={0: 'a'})
df = df.rename(columns={1: 'b'})
df = df.set_index(['a', 'b'])
#df = df.set_index([0, 1])
today = dt.datetime(2006, 1, 1)
days = pd.date_range(today, periods=len(df.columns), freq='D')
colnames = [day.strftime(format='%Y-%m-%d') for day in days]
df.columns = colnames
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/frame.py", line 2099, in __setattr__
super(DataFrame, self).__setattr__(name, value)
File "properties.pyx", line 59, in pandas.lib.AxisProperty.__set__ (pandas/lib.c:29330)
File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/generic.py", line 656, in _set_axis
self._data.set_axis(axis, labels)
File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/internals.py", line 1039, in set_axis
block.set_ref_items(self.items, maybe_rename=maybe_rename)
File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/internals.py", line 93, in set_ref_items
self.items = ref_items.take(self.ref_locs)
File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/index.py", line 395, in take
taken = self.view(np.ndarray).take(indexer)
IndexError: index 7 is out of bounds for axis 0 with size 7

This is a very subtle bug. Going to be fixed by: https://github.com/pydata/pandas/pull/5345 in upcoming release 0.13 (very shortly).
As a workaround, you can do this after then set_index but before the column assignment
df = DataFrame(dict([ (c,col) for c, col in df.iteritems() ]))
The internal state of the frame was off; it is the renames followed by the set_index which caused this, so this recreates it so you can work with it.

Related

How to solve this error : TypeError: 'last' only supports a DatetimeIndex index

Getting error :- 'last' only supports a DatetimeIndex index
def create_excel_file():
master_list = []
for name in filelist:
new_path = Path(name).parent
base = os.path.basename(new_path)
final = os.path.splitext(base)[0]
with open(name,"r") as f:
soupObj = bs4.BeautifulSoup(f, "lxml")
df = pd.DataFrame([(x["uri"], *x["t"].split("T"), x["u"], x["desc"])
for x in soupObj.find_all("log")],
columns=["Document", "Date", "Time", "User", "Description"])
df.insert(0, 'Database', f'{final}')
df['Document'] = df['Document'].astype(str)
df['Date'] = pd.to_datetime(df['Date']).dt.date
master_list.append(df)
df = pd.concat(master_list, axis=0, ignore_index=True)
df = df.sort_values(by='Date', ascending=True).set_index('Date').last('3M')
df = df.sort_values(by='Date', ascending=False)
df.to_excel("logfile.xlsx", index=True)
create_excel_file()
suggest me what I am doing wrong
Error message:-
Traceback (most recent call last):
File "C:\Users\Desktop\project\Final test.py", line 40, in <module>
create_excel_file()
File "C:\Users\Desktop\project\Final test.py", line 34, in create_excel_file
df = df.sort_values(by='Date', ascending=True).set_index('Date').last('3M')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\AppData\Roaming\Python\Python311\site-packages\pandas\core\generic.py", line 9001, in last
raise TypeError("'last' only supports a DatetimeIndex index")
TypeError: 'last' only supports a DatetimeIndex index
Process finished with exit code 1
getting Error as shows above
From documentation
For a DataFrame with a sorted DatetimeIndex, this function selects the last few rows based on a date offset.
So, you need to make sure that you did your sort on a column whose values are of type Datetime
From your code below, make sure the data in column 'Date' are actually datetime
df = df.sort_values(by='Date', ascending=True).set_index('Date').last('3M')

How to merge 2+ columns with different length? ValueError: Length of values

I am trying to create a dataframe main_df which have the index date and followed by df['high']-df['low'] from each ticker.
Note:
in the example, the 3 tickers data from 1996/1/1 to 2020/12/31.
The ACN went public on 2001/07/19
so length of df['high']-df['low'] would be different.
The following code is what I used:
import pandas as pd
def test_demo():
tickers = ['ADI', 'ACN', 'ABT']
df2 = pd.DataFrame()
main_df = pd.DataFrame()
pd.set_option('display.max_columns', None)
for count, ticker in enumerate(tickers):
df = pd.read_csv('demo\{}.csv'.format(ticker))
print(df)
df = df.set_index('date')
df2['date'] = df.index
df2 = df2.set_index('date')
df2[ticker] = df['high'] - df['low']
if main_df.empty:
main_df = df2
count = 1
else:
main_df = main_df.join(df2, on='date', how='outer')
# main_df = main_df.merge(df, on='date')
# print(main_df)
if count % 10 == 0:
print(count)
main_df.to_csv('testdemo.csv')
test_demo()
it gives me an error and traceback as following
Traceback (most recent call last):
File "D:\PycharmProjects\backtraderP1\Main.py", line 81, in <module>
from zfunctions.WebDemo import test_demo
File "D:\PycharmProjects\backtraderP1\zfunctions\WebDemo.py", line 33, in <module>
test_demo()
File "D:\PycharmProjects\backtraderP1\zfunctions\WebDemo.py", line 13, in test_demo
df2['date'] = df.index
File "C:\Users\Cornerstone\AppData\Roaming\Python\Python39\site-packages\pandas\core\frame.py", line 3163, in __setitem__
self._set_item(key, value)
File "C:\Users\Cornerstone\AppData\Roaming\Python\Python39\site-packages\pandas\core\frame.py", line 3242, in _set_item
value = self._sanitize_column(key, value)
File "C:\Users\Cornerstone\AppData\Roaming\Python\Python39\site-packages\pandas\core\frame.py", line 3899, in _sanitize_column
value = sanitize_index(value, self.index)
File "C:\Users\Cornerstone\AppData\Roaming\Python\Python39\site-packages\pandas\core\internals\construction.py", line 751, in sanitize_index
raise ValueError(
ValueError: Length of values (4895) does not match length of index (6295)
Process finished with exit code 1
the code passes the first time process ADI, and the error appears when got to ACN data.
the line df2['date'] = df.index and the df2[ticker] = df['high'] - df['low'] shouldn't be the problem. and appears in the answers in other posts. but the combination doesn't work in this case.
if someone can help me understand it and solve this issue, would be great.
Many thanks.

tz_localize: KeyError: ('Asia/Singapore', u'occurred at index 0')

Reference to: Python pandas convert unix timestamp with timezone into datetime
Did a search on this topic but still can't find the answer.
I have a dataframe whichh is the following format:
df timestamp
1 1549914000
2 1549913400
3 1549935000
3 1549936800
5 1549936200
I use the following to convert epoch to date:
df['date'] = pd.to_datetime(df['timestamp'], unit='s')
This line will produce a date that is always 8 hours behind my local time.
So I followed the example in the link to use apply + tz.localize to Asia/Singapore, I tried the following code on the next line after the above code.
df['date'] = df.apply(lambda x: x['date'].tz_localize(x['Asia/Singapore']), axis=1)
but python return an error as below:
Traceback (most recent call last):
File "/home/test/script.py", line 479, in <module>
schedule.every(10).minutes.do(main).run()
File "/opt/cloudera/parcels/Anaconda-4.0.0/lib/python2.7/site-packages/schedule/__init__.py", line 411, in run
ret = self.job_func()
File "/home/test/script.py", line 361, in main
df['date'] = df.apply(localize_ts, axis = 1)
File "/opt/cloudera/parcels/Anaconda-4.0.0/lib/python2.7/site-packages/pandas/core/frame.py", line 4877, in apply
ignore_failures=ignore_failures)
File "/opt/cloudera/parcels/Anaconda-4.0.0/lib/python2.7/site-packages/pandas/core/frame.py", line 4973, in _apply_standard
results[i] = func(v)
File "/home/test/script.py", line 359, in localize_ts
return pd.to_datetime(row['date']).tz_localize(row['Asia/Singapore'])
File "/opt/cloudera/parcels/Anaconda-4.0.0/lib/python2.7/site-packages/pandas/core/series.py", line 623, in __getitem__
result = self.index.get_value(self, key)
File "/opt/cloudera/parcels/Anaconda-4.0.0/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 2574, in get_value
raise e1
KeyError: ('Asia/Singapore', u'occurred at index 0')
Did I replace .tz_localize(x['tz']) in correctly?
As written, your code is looking for a column named Asia/Singapore. Try this instead:
df['date'] = df['date'].dt.tz_localize('Asia/Singapore')
you can try
import numpy as np
import pandas as pd
df = pd.DataFrame({'timestamp': [1549952400, 1549953600]},index=['1', '2'])
df['timestamp2'] = df['timestamp'] + 28800
df['date'] = pd.to_datetime(df['timestamp2'], unit='s')
df = df.drop('timestamp2', 1)

Need to assign dic to Pandas Dataframe

I have problems when I try to assign a dict to the df DataFrame,
df.loc[index,'count'] = dict()
as I get this error message:
Incompatible indexer with Series
To work around this problem, I can do this,
df.loc[index,'count'] = [dict()]
, but I don't like this solution since I have to resolve the list before getting the dictionary i.e.
a = (df.loc[index,'count'])[0]
How can I solve this situation in a more elegant way?
EDIT1
One way to replicate the whole code is as follow
Code:
import pandas as pd
df = pd.DataFrame(columns= ['count', 'aaa'])
d = dict()
df.loc[0, 'count'] = [d]; print('OK!');
df.loc[0, 'count'] = d
Output:
OK!
Traceback (most recent call last):
File "<ipython-input-193-67bbd89f2c69>", line 4, in <module>
df.loc[0, 'count'] = d
File "/usr/lib64/python3.6/site-packages/pandas/core/indexing.py", line 194, in __setitem__
self._setitem_with_indexer(indexer, value)
File "/usr/lib64/python3.6/site-packages/pandas/core/indexing.py", line 625, in _setitem_with_indexer
value = self._align_series(indexer, Series(value))
File "/usr/lib64/python3.6/site-packages/pandas/core/indexing.py", line 765, in _align_series
raise ValueError('Incompatible indexer with Series')
ValueError: Incompatible indexer with Series

How can I add rows in pandas by using "loc" and "for"?

I want to add a data of dataframe to new dataframe by "loc". I used "loc" but an error was occurred. Can I add a data?
>>> import pandas as pd
>>> df = pd.DataFrame({'A': [1.0, 1.2, 3.4, 4.1, 8.2]})
>>> import pandas as pd
>>> df_new = pd.DataFrame(columns=['A'])
>>> for i in df:
... df_new.loc[i] = df.loc[i]
...
Traceback (most recent call last):
File "/Users/Hajime/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1434, in _has_valid_type
error()
File "/Users/Hajime/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1429, in error
(key, self.obj._get_axis_name(axis)))
KeyError: 'the label [A] is not in the [index]'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/Users/Hajime/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1328, in __getitem__
return self._getitem_axis(key, axis=0)
File "/Users/Hajime/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1551, in _getitem_axis
self._has_valid_type(key, axis)
File "/Users/Hajime/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1442, in _has_valid_type
error()
File "/Users/Hajime/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1429, in error
(key, self.obj._get_axis_name(axis)))
KeyError: 'the label [A] is not in the [index]'
But an following code is succeed.
>>> df_new.loc[1] = df.loc[1]
>>> df_new
A
1 1.2
Why don't you take a look at what for is iterating over here?
In [353]: for i in df:
...: print(i)
...:
A
Conclusion - Iteration over df results in iteration over the column names. What you're looking for is something along the lines of df.iterrows, or iterating over df.index.
For example,
for i, r in df.iterrows():
df_new.loc[i, :] = r
df_new
A
0 1.0
1 1.2
2 3.4
3 4.1
4 8.2
The error is in this part:
for i in df:
df_new.loc[i] = df.loc[i]
for loc, the first argument is for index. but i is a column name
if you just want add df to df_new. use concat.
df_new = pd.concat([df_new, df])
import pandas as pd
df = pd.DataFrame({'A': [1.0, 1.2, 3.4, 4.1, 8.2]})
import pandas as pd
df_new = pd.DataFrame(columns=['A'])
for i in df:
Just adding :, before i will do what you want in the first place
df.loc[index of row, column name]
Now what are you doing wrong ? You are passing column name as row index which does not exist
df_new.loc[:,i] = df.loc[:,i]
Anyhow you can pass all the columns in 1 go :
df_new[col_names]=df[col_names]
col_names is a list

Categories