tz_localize: KeyError: ('Asia/Singapore', u'occurred at index 0') - python

Reference to: Python pandas convert unix timestamp with timezone into datetime
Did a search on this topic but still can't find the answer.
I have a dataframe whichh is the following format:
df timestamp
1 1549914000
2 1549913400
3 1549935000
3 1549936800
5 1549936200
I use the following to convert epoch to date:
df['date'] = pd.to_datetime(df['timestamp'], unit='s')
This line will produce a date that is always 8 hours behind my local time.
So I followed the example in the link to use apply + tz.localize to Asia/Singapore, I tried the following code on the next line after the above code.
df['date'] = df.apply(lambda x: x['date'].tz_localize(x['Asia/Singapore']), axis=1)
but python return an error as below:
Traceback (most recent call last):
File "/home/test/script.py", line 479, in <module>
schedule.every(10).minutes.do(main).run()
File "/opt/cloudera/parcels/Anaconda-4.0.0/lib/python2.7/site-packages/schedule/__init__.py", line 411, in run
ret = self.job_func()
File "/home/test/script.py", line 361, in main
df['date'] = df.apply(localize_ts, axis = 1)
File "/opt/cloudera/parcels/Anaconda-4.0.0/lib/python2.7/site-packages/pandas/core/frame.py", line 4877, in apply
ignore_failures=ignore_failures)
File "/opt/cloudera/parcels/Anaconda-4.0.0/lib/python2.7/site-packages/pandas/core/frame.py", line 4973, in _apply_standard
results[i] = func(v)
File "/home/test/script.py", line 359, in localize_ts
return pd.to_datetime(row['date']).tz_localize(row['Asia/Singapore'])
File "/opt/cloudera/parcels/Anaconda-4.0.0/lib/python2.7/site-packages/pandas/core/series.py", line 623, in __getitem__
result = self.index.get_value(self, key)
File "/opt/cloudera/parcels/Anaconda-4.0.0/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 2574, in get_value
raise e1
KeyError: ('Asia/Singapore', u'occurred at index 0')
Did I replace .tz_localize(x['tz']) in correctly?

As written, your code is looking for a column named Asia/Singapore. Try this instead:
df['date'] = df['date'].dt.tz_localize('Asia/Singapore')

you can try
import numpy as np
import pandas as pd
df = pd.DataFrame({'timestamp': [1549952400, 1549953600]},index=['1', '2'])
df['timestamp2'] = df['timestamp'] + 28800
df['date'] = pd.to_datetime(df['timestamp2'], unit='s')
df = df.drop('timestamp2', 1)

Related

How to merge 2+ columns with different length? ValueError: Length of values

I am trying to create a dataframe main_df which have the index date and followed by df['high']-df['low'] from each ticker.
Note:
in the example, the 3 tickers data from 1996/1/1 to 2020/12/31.
The ACN went public on 2001/07/19
so length of df['high']-df['low'] would be different.
The following code is what I used:
import pandas as pd
def test_demo():
tickers = ['ADI', 'ACN', 'ABT']
df2 = pd.DataFrame()
main_df = pd.DataFrame()
pd.set_option('display.max_columns', None)
for count, ticker in enumerate(tickers):
df = pd.read_csv('demo\{}.csv'.format(ticker))
print(df)
df = df.set_index('date')
df2['date'] = df.index
df2 = df2.set_index('date')
df2[ticker] = df['high'] - df['low']
if main_df.empty:
main_df = df2
count = 1
else:
main_df = main_df.join(df2, on='date', how='outer')
# main_df = main_df.merge(df, on='date')
# print(main_df)
if count % 10 == 0:
print(count)
main_df.to_csv('testdemo.csv')
test_demo()
it gives me an error and traceback as following
Traceback (most recent call last):
File "D:\PycharmProjects\backtraderP1\Main.py", line 81, in <module>
from zfunctions.WebDemo import test_demo
File "D:\PycharmProjects\backtraderP1\zfunctions\WebDemo.py", line 33, in <module>
test_demo()
File "D:\PycharmProjects\backtraderP1\zfunctions\WebDemo.py", line 13, in test_demo
df2['date'] = df.index
File "C:\Users\Cornerstone\AppData\Roaming\Python\Python39\site-packages\pandas\core\frame.py", line 3163, in __setitem__
self._set_item(key, value)
File "C:\Users\Cornerstone\AppData\Roaming\Python\Python39\site-packages\pandas\core\frame.py", line 3242, in _set_item
value = self._sanitize_column(key, value)
File "C:\Users\Cornerstone\AppData\Roaming\Python\Python39\site-packages\pandas\core\frame.py", line 3899, in _sanitize_column
value = sanitize_index(value, self.index)
File "C:\Users\Cornerstone\AppData\Roaming\Python\Python39\site-packages\pandas\core\internals\construction.py", line 751, in sanitize_index
raise ValueError(
ValueError: Length of values (4895) does not match length of index (6295)
Process finished with exit code 1
the code passes the first time process ADI, and the error appears when got to ACN data.
the line df2['date'] = df.index and the df2[ticker] = df['high'] - df['low'] shouldn't be the problem. and appears in the answers in other posts. but the combination doesn't work in this case.
if someone can help me understand it and solve this issue, would be great.
Many thanks.

Need to assign dic to Pandas Dataframe

I have problems when I try to assign a dict to the df DataFrame,
df.loc[index,'count'] = dict()
as I get this error message:
Incompatible indexer with Series
To work around this problem, I can do this,
df.loc[index,'count'] = [dict()]
, but I don't like this solution since I have to resolve the list before getting the dictionary i.e.
a = (df.loc[index,'count'])[0]
How can I solve this situation in a more elegant way?
EDIT1
One way to replicate the whole code is as follow
Code:
import pandas as pd
df = pd.DataFrame(columns= ['count', 'aaa'])
d = dict()
df.loc[0, 'count'] = [d]; print('OK!');
df.loc[0, 'count'] = d
Output:
OK!
Traceback (most recent call last):
File "<ipython-input-193-67bbd89f2c69>", line 4, in <module>
df.loc[0, 'count'] = d
File "/usr/lib64/python3.6/site-packages/pandas/core/indexing.py", line 194, in __setitem__
self._setitem_with_indexer(indexer, value)
File "/usr/lib64/python3.6/site-packages/pandas/core/indexing.py", line 625, in _setitem_with_indexer
value = self._align_series(indexer, Series(value))
File "/usr/lib64/python3.6/site-packages/pandas/core/indexing.py", line 765, in _align_series
raise ValueError('Incompatible indexer with Series')
ValueError: Incompatible indexer with Series

Getting error slicing time series with pandas

I'm trying to slice a time series, I can do it perfectly this way :
subseries = series['2015-07-07 01:00:00':'2015-07-07 03:30:00'] .
But the following code won't work
def GetDatetime():
Y = int(raw_input("Year "))
M = int(raw_input("Month "))
D = int(raw_input("Day "))
d = datetime.datetime(Y, M, D) #creates a datetime object
return d
filePath = "pathtofile.csv"
series = pd.read_csv(str(filePath), index_col='date')
series.index = pd.to_datetime(series.index, unit='s')
d = GetDatetime()
f = GetDatetime()
subseries = series[d:f]
The last line generates this error:
Traceback (most recent call last):
File "dontgivemeerrorsbrasommek.py", line 37, in <module>
brasla7nina= df[d:f]
File "/usr/local/lib/python2.7/dist-packages/pandas-0.20.2-py2.7-linux-x86_64.egg/pandas/core/frame.py", line 1952, in __getitem__
indexer = convert_to_index_sliceable(self, key)
File "/usr/local/lib/python2.7/dist-packages/pandas-0.20.2-py2.7-linux-x86_64.egg/pandas/core/indexing.py", line 1896, in convert_to_index_sliceable
return idx._convert_slice_indexer(key, kind='getitem')
File "/usr/local/lib/python2.7/dist-packages/pandas-0.20.2-py2.7-linux-x86_64.egg/pandas/core/indexes/base.py", line 1407, in _convert_slice_indexer
indexer = self.slice_indexer(start, stop, step, kind=kind)
File "/usr/local/lib/python2.7/dist-packages/pandas-0.20.2-py2.7-linux-x86_64.egg/pandas/core/indexes/datetimes.py", line 1515, in slice_indexer
return Index.slice_indexer(self, start, end, step, kind=kind)
File "/usr/local/lib/python2.7/dist-packages/pandas-0.20.2-py2.7-linux-x86_64.egg/pandas/core/indexes/base.py", line 3350, in slice_indexer
kind=kind)
File "/usr/local/lib/python2.7/dist-packages/pandas-0.20.2-py2.7-linux-x86_64.egg/pandas/core/indexes/base.py", line 3538, in slice_locs
start_slice = self.get_slice_bound(start, 'left', kind)
File "/usr/local/lib/python2.7/dist-packages/pandas-0.20.2-py2.7-linux-x86_64.egg/pandas/core/indexes/base.py", line 3487, in get_slice_bound
raise err
KeyError: 1435802520000000000
I think it's a time-stamp conversion problem so I tried the following but still it wouldn't work :
d3 = pandas.Timestamp(datetime(Y, M, D, H, m))
d2 = pandas.to_datetime(d)
Your help would be appreciated, thank you. :)
change def GetDatetime() function return value to:
return str(d)
This will return datetime string which times series will be able to deal with.
if I understand your code correctly, when you do this:
subseries = series['2015-07-07 01:00:00':'2015-07-07 03:30:00']
you're slicing series (btw, that's confusing seeing as there is a pandas datatype Series) from two strings.
if that works, then what you need from subseries= df[d:f] would be that d and f be strings.
you can do that by calling the datetime method .strftime() eg:
d= GetDatetime().strftime('%Y-%m-%d 00:00:00')
f= GetDatetime().strftime('%Y-%m-%d 00:00:00')

Combining columns using pandas

I am trying to combine date and time columns of a csv file and convert them to timestamp using pandas.
Here is a sample of my csv file when read into a dataframe
Dataframe after reading
Id Station Month Parameter Date From To
1.0 ANANDVIHAR Dec ?PM2.5 2015-12-01 ?00:00:00 ?00:59:00
The Following Code:-
df['DateTime'] = df.apply(lambda row: datetime.datetime.strptime(row['Date']+ ':' + row['From'], '%Y.%m.%d:%H:%M:%S'), axis=1)
Is giving the following error:-
Traceback (most recent call last):
File "project101.py", line 36, in <module>
df['DateTime'] = df.apply(lambda row: datetime.datetime.strptime(row['Date']+ ':' + row['From'], '%Y.%m.%d:%H:%M:%S'), axis=1)
File "c:\Python27\lib\site-packages\pandas\core\frame.py", line 4133, in apply
return self._apply_standard(f, axis, reduce=reduce)
File "c:\Python27\lib\site-packages\pandas\core\frame.py", line 4229, in _apply_standard
results[i] = func(v)
File "project101.py", line 36, in <lambda>
df['DateTime'] = df.apply(lambda row: datetime.datetime.strptime(row['Date']+ ':' + row['From'], '%Y.%m.%d:%H:%M:%S'), axis=1)
File "c:\Python27\lib\_strptime.py", line 332, in _strptime
(data_string, format))
ValueError: ("time data '2015-12-01:\\xa000:00:00' does not match format '%Y.%m.%d:%H:%M:%S'", u'occurred at index 0')
You can simply do:
df['DateTime'] = pd.to_datetime(df['Date'].str.cat(df['From'], sep=" "),
format='%Y-%m-%d \\xa%H:%M:%S', errors='coerce')
The '\\xa' in the format specifier will take care of the question marks. Those marks are for misinterpreted literal, which probably looks like '\\xa'
You can use pandas.Series.str.cat function.
Following code gives you a basic idea about this:
>>> Series(['a', 'b', 'c']).str.cat(['A', 'B', 'C'], sep=',')
0 a,A
1 b,B
2 c,C
dtype: object
For more information, please check this:
http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.Series.str.cat.html
Hope this solves your problem...
I finally got a solution I stripped the question marks before the date column and applied to_datetime() to the column of the dataframe
df['From'] = df['From'].map(lambda x: str(x)[1:])
df['FromTime'] = pd.to_datetime(df['Date'].str.cat(df['From'], sep=" "),format='%Y-%m-%d %H:%M:%S', errors='coerce')

Pandas MultiIndex names not working

The axis 0 in the IndexError strikes me as odd. Where is my mistake?
It works if I do not rename the columns before setting the MultiIndex (uncomment line df = df.set_index([0, 1]) and comment the three above). Tested with stable and dev versions.
I am fairly new to python and pandas so any other suggestions for improvement are much appreciated.
import itertools
import datetime as dt
import numpy as np
import pandas as pd
from pandas.io.html import read_html
dfs = read_html('http://www.epexspot.com/en/market-data/auction/auction-table/2006-01-01/DE',
attrs={'class': 'list hours responsive'},
skiprows=1)
df = dfs[0]
hours = list(itertools.chain.from_iterable([[x, x] for x in range(1, 25)]))
df[0] = hours
df = df.rename(columns={0: 'a'})
df = df.rename(columns={1: 'b'})
df = df.set_index(['a', 'b'])
#df = df.set_index([0, 1])
today = dt.datetime(2006, 1, 1)
days = pd.date_range(today, periods=len(df.columns), freq='D')
colnames = [day.strftime(format='%Y-%m-%d') for day in days]
df.columns = colnames
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/frame.py", line 2099, in __setattr__
super(DataFrame, self).__setattr__(name, value)
File "properties.pyx", line 59, in pandas.lib.AxisProperty.__set__ (pandas/lib.c:29330)
File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/generic.py", line 656, in _set_axis
self._data.set_axis(axis, labels)
File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/internals.py", line 1039, in set_axis
block.set_ref_items(self.items, maybe_rename=maybe_rename)
File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/internals.py", line 93, in set_ref_items
self.items = ref_items.take(self.ref_locs)
File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/index.py", line 395, in take
taken = self.view(np.ndarray).take(indexer)
IndexError: index 7 is out of bounds for axis 0 with size 7
This is a very subtle bug. Going to be fixed by: https://github.com/pydata/pandas/pull/5345 in upcoming release 0.13 (very shortly).
As a workaround, you can do this after then set_index but before the column assignment
df = DataFrame(dict([ (c,col) for c, col in df.iteritems() ]))
The internal state of the frame was off; it is the renames followed by the set_index which caused this, so this recreates it so you can work with it.

Categories