Converting time zone pandas dataframe - python

I have data:
Symbol bid ask
Timestamp
2014-01-01 21:55:34.378000 EUR/USD 1.37622 1.37693
2014-01-01 21:55:40.410000 EUR/USD 1.37624 1.37698
2014-01-01 21:55:47.210000 EUR/USD 1.37619 1.37696
2014-01-01 21:55:57.963000 EUR/USD 1.37616 1.37696
2014-01-01 21:56:03.117000 EUR/USD 1.37616 1.37694
The timestamp is in GMT. Is there a way to convert that to Eastern?
Note when I do:
data.index
I get output:
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-01-01 21:55:34.378000, ..., 2014-01-01 21:56:03.117000]
Length: 5, Freq: None, Timezone: None

Localize the index (using tz_localize) to UTC (to make the Timestamps timezone-aware) and then convert to Eastern (using tz_convert):
import pytz
eastern = pytz.timezone('US/Eastern')
df.index = df.index.tz_localize(pytz.utc).tz_convert(eastern)
For example:
import pandas as pd
import pytz
index = pd.date_range('20140101 21:55', freq='15S', periods=5)
df = pd.DataFrame(1, index=index, columns=['X'])
print(df)
# X
# 2014-01-01 21:55:00 1
# 2014-01-01 21:55:15 1
# 2014-01-01 21:55:30 1
# 2014-01-01 21:55:45 1
# 2014-01-01 21:56:00 1
# [5 rows x 1 columns]
print(df.index)
# <class 'pandas.tseries.index.DatetimeIndex'>
# [2014-01-01 21:55:00, ..., 2014-01-01 21:56:00]
# Length: 5, Freq: 15S, Timezone: None
eastern = pytz.timezone('US/Eastern')
df.index = df.index.tz_localize(pytz.utc).tz_convert(eastern)
print(df)
# X
# 2014-01-01 16:55:00-05:00 1
# 2014-01-01 16:55:15-05:00 1
# 2014-01-01 16:55:30-05:00 1
# 2014-01-01 16:55:45-05:00 1
# 2014-01-01 16:56:00-05:00 1
# [5 rows x 1 columns]
print(df.index)
# <class 'pandas.tseries.index.DatetimeIndex'>
# [2014-01-01 16:55:00-05:00, ..., 2014-01-01 16:56:00-05:00]
# Length: 5, Freq: 15S, Timezone: US/Eastern

The simplest way is to use to_datetime with utc=True:
df = pd.DataFrame({'Symbol': ['EUR/USD'] * 5,
'bid': [1.37622, 1.37624, 1.37619, 1.37616, 1.37616],
'ask': [1.37693, 1.37698, 1.37696, 1.37696, 1.37694]})
df.index = pd.to_datetime(['2014-01-01 21:55:34.378000',
'2014-01-01 21:55:40.410000',
'2014-01-01 21:55:47.210000',
'2014-01-01 21:55:57.963000',
'2014-01-01 21:56:03.117000'],
utc=True)
For more flexibility, you can convert timezones with tz_convert(). If your data column/index is not timezone-aware, you will get a warning, and should first make the data timezone-aware with tz_localize.
df = pd.DataFrame({'Symbol': ['EUR/USD'] * 5,
'bid': [1.37622, 1.37624, 1.37619, 1.37616, 1.37616],
'ask': [1.37693, 1.37698, 1.37696, 1.37696, 1.37694]})
df.index = pd.to_datetime(['2014-01-01 21:55:34.378000',
'2014-01-01 21:55:40.410000',
'2014-01-01 21:55:47.210000',
'2014-01-01 21:55:57.963000',
'2014-01-01 21:56:03.117000'])
df.index = df.index.tz_localize('GMT')
df.index = df.index.tz_convert('America/New_York')
This also works similarly for datetime columns, but you need dt after accessing the column:
df['column'] = df['column'].dt.tz_convert('America/New_York')

To convert EST time into Asia tz
df.index = data.index.tz_localize('EST')
df.index = data.index.tz_convert('Asia/Kolkata')
Pandas has now inbuilt tz conversion ability.

Related

Convert time object to datetime format in python pandas

I have a dataset of column name DateTime having dtype object.
df['DateTime'] = pd.to_datetime(df['DateTime'])
I have used the above code to convert to datetime format then did a split in the column to have Date and Time separately
df['date'] = df['DateTime'].dt.date
df['time'] = df['DateTime'].dt.time
but after the split the format changes to object type and while converting it to datetime it showing error for the time column name as: TypeError: is not convertible to datetime
How to convert it to datetime format the time column
You can use combine in list comprehension with zip:
df = pd.DataFrame({'DateTime': ['2011-01-01 12:48:20', '2014-01-01 12:30:45']})
df['DateTime'] = pd.to_datetime(df['DateTime'])
df['date'] = df['DateTime'].dt.date
df['time'] = df['DateTime'].dt.time
import datetime
df['new'] = [datetime.datetime.combine(a, b) for a, b in zip(df['date'], df['time'])]
print (df)
DateTime date time new
0 2011-01-01 12:48:20 2011-01-01 12:48:20 2011-01-01 12:48:20
1 2014-01-01 12:30:45 2014-01-01 12:30:45 2014-01-01 12:30:45
Or convert to strings, join together and convert again:
df['new'] = pd.to_datetime(df['date'].astype(str) + ' ' +df['time'].astype(str))
print (df)
DateTime date time new
0 2011-01-01 12:48:20 2011-01-01 12:48:20 2011-01-01 12:48:20
1 2014-01-01 12:30:45 2014-01-01 12:30:45 2014-01-01 12:30:45
But if use floor for remove times with converting times to timedeltas then use + only:
df['date'] = df['DateTime'].dt.floor('d')
df['time'] = pd.to_timedelta(df['DateTime'].dt.strftime('%H:%M:%S'))
df['new'] = df['date'] + df['time']
print (df)
DateTime date time new
0 2011-01-01 12:48:20 2011-01-01 12:48:20 2011-01-01 12:48:20
1 2014-01-01 12:30:45 2014-01-01 12:30:45 2014-01-01 12:30:45
How to convert it back to datetime format the time column
There appears to be a misunderstanding. Pandas datetime series must include date and time components. This is non-negotiable. You can simply use pd.to_datetime without specifying a date and use the default 1900-01-01 date:
# date from jezrael
print(pd.to_datetime(df['time'], format='%H:%M:%S'))
0 1900-01-01 12:48:20
1 1900-01-01 12:30:45
Name: time, dtype: datetime64[ns]
Or use another date component, for example today's date:
today = pd.Timestamp('today').strftime('%Y-%m-%d')
print(pd.to_datetime(today + ' ' + df['time'].astype(str)))
0 2018-11-25 12:48:20
1 2018-11-25 12:30:45
Name: time, dtype: datetime64[ns]
Or recombine from your date and time series:
print(pd.to_datetime(df['date'].astype(str) + ' ' + df['time'].astype(str)))
0 2011-01-01 12:48:20
1 2014-01-01 12:30:45
dtype: datetime64[ns]

Parsing datetime64 and datetime.time Python 3.6.5

I have two columns, one has type datetime64 and datetime.time. The
first column has the day and the second one the hour and minutes. I
am having trouble parsing them:
Leistung_0011
ActStartDateExecution ActStartTimeExecution
0 2016-02-17 11:00:00
10 2016-04-15 07:15:00
20 2016-06-10 10:30:00
Leistung_0011['Start_datetime'] = pd.to_datetime(Leistung_0011['ActStartDateExecution'].astype(str) + ' ' + Leistung_0011['ActStartTimeExecution'].astype(str))
ValueError: ('Unknown string format:', 'NaT 00:00:00')
You can convert to str and join with whitespace before passing to pd.to_datetime:
df['datetime'] = pd.to_datetime(df['day'].astype(str) + ' ' + df['time'].astype(str))
print(df, df.dtypes, sep='\n')
# day time datetime
# 0 2018-01-01 15:00:00 2018-01-01 15:00:00
# 1 2015-12-30 05:00:00 2015-12-30 05:00:00
# day datetime64[ns]
# time object
# datetime datetime64[ns]
# dtype: object
Setup
from datetime import datetime
df = pd.DataFrame({'day': ['2018-01-01', '2015-12-30'],
'time': ['15:00', '05:00']})
df['day'] = pd.to_datetime(df['day'])
df['time'] = df['time'].apply(lambda x: datetime.strptime(x, '%H:%M').time())
print(df['day'].dtype, type(df['time'].iloc[0]), sep='\n')
# datetime64[ns]
# <class 'datetime.time'>
Complete example including seconds:
import pandas as pd
from io import StringIO
x = StringIO(""" ActStartDateExecution ActStartTimeExecution
0 2016-02-17 11:00:00
10 2016-04-15 07:15:00
20 2016-06-10 10:30:00""")
df = pd.read_csv(x, delim_whitespace=True)
df['ActStartDateExecution'] = pd.to_datetime(df['ActStartDateExecution'])
df['ActStartTimeExecution'] = df['ActStartTimeExecution'].apply(lambda x: datetime.strptime(x, '%H:%M:%S').time())
df['datetime'] = pd.to_datetime(df['ActStartDateExecution'].astype(str) + ' ' + df['ActStartTimeExecution'].astype(str))
print(df.dtypes)
ActStartDateExecution datetime64[ns]
ActStartTimeExecution object
datetime datetime64[ns]
dtype: object

How can I print certain rows from a CSV in pandas

My problem is that I have a big dataframe with over 40000 Rows and now I want to select the rows from 2013-01-01 00:00:00 until 2013-31-12 00:00:00
print(df.loc[df['localhour'] == '2013-01-01 00:00:00'])
Thats my code now but I can not choose an intervall for printing out ... any ideas ?
One way is to set your index as datetime and then use pd.DataFrame.loc with string indexers:
df = pd.DataFrame({'Date': ['2013-01-01', '2014-03-01', '2011-10-01', '2013-05-01'],
'Var': [1, 2, 3, 4]})
df['Date'] = pd.to_datetime(df['Date'])
res = df.set_index('Date').loc['2010-01-01':'2013-01-01']
print(res)
Var
Date
2013-01-01 1
2011-10-01 3
Make a datetime object and then apply the condition:
print(df)
date
0 2013-01-01
1 2014-03-01
2 2011-10-01
3 2013-05-01
df['date']=pd.to_datetime(df['date'])
df['date'].loc[(df['date']<='2013-12-31 00:00:00') & (df['date']>='2013-01-01 00:00:00')]
Output:
0 2013-01-01
3 2013-05-01

Python Pandas remove date from timestamp

I have a large data set like this
user category
time
2014-01-01 00:00:00 21155349 2
2014-01-01 00:00:00 56347479 6
2014-01-01 00:00:00 68429517 13
2014-01-01 00:00:00 39055685 4
2014-01-01 00:00:00 521325 13
I want to make it as
user category
time
00:00:00 21155349 2
00:00:00 56347479 6
00:00:00 68429517 13
00:00:00 39055685 4
00:00:00 521325 13
How you do this using pandas
If you want to mutate a series (column) in pandas, the pattern is to apply a function to it (that updates on element in the series at a time), and to then assign that series back into into the dataframe
import pandas
import StringIO
# load data
data = '''date,user,category
2014-01-01 00:00:00, 21155349, 2
2014-01-01 00:00:00, 56347479, 6
2014-01-01 00:00:00, 68429517, 13
2014-01-01 00:00:00, 39055685, 4
2014-01-01 00:00:00, 521325, 13'''
df = pandas.read_csv(StringIO.StringIO(data))
df['date'] = pandas.to_datetime(df['date'])
# make the required change
without_date = df['date'].apply( lambda d : d.time() )
df['date'] = without_date
# display results
print df
If the problem is because the date is the index, you've got a few more hoops to jump through:
df = pandas.read_csv(StringIO.StringIO(data), index_col='date')
ser = pandas.to_datetime(df.index).to_series()
df.set_index(ser.apply(lambda d : d.time() ))
As suggested by #DSM, If you have pandas later than 0.15.2, you can use use the .dt accessor on the series to do fast updates.
df = pandas.read_csv(StringIO.StringIO(data), index_col='date')
ser = pandas.to_datetime(df.index).to_series()
df.set_index(ser.dt.time)

Python Pandas business day range bdate_range doesn't take 1min freq?

I am trying to use bdate_range with '1min' freq to get minute by minute data on all business days.
df = pd.bdate_range('20130101 9:30','20130106 16:00',freq='1min')
with output ends with
......
2013-01-05 23:59:00
2013-01-06 00:00:00
In [158]:
Notice that 2013-01-05 and 2013-01-06 are weekends and it didn't take time limit between 9:30 and 16:00
I think the freq = '1min' totally overwrites freq = 'B' from function name bdate_range
I also tried using date_range. It worked for the time range from 9:30 to 16:00, but it can't exclude weekends.
Thanks!
You could do it like this
In [28]: rng = pd.date_range('2012-01-01', '2013-01-01', freq="1min")
In [29]: rng
Out[29]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-01-01 00:00:00, ..., 2013-01-01 00:00:00]
Length: 527041, Freq: T, Timezone: None
Limit the times that I want
In [30]: x = rng[rng.indexer_between_time('9:30','16:00')]
In [31]: x
Out[31]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-01-01 09:30:00, ..., 2012-12-31 16:00:00]
Length: 143106, Freq: None, Timezone: None
Only days that are mon-fri
In [32]: x = x[x.dayofweek<5]
In [33]: x
Out[33]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-01-02 09:30:00, ..., 2012-12-31 16:00:00]
Length: 102051, Freq: None, Timezone: None

Categories