I have a dataframe as follows:
period
1651622400000.00000
1651536000000.00000
1651449600000.00000
1651363200000.00000
1651276800000.00000
1651190400000.00000
1651104000000.00000
1651017600000.00000
I have converted it into human readable datetime as:
df['period'] = pd.to_datetime(df['period'], unit='ms')
and this outputs:
2022-04-04 00:00:00
2022-04-05 00:00:00
2022-04-06 00:00:00
2022-04-07 00:00:00
2022-04-08 00:00:00
2022-04-09 00:00:00
2022-04-10 00:00:00
2022-04-11 00:00:00
2022-04-12 00:00:00
hours minutes and seconds are turned to 0.
I checked this into https://www.epochconverter.com/ and this gives
GMT: Monday, April 4, 2022 12:00:00 AM
Your time zone: Monday, April 4, 2022 5:45:00 AM GMT+05:45
How do I get h, m, and s as well?
If use https://www.epochconverter.com/ is added timezone.
If need add timezones to column use Series.dt.tz_localize and then Series.dt.tz_convert:
df['period'] = (pd.to_datetime(df['period'], unit='ms')
.dt.tz_localize('GMT')
.dt.tz_convert('Asia/Kathmandu'))
print (df)
period
0 2022-05-04 05:45:00+05:45
1 2022-05-03 05:45:00+05:45
2 2022-05-02 05:45:00+05:45
3 2022-05-01 05:45:00+05:45
4 2022-04-30 05:45:00+05:45
5 2022-04-29 05:45:00+05:45
6 2022-04-28 05:45:00+05:45
7 2022-04-27 05:45:00+05:45
There is no problem with your code or with pandas. And I don't think the timezone is an issue here either (as the other answer says). April 4, 2022 12:00:00 AM is the exact same time and date as 2022-04-04 00:00:00, just in one case you use AM... You could specify timezones as jezrael writes or with utc=True (check the docs) but I guess that's not your problem.
Related
I have a pandas series s, I would like to extract the Monday before the third Friday:
with the help of the answer in following link, I can get a resample of third friday, I am still not sure how to get the Monday just before it.
pandas resample to specific weekday in month
from pandas.tseries.offsets import WeekOfMonth
s.resample(rule=WeekOfMonth(week=2,weekday=4)).bfill().asfreq(freq='D').dropna()
Any help is welcome
Many thanks
For each source date, compute your "wanted" date in 3 steps:
Shift back to the first day of the current month.
Shift forward to Friday in third week.
Shift back 4 days (from Friday to Monday).
For a Series containing dates, the code to do it is:
s.dt.to_period('M').dt.to_timestamp() + pd.offsets.WeekOfMonth(week=2, weekday=4)\
- pd.Timedelta('4D')
To test this code I created the source Series as:
s = (pd.date_range('2020-01-01', '2020-12-31', freq='MS') + pd.Timedelta('1D')).to_series()
It contains the second day of each month, both as the index and value.
When you run the above code, you will get:
2020-01-02 2020-01-13
2020-02-02 2020-02-17
2020-03-02 2020-03-16
2020-04-02 2020-04-13
2020-05-02 2020-05-11
2020-06-02 2020-06-15
2020-07-02 2020-07-13
2020-08-02 2020-08-17
2020-09-02 2020-09-14
2020-10-02 2020-10-12
2020-11-02 2020-11-16
2020-12-02 2020-12-14
dtype: datetime64[ns]
The left column contains the original index (source date) and the right
column - the "wanted" date.
Note that third Monday formula (as proposed in one of comments) is wrong.
E.g. third Monday in January is 2020-01-20, whereas the correct date is 2020-01-13.
Edit
If you have a DataFrame, something like:
Date Amount
0 2020-01-02 10
1 2020-01-12 10
2 2020-01-13 2
3 2020-01-20 2
4 2020-02-16 2
5 2020-02-17 12
6 2020-03-15 12
7 2020-03-16 3
8 2020-03-31 3
and you want something like resample but each "period" should start
on a Monday before the third Friday in each month, and e.g. compute
a sum for each period, you can:
Define the following function:
def dateShift(d):
d += pd.Timedelta(4, 'D')
d = pd.offsets.WeekOfMonth(week=2, weekday=4).rollback(d)
return d - pd.Timedelta(4, 'D')
i.e.:
Add 4 days (e.g. move 2020-01-13 (Monday) to 2020-01-17 (Friday).
Roll back (in the above case (on offset) this date will not be moved).
Subtract 4 days.
Run:
df.groupby(df.Date.apply(dateShift)).sum()
The result is:
Amount
Date
2019-12-16 20
2020-01-13 6
2020-02-17 24
2020-03-16 6
E. g. two values of 10 for 2020-01-02 and 2020-01-12 are assigned
to period starting on 2019-12-16 (the "wanted" date for December 2019).
I have a dataframe looking like this
open Start show Einde show
5 NaN 11:30 NaN
6 16:00 18:00 19:45
7 14:30 16:30 18:15
8 NaN NaN NaN
9 18:45 20:45 22:30
These hours are in string format and I would like to transform them to datetime format.
Whenever I try to use pd.to_datetime(evs['open'], errors='coerce') (to change one of the columns) It changes the hours to a full datetime format like this: 2020-04-03 16:00:00 with todays date. I would like to have just the hour, but still in datetime format so I can add minutes etc.
Now when I use dt.hour to access the hour, it return a string and not in HH:MM format.
Can someone help me out please? I'm reading in a CSV through Pandas read_csv but when I use the date parser I get the same problem. Ideally this would get fixed in the read_csv section instead of separately but at this point I'll take anything.
Thanks!
As Chris commented, it is not possible to convert just the hours and minutes into datetime format. But you can use timedeltas to solve your problem.
import datetime
import pandas as pd
def to_timedelta(date):
date = pd.to_datetime(date)
try:
date_start = datetime.datetime(date.year, date.month, date.day, 0, 0)
except TypeError:
return pd.NaT # to keep dtype of series; Alternative: pd.Timedelta(0)
return date - date_start
df['open'].apply(to_timedelta)
Output:
5 NaT
6 16:00:00
7 14:30:00
8 NaT
9 18:45:00
Name: open, dtype: timedelta64[ns]
Now you can use datetime.timedelta to add/subtract minutes, hours or whatever:
df['open'] + datetime.timedelta(minutes=15)
Output:
5 NaT
6 16:15:00
7 14:45:00
8 NaT
9 19:00:00
Name: open, dtype: timedelta64[ns]
Also, it is pretty easy to get back to full datetimes:
df['open'] + datetime.datetime(2020, 4, 4)
Output:
5 NaT
6 2020-04-04 16:00:00
7 2020-04-04 14:30:00
8 NaT
9 2020-04-04 18:45:00
Name: open, dtype: datetime64[ns]
I have a timestamp column in a dataframe as below, and I want to create another column called day of week from that. How can do it?
Input:
Pickup date/time
07/05/2018 09:28:00
14/05/2018 17:00:00
15/05/2018 17:00:00
15/05/2018 17:00:00
23/06/2018 17:00:00
29/06/2018 17:00:00
Expected Output:
Pickup date/time Day of Week
07/05/2018 09:28:00 Monday
14/05/2018 17:00:00 Monday
15/05/2018 17:00:00 Tuesday
15/05/2018 17:00:00 Tuesday
23/06/2018 17:00:00 Saturday
29/06/2018 17:00:00 Friday
You can use weekday_name
df['date/time'] = pd.to_datetime(df['date/time'], format = '%d/%m/%Y %H:%M:%S')
df['Day of Week'] = df['date/time'].dt.weekday_name
You get
date/time Day of Week
0 2018-05-07 09:28:00 Monday
1 2018-05-14 17:00:00 Monday
2 2018-05-15 17:00:00 Tuesday
3 2018-05-15 17:00:00 Tuesday
4 2018-06-23 17:00:00 Saturday
5 2018-06-29 17:00:00 Friday
Edit:
For the newer versions of Pandas, use day_name(),
df['Day of Week'] = df['date/time'].dt.day_name()
pandas>=0.23.0: pandas.Timestamp.day_name()
df['Day of Week'] = df['date/time'].day_name()
https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.day_name.html
pandas>=0.18.1,<0.23.0: pandas.Timestamp.weekday_name()
Deprecated since version 0.23.0
https://pandas-docs.github.io/pandas-docs-travis/reference/api/pandas.Timestamp.weekday_name.html
I have previously only worked in Stata but am now trying to switch to python. I want to conduct an event study. More specifically, I have 4 fixed dates a year. Every first day of every quarter, e.g. 1st January, 1st April...., and an event window +- 10 days around the event date. In order to partition my sample to the desired window I am using the following command:
smpl = merged.ix[datetime.date(year=2013,month=12,day=21):datetime.date(year=2014,month=1,day=10)]
I want to write a loop that automatically shifts the choosen sample period 90 days forward in every run of the loop so that I can subsequently run the required analysis in that step. I know how to run the analysis, but I do not know how to shift the sample 90 days forward for every step in the loop. For example, the next sample in the loop should be:
smpl = merged.ix[datetime.date(year=2014,month=3,day=21):datetime.date(year=2014,month=4,day=10)]
Its probably pretty simple, something like month=I and then shift by +3 every month. I am just to much of a noob in python to get the syntax done.
Any help is greatly appreciated.
I'd use this:
for beg in pd.date_range('2013-12-21', '2017-05-17', freq='90D'):
smpl = merged.loc[beg:beg + pd.Timedelta('20D')]
...
Demo:
In [158]: for beg in pd.date_range('2013-12-21', '2017-05-17', freq='90D'):
...: print(beg, beg + pd.Timedelta('20D'))
...:
2013-12-21 00:00:00 2014-01-10 00:00:00
2014-03-21 00:00:00 2014-04-10 00:00:00
2014-06-19 00:00:00 2014-07-09 00:00:00
2014-09-17 00:00:00 2014-10-07 00:00:00
2014-12-16 00:00:00 2015-01-05 00:00:00
2015-03-16 00:00:00 2015-04-05 00:00:00
2015-06-14 00:00:00 2015-07-04 00:00:00
2015-09-12 00:00:00 2015-10-02 00:00:00
2015-12-11 00:00:00 2015-12-31 00:00:00
2016-03-10 00:00:00 2016-03-30 00:00:00
2016-06-08 00:00:00 2016-06-28 00:00:00
2016-09-06 00:00:00 2016-09-26 00:00:00
2016-12-05 00:00:00 2016-12-25 00:00:00
2017-03-05 00:00:00 2017-03-25 00:00:00
I have a group of dates. I would like to subtract them from their forward neighbor to get the delta between them. My code look like this:
import pandas, numpy, StringIO
txt = '''ID,DATE
002691c9cec109e64558848f1358ac16,2003-08-13 00:00:00
002691c9cec109e64558848f1358ac16,2003-08-13 00:00:00
0088f218a1f00e0fe1b94919dc68ec33,2006-05-07 00:00:00
0088f218a1f00e0fe1b94919dc68ec33,2006-06-03 00:00:00
00d34668025906d55ae2e529615f530a,2006-03-09 00:00:00
00d34668025906d55ae2e529615f530a,2006-03-09 00:00:00
0101d3286dfbd58642a7527ecbddb92e,2007-10-13 00:00:00
0101d3286dfbd58642a7527ecbddb92e,2007-10-27 00:00:00
0103bd73af66e5a44f7867c0bb2203cc,2001-02-01 00:00:00
0103bd73af66e5a44f7867c0bb2203cc,2008-01-20 00:00:00
'''
df = pandas.read_csv(StringIO.StringIO(txt))
df = df.sort('DATE')
df.DATE = pandas.to_datetime(df.DATE)
grouped = df.groupby('ID')
df['X_SEQUENCE_GAP'] = pandas.concat([g['DATE'].sub(g['DATE'].shift(), fill_value=0) for title,g in grouped])
I am getting pretty incomprehensible results. So, I am going to go with I have a logic error.
The results I get are as follows:
ID DATE X_SEQUENCE_GAP
0 002691c9cec109e64558848f1358ac16 2003-08-13 00:00:00 12277 days, 00:00:00
1 002691c9cec109e64558848f1358ac16 2003-08-13 00:00:00 00:00:00
3 0088f218a1f00e0fe1b94919dc68ec33 2006-06-03 00:00:00 27 days, 00:00:00
2 0088f218a1f00e0fe1b94919dc68ec33 2006-05-07 00:00:00 13275 days, 00:00:00
5 00d34668025906d55ae2e529615f530a 2006-03-09 00:00:00 13216 days, 00:00:00
4 00d34668025906d55ae2e529615f530a 2006-03-09 00:00:00 00:00:00
6 0101d3286dfbd58642a7527ecbddb92e 2007-10-13 00:00:00 13799 days, 00:00:00
7 0101d3286dfbd58642a7527ecbddb92e 2007-10-27 00:00:00 14 days, 00:00:00
9 0103bd73af66e5a44f7867c0bb2203cc 2008-01-20 00:00:00 2544 days, 00:00:00
8 0103bd73af66e5a44f7867c0bb2203cc 2001-02-01 00:00:00 11354 days, 00:00:00
I was expecting for exapme that 0 and 1 would have both a 0 result. Any help is most appreciated.
This is in 0.11rc1 (I don't think will work on a prior version)
When you shift dates the first one is a NaT (like a nan, but for datetimes/timedeltas)
In [27]: df['X_SEQUENCE_GAP'] = grouped.apply(lambda g: g['DATE']-g['DATE'].shift())
In [30]: df.sort()
Out[30]:
ID DATE X_SEQUENCE_GAP
0 002691c9cec109e64558848f1358ac16 2003-08-13 00:00:00 NaT
1 002691c9cec109e64558848f1358ac16 2003-08-13 00:00:00 00:00:00
2 0088f218a1f00e0fe1b94919dc68ec33 2006-05-07 00:00:00 NaT
3 0088f218a1f00e0fe1b94919dc68ec33 2006-06-03 00:00:00 27 days, 00:00:00
4 00d34668025906d55ae2e529615f530a 2006-03-09 00:00:00 NaT
5 00d34668025906d55ae2e529615f530a 2006-03-09 00:00:00 00:00:00
6 0101d3286dfbd58642a7527ecbddb92e 2007-10-13 00:00:00 NaT
7 0101d3286dfbd58642a7527ecbddb92e 2007-10-27 00:00:00 14 days, 00:00:00
8 0103bd73af66e5a44f7867c0bb2203cc 2001-02-01 00:00:00 NaT
9 0103bd73af66e5a44f7867c0bb2203cc 2008-01-20 00:00:00 2544 days, 00:00:00
You can then fillna (but you have to do this ackward type conversion becuase of a numpy bug, will get fixed in 0.12).
In [57]: df['X_SEQUENCE_GAP'].sort_index().astype('timedelta64[ns]').fillna(0)
Out[57]:
0 00:00:00
1 00:00:00
2 00:00:00
3 27 days, 00:00:00
4 00:00:00
5 00:00:00
6 00:00:00
7 14 days, 00:00:00
8 00:00:00
9 2544 days, 00:00:00
Name: X_SEQUENCE_GAP, dtype: timedelta64[ns]