Parsing date/time strings in Pandas DataFrame - python

I have the following Pandas series of dates/times:
pd.DataFrame({"GMT":["13 Feb 20089:30 AM", "22 Apr 20098:30 AM",
"14 Jul 20108:30 AM", "01 Jan 20118:30 AM"]})
GMT
13 Feb 20089:30 AM
22 Apr 20098:30 AM
14 Jul 20108:30 AM
01 Jan 20118:30 AM
What I would like is to split the date and time portions into two separate columns, i.e.
Date Time
13 Feb 2008 9:30 AM
22 Apr 2009 8:30 AM
14 Jul 2010 8:30 AM
01 Jan 2011 8:30 AM
Any help? Thought about simply splicing each string individually but was wondering if there was a better solution that returned them as datetime objects.

Use to_datetime + dt.strftime:
df['GMT'] = pd.to_datetime(df['GMT'], format='%d %b %Y%H:%M %p')
df['Date'] = df['GMT'].dt.strftime('%d %b %Y')
df['Time'] = df['GMT'].dt.strftime('%H:%M %p')
print (df)
GMT Date Time
0 2008-02-13 09:30:00 13 Feb 2008 09:30 AM
1 2009-04-22 08:30:00 22 Apr 2009 08:30 AM
2 2010-07-14 08:30:00 14 Jul 2010 08:30 AM
3 2011-01-01 08:30:00 01 Jan 2011 08:30 AM
And for datetime objects use dt.date and
dt.time:
df['GMT'] = pd.to_datetime(df['GMT'], format='%d %b %Y%H:%M %p')
df['Date'] = df['GMT'].dt.date
df['Time'] = df['GMT'].dt.time
print (df)
GMT Date Time
0 2008-02-13 09:30:00 2008-02-13 09:30:00
1 2009-04-22 08:30:00 2009-04-22 08:30:00
2 2010-07-14 08:30:00 2010-07-14 08:30:00
3 2011-01-01 08:30:00 2011-01-01 08:30:00
For formats check http://strftime.org/.

Related

How to shift dates in pandas dataframe with reference to a particular column having business days?

I am trying to reference a particular column called 'Unloading point' in order to map my dates to these particular business days. So the dates present in 'Date' column should be mapped based on 'Unloading point'. If there is no data in 'Unloading point' column and if date falls on a weekend then it should just rollback to nearest business day. I am quite new to pandas dataframes. Any help will be deeply appreciated.
Date
Unloading point
Expected Output
5/30/2021
MON-TUE-WED-THU-FRI
5/31/2021
6/11/2021
MON-TUE-WED-THU-FRI
6/11/2021
6/5/2021
THU
6/3/2021
6/4/2021
THU
6/3/2021
5/27/2021
THU
5/27/2021
5/29/2021
THU
5/27/2021
5/29/2021
5/28/2021
6/6/2021
MON-TUE-WED-THU-FRI
6/7/2021
6/1/2021
6/1/2021
5/29/2021
TUE
5/25/2021
6/1/2021
6/1/2021
7/31/2021
THU
7/29/2021
6/1/2021
WED
6/2/2021
5/26/2021
WED
5/26/2021
6/14/2021
MON-TUE-WED-THU-FRI
6/14/2021
5/27/2021
MON-TUE-WED
5/26/2021
6/15/2021
MON-TUE-WED
6/15/2021
5/22/2021
TUE-WED
5/19/2021
6/10/2021
MON-TUE-WED
6/10/2021
6/24/2021
TUE-FRI
6/22/2021
To answer this question we can use Custom Business Days. The constructor of these objets takes a weekmask parameter which contains a list of days of the week which are considered business days, for example "Mon Tue Wed". It is also possible to specify holidays. These objects work much like the BusinessDay class but with the the freedom to specify the business days as we please.
Answer
The key here is to create a CDay object for each row.
from pandas.tseries.offsets import CDay # Same as CustomBusinessDay
# Make DimetimeIndex
df['date'] = pd.to_datetime(df['date'])
# Format the weekmask correctly
df['weekmask'] = df['weekmask'].str.replace('-', ' ').str.title()
def roll(s):
# The default weekmask is Monday to Friday.
cday = CDay(weekmask=s.weekmask) if s.weekmask else CDay()
next_day = cday.rollforward(s.date)
prev_day = cday.rollback(s.date)
return next_day if next_day - s.date < s.date - prev_day else prev_day
df['roll'] = df.apply(roll, axis='columns')
Result
date weekmask roll
0 2021-05-30 Mon Tue Wed Thu Fri 2021-05-31
1 2021-06-11 Mon Tue Wed Thu Fri 2021-06-11
2 2021-06-05 Thu 2021-06-03
3 2021-06-04 Thu 2021-06-03
4 2021-05-27 Thu 2021-05-27
5 2021-05-29 Thu 2021-05-27
6 2021-05-29 None 2021-05-28
7 2021-06-06 Mon Tue Wed Thu Fri 2021-06-07
8 2021-06-01 None 2021-06-01
9 2021-05-29 Tue 2021-06-01
10 2021-06-01 None 2021-06-01
11 2021-07-31 Thu 2021-07-29
12 2021-06-01 Wed 2021-06-02
13 2021-05-26 Wed 2021-05-26
14 2021-06-14 Mon Tue Wed Thu Fri 2021-06-14
15 2021-05-27 Mon Tue Wed 2021-05-26
16 2021-06-15 Mon Tue Wed 2021-06-15
17 2021-05-22 Tue Wed 2021-05-19
18 2021-06-10 Mon Tue Wed 2021-06-09
19 2021-06-24 Tue Fri 2021-06-25

create another dataframe datetime column based on the value of the datetime in another dataframe column

I have a dataframe which has a datetime column lets call it my_dates.
I also have a list of dates which has say 5 dates for this example.
15th Jan 2020
20th Mar 2020
28th Jun 2020
20th Jul 2020
8th Aug 2020
What I want to do is create another column in my datframe where it looks at the datetime in my_dates column & where it is less than a date in my date list for it to take that value.
For example lets say for this example say its 23rd June 2020. I want the new column to have the value for this row of 28th June 2020. Hopefully the examples below are clear.
More examples
my_dates expected_values
14th Jan 2020 15th Jan 2020
15th Jan 2020 15th Jan 2020
16th Jan 2020 20th Mar 2020
... ...
19th Mar 2020 20th Mar 2020
20th Mar 2020 20th Mar 2020
21st Mar 2020 28th Jun 2020
What is the most efficient way to do this rather than looping?
IIUC, you need pd.merge_asof with the argument direction set to forward
dates = ['15th Jan 2020',
'20th Mar 2020',
'28th Jun 2020',
'20th Jul 2020',
'8th Aug 2020' ]
dates_proper = [pd.to_datetime(d) for d in dates]
df = pd.DataFrame(pd.date_range('14-01-2020','21-03-2020'),columns=['my_dates'])
df1 = pd.DataFrame(dates_proper,columns=['date_list'])
merged_df = pd.merge_asof(
df, df1, left_on=["my_dates"], right_on=["date_list"], direction="forward"
)
print(merged_df)
my_dates date_list
0 2020-01-14 2020-01-15
1 2020-01-15 2020-01-15
2 2020-01-16 2020-03-20
3 2020-01-17 2020-03-20
4 2020-01-18 2020-03-20
.. ... ...
63 2020-03-17 2020-03-20
64 2020-03-18 2020-03-20
65 2020-03-19 2020-03-20
66 2020-03-20 2020-03-20
67 2020-03-21 2020-06-28
Finally a usecase for pd.merge_asof! :) From the documentation
Perform an asof merge. This is similar to a left-join except that we match on nearest key rather than equal keys.
It would have been helpful to make your example reproducible like this:
In [12]: reference = pd.DataFrame([['15th Jan 2020'],['20th Mar 2020'],['28th Jun 2020'],['20th Jul 2020'],['8th Aug 2020']], columns=['reference']).astype('datetime64')
In [13]: my_dates = pd.DataFrame([['14th Jan 2020'], ['15th Jan 2020'], ['16th Jan 2020'], ['19th Mar 2020'], ['20th Mar 2020'],['21th Mar 2020']], columns=['dates']).astype('datetime64')
In [15]: pd.merge_asof(my_dates, reference, left_on='dates', right_on='reference', direction='forward')
Out[15]:
dates reference
0 2020-01-14 2020-01-15
1 2020-01-15 2020-01-15
2 2020-01-16 2020-03-20
3 2020-03-19 2020-03-20
4 2020-03-20 2020-03-20
5 2020-03-21 2020-06-28

Divide Single to Multiple Columns by Delimiter Python Dataframe

i have a dataframe called "dates" with shape 4380,1 that looks like this -
date
0 2017-01-01 00:00:00
1 2017-01-01 06:00:00
2 2017-01-01 12:00:00
3 2017-01-01 18:00:00
4 2017-01-02 00:00:00
...
4375 2019-12-30 18:00:00
4376 2019-12-31 00:00:00
4377 2019-12-31 06:00:00
4378 2019-12-31 12:00:00
4379 2019-12-31 18:00:00
but i need to divide the single column of dates by the delimiter "-" or dash so that I can use this to groupby the month e.g., 01, 02,...12. So, my final result for the new dataframe should have shape 4380,4 and look like:
Year Month Day HHMMSS
0 2017 01 01 00:00:00
1 2017 01 01 06:00:00
...
4379 2019 12 31 18:00:00
I cannot find how to do this python transformation from single to multiple columns based on a delimiter. Thank you much!
Use Series.dt.strftime and Series.str.split:
new_df = df['date'].dt.strftime('%Y-%m-%d-%H:%M:%S').str.split('-',expand=True)
new_df.columns = ['Year','Month','Day', 'HHMMSS']
print(new_df)
Year Month Day HHMMSS
0 2017 01 01 00:00:00
1 2017 01 01 06:00:00
2 2017 01 01 12:00:00
3 2017 01 01 18:00:00
4 2017 01 02 00:00:00
4375 2019 12 30 18:00:00
4376 2019 12 31 00:00:00
4377 2019 12 31 06:00:00
4378 2019 12 31 12:00:00
4379 2019 12 31 18:00:00

Arranging the data-series according to month vs year in python

I have two Python series like below.
obstime temperature
2012-01-31 -10.203452
2012-02-29 -7.818472
2012-03-31 -10.965704
2012-04-30 -12.800104
2012-05-31 -16.666207
2012-06-30 -11.511220
2012-07-31 -17.928276
2012-08-31 -14.837011
2012-09-30 -13.116554
2012-10-31 -9.929026
2012-11-30 -5.082396
2012-12-31 -10.915046
2013-01-31 -15.459292
2013-02-28 -8.278767
2013-03-31 -13.764899
2013-04-30 -13.262068
2013-05-31 -15.787945
2013-06-30 -13.096949
2013-07-31 -15.841149
2013-08-31 -16.051178
...
2016-01-31 -4.883573
And I want to arrange the data in year vs month format like shown below:
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2012 -10.20 -7.81 ......
2013 -15.45 -8.27....
...
2016 -4.88 -7.94
Need to parse the year , month which should have the values from the series.
you need df.pivot_table
#if obstime is one of the columns, then convert it to index
#df.set_index('obstime',inplace=True)
#make your index to datetime
df.index=pd.to_datetime(df.index)
df['Year']=df.index.year
df['Month']=df.index.strftime('%b')
df.pivot_table(columns='Month',index='Year',values='temperature')

Simple way to extract date from text in Pandas

The excerpt from data:
Givent the following example of pandas dataframe:
df =
index date
7838 2012 January
7790 2012 January
7853 2015 September
7889 2016 March
7928 2015 October
7847 1999 January
7884 2006 January
7826 1992 January
Is there a simple (and pythonic) way to convert free text into a standard date time variable? Something like:
df =
index date
7838 2012-01-01
7790 2012-01-01
7853 2015-09-01
7889 2016-03-01
7928 2015-10-01
7847 1999-01-01
7884 2006-01-01
7826 1992-01-01
Use pd.to_datetime() to convert from text to date type. You can glean the appropriate date types from this list.
df['date'] = pd.to_datetime(df['date'], format='%Y %B')
to_datetime handles this fine without any specific format specifier:
In [83]:
pd.to_datetime(df['date'])
Out[83]:
0 2012-01-01
1 2012-01-01
2 2015-09-01
3 2016-03-01
4 2015-10-01
5 1999-01-01
6 2006-01-01
7 1992-01-01
Name: date, dtype: datetime64[ns]

Categories