How MonthEnd(1) function works? - python

MonthEnd function offset the date to end of month as given below. But I am not clear how it works? Does it scan the left side of '+' sign and then calculate the number of days to add? Normally we expect something like MonthEnd(Date).
import pandas as pd
from pandas.tseries.offsets import MonthEnd
df = pd.DataFrame({'Date': [20010410, 20050805, 20100219, 20160211, 19991208, 20061122]})
df['EndOfMonth'] = pd.to_datetime(df['Date'], format="%Y%m%d") + MonthEnd(1)
print(df['EndOfMonth'])
0 2001-04-30
1 2005-08-31
2 2010-02-28
3 2016-02-29
4 1999-12-31
5 2006-11-30
Name: EndOfMonth, dtype: datetime64[ns]

Related

How do I adjust the dates of a column in pandas according to a threshhold?

I have a data frame with a datetime column like so:
dates
0 2017-09-19
1 2017-08-28
2 2017-07-13
I want to know if there is a way to adjust the dates with this condition:
If the day of the date is before 15, then change the date to the end of last month.
If the day of the date is 15 or after, then change the date to the end of the current month.
My desired output would look something like this:
dates
0 2017-09-30
1 2017-08-31
2 2017-06-30
Using np.where and Josh's suggestion of MonthEnd, this can be simplified a bit.
Given:
dates
0 2017-09-19
1 2017-08-28
2 2017-07-13
Doing:
from pandas.tseries.offsets import MonthEnd
# Where the day is less than 15,
# Give the DateEnd of the previous month.
# Otherwise,
# Give the DateEnd of the current month.
df.dates = np.where(df.dates.dt.day.lt(15),
df.dates.add(MonthEnd(-1)),
df.dates.add(MonthEnd(0)))
print(df)
# Output:
dates
0 2017-09-30
1 2017-08-31
2 2017-06-30
Easy with MonthEnd
Let's set up the data:
dates = pd.Series({0: '2017-09-19', 1: '2017-08-28', 2: '2017-07-13'})
dates = pd.to_datetime(dates)
Then:
from pandas.tseries.offsets import MonthEnd
pre, post = dates.dt.day < 15, dates.dt.day >= 15
dates.loc[pre] = dates.loc[pre] + MonthEnd(-1)
dates.loc[post] = dates.loc[post] + MonthEnd(1)
Explanation: create masks (pre and post) first. Then use the masks to either get month end for current or previous month, as appropriate.

Pandas dataframe timedelta is giving exceptions

I am trying to get the next month first date based on billDate in a dataframe.
I did this:
import pandas as pd
import datetime
from datetime import timedelta
dt = pd.to_datetime('15/4/2019', errors='coerce')
print(dt)
print((dt.replace(day=1) + datetime.timedelta(days=32)).replace(day=1))
It is working perfectly, and the output is :
2019-04-15 00:00:00
2019-05-01 00:00:00
Now, I am applying same logic in my dataframe in the below code
df[comNewColName] = (pd.to_datetime(df['billDate'], errors='coerce').replace(day=1) + datetime.timedelta(days=32)).replace(day=1)
But I am getting error like this:
---> 69 df[comNewColName] = (pd.to_datetime(df['billDate'], errors='coerce').replace(day=1) + datetime.timedelta(days=32)).replace(day=1)
70 '''print(df[['billDate']])'''
71 '''df = df.assign(Product=lambda x: (x['Field_1'] * x['Field_2'] * x['Field_3']))'''
TypeError: replace() got an unexpected keyword argument 'day'
You can use Series.to_period for month periods, add 1 for next month and then convert back to datetimes by Series.dt.to_timestamp:
print (df)
billDate
0 15/4/2019
1 30/4/2019
2 15/8/2019
df['billDate'] = (pd.to_datetime(df['billDate'], errors='coerce', dayfirst=True)
.dt.to_period('m')
.add(1)
.dt.to_timestamp())
print (df)
billDate
0 2019-05-01
1 2019-05-01
2 2019-09-01

How to swap day by month in a Series with python?

I have a column in which there are dates :
df['Date']
Date
0 2020-25-04
1 2020-26-04
2 2020-27-04
3 2020-12-05
4 2020-06-05
Name: Date, Length: 5, dtype: datetime64[ns]
I want to swap the element Day by element Month, so I can have :
df['Date']
Date
0 2020-04-25
1 2020-04-26
2 2020-04-27
3 2020-05-12
4 2020-05-06
Name: Date, Length: 5, dtype: datetime64[ns]
Any help would be appreciated.
import pandas as pd
import numpy as np
df = pd.DataFrame({'Date':[np.datetime64('2020-04-25') ,np.datetime64('2020-04-26')]})
df['Date'] = df['Date'].apply(lambda x: x.strftime('%Y-%m-%d'))
print(df)
I converted data into np.datetime format and applied lambda function.

pd.to_datetime is getting half my dates with flipped day / months

My dataset has dates in the European format, and I'm struggling to convert it into the correct format before I pass it through a pd.to_datetime, so for all day < 12, my month and day switch.
Is there an easy solution to this?
import pandas as pd
import datetime as dt
df = pd.read_csv(loc,dayfirst=True)
df['Date']=pd.to_datetime(df['Date'])
Is there a way to force datetime to acknowledge that the input is formatted at dd/mm/yy?
Thanks for the help!
Edit, a sample from my dates:
renewal["Date"].head()
Out[235]:
0 31/03/2018
2 30/04/2018
3 28/02/2018
4 30/04/2018
5 31/03/2018
Name: Earliest renewal date, dtype: object
After running the following:
renewal['Date']=pd.to_datetime(renewal['Date'],dayfirst=True)
I get:
Out[241]:
0 2018-03-31 #Correct
2 2018-04-01 #<-- this number is wrong and should be 01-04 instad
3 2018-02-28 #Correct
Add format.
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
You can control the date construction directly if you define separate columns for 'year', 'month' and 'day', like this:
import pandas as pd
df = pd.DataFrame(
{'Date': ['01/03/2018', '06/08/2018', '31/03/2018', '30/04/2018']}
)
date_parts = df['Date'].apply(lambda d: pd.Series(int(n) for n in d.split('/')))
date_parts.columns = ['day', 'month', 'year']
df['Date'] = pd.to_datetime(date_parts)
date_parts
# day month year
# 0 1 3 2018
# 1 6 8 2018
# 2 31 3 2018
# 3 30 4 2018
df
# Date
# 0 2018-03-01
# 1 2018-08-06
# 2 2018-03-31
# 3 2018-04-30

Pandas Timedelta in Days

I have a dataframe in pandas called 'munged_data' with two columns 'entry_date' and 'dob' which i have converted to Timestamps using pd.to_timestamp.I am trying to figure out how to calculate ages of people based on the time difference between 'entry_date' and 'dob' and to do this i need to get the difference in days between the two columns ( so that i can then do somehting like round(days/365.25). I do not seem to be able to find a way to do this using a vectorized operation. When I do munged_data.entry_date-munged_data.dob i get the following :
internal_quote_id
2 15685977 days, 23:54:30.457856
3 11651985 days, 23:49:15.359744
4 9491988 days, 23:39:55.621376
7 11907004 days, 0:10:30.196224
9 15282164 days, 23:30:30.196224
15 15282227 days, 23:50:40.261632
However i do not seem to be able to extract the days as an integer so that i can continue with my calculation.
Any help appreciated.
Using the Pandas type Timedelta available since v0.15.0 you also can do:
In[1]: import pandas as pd
In[2]: df = pd.DataFrame([ pd.Timestamp('20150111'),
pd.Timestamp('20150301') ], columns=['date'])
In[3]: df['today'] = pd.Timestamp('20150315')
In[4]: df
Out[4]:
date today
0 2015-01-11 2015-03-15
1 2015-03-01 2015-03-15
In[5]: (df['today'] - df['date']).dt.days
Out[5]:
0 63
1 14
dtype: int64
You need 0.11 for this (0.11rc1 is out, final prob next week)
In [9]: df = DataFrame([ Timestamp('20010101'), Timestamp('20040601') ])
In [10]: df
Out[10]:
0
0 2001-01-01 00:00:00
1 2004-06-01 00:00:00
In [11]: df = DataFrame([ Timestamp('20010101'),
Timestamp('20040601') ],columns=['age'])
In [12]: df
Out[12]:
age
0 2001-01-01 00:00:00
1 2004-06-01 00:00:00
In [13]: df['today'] = Timestamp('20130419')
In [14]: df['diff'] = df['today']-df['age']
In [16]: df['years'] = df['diff'].apply(lambda x: float(x.item().days)/365)
In [17]: df
Out[17]:
age today diff years
0 2001-01-01 00:00:00 2013-04-19 00:00:00 4491 days, 00:00:00 12.304110
1 2004-06-01 00:00:00 2013-04-19 00:00:00 3244 days, 00:00:00 8.887671
You need this odd apply at the end because not yet full support for timedelta64[ns] scalars (e.g. like how we use Timestamps now for datetime64[ns], coming in 0.12)
Not sure if you still need it, but in Pandas 0.14 i usually use .astype('timedelta64[X]') method
http://pandas.pydata.org/pandas-docs/stable/timeseries.html (frequency conversion)
df = pd.DataFrame([ pd.Timestamp('20010101'), pd.Timestamp('20040605') ])
df.ix[0]-df.ix[1]
Returns:
0 -1251 days
dtype: timedelta64[ns]
(df.ix[0]-df.ix[1]).astype('timedelta64[Y]')
Returns:
0 -4
dtype: float64
Hope that will help
Let's specify that you have a pandas series named time_difference which has type
numpy.timedelta64[ns]
One way of extracting just the day (or whatever desired attribute) is the following:
just_day = time_difference.apply(lambda x: pd.tslib.Timedelta(x).days)
This function is used because the numpy.timedelta64 object does not have a 'days' attribute.
To convert any type of data into days just use pd.Timedelta().days:
pd.Timedelta(1985, unit='Y').days
84494

Categories