Pandas Using Frequencies to Offset Date - python

I am trying to build a function in Python where if a user provides an offset frequency such as 1D 10M 1Y then I can provide the date using the offset.
Example if user inputs 1M along with a date 2021-08-25
pd.Timestamp('2021-08-25') - pd.tseries.frequencies.to_offset('1M')
The above code outputs Timestamp('2021-07-31 00:00:00') which is not one month prior to the date provided by user. Expected Output Timestamp('2021-07-25 00:00:00')
How can I achieve this?

You need to use pd.DateOffset:
>>> pd.Timestamp("2020-08-25") - pd.DateOffset(months=1)
Timestamp('2020-07-25 00:00:00')
The frequencies.to_offset() returns a <MonthEnd> object, hence why you were getting 07-31:
>>> pd.tseries.frequencies.to_offset("1M")
<MonthEnd>

Since years and months don't have a fixed frequency you can use the pd.offsets.DateOffset method to deal with calendar additions of years and months, similar to the implementation of relativedelta.
Because you'll need to specify both the argument names and values for this to work, you can change your function to pass a dict with the arguments as opposed to just the offset frequency.
import pandas as pd
def offset_date(timestamp, offset):
"""
offset: dict of {'offset frequency': periods}
"""
return timestamp + pd.offsets.DateOffset(**offset)
timestamp = pd.Timestamp('2021-08-25')
offset_date(timestamp, {'months': 1})
#Timestamp('2021-09-25 00:00:00')
offset_date(timestamp, {'days': 10})
#Timestamp('2021-09-04 00:00:00')
offset_date(timestamp, {'years': -3})
#Timestamp('2018-08-25 00:00:00')

Related

Convert year from unicode format to python's datetime format

I have only year parameter as input in the following manner:
2014,2015,2016
I want to convert each element from my list into python's datetime format. Is it possible to do this kind of things if the only given parameter is the year ?
Just set month and day manually to 1
from datetime import date
YearLst = [2014,2015,2016]
map(lambda t: date(t, 1, 1),YearLst)

How to include end date in pandas date_range method?

From pd.date_range('2016-01', '2016-05', freq='M', ).strftime('%Y-%m'), the last month is 2016-04, but I was expecting it to be 2016-05. It seems to me this function is behaving like the range method, where the end parameter is not included in the returning array.
Is there a way to get the end month included in the returning array, without processing the string for the end month?
A way to do it without messing with figuring out month ends yourself.
pd.date_range(*(pd.to_datetime(['2016-01', '2016-05']) + pd.offsets.MonthEnd()), freq='M')
DatetimeIndex(['2016-01-31', '2016-02-29', '2016-03-31', '2016-04-30',
'2016-05-31'],
dtype='datetime64[ns]', freq='M')
You can use .union to add the next logical value after initializing the date_range. It should work as written for any frequency:
d = pd.date_range('2016-01', '2016-05', freq='M')
d = d.union([d[-1] + 1]).strftime('%Y-%m')
Alternatively, you can use period_range instead of date_range. Depending on what you intend to do, this might not be the right thing to use, but it satisfies your question:
pd.period_range('2016-01', '2016-05', freq='M').strftime('%Y-%m')
In either case, the resulting output is as expected:
['2016-01' '2016-02' '2016-03' '2016-04' '2016-05']
For the later crowd. You can also try to use the Month-Start frequency.
>>> pd.date_range('2016-01', '2016-05', freq='MS', format = "%Y-%m" )
DatetimeIndex(['2016-01-01', '2016-02-01', '2016-03-01', '2016-04-01',
'2016-05-01'],
dtype='datetime64[ns]', freq='MS')
Include the day when specifying the dates in date_range call
pd.date_range('2016-01-31', '2016-05-31', freq='M', ).strftime('%Y-%m')
array(['2016-01', '2016-02', '2016-03', '2016-04', '2016-05'],
dtype='|S7')
I had a similar problem when using datetime objects in dataframe. I would set the boundaries through .min() and .max() functions and then fill in missing dates using the pd.date_range function. Unfortunately the returned list/df was missing the maximum value.
I found two work arounds for this:
1) Add "closed = None" parameter in the pd.date_range function. This worked in the example below; however, it didn't work for me when working only with dataframes (no idea why).
2) If option #1 doesn't work then you can add one extra unit (in this case a day) using the datetime.timedelta() function. In the case below it over indexed by a day but it can work for you if the date_range function isn't giving you the full range.
import pandas as pd
import datetime as dt
#List of dates as strings
time_series = ['2020-01-01', '2020-01-03', '2020-01-5', '2020-01-6', '2020-01-7']
#Creates dataframe with time data that is converted to datetime object
raw_data_df = pd.DataFrame(pd.to_datetime(time_series), columns = ['Raw_Time_Series'])
#Creates an indexed_time list that includes missing dates and the full time range
#Option No. 1 is to use the closed = None parameter choice.
indexed_time = pd.date_range(start = raw_data_df.Raw_Time_Series.min(),end = raw_data_df.Raw_Time_Series.max(),freq='D',closed= None)
print('indexed_time option #! = ', indexed_time)
#Option No. 2 if the function allows you to extend the time by one unit (in this case day)
#by using the datetime.timedelta function to get what you need.
indexed_time = pd.date_range(start = raw_data_df.Raw_Time_Series.min(),end = raw_data_df.Raw_Time_Series.max()+dt.timedelta(days=1),freq='D')
print('indexed_time option #2 = ', indexed_time)
#In this case you over index by an extra day because the date_range function works properly
#However, if the "closed = none" parameters doesn't extend through the full range then this is a good work around
I dont think so. You need to add the (n+1) boundary
pd.date_range('2016-01', '2016-06', freq='M' ).strftime('%Y-%m')
The start and end dates are strictly inclusive. So it will not
generate any dates outside of those dates if specified.
http://pandas.pydata.org/pandas-docs/stable/timeseries.html
Either way, you have to manually add some information. I believe adding just one more month is not a lot of work.
The explanation for this issue is that the function pd.to_datetime() converts a '%Y-%m' date string by default to the first of the month datetime, or '%Y-%m-01':
>>> pd.to_datetime('2016-05')
Timestamp('2016-05-01 00:00:00')
>>> pd.date_range('2016-01', '2016-02')
DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
'2016-01-05', '2016-01-06', '2016-01-07', '2016-01-08',
'2016-01-09', '2016-01-10', '2016-01-11', '2016-01-12',
'2016-01-13', '2016-01-14', '2016-01-15', '2016-01-16',
'2016-01-17', '2016-01-18', '2016-01-19', '2016-01-20',
'2016-01-21', '2016-01-22', '2016-01-23', '2016-01-24',
'2016-01-25', '2016-01-26', '2016-01-27', '2016-01-28',
'2016-01-29', '2016-01-30', '2016-01-31', '2016-02-01'],
dtype='datetime64[ns]', freq='D')
Then everything follows from that. Specifying freq='M' includes month ends between 2016-01-01 and 2016-05-01, which is the list you receive and excludes 2016-05-31. But specifying month starts 'MS' like the second answer provides, includes 2016-05-01 as it falls within the range. pd.date_range() default behavior isn't like the range method since ends are included. From the docs:
closed controls whether to include start and end that are on the boundary. The default includes boundary points on either end.

Pandas: select all dates with specific month and day

I have a dataframe full of dates and I would like to select all dates where the month==12 and the day==25 and add replace the zero in the xmas column with a 1.
Anyway to do this? the second line of my code errors out.
df = DataFrame({'date':[datetime(2013,1,1).date() + timedelta(days=i) for i in range(0,365*2)], 'xmas':np.zeros(365*2)})
df[df['date'].month==12 and df['date'].day==25] = 1
Pandas Series with datetime now behaves differently. See .dt accessor.
This is how it should be done now:
df.loc[(df['date'].dt.day==25) & (cust_df['date'].dt.month==12), 'xmas'] = 1
Basically what you tried won't work as you need to use the & to compare arrays, additionally you need to use parentheses due to operator precedence. On top of this you should use loc to perform the indexing:
df.loc[(df['date'].month==12) & (df['date'].day==25), 'xmas'] = 1
An update was needed in reply to this question. As of today, there's a slight difference in how you extract months from datetime objects in a pd.Series.
So from the very start, incase you have a raw date column, first convert it to datetime objects by using a simple function:
import datetime as dt
def read_as_datetime(str_date):
# replace %Y-%m-%d with your own date format
return dt.datetime.strptime(str_date,'%Y-%m-%d')
then apply this function to your dates column and save results in a new column namely datetime:
df['datetime'] = df.dates.apply(read_as_datetime)
finally in order to extract dates by day and month, use the same piece of code that #Shayan RC explained, with this slight change; notice the dt.datetime after calling the datetime column:
df.loc[(df['datetime'].dt.datetime.month==12) &(df['datetime'].dt.datetime.day==25),'xmas'] =1

Break-up year, months & days in Pandas

I have a input parameter dictionary as below -
InparamDict = {'DataInputDate':'2014-10-25'
}
Using the field InparamDict['DataInputDate'], I want to pull up data from 2013-10-01 till 2013-10-25. What would be the best way to arrive at the same using Pandas?
The sql equivalent is -
DATEFROMPARTS(DATEPART(year,GETDATE())-1,DATEPART(month,GETDATE()),'01')
You forgot to mention if you're trying to pull up data from a DataFrame, Series or what. If you just want to get the date parts, you just have to get the attribute you want from the Timestamp object.
from pandas import Timestamp
dt = Timestamp(InparamDict['DataInputDate'])
dt.year, dt.month, dt.day
If the dates are in a DataFrame (df) and you convert them to dates instead of strings. You can select the data by ranges as well, for instance
df[df['DataInputDate'] > datetime(2013,10,1)]

Querying a DateField with a range of 2 years

This is a pretty straight-forward question:
I have two models, each with a DateField. I want to query Model-A based on the date in Model-B. I want a query that returns all the objects of Model-A that have a date within 2 years, plus or minus, of the date in one object of Model-B. How can this be done?
Assuming you have a date value from model B, calculate two dates: one - 2 years in the past and another - 2 years in the future by the help of python-dateutil module (taken partially from here). Then, use __range notation to filter out A records by date range:
from dateutil.relativedelta import relativedelta
def yearsago(from_date, years):
return from_date - relativedelta(years=years)
b_date = b.my_date
date_min, date_max = yearsago(b_date, 2), yearsago(b_date, -2)
data = A.objects.filter(my_date__range=(date_min, date_max))
where b is a B model instance.
Also see: Django database query: How to filter objects by date range?
Hope that helps.

Categories