I am trying to get the previous 2 and 3 month end date. In the below code, I am able to get the last month end date which is 2021-01-31, but I also need to get 2020-12-31 and 2020-11-30.
Any advices are greatly appreciated.
today = datetime.date.today()
first = today.replace(day=1)
lastMonth = first - dt.timedelta(days=1)
date1=lastMonth.strftime("%Y-%m-%d")
date1
Out[90]: '2021-01-31'
Try:
prev2month = lastMonth - pd.offsets.MonthEnd(n=1)
prev3month = lastMonth - pd.offsets.MonthEnd(n=2)
More usage information of offset (e.g. MonthEnd, MonthBegin) can be found in the documentation.
A quick and dirty way for doing this without needing to deal with the varying number of days in each month is to simply repeat the process N times, where N is the number of months back you want:
import datetime
today = datetime.date.today()
temp_date = today.replace(day=1)
for _ in range(3):
previous_month = temp_date - datetime.timedelta(days=1)
print(previous_month.strftime("%Y-%m-%d"))
temp_date = previous_month.replace(day=1)
outputs
2021-01-31
2020-12-31
2020-11-30
You can use the calendar package:
import calendar
calendar.monthrange(2020, 2)[1] # gives you the last day of Feb 2020
If you're definitely using pandas then you can make use of date_range, eg:
pd.date_range('today', periods=4, freq='-1M', normalize=True)
That'll give you:
DatetimeIndex(['2021-02-28', '2021-01-31', '2020-12-31', '2020-11-30'], dtype='datetime64[ns]', freq='-1M')
Ignore the first element and use as needed...
Alternatively:
dr = pd.date_range(end='today', periods=3, freq='M', normalize=True)[::-1]
Which gives you:
DatetimeIndex(['2021-01-31', '2020-12-31', '2020-11-30'], dtype='datetime64[ns]', freq='-1M')
Then if you want strings you can use dr.strftime('%Y-%m-%d') which'll give you Index(['2021-01-31', '2020-12-31', '2020-11-30'], dtype='object')
Related
I've been working on a scraping and EDA project on Python3 using Pandas, BeautifulSoup, and a few other libraries and wanted to do some analysis using the time differences between two dates. I want to determine the number of days (or months or even years if that'll make it easier) between the start dates and end dates, and am stuck. I have two columns (air start date, air end date), with dates in the following format: MM-YYYY (so like 01-2021). I basically wanted to make a third column with the time difference between the end and start dates (so I could use it in later analysis).
# split air_dates column into start and end date
dateList = df["air_dates"].str.split("-", n = 1, expand = True)
df['air_start_date'] = dateList[0]
df['air_end_date'] = dateList[1]
df.drop(columns = ['air_dates'], inplace = True)
df.drop(columns = ['rank'], inplace = True)
# changing dates to numerical notation
df['air_start_date'] = pds.to_datetime(df['air_start_date'])
df['air_start_date'] = df['air_start_date'].dt.date.apply(lambda x: x.strftime('%m-%Y') if pds.notnull(x) else npy.NaN)
df['air_end_date'] = pds.Series(df['air_end_date'])
df['air_end_date'] = pds.to_datetime(df['air_end_date'], errors = 'coerce')
df['air_end_date'] = df['air_end_date'].dt.date.apply(lambda x: x.strftime('%m-%Y') if pds.notnull(x) else npy.NaN)
df.isnull().sum()
df.dropna(subset = ['air_end_date'], inplace = True)
def time_diff(time_series):
return datetime.datetime.strptime(time_series, '%d')
df['time difference'] = df['air_end_date'].apply(time_diff) - df['air_start_date'].apply(time_diff)
The last four lines are my attempt at getting a time difference, but I got an error saying 'ValueError: unconverted data remains: -2021'. Any help would be greatly appreciated, as this has had me stuck for a good while now. Thank you!
As far as I can understand, if you have start date and time and end date and time then you can use datetime module in python.
To use this, something like this would be used:
import datetime
# variable = datetime(year, month, day, hour, minute, second)
start = datetime(2017,5,8,18,56,40)
end = datetime(2019,6,27,12,30,58)
print( start - end ) # this will print the difference of these 2 date and time
Hope this answer helps you.
Ok so I figured it out. In my second to last line, I replaced the %d with %m-%Y and now it populates the new column with the number of days between the two dates. I think the format needed to be consistent when running strptime so that's what was causing that error.
here's a slightly cleaned up version; subtract start date from end date to get a timedelta, then take the days attribute from that.
EX:
import pandas as pd
df = pd.DataFrame({'air_dates': ["Apr 2009 - Jul 2010", "not a date - also not a date"]})
df['air_start_date'] = df['air_dates'].str.split(" - ", expand=True)[0]
df['air_end_date'] = df['air_dates'].str.split(" - ", expand=True)[1]
df['air_start_date'] = pd.to_datetime(df['air_start_date'], errors="coerce")
df['air_end_date'] = pd.to_datetime(df['air_end_date'], errors="coerce")
df['timediff_days'] = (df['air_end_date']-df['air_start_date']).dt.days
That will give you for the dummy example
df['timediff_days']
0 456.0
1 NaN
Name: timediff_days, dtype: float64
Regarding calculation of difference in month, you can find some suggestions how to calculate those here. I'd go with #piRSquared's approach:
df['timediff_months'] = ((df['air_end_date'].dt.year - df['air_start_date'].dt.year) * 12 +
(df['air_end_date'].dt.month - df['air_start_date'].dt.month))
df['timediff_months']
0 15.0
1 NaN
Name: timediff_months, dtype: float64
I need to generate the start/end of weekly date ranges given a year as a simple range that goes df['start'] and df['end'].
I have the following solution, but it is not fully inclusive and maybe a bit hacky. And it needs to be general so that it's not dependent on the year (2018, 2019, etc.). I am curious if there is something more accurate.
Thanks for any suggestions.
# create weekly range
df = pd.DataFrame(pd.date_range("20180101", "20181231",freq='7d'), columns=['start_date'])
# add end date from weekly-range start date
df['end_date'] = df['start_date'].shift(-1)
# manually adjust to create a "true" week range
df['end_date'] = df['end_date'] - pd.to_timedelta(1, unit='d')
df.head()
df.tail()
If not necessary last missing value I think simpliest is add 6 days:
df['end_date'] = df['start_date'] + pd.to_timedelta(6, unit='d')
If need last missing value:
df['end_date'] = df['start_date'].iloc[:-1] + pd.to_timedelta(6, unit='d')
Just use timedelta and days as argument which is 6 in your case.
from datetime import timedelta
df['end_date'] = df['start_date'] + timedelta(days=6)
In Pandas for Python, I have a data set that has a column of datetimes in it. I need to create a new column that has the date of the following Sunday for each row.
I've tried various methods trying to use iterrows and then figure out the day of the week, and add a day until the day is 7, but it hasn't worked and I'm not even sure how I'd return the date instead of just the day number then. I also don't feel like iterrows would be the best way to do it either.
What is the best way to return a column of the following Sunday from a date column?
Use the Pandas date offsets, e.g.:
>>> pd.to_datetime('2019-04-09') + pd.offsets.Week(n=0, weekday=6)
Timestamp('2019-04-14 00:00:00')
For example, this changes the provided datetime over a week. This is vectorised, so you can run it against a series like so:
temp['sunday_dates'] = temp['our_dates'] + pd.offsets.Week(n=0, weekday=6)
our_dates random_data sunday_dates
0 2010-12-31 4012 2011-01-02
1 2007-12-31 3862 2008-01-06
2 2006-12-31 3831 2007-01-07
3 2011-12-31 3811 2012-01-01
N.b. Pass n=0 to keep a day, which is already on a Sunday, on that day. Pass n=1 if you want to force it to the next Sunday. The Week(weekday=INT) parameter is 0 indexed on Monday and takes values from 0 to 6 (inclusive). Thus, passing 0 yields all Mondays, 1 yields all Tuesdays, etc. Using this, you can make everything any day of the week you would like.
N.b. If you want to go to the last Sunday, just swap + to - to go back.
N.b. (Such note, much bene) The specific documentation on time series functionality can be found here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
The function
import datetime
def datetime_to_next_sunday(original_datetime):
return original_datetime + datetime.timedelta(days=6-original_datetime.weekday())
returns the datetime.datetime shifted to the next sunday. Having
import pandas as pd
df = pd.DataFrame({'A': ['Foo', 'Bar'],
'datetime': [datetime.datetime.now(),
datetime.datetime.now() + datetime.timedelta(days=1)]})
the following line should to the job:
df['dt_following_sunday'] = df[['datetime']].applymap(datetime_to_next_sunday)
I suggest to use calendar library
import calendar
import datetime as dt
#today date
now = datetime.datetime.now()
print (now.year, now.month, now.day, now.hour, now.minute, now.second)
# diffrence in days between current date and Sunday
difday = 7 - calendar.weekday(now.year, now.month, now.day)
# Afterwards next Sunday from today
nextsunday = datetime.date(now.year, now.month , now.day + difday)
print(nextsunday)
Write this function and use
The accepted answer is the way to go, but you can also use Series.apply() and pandas.Timedelta() for this, i.e.:
df["ns"] = df["d"].apply(lambda d: d + pd.Timedelta(days=(6 if d.weekday() == 6 else 6-d.weekday())))
d ns
0 2019-04-09 21:22:10.886702 2019-04-14 21:22:10.886702
Demo
I am using the datetime.datetime.timedelta to subtract a day from today. When I run the code it is changing the Month part of the Datetime class property for some reason. Please help explain.
days_to_subtract = 1
date = (datetime.datetime.today() - datetime.timedelta(days=days_to_subtract))
I expect the result for it to be 2/10/2019, but the output gives 10/02/2019.
import datetime
days_to_subtract = 1
date = (datetime.datetime.today() - datetime.timedelta(days=days_to_subtract))
print (date)
#output
2019-02-10 13:02:07.645241
print (date.strftime('%m/%d/%Y'))
#output
02/10/2019
Wanted to create a time range from August 1, 2011 dynamically to the last month of the existing data. Don't know why I'm returning time series with the last day of the month instead of the first.
Any suggestions? Please ignore my petty comments to my coworkers.
# Our Formatting System is garbage - So Have to code it to non garbage
start_date = pd.Timestamp(2011, 8, 1,)
months = pd.date_range(start_date, periods=len(timeseries[0]), freq='M')
print(months)
By using the freq M, you are telling it to use month's end. See this link for a description of datetime offsets in pandas, but in short M is
month end frequency
Use 'MS' (month start) instead:
>>> start_date = pd.Timestamp(2011, 8, 1,)
>>> months = pd.date_range(start_date, periods=10, freq='MS')
>>> print(months)
DatetimeIndex(['2011-08-01', '2011-09-01', '2011-10-01', '2011-11-01',
'2011-12-01', '2012-01-01', '2012-02-01', '2012-03-01',
'2012-04-01', '2012-05-01'],
dtype='datetime64[ns]', freq='MS')