I want to create function which return me the difference between two dates excluding weekends and holidays ?
Eg:- Difference between 01/07/2019 and 08/07/2019 should return me 5 days excluding (weekend on 6/7/107 and 7/07/2019).
What should be best possible way to achieve this ???
Try converting string into date using the format of your date using pd.to_datetime()
use np.busday_count to find difference between days excluding the weekends
import pandas as pd
import numpy as np
date1 = "01/07/2019"
date2 = "08/07/2019"
date1 = pd.to_datetime(date1,format="%d/%m/%Y").date()
date2 = pd.to_datetime(date2,format="%d/%m/%Y").date()
days = np.busday_count( date1 , date2)
print(days)
5
incase you want to provide holidays
holidays = pd.to_datetime("04/07/2019",format="%d/%m/%Y").date()
days = np.busday_count( start, end,holidays=[holidays] )
print(days)
4
Related
I am filtering a dataframe by dates to produce two seperate versions:
Data from only today's date
Data from the last two years
However, when I try to filter on the date, it seems to miss dates that are within the last two years.
date_format = '%m-%d-%Y' # desired date format
today = dt.now().strftime(date_format) # today's date. Will always result in today's date
today = dt.strptime(today, date_format).date() # converting 'today' into a datetime object
today = today.strftime(date_format)
two_years = today - relativedelta(years=2) # date is today's date minus two years.
two_years = two_years.strftime(date_format)
# normalizing the format of the date column to the desired format
df_data['date'] = pd.to_datetime(df_data['date'], errors='coerce').dt.strftime(date_format)
df_today = df_data[df_data['date'] == today]
df_two_year = df_data[df_data['date'] >= two_years]
Which results in:
all dates ['07-17-2020' '07-15-2020' '08-01-2019' '03-25-2015']
today df ['07-17-2020']
two year df ['07-17-2020' '08-01-2019']
The 07-15-2020 date is missing from the two year, even though 08-01-2019 is captured.
you don't need to convert anything to string, simply work with datetime dtype. Ex:
import pandas as pd
df = pd.DataFrame({'date': pd.to_datetime(['07-17-2020','07-15-2020','08-01-2019','03-25-2015'])})
today = pd.Timestamp('now')
print(df[df['date'].dt.date == today.date()])
# date
# 0 2020-07-17
print(df[(df['date'].dt.year >= today.year-1) & (df['date'].dt.date != today.date())])
# date
# 1 2020-07-15
# 2 2019-08-01
What you get from the comparison operations (adjust them as needed...) are boolean masks - you can use them nicely to filter the df.
Your datatype conversions are the problem here. You could do this:
today = dt.now() # today's date. Will always result in today's date
two_years = today - relativedelta(years=2) # date is today's date minus two years.
This prints '2018-07-17 18:40:42.704395'. You can then convert it to the date only format.
two_years = two_years.strftime(date_format)
two_years = dt.strptime(two_years, date_format).date()
If I have a series of date such as "2020-01-01,2020-01-02,...." Then how can I add 6 months to every date? The result should be "2020-07-01,2020-07-02,.....". In Excel I can use Edate function so is there any similar function that also works in python?
You can do the following:
from dateutil.relativedelta import relativedelta
for i, date in enumerate(datelist):
datelist[i] = date + relativedelta(months=+6)
datelist refers to your series with the dates
I need to calculate the difference between 2 differently formatted ISO dates. For example, 2019-06-28T05:28:14Z and 2019-06-28T05:28:14-04:00. Most of the answers here focus on only one format or another, i.e Z-formatted.
Here is what I have attempted using this library iso8601:
import iso8601
date1 = iso8601.parse_date("2019-06-28T05:28:14-04:00")
date2 = iso8601.parse_date("2019-06-28T05:28:14Z")
difference = date2 - date1
>>> datetime.timedelta(days=-1, seconds=75600)
I have also tried to replace Z with -00:00 but the difference is the same:
date2 = iso8601.parse_date("2019-06-28T05:28:14Z".replace("Z", "-00:00")
If I understand this correctly, it should show a difference of 4 hours. How do I calculate the difference in hours/days between 2 different date formats?
I am using Python 3.8.1.
I have used Pandas module but I think is the same with iso8601.
To have the right difference I had to specify the same timezone in parsing function, as it follows:
import pandas as pd
date1 = pd.to_datetime("2019-06-28T05:28:14-04:00",utc=True)
date2 = pd.to_datetime("2019-06-28T05:28:14Z",utc=True)
Then my difference is expressed in a Timedelta format:
difference = (date2 - date1)
print(difference)
>> Timedelta('-1 days +20:00:00')
A timedelta of -1 days and 20h means 4 hours, infact if I convert the total seconds in hours I obtain:
print(difference.total_seconds()//3600)
>> -4
I hope this could be of help.
An alternative is using the metomi-isodatetime package. It's created by the Met Office, so it must be standards-compliant.
Also, the package has no other dependencies, so it's 'lightweight'.
from metomi.isodatetime.parsers import TimePointParser
date1_s = "2019-06-28T05:28:14Z"
date2_s = "2019-06-28T05:28:14-04:00"
date1 = TimePointParser().parse(date1_s)
print(date1)
date2 = TimePointParser().parse(date2_s)
print(date2)
datediff = date2 - date1
print(type(datediff))
print(datediff)
print(datediff.hours)
#
Running the above will produce the following output:
2019-06-28T05:28:14Z
2019-06-28T05:28:14-04:00
<class 'metomi.isodatetime.data.Duration'>
PT4H
4
Is there a better / more direct way to calculate this than the following?
# 1. Set up the start and end date for which you want to calculate the
# number of business days excluding holidays.
start_date = '01JAN1986'
end_date = '31DEC1987'
start_date = datetime.datetime.strptime(start_date, '%d%b%Y')
end_date = datetime.datetime.strptime(end_date, '%d%b%Y')
# 2. Generate a list of holidays over this period
from pandas.tseries.holiday import USFederalHolidayCalendar
calendar = USFederalHolidayCalendar()
holidays = calendar.holidays(start_date, end_date)
holidays
Which gives a pandas.tseries.index.DatetimeIndex
DatetimeIndex(['1986-01-01', '1986-01-20', '1986-02-17', '1986-05-26',
'1986-07-04', '1986-09-01', '1986-10-13', '1986-11-11',
'1986-11-27', '1986-12-25', '1987-01-01', '1987-01-19',
'1987-02-16', '1987-05-25', '1987-07-03', '1987-09-07',
'1987-10-12', '1987-11-11', '1987-11-26', '1987-12-25'],
dtype='datetime64[ns]', freq=None, tz=None)
But you need a list for numpy busday_count
holiday_date_list = holidays.date.tolist()
Then with and without the holidays you get:
np.busday_count(start_date.date(), end_date.date())
>>> 521
np.busday_count(start_date.date(), end_date.date(), holidays = holiday_date_list)
>>> 501
There are some other questions slightly similar but generally working with pandas Series or Dataframes (Get business days between start and end date using pandas, Counting the business days between two series)
If you put the index you created in a dataframe, you can use resample to fill in the gaps. The offset passed to .resample() can include things like business days and even (custom) calendars:
from pandas.tseries.holiday import USFederalHolidayCalendar
C = pd.offsets.CustomBusinessDay(calendar=USFederalHolidayCalendar())
start_date = '01JAN1986'
end_date = '31DEC1987'
(
pd.DataFrame(index=pd.to_datetime([start_date, end_date]))
.resample(C, closed='right')
.asfreq()
.index
.size
) - 1
The size of the index - 1 then gives us the amount of days.
I'm trying to delete rows of a dataframe based on one date column; [Delivery Date]
I need to delete rows which are older than 6 months old but not equal to the year '1970'.
I've created 2 variables:
from datetime import date, timedelta
sixmonthago = date.today() - timedelta(188)
import time
nineteen_seventy = time.strptime('01-01-70', '%d-%m-%y')
but I don't know how to delete rows based on these two variables, using the [Delivery Date] column.
Could anyone provide the correct solution?
You can just filter them out:
df[(df['Delivery Date'].dt.year == 1970) | (df['Delivery Date'] >= sixmonthago)]
This returns all rows where the year is 1970 or the date is less than 6 months.
You can use boolean indexing and pass multiple conditions to filter the df, for multiple conditions you need to use the array operators so | instead of or, and parentheses around the conditions due to operator precedence.
Check the docs for an explanation of boolean indexing
Be sure the calculation itself is accurate for "6 months" prior. You may not want to be hardcoding in 188 days. Not all months are made equally.
from datetime import date
from dateutil.relativedelta import relativedelta
#http://stackoverflow.com/questions/546321/how-do-i-calculate-the-date-six-months-from-the-current-date-using-the-datetime
six_months = date.today() - relativedelta( months = +6 )
Then you can apply the following logic.
import time
nineteen_seventy = time.strptime('01-01-70', '%d-%m-%y')
df = df[(df['Delivery Date'].dt.year == nineteen_seventy.tm_year) | (df['Delivery Date'] >= six_months)]
If you truly want to drop sections of the dataframe, you can do the following:
df = df[(df['Delivery Date'].dt.year != nineteen_seventy.tm_year) | (df['Delivery Date'] < six_months)].drop(df.columns)