How to get difference between 2 dates with different ISO format? - python

I need to calculate the difference between 2 differently formatted ISO dates. For example, 2019-06-28T05:28:14Z and 2019-06-28T05:28:14-04:00. Most of the answers here focus on only one format or another, i.e Z-formatted.
Here is what I have attempted using this library iso8601:
import iso8601
date1 = iso8601.parse_date("2019-06-28T05:28:14-04:00")
date2 = iso8601.parse_date("2019-06-28T05:28:14Z")
difference = date2 - date1
>>> datetime.timedelta(days=-1, seconds=75600)
I have also tried to replace Z with -00:00 but the difference is the same:
date2 = iso8601.parse_date("2019-06-28T05:28:14Z".replace("Z", "-00:00")
If I understand this correctly, it should show a difference of 4 hours. How do I calculate the difference in hours/days between 2 different date formats?
I am using Python 3.8.1.

I have used Pandas module but I think is the same with iso8601.
To have the right difference I had to specify the same timezone in parsing function, as it follows:
import pandas as pd
date1 = pd.to_datetime("2019-06-28T05:28:14-04:00",utc=True)
date2 = pd.to_datetime("2019-06-28T05:28:14Z",utc=True)
Then my difference is expressed in a Timedelta format:
difference = (date2 - date1)
print(difference)
>> Timedelta('-1 days +20:00:00')
A timedelta of -1 days and 20h means 4 hours, infact if I convert the total seconds in hours I obtain:
print(difference.total_seconds()//3600)
>> -4
I hope this could be of help.

An alternative is using the metomi-isodatetime package. It's created by the Met Office, so it must be standards-compliant.
Also, the package has no other dependencies, so it's 'lightweight'.
from metomi.isodatetime.parsers import TimePointParser
date1_s = "2019-06-28T05:28:14Z"
date2_s = "2019-06-28T05:28:14-04:00"
date1 = TimePointParser().parse(date1_s)
print(date1)
date2 = TimePointParser().parse(date2_s)
print(date2)
datediff = date2 - date1
print(type(datediff))
print(datediff)
print(datediff.hours)
#
Running the above will produce the following output:
2019-06-28T05:28:14Z
2019-06-28T05:28:14-04:00
<class 'metomi.isodatetime.data.Duration'>
PT4H
4

Related

Python3 - datetime function -- clarification on .days meaning

I am wanting to take user input of date1 and date2 and calculate the difference to determine how many weeks are in between. My entire program is below. The example dates I'm using are:
date1 = 2023-03-15
date2 = 2022-11-09
Output is Number of weeks: 18 -- which is correct.
My 1st question that I need help in clarifying is why do I need the .days after the days = abs(date2-date1).days? I have searched for many hours via Google, stackoverflow, Youtube and Python docs https://docs.python.org/3.9/library/datetime.html?highlight=datetime#module-datetime. I'm pretty new to Python and reading the docs sometimes trips me up, so please forgive if it's in there -- I've struggled reading through some of it. Why is the .days needed? I know that if I remove .days, the output is: Number of weeks: 18 days, 0:00:00. Where is the documentation on needing the .days listed in the datetime module docs??? Can someone help me understand this please?
My 2nd question is why do I get Number of weeks: 0 when I change .days to .seconds? (this is when I was testing things and comment out the weeks = days//7 and print out days) The one part in the docs that I think addresses this the following: https://docs.python.org/3.9/library/datetime.html?highlight=datetime#module-datetime:~:text=the%20given%20year.-,Supported%20operations%3A,!%3D.%20The%20latter%20cases%20return%20False%20or%20True%2C%20respectively.,-In%20Boolean%20contexts.... and if this is correct, am I reading it correctly that if the difference in dates are to be determined, only "days" are returned, and thus no seconds or microseconds?
Thank you for your help! Code below:
#Find the number of weeks between two given dates
from datetime import datetime
#User input for 1st date in YYYY-MM-DD format
date1 = input("Enter 1st date in YYYY-MM-DD format: ")
date1 = datetime.strptime(date1, "%Y-%m-%d")
#User input for 2nd date in YYYY-MM-DD format
date2 = input("Enter 2nd date in YYYY-MM-DD format: ")
date2 = datetime.strptime(date2, "%Y-%m-%d")
#Calculate the weeks between the 2 given dates
days = abs(date2-date1).days
weeks = days//7
print("Number of weeks: ", weeks)
Output-correct answer with .days included:
Enter 1st date in YYYY-MM-DD format: 2023-03-15
Enter 2nd date in YYYY-MM-DD format: 2022-11-09
Number of weeks: 18
Output-with no .days added:
Enter 1st date in YYYY-MM-DD format: 2023-03-15
Enter 2nd date in YYYY-MM-DD format: 2022-11-09
Number of weeks: 18 days, 0:00:00
Output-(regarding 2nd question with the .seconds put in place of .days:
Enter 1st date in YYYY-MM-DD format: 2023-03-15
Enter 2nd date in YYYY-MM-DD format: 2022-11-09
Number of weeks: 0
Subtracting two datetime.datetime objects returns a datetime.timedelta object:
>>> date1 = datetime.strptime('2023-03-15', "%Y-%m-%d")
>>> date2 = datetime.strptime('2022-11-09', "%Y-%m-%d")
>>> date2-date1
datetime.timedelta(days=-126)
>>>
From the docs:
Only days, seconds and microseconds are stored internally. Arguments are converted to those units:
A millisecond is converted to 1000 microseconds.
A minute is converted to 60 seconds.
An hour is converted to 3600 seconds.
A week is converted to 7 days.
Here is some example usage of datetime.timedelta objects. So for your second question, I believe that you're right; for the difference between two days, there are no .seconds, as it is strictly a difference of days. For .seconds to be nonzero, you'd have to have some component of the difference that is larger than a 1,000,000 microseconds but smaller than 86,400 seconds, I suppose.
TL;DR: The answer to both of your questions is "because that is a property of the datetime.timedelta class."
More fully, the date1 and date2 objects you create in your code are both instances of the datetime.datetime class. The - operation between them makes a timedelta object.
Why is the .days needed to avoid printing ", 0:00:00"?
By default all the date information in the timedelta object you created with the operation abs(date2-date1) is printed (including seconds and microseconds, even after modifying it with the //7 operation). When you use the . operator, you access the days attribute of the timedelta object, and only that attribute's value is used.
Why do I get "Number of weeks: 0" when I change .days to .seconds?
The value of the seconds attribute of the timedelta object you created with the operation abs(date2-date1) is integer 0.
See below:
>>> from datetime import datetime
>>> date1 = "2023-03-15"
>>> date1, type(date1)
('2023-03-15', <class 'str'>)
>>> date1 = datetime.strptime(date1, "%Y-%m-%d")
>>> date1, type(date1)
(datetime.datetime(2023, 3, 15, 0, 0), <class 'datetime.datetime'>)
>>> date2 = "2022-11-09"
>>> date2, type(date2)
('2022-11-09', <class 'str'>)
>>> date2 = datetime.strptime(date2, "%Y-%m-%d")
>>> date2, type(date2)
(datetime.datetime(2022, 11, 9, 0, 0), <class 'datetime.datetime'>)
>>> abs(date2-date1), type(abs(date2-date1))
(datetime.timedelta(days=126), <class 'datetime.timedelta'>)
>>> abs(date2-date1).days, type(abs(date2-date1).days)
(126, <class 'int'>)
>>> abs(date2-date1).seconds, type(abs(date2-date1).seconds)
(0, <class 'int'>)
See also: this discussion.
To be precise you calculate difference in 7 days intervals, but the week is actualy a time period which starts on Monday (or Sunday) and ends on Sunday (or Monday). So the difference between 2022-10-01 (Sat) and 2022-10-04 (Tue) in weeks of year is 1, but in days is 3 (0 7days intervals in your case).
So if you need to find the distance between two dates in weeks of year you have to take account of the weekdays:
from datetime import date
d1 = date(2022,10,18)
d2 = date(2022,10,5)
w1 = d1.weekday()
w2 = d2.weekday()
# dfference in weeks of year = 2
((d1-d2).days - (w1-w2))/7 # 2.0
# difference in days = 13
d1-d2 # datetime.timedelta(days=13)

python - get difference between two times [duplicate]

This question already has answers here:
Python timedelta issue with negative values
(3 answers)
Closed 1 year ago.
I have a simple question today. I want to get the difference between the 2 dates and check if the difference is around ~5 minutes.
And I found a problem with getting the difference, to check.
I compared the same date with one that had a few minutes difference, and it printed a difference of -1 day? That doesn't make sense to me.
Test it yourself, if you want.
Compared Dates:
Date1: 2021-05-15 00:38:57.244000
Date2: 2021-05-15 02:40:42.245693
The printed difference was:
Diff: -1 day, 21:58:14.998307
So why is it at -1 day? Shouldn't the difference be just ~2 hours? And what's the best way to get the difference between the 2 dates? As I said, I want to check if the difference is smaller (or equal to) 5 minutes. How is that possible if the dates can be every time different?
Important info: The dates are always different because I check account creation dates.
I used this code to make the difference:
diff = date1 - date2
This problem can be solved by ordering the two date objects so that the diff comparison always returns a positive number. Then you can use a timedelta object to check if the diff is less than 5 minutes.
from datetime import datetime, timedelta
# Define dates as datetime objects
Date1 = datetime.fromisoformat("2021-05-15 00:38:57.244000")
Date2 = datetime.fromisoformat("2021-05-15 02:40:42.245693")
# Define 5 minutes as a timedelta object
five_minute_duration = timedelta(minutes=5)
Compare the two dates by subtracting the older date from the newer date. The result will be a timedelta object.
diff = Date2 - Date1
If you're not sure which date object is the older and younger, you could sort them. This will make the comparison stable and prevent the unexpected negative timedelta.
dates_sorted = sorted([Date2, Date1])
diff = dates_sorted[1] - dates_sorted[0]
Difference in hours. In this the example, the duration is 2.029... hours.
diff.total_seconds() / (60 * 60)
Check if the diff is smaller or equal to the 5 minute duration.
if diff <= five_minute_duration:
print("Yes")
else:
print("No")
You can do compare operation before minus operation. Example is as shown below.
from datetime import datetime
dt1 = datetime(2020, 4, 19, 12, 20)
dt2 = datetime(2020, 4, 21, 15, 2)
def get_date_diff(date1, date2):
diff = date2 - date1 if date1 < date2 else date1 - date2
return diff
def test():
print (get_date_diff(dt1, dt2))
test()
Create an upper and lower bound by creating two timedeltas with positive and negative values for the allowed difference
lower = timedelta(minutes=-5)
upper = timedelta(minutes=5)
Then check if the difference between the datetimes falls within those bounds
lower < date1 - date2 < upper
The advantage to this approach is that if this comparison is made many times lower and upper can be created just once and no sorting of the datetimes has to be performed

Timestamp subtraction must have the same timezones

I keep getting the following error:
TypeError: Timestamp subtraction must have the same timezones or no
timezones
At this line
df['days_in_Month'].loc[df['Month'] == min_date_Month] = (df['Month_end'] - \
pd.to_datetime(min_date,format="%Y-%m-%d"))
My df['TransactionDate'] is a column with the following format 2019-08-23T00:00:00.000Z. I am programming on Python3.3.7.
df['Month'] = df['TransactionDate'].apply(lambda x : str(x)[:7])
df['Month_begin'] = pd.to_datetime(df['Month'], format="%Y-%m") + MonthBegin(0)
df['Month_end'] = pd.to_datetime(df['Month'], format="%Y-%m") + MonthEnd(1)
df['days_in_Month'] = (df['Month_end'] - df['Month_begin'])#.days()
print(df.columns)
print(df)
min_date = df['TransactionDate'].min()
min_date_Month = min_date[:7]
df['days_in_Month'].loc[df['Month'] == min_date_Month] = (df['Month_end'] - \
pd.to_datetime(min_date,format="%Y-%m-%d"))
df['Month_begin'].loc[df['Month'] == min_date_Month] = pd.to_datetime(min_date,format="%Y-%m-%d")
When you run a piece of your offending instruction:
pd.to_datetime(min_date, format="%Y-%m-%d")
you will get:
Timestamp('2019-11-01 00:00:00+0000', tz='UTC')
It indicates that format="%Y-%m-%d" does not prevent this function
from parsing the whole input string, so the result is with
a time zone.
To parse only the date part, run:
pd.to_datetime(min_date[:10])
(even without format) and you will get:
Timestamp('2019-11-01 00:00:00')
without the time zone.
But the whole your instruction is weird.
When you run the left hand side alone:
df['days_in_Month'].loc[df['Month'] == min_date_Month]
you will get:
0 29 days
Name: days_in_Month, dtype: timedelta64[ns]
But when you run the right hand side alone:
df['Month_end'] - pd.to_datetime(min_date[:10])
you will get:
0 29 days
1 60 days
2 91 days
3 120 days
Name: Month_end, dtype: timedelta64[ns]
So you attempt to save the whole column under a single cell.
Maybe this instruction should be:
df['days_in_Month'] = df['Month_end'] - pd.to_datetime(min_date[:10])
instead?
And yet another remark: Your days_in_Month column is actually of
timedelta64 type, not the number of days.
To have the number of days in each month (as an integer), you should run:
df['days_in_Month'] = (df['Month_end'] - df['Month_begin']).dt.days + 1
Note that e.g. the difference between 2019-11-01 and 2019-11-30
is 29 days, whereas November has 30 days.
the problem is the Z in your datetimestring causes the datetime to be interpretted as utc timezone
but your Month_end key does not have any timezone info attached to it, so it does not have a timezone associated with it
pandas does not know how to interact with these two different things, so you need to either remove the timezone from the datetime string, or better make your other datetimes timezone aware to utc.
pandas makes this relatively easy
Month_end = pandas.to_datetime(month_end_strings,utc=True)

Calculate difference between two dates excluding weekends in python?

I want to create function which return me the difference between two dates excluding weekends and holidays ?
Eg:- Difference between 01/07/2019 and 08/07/2019 should return me 5 days excluding (weekend on 6/7/107 and 7/07/2019).
What should be best possible way to achieve this ???
Try converting string into date using the format of your date using pd.to_datetime()
use np.busday_count to find difference between days excluding the weekends
import pandas as pd
import numpy as np
date1 = "01/07/2019"
date2 = "08/07/2019"
date1 = pd.to_datetime(date1,format="%d/%m/%Y").date()
date2 = pd.to_datetime(date2,format="%d/%m/%Y").date()
days = np.busday_count( date1 , date2)
print(days)
5
incase you want to provide holidays
holidays = pd.to_datetime("04/07/2019",format="%d/%m/%Y").date()
days = np.busday_count( start, end,holidays=[holidays] )
print(days)
4

Compute number of dates between two string dates and return an integer

I have a .txt file data-set like this with the date column of interest:
1181206,3560076,2,01/03/2010,46,45,M,F
2754630,2831844,1,03/03/2010,56,50,M,F
3701022,3536017,1,04/03/2010,40,38,M,F
3786132,3776706,2,22/03/2010,54,48,M,F
1430789,3723506,1,04/05/2010,55,43,F,M
2824581,3091019,2,23/06/2010,59,58,M,F
4797641,4766769,1,04/08/2010,53,49,M,F
I want to work out the number of days between each date and 01/03/2010 and replace the date with the days offset {0, 2, 3, 21...} yielding an output like this:
1181206,3560076,2,0,46,45,M,F
2754630,2831844,1,2,56,50,M,F
3701022,3536017,1,3,40,38,M,F
3786132,3776706,2,21,54,48,M,F
1430789,3723506,1,64,55,43,F,M
2824581,3091019,2,114,59,58,M,F
4797641,4766769,1,156,53,49,M,F
I've been trying for ages and its getting really frustrating. I've tried converting to datetime using the datetime.datetime.strptime( '01/03/2010', "%d/%m/%Y").date() method and then subtracting the two dates but it gives me an output of e.g. '3 days, 0:00:00' but I can't seem to get an output of only the number!
The difference between two dates is a timedelta. Any timedelta instance has days attribute that is an integer value you want.
This is fairly simple. Using the code you gave:
date1 = datetime.datetime.strptime('01/03/2010', '%d/%m/%Y').date()
date2 = datetime.datetime.strptime('04/03/2010', '%d/%m/%Y').date()
You get two datetime objects.
(date2-date1)
will give you the time delta. The mistake you're making is to convert that timedelta to a string. timedelta objects have a days attribute. Therefore, you can get the number of days using it:
(date2-date1).days
This generates the desired output.
Using your input (a bit verbose...)
#!/usr/bin/env python
import datetime
with open('input') as fd:
d_first = datetime.date(2010, 03, 01)
for line in fd:
date=line.split(',')[3]
day, month, year= date.split(r'/')
d = datetime.date(int(year), int(month), int(day))
diff=d - d_first
print diff.days
Gives
0
2
3
21
64
114
156
Have a look at pleac, a lot of date-example there using python.

Categories