Get the average number of days between two dates in Python - python

I have a data frame with only dates in one column.
My objective is to calculate the number of days between each date, in order to get the average number of days between two dates (each date corresponds to an operation).
I tried doing something like this :
for i in range(len(df)):
if i != 0:
d0 = df.iloc[[i-1,1]]
d1 = df.iloc[[i,1]]
L.append((d1 - d0).days)
But got the error message : 'unsupported operand type(s) for -: 'str' and 'str''

You can subtract a date from another if they're in proper format. Maybe a timestamp in seconds, maybe a proper datetime object. As you might've noticed, you can't subtract strings.
If the date is in the ISO format, it's the easiest, just do this before the subtraction:
from datetime import datetime
d0 = datetime.fromisoformat(d0)
d1 = datetime.fromisoformat(d1)
The result will be in a datetime.timedelta format with a .total_seconds() method and even a .days attribute, so you get days like this:
difference_in_days = (d1 - d0).days
If it's something else than an ISO string, check out this question:
Converting string into datetime

Related

my code works for single input but whn i use lambda it doesnt work?

I change the field of 'date' to datetime in order to calculate the difference between their cells and 'today'
df['date']=pd.to_datetime(df['date'], format='%Y-%d-%m %H:%M:%S',errors='ignore')
so i can try below code for several random indexes such as below:
pd.Timestamp.now().normalize() - df['date'][1003]
Timedelta('2598 days 00:00:00')
as you see it works.
but when i run this code i got error:
df['Diff']=df['date'].agg(lambda x:pd.Timestamp.now().normalize()-x)
TypeError: unsupported operand type(s) for -: 'Timestamp' and 'str'
EDIT:
how can I recognize bad dates? i need all of dates with any NaT.
Filter missing values NaT generated if not match pattern %Y-%d-%m %H:%M:%S or if datetime not exist like 2020-02-30 12:02:10:
out = df.loc[pd.to_datetime(df['date'],format='%Y-%d-%m %H:%M:%S',errors='coerce').isna(), 'date']
Problem is with errors='ignore' it working different like you think.
It return same ouput if at least one wrong datetime, so is possible get mixed datetimes with strings or only string column (like original column date).
If need datetime column need replace bad dates to NaT by:
df['date']=pd.to_datetime(df['date'], format='%Y-%d-%m %H:%M:%S',errors='coerce')
df['Diff'] = pd.Timestamp.now().normalize() - df['date']

Subtracting dates and only take out number of days as number

#Under d.types i can confirm they are both datetime objects
date1 datetime64[ns]
date2 datetime64[ns]
df_test['snaptoexpectedStart'] = df['date1'] - df['date2']
TypeError: '<' not supported between instances of 'str' and 'int'
I dont understand why I'm getting that error when both the columns im trying to subtract are in the correct format.
I guess it has something to do with the datetime format I suppose, try casting this way to see if it works :
from datetime import datetime
df_test['snaptoexpectedStart'] = datetime(df['date1']) - datetime(df['date2'])
If you are looking to get the number of days only than try this :
df_test['snaptoexpectedStart'] = (df_test['date1'] - df_test['date2Date']).dt.days
You might want to look into the timedelta class:
According to the API, subtracting two datetimes (assuming they are datetime.datetimes) results in a timedelta object. You can then use the .day attribute of the timedelta to get the difference in days.

Want to find difference, in days, between two dates, of different date format, in Python

I have two different dates that I am pulling from a database using a SQL query. Im looking to do transformations in Python, but the two main dates I want to work with are stored in different formats. The first date is of the date format (YYYY/MM/DD) the other is of (YYYY/MM/DD HH:MM:SS) format. I want a difference in days so the DATETIME is irrelevant on the second date. I was wondering what is the easiest way to do this in python? Ideally, I would like to automate this, where I create a DATE format of the DATETIME variable, and take the difference between the two DATES.
I've tried the following but I am also getting errors since I am dealing with Series. I am trying to get the delta for every row.
df.delta = (df.DATETIME - df.DATE)
and
df.delta = datetime.timedelta(df.DATETIME - df.DATE)
import datetime
d1 = datetime.datetime.strptime('2018/01/13', '%Y/%m/%d')
d2 = datetime.datetime.strptime('2018/01/15 18:34:02', '%Y/%m/%d %H:%M:%S')
delta = d2 - d1
print delta.total_seconds()
print delta.days
Convert your datetime object to a date object, you are then able to subtract them for a delta value.
df.delta = (df.DATETIME.date() - df.DATE)

Datetime and Stfrtime

I'm making a calculator which tells you how many days there is between today and the given date. The dates is imported from a file and is written in format yyyy/mm/dd and dd/mm/yyyy.
I have two problems:
1: The format which the dates are written in varies. A few of the dates is written in reverse. How do I reverse them? I get ValueError: day is out of range for month.
2: When I try to subtract "today" with the "dates" I get the error TypeError: unsupported operand type(s) for -: 'str' and 'str' and when I add "int" I get ValueError: invalid literal for int() with base 10: '2015, 10, 23'
Any advice? :)
for line in response:
line = (line.decode(encoding).strip())
year, month, day = line.split('/')
today = date.today().strftime("%Y, %m, %d")
dates = datetime(int(year), int(month), int(day)).strftime("%Y, %m, %d")
print(int(today)-int(dates))
No need to convert into integer if you have two date objects, you can just subtract one from the other and query the resulting timedelta object for the number of days:
>>> from datetime import date
>>> a = date(2011,11,24)
>>> b = date(2011,11,17)
>>> a-b
datetime.timedelta(7)
>>> (a-b).days
7
And it works with datetimes too — I think it rounds down to the nearest day:
>>> from datetime import datetime
>>> a = datetime(2011,11,24,0,0,0)
>>> b = datetime(2011,11,17,23,59,59)
>>> a-b
datetime.timedelta(6, 1)
>>> (a-b).days
6
Your second problem is caused by calling strftime too early. date objects can be evaluated to each other, but strings cannot. ie
today = date.today()
dates = date(int(year), int(month), int(day))
print((today-dates).days)
also, you should use date objects for both.
You're second problem can be fixed with some simple error checking like
if year < day:
switch(year,day) #pseudo code
or something more verbose than that but you get the idea.
EDIT:
I forgot that comparisons return a timedelta object. these objects only hold days and smaller time sequences (hours, mins, seconds etc)

Compare two dates with different formats in Python

I'm not familiar with Python, just debugging an existing code. I'm comparing two dates here but they have different formats. I get the "TypeError: can't compare offset-naive and offset-aware datetimes" when I do the compare.
if date_start <= current_date:
"TypeError: can't compare offset-naive and offset-aware
str(date_start) >> 2015-08-24 16:25:00+00:00
str(current_date) >> 2015-08-24 17:58:42.092391
How do I make a valid date comparison? I'm assuming I need to convert one to another format.
UPDATE
hour_offset = 0
minute_offset = 0
if timezone_offset:
offset_sign = int('%s1' % timezone_offset[0])
hour_offset = offset_sign * int(timezone_offset[1:3])
minute_offset = offset_sign * int(timezone_offset[3:5])
current_date = (datetime.datetime.now() +
datetime.timedelta(hours=hour_offset,minutes=minute_offset))
The previous dev might have applied the timezone offset this way. Any thoughts on this one?
Use dt.replace(tzinfo=tz) to add a timezone to the naive datetime, so that they can be compared.
There is no more details here to solve.But if you want to get a offset-naive time.Try this
(offset-aware-datetime).replace(tzinfo=None)
To add a timezone to offset-naive time.Try this
(offset-naive-datetime).replace(tzinfo=tz)
One way is to convert the date to seconds since epoch and then compare. Say,if your date is 2015-08-24 16:25:00 then you can convert to seconds using datetime method. It takes parameters as (year, month, day[, hour[, minute[,second[, microsecond[, tzinfo]]]]]). It returns a datetime object. Finally, you can use strftime() to get seconds as a zero-padded decimal number. So your code can be:
import datetime
d1 = datetime.datetime(2015,8,24,16,25,0)
d2 = datetime.datetime(2015,8,24,17,58,42,92391)
if int(d1.strftime("%s")) > int(d2.strftime("%s")):
print "First one is bigger"
else:
print "Second one is bigger"
I hope this helps!

Categories