I have a JSON date data set and trying to calculate the time difference between two different JSON DateTime.
For example :
'2015-01-28T21:41:38.508275' - '2015-01-28T21:41:34.921589'
Please look at the python code below:
#let's say 'time' is my data frame and JSON formatted time values are under the 'due_date' column
time_spent = time.iloc[2]['due_date'] - time.iloc[10]['due_date']
This doesn't work. I also tried to cast each operand to int, but it also didn't help. What are the different ways to perform this calculation?
I use parser from dateutil.
Something like that:
from dateutil.parser import parse
first_date_obj = parse("2015-01-28T21:41:38.508275")
second_date_obj = parse("2015-02-28T21:41:38.508275")
print(second_date_obj - first_date_obj)
You can also access the year, month, day of the date object like that:
print(first_date_obj.year)
print(first_date_obj.month)
print(first_date_obj.day)
# and so on
from datetime import datetime
date_format = '%Y-%m-%dT%H:%M:%S.%f'
d2 = time.iloc[2]['due_date']
d1 = time.iloc[10]['due_date']
time_spent = datetime.strptime(d2, date_format) - datetime.strptime(d1, date_format)
print(time_spent.days) # 0
print(time_spent.microseconds) # 586686
print(time_spent.seconds) # 3
print(time_spent.total_seconds()) # 3.586686
The easiest thing to do is to use the pandas datetime capability (since you are already using iloc I assume you are using pandas). You can convert the entire dataframe column labeled due_date to be a pandas datetime datatype using
import pandas as pd
time['due_date'] = pd.to_datetime(time['due_date']
then calculate the time difference you want using
time_spent = time.iloc[2]['due_date'] - time.iloc[10]['due_date']
time_spent will be a pandas timedelta object that you can then manipulate as necessary.
See https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html and https://pandas.pydata.org/pandas-docs/stable/user_guide/timedeltas.html.
Related
Exist an easier way to do this kind of parse date?
I'm trying to make a filter in pandas to localize dates 3 months ago and loc the entire month too.
The code works, but I'm searching for the best way.
final_date = pd.to_datetime(f'{(datetime.today() - timedelta(days=90)).year}-{(datetime.today() - timedelta(days=90)).month}-01', dayfirst=True)
My Suggestion
This version of the code is more concise and easier to read.
It avoids using multiple calls to datetime.today() and string formatting.
import pandas as pd
from datetime import date, timedelta
# Calculate the date 90 days ago
d = date.today() - timedelta(days=90)
# Format the date as a string in the '%Y-%m-01' format
date_str = d.strftime('%Y-%m-01')
# Parse the string as a datetime object
final_date = pd.to_datetime(date_str, dayfirst=True)
I have a big excel file with a datetime format column which are in strings. The column looks like this:
ingezameldop
2022-10-10 15:51:18
2022-10-10 15:56:19
I have found two ways of trying to do this, however they do not work.
First (nice way):
import pandas as pd
from datetime import datetime
from datetime import date
dagStart = datetime.strptime(str(date.today())+' 06:00:00', '%Y-%m-%d %H:%M:%S')
dagEind = datetime.strptime(str(date.today())+' 23:00:00', '%Y-%m-%d %H:%M:%S')
data = pd.read_excel('inzamelbestand.xlsx', index_col=9)
data = data.loc[pd.to_datetime(data['ingezameldop']).dt.time.between(dagStart.time(), dagEind.time())]
data.to_excel("oefenexcel.xlsx")
However, this returns me with an excel file identical to the original one. I cant seem to fix this.
Second way (sketchy):
import pandas as pd
from datetime import datetime
from datetime import date
df = pd.read_excel('inzamelbestand.xlsx', index_col=9)
# uitfilteren dag van vandaag
dag = str(date.today())
dag1 = dag[8]+dag[9]
vgl = df['ingezameldop']
vgl2 = vgl.str[8]+vgl.str[9]
df = df.loc[vgl2 == dag1]
# uitfilteren vanaf 6 uur 's ochtends
# str11 str12 = uur
df.to_excel("oefenexcel.xlsx")
This one works for filtering out the exact day. But when I want to filter out the hours it does not. Because I use the same way (getting the 11nd and 12th character from the string) but I cant use logic operators (>=) on strings, so I cant filter out for times >6
You can modify this line of code
data = data.loc[pd.to_datetime(data['ingezameldop']).dt.time.between(dagStart.time(), dagEind.time())]
as
(dagStart.hour, dagStart.minute) <= (data['ingezameldop'].hour, data['ingezameldop'].minute) < (dagEind.hour, dagEind.minute)
to get boolean values that are only true for records within the date range.
dagStart, dagEind and data['ingezameldop'] must be in datetime format.
In order to apply it on individual element of the column, wrap it in a function and use apply as follows
def filter(ingezameldop, dagStart, dagEind):
return (dagStart.hour, dagStart.minute) <= (data['ingezameldop'].hour, data['ingezameldop'].minute) < (dagEind.hour, dagEind.minute)
then apply the filter on the column in this way
data['filter'] = data['ingezameldop'].apply(filter, dagStart=dagStart, dagEind=dagEind)
That will apply the function on individual series element which must be in datetime format
I have some columns which has time in string format like '01:19:55' and so on. I want to convert this string into time format. For that, I am trying to use the code like below:
col_list2=['clm_rcvd_tm','gtwy_rcvd_tm']
pandas_df_1=df2.toPandas()
for x in col_list2:
pandas_df_1[x]=pd.to_datetime(pandas_df_1[x].replace(" ",""),format='%H:%M:%S').dt.time
As an output, these clumns are returning decimal values. (Ex:0.26974537037037)
Any help will be appreciated.
If you have a string in your pandas dataframe as you mentioned i.e;
01:19:55 so that you can easily convert this by using python datetime module
For example;
import datetime
str = "01:19:55"
dt = datetime.strptime(str, '%H:%M:%S') # for 24 hour format
dt = datetime.strptime(str, '%I:%M:%S') # for 12 hour format
for references have a look
Python Datetime formating
I want to find out how much time as passed from last night to today morning. What is the quickest way to do this? The format of the pandas dataframe is shown here. format
You are looking for timedelta. Read this:
https://www.tutorialspoint.com/python_pandas/python_pandas_timedelta.htm
If the datatype is not already datetime, you can use this:
from datetime import datetime
frm = '%H:%M' #format
delta = datetime.strptime(df['wake_time'][row_number], frm) - datetime.strptime(df['sleep_time'][row_number], frm)
*assuming your pandas dataframe is named df and the row index is row_number
I'am trying to calculate the difference between string time values but i could not read microseconds format. Why i have this type of errors ? and how i can fix my code for it ?
I have already tried "datetime.strptime" method to get string to time format then use pandas.dataframe.diff method to calculate the difference between each item in the list and create a column in excel for it.
```
from datetime import datetime
import pandas as pd
for itemz in time_list:
df = pd.DataFrame(datetime.strptime(itemz, '%H %M %S %f'))
ls_cnv.append(df.diff())
df = pd.DataFrame(time_list)
ls_cnv = [df.diff()]
print (ls_cnv)
```
I expect the output to be
ls_cnv = [NaN, 00:00:00, 00:00:00]
time_list = ['10:54:05.912783', '10:54:05.912783', '10:54:05.912783']
but i have instead (time data '10:54:05.906224' does not match format '%H %M %S %f')
The error you get is because you are using strptime wrong.
df = pd.DataFrame(datetime.strptime(itemz, '%H:%M:%S.%f'))
The above would be the correct form, the one passed from your time_list but that's not the case. You create the DataFrame in the wrong way too. DataFrame is a table if you wish of data. The following lines will create and replace in every loop a new DataFrame for every itemz which is one element of your list at time. So it will create a DataFrame with one element in the first loop which will be '10:54:05.912783' and it will diff() that with itself while there is no other value.
for itemz in time_list:
df = pd.DataFrame(datetime.strptime(itemz, '%H %M %S %f'))
ls_cnv.append(df.diff())
Maybe what you wanted to do is the following:
from datetime import datetime
import pandas as pd
ls_cnv = []
time_list = ['10:54:03.912743', '10:54:05.912783', '10:44:05.912783']
df = pd.to_datetime(time_list)
data = pd.DataFrame({'index': range(len(time_list))}, index=df)
a = pd.Series(data.index).diff()
ls_cnv.append(a)
print (ls_cnv)
Just because your time format must include colons and point like this
"%H:%M:%S.%f"