"How to calculate difference in succesive time values in Python"

"How to calculate difference in succesive time values in Python" - python

I'am trying to calculate the difference between string time values but i could not read microseconds format. Why i have this type of errors ? and how i can fix my code for it ?
I have already tried "datetime.strptime" method to get string to time format then use pandas.dataframe.diff method to calculate the difference between each item in the list and create a column in excel for it.
```
from datetime import datetime
import pandas as pd
for itemz in time_list:
df = pd.DataFrame(datetime.strptime(itemz, '%H %M %S %f'))
ls_cnv.append(df.diff())
df = pd.DataFrame(time_list)
ls_cnv = [df.diff()]
print (ls_cnv)
```
I expect the output to be
ls_cnv = [NaN, 00:00:00, 00:00:00]
time_list = ['10:54:05.912783', '10:54:05.912783', '10:54:05.912783']
but i have instead (time data '10:54:05.906224' does not match format '%H %M %S %f')

The error you get is because you are using strptime wrong.
df = pd.DataFrame(datetime.strptime(itemz, '%H:%M:%S.%f'))
The above would be the correct form, the one passed from your time_list but that's not the case. You create the DataFrame in the wrong way too. DataFrame is a table if you wish of data. The following lines will create and replace in every loop a new DataFrame for every itemz which is one element of your list at time. So it will create a DataFrame with one element in the first loop which will be '10:54:05.912783' and it will diff() that with itself while there is no other value.
for itemz in time_list:
df = pd.DataFrame(datetime.strptime(itemz, '%H %M %S %f'))
ls_cnv.append(df.diff())
Maybe what you wanted to do is the following:
from datetime import datetime
import pandas as pd
ls_cnv = []
time_list = ['10:54:03.912743', '10:54:05.912783', '10:44:05.912783']
df = pd.to_datetime(time_list)
data = pd.DataFrame({'index': range(len(time_list))}, index=df)
a = pd.Series(data.index).diff()
ls_cnv.append(a)
print (ls_cnv)

Just because your time format must include colons and point like this
"%H:%M:%S.%f"

Related

Sorting datetime; pandas

I have a big excel file with a datetime format column which are in strings. The column looks like this:
ingezameldop
2022-10-10 15:51:18
2022-10-10 15:56:19
I have found two ways of trying to do this, however they do not work.
First (nice way):
import pandas as pd
from datetime import datetime
from datetime import date
dagStart = datetime.strptime(str(date.today())+' 06:00:00', '%Y-%m-%d %H:%M:%S')
dagEind = datetime.strptime(str(date.today())+' 23:00:00', '%Y-%m-%d %H:%M:%S')
data = pd.read_excel('inzamelbestand.xlsx', index_col=9)
data = data.loc[pd.to_datetime(data['ingezameldop']).dt.time.between(dagStart.time(), dagEind.time())]
data.to_excel("oefenexcel.xlsx")
However, this returns me with an excel file identical to the original one. I cant seem to fix this.
Second way (sketchy):
import pandas as pd
from datetime import datetime
from datetime import date
df = pd.read_excel('inzamelbestand.xlsx', index_col=9)
# uitfilteren dag van vandaag
dag = str(date.today())
dag1 = dag[8]+dag[9]
vgl = df['ingezameldop']
vgl2 = vgl.str[8]+vgl.str[9]
df = df.loc[vgl2 == dag1]
# uitfilteren vanaf 6 uur 's ochtends
# str11 str12 = uur
df.to_excel("oefenexcel.xlsx")
This one works for filtering out the exact day. But when I want to filter out the hours it does not. Because I use the same way (getting the 11nd and 12th character from the string) but I cant use logic operators (>=) on strings, so I cant filter out for times >6

You can modify this line of code
data = data.loc[pd.to_datetime(data['ingezameldop']).dt.time.between(dagStart.time(), dagEind.time())]
as
(dagStart.hour, dagStart.minute) <= (data['ingezameldop'].hour, data['ingezameldop'].minute) < (dagEind.hour, dagEind.minute)
to get boolean values that are only true for records within the date range.
dagStart, dagEind and data['ingezameldop'] must be in datetime format.
In order to apply it on individual element of the column, wrap it in a function and use apply as follows
def filter(ingezameldop, dagStart, dagEind):
return (dagStart.hour, dagStart.minute) <= (data['ingezameldop'].hour, data['ingezameldop'].minute) < (dagEind.hour, dagEind.minute)
then apply the filter on the column in this way
data['filter'] = data['ingezameldop'].apply(filter, dagStart=dagStart, dagEind=dagEind)
That will apply the function on individual series element which must be in datetime format

Am i doing something wrong with the loops?

I am using python to do some data cleaning and i've used the datetime module to split date time and tried to create another column with just the time.
My script works but it just takes the last value of the data frame.
Here is the code:
import datetime
i = 0
for index, row in df.iterrows():
date = datetime.datetime.strptime(df.iloc[i, 0], "%Y-%m-%dT%H:%M:%SZ")
df['minutes'] = date.minute
i = i + 1
This is the dataframe :
Output

df['minutes'] = date.minute reassigns the entire 'minutes' column with the scalar value date.minute from the last iteration.
You don't need a loop, as 99% of the cases when using pandas.
You can use vectorized assignment, just replace 'source_column_name' with the name of the column with the source data.
df['minutes'] = pd.to_datetime(df['source_column_name'], format='%Y-%m-%dT%H:%M:%SZ').dt.minute
It is also most likely that you won't need to specify format as pd.to_datetime is fairly smart.
Quick example:
df = pd.DataFrame({'a': ['2020.1.13', '2019.1.13']})
df['year'] = pd.to_datetime(df['a']).dt.year
print(df)
outputs
a year
0 2020.1.13 2020
1 2019.1.13 2019

Seems like you're trying to get the time column from the datetime which is in string format. That's what I understood from your post.
Could you give this a shot?
from datetime import datetime
import pandas as pd
def get_time(date_cell):
dt = datetime.strptime(date_cell, "%Y-%m-%dT%H:%M:%SZ")
return datetime.strftime(dt, "%H:%M:%SZ")
df['time'] = df['date_time'].apply(get_time)

Convert a date to a different format for an entire new column

I want to convert the date in a column in a dataframe to a different format. Currently, it has this format: '2019-11-20T01:04:18'. I want it to have this format: 20-11-19 1:04.
I think I need to develop a loop and generate a new column for the new date format. So essentially, in the loop, I would refer to the initial column and then generate the variable for the new column in the format I want.
Can someone help me out to complete this task?
The following code works for one occasion:
import datetime
d = datetime.datetime.strptime('2019-11-20T01:04:18', '%Y-%m-%dT%H:%M:%S')
print d.strftime('%d-%m-%y %H:%M')

From a previous answer in this site , this should be able to help you, comments give explanation
You can read your data into pandas from csv or database or create some test data as shown below for testing.
>>> import pandas as pd
>>> df = pd.DataFrame({'column': {0: '26/1/2016', 1: '26/1/2016'}})
>>> # First convert the column to datetime datatype
>>> df['column'] = pd.to_datetime(df.column)
>>> # Then call the datetime object format() method, set the modifiers you want here
>>> df['column'] = df['column'].dt.strftime('%Y-%m-%dT%H:%M:%S')
>>> df
column
0 2016-01-26T00:00:00
1 2016-01-26T00:00:00
NB. Check to ensure that all your columns have similar date strings

You can either achieve it like this:
from datetime import datetime
df['your_column'] = df['your_column'].apply(lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%S').strftime('%d-%m-%y %H:%M'))

How to play around with JSON date format?

I have a JSON date data set and trying to calculate the time difference between two different JSON DateTime.
For example :
'2015-01-28T21:41:38.508275' - '2015-01-28T21:41:34.921589'
Please look at the python code below:
#let's say 'time' is my data frame and JSON formatted time values are under the 'due_date' column
time_spent = time.iloc[2]['due_date'] - time.iloc[10]['due_date']
This doesn't work. I also tried to cast each operand to int, but it also didn't help. What are the different ways to perform this calculation?

I use parser from dateutil.
Something like that:
from dateutil.parser import parse
first_date_obj = parse("2015-01-28T21:41:38.508275")
second_date_obj = parse("2015-02-28T21:41:38.508275")
print(second_date_obj - first_date_obj)
You can also access the year, month, day of the date object like that:
print(first_date_obj.year)
print(first_date_obj.month)
print(first_date_obj.day)
# and so on

from datetime import datetime
date_format = '%Y-%m-%dT%H:%M:%S.%f'
d2 = time.iloc[2]['due_date']
d1 = time.iloc[10]['due_date']
time_spent = datetime.strptime(d2, date_format) - datetime.strptime(d1, date_format)
print(time_spent.days) # 0
print(time_spent.microseconds) # 586686
print(time_spent.seconds) # 3
print(time_spent.total_seconds()) # 3.586686

The easiest thing to do is to use the pandas datetime capability (since you are already using iloc I assume you are using pandas). You can convert the entire dataframe column labeled due_date to be a pandas datetime datatype using
import pandas as pd
time['due_date'] = pd.to_datetime(time['due_date']
then calculate the time difference you want using
time_spent = time.iloc[2]['due_date'] - time.iloc[10]['due_date']
time_spent will be a pandas timedelta object that you can then manipulate as necessary.
See https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html and https://pandas.pydata.org/pandas-docs/stable/user_guide/timedeltas.html.

Extract Date from excel and append it in a list using python

I have an column in excel which has dates in the format ''17-12-2015 19:35". How can I extract the first 2 digits as integers and append it to a list? In this case I need to extract 17 and append it to a list. Can it be done using pandas also?
Code thus far:
import pandas as pd
Location = r'F:\Analytics Materials\files\paymenttransactions.csv'
df = pd.read_csv(Location)
time = df['Creation Date'].tolist()
print (time)

You could extract the day of each timestamp like
from datetime import datetime
import pandas as pd
location = r'F:\Analytics Materials\files\paymenttransactions.csv'
df = pd.read_csv(location)
timestamps = df['Creation Date'].tolist()
dates = [datetime.strptime(timestamp, '%d-%m-%Y %H:%M') for timestamp in timestamps]
days = [date.strftime('%d') for date in dates]
print(days)
The '%d-%m-%Y %H:%M'and '%d' bits are format specififers, that describe how your timestamp is formatted. See e.g. here for a complete list of directives.
datetime.strptime parses a string into a datetimeobject using such a specifier. dateswill thus hold a list of datetime instances instead of strings.
datetime.strftime does the opposite: It turns a datetime object into string, again using a format specifier. %d simply instructs strftime to only output the day of a date.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

"How to calculate difference in succesive time values in Python" - python

Just because your time format must include colons and point like this "%H:%M:%S.%f"

Related

Sorting datetime; pandas

Am i doing something wrong with the loops?

Convert a date to a different format for an entire new column

How to play around with JSON date format?

Extract Date from excel and append it in a list using python

Categories

Resources