I am trying to add more than two timestamp values and I expect to see output in minutes/seconds. How can I add two timestamps? I basically want to do: '1995-07-01 00:00:01' + '1995-07-01 00:05:06' and see if total time>=60minutes.
I tried this code: df['timestamp'][0]+df['timestamp'][1]. I referred this post but my timestamps are coming from dataframe.
Head of my dataframe column looks like this:
0 1995-07-01 00:00:01
1 1995-07-01 00:00:06
2 1995-07-01 00:00:09
3 1995-07-01 00:00:09
4 1995-07-01 00:00:09
Name: timestamp, dtype: datetime64[ns]
I am getting this error:
TypeError: unsupported operand type(s) for +: 'Timestamp' and 'Timestamp'
The problem is that adding Timestamps makes no sense. What if they were on different days? What you want are the sum of Timedeltas. We can create Timedeltas by subtracting a common date from the whole series. Let's subtract the minimum date. Then sum up the Timedeltas. Let s be your series of Timestamps
s.sub(s.dt.date.min()).sum().total_seconds()
34.0
#Adding two timestamps is not supported and not logical
#Probably, you really want to add the time rather than the timestamp itself
#This is how to extract the time from the timestamp then summing it up
import datetime
import time
t = ['1995-07-01 00:00:01','1995-07-01 00:00:06','1995-07-01 00:00:09','1995-07-01 00:00:09','1995-07-01 00:00:09']
tSum = datetime.timedelta()
df = pd.DataFrame(t, columns=['timestamp'])
for i in range(len(df)):
df['timestamp'][i] = datetime.datetime.strptime(df['timestamp'][i], "%Y-%m-%d %H:%M:%S").time()
dt=df['timestamp'][i]
(hr, mi, sec) = (dt.hour, dt.minute, dt.second)
sum = datetime.timedelta(hours=int(hr), minutes=int(mi),seconds=int(sec))
tSum += sum
if tSum.seconds >= 60*60:
print("more than 1 hour")
else:
print("less than 1 hour")
Related
Hello,
I am trying to extract date and time column from my excel data. I am getting column as DataFrame with float values, after using pandas.to_datetime I am getting date with different date than actual date from excel. for example, in excel starting date is 01.01.1901 00:00:00 but in python I am getting 1971-01-03 00:00:00.000000 like this.
How can I solve this problem?
I need a final output in total seconds with DataFrame. First cell starting as a 00 sec and very next cell with timestep of seconds (time difference in ever cell is 15min.)
Thank you.
Your input is fractional days, so there's actually no need to convert to datetime if you want the duration in seconds relative to the first entry. Subtract that from the rest of the column and multiply by the number of seconds in a day:
import pandas as pd
df = pd.DataFrame({"Datum/Zeit": [367.0, 367.010417, 367.020833]})
df["totalseconds"] = (df["Datum/Zeit"] - df["Datum/Zeit"].iloc[0]) * 86400
df["totalseconds"]
0 0.0000
1 900.0288
2 1799.9712
Name: totalseconds, dtype: float64
If you have to use datetime, you'll need to convert to timedelta (duration) to do the same, e.g. like
df["datetime"] = pd.to_datetime(df["Datum/Zeit"], unit="d")
# df["datetime"]
# 0 1971-01-03 00:00:00.000000
# 1 1971-01-03 00:15:00.028800
# 2 1971-01-03 00:29:59.971200
# Name: datetime, dtype: datetime64[ns]
# subtraction of datetime from datetime gives timedelta, which has total_seconds:
df["totalseconds"] = (df["datetime"] - df["datetime"].iloc[0]).dt.total_seconds()
# df["totalseconds"]
# 0 0.0000
# 1 900.0288
# 2 1799.9712
# Name: totalseconds, dtype: float64
I have a data frame with a lot of columns and rows, the index column contains datetime objects.
date_time column1 column2
10-10-2010 00:00:00 1 10
10-10-2010 00:00:03 1 10
10-10-2010 00:00:06 1 10
Now I want to calculate the difference in time between the first and last datetime object. Therefore:
start = df["date_time"].head(1)
stop = df["date_time"].tail(1)
However I now want to extract this datetime value so that I can use the .total_seconds() seconds to calculate the number of seconds difference between the two datetime objects, something like:
delta_t_seconds = (start - stop).total_seconds()
This however doesn't give the desired result, since start and stop are still series with only one member.
please help
My dataframe has a column which measures time difference in the format HH:MM:SS.000
The pandas is formed from an excel file, the column which stores time difference is an Object. However some entries have negative time difference, the negative sign doesn't matter to me and needs to be removed from the time as it's not filtering a condition I have:
Note: I only have the negative time difference there because of the issue I'm currently having.
I've tried the following functions but I get errors as some of the time difference data is just 00:00:00 and some is 00:00:02.65 and some are 00:00:02.111
firstly how would I ensure that all data in this column is to 00:00:00.000. And then how would I remove the '-' from some the data.
Here's a sample of the time diff column, I cant transform this column into datetime as some of the entries dont have 3 digits after the decimal. Is there a way to iterate through the column and add a 0 if the length of the value isn't equal to 12 digits.
00:00:02.97
00:00:03:145
00:00:00
00:00:12:56
28 days 03:05:23.439
It looks like you need to clean your input before you can parse to timedelta, e.g. with the following function:
import pandas as pd
def clean_td_string(s):
if s.count(':') > 2:
return '.'.join(s.rsplit(':', 1))
return s
Applied to a df's column, this looks like
df = pd.DataFrame({'Time Diff': ['00:00:02.97', '00:00:03:145', '00:00:00', '00:00:12:56', '28 days 03:05:23.439']})
df['Time Diff'] = pd.to_timedelta(df['Time Diff'].apply(clean_td_string))
# df['Time Diff']
# 0 0 days 00:00:02.970000
# 1 0 days 00:00:03.145000
# 2 0 days 00:00:00
# 3 0 days 00:00:12.560000
# 4 28 days 03:05:23.439000
# Name: Time Diff, dtype: timedelta64[ns]
I have a column of type object it contains 500 rows of dates. I converted the column type to date and I am trying to get a count of the incorrect values, in order to fix them.
Sample of the column, you can see examples of the wrong values in rows: 3 and 5
0 2018-06-14
1 2018-11-12
2 2018-10-09
3 2018-24-08
4 2018-11-12
5 11-02-2018
6 2018-12-31
I can fix the dates if I use this code:
dirtyData['date'] = pd.to_datetime(dirtyData['date'],dayfirst=True)
But I would like to check that the format in every row is %Y-%m-%d' and get the count of the inconsistent formats first. Then change the values.
Is it possible to achieve this?
The below code will work. However, as Michael Gardner mentioned it wont distinguish between days and months if the day 12 or less
import datetime
import pandas as pd
date_list = ["2018-06-14", "2018-11-12", "2018-10-09", "2018-24-08",
"2018-11-12", "11-02-2018", "2018-12-31"]
series1 = pd.Series(date_list)
print(series1)
#The above code is to replicate your date series
count = 0
for item in series1:
try:
datetime.datetime.strptime(item, "%Y-%m-%d") #checks if the date format is Year, Month,Day.
except ValueError: #if there is a value error then it will count these errors
count += 1
print(count)
I'm trying to read a log and compute the duration of a certain workflow. So the dataframe containing the log looks something like this:
Timestamp Workflow Status
20:31:52 ABC Started
...
...
20:32:50 ABC Completed
In order to compute the duration, I am doing using the following code:
start_time = log_text[(log_text['Workflow']=='ABC') & (log_text['Category']=='Started')]['Timestamp']
compl_time = log_text[(log_text['Workflow']=='ABC') & (log_text['Category']=='Completed')]['Timestamp']
duration = compl_time - start_time
and the answer I get is:
1 NaT
72 NaT
Name: Timestamp, dtype: timedelta64[ns]
I think since the index is different, the time difference is not being calculated correctly. Of course, I could get the correct answer by using the index of each row explicitly by:
duration = compl_time.loc[72] - start_time[1]
But this seems to be an inelegant way of doing things. Is there a better way to accomplish the same?
You are right, there is problem with different indexes, so output cannot be aligned and get NaNs.
The simpliest is convert output to numpy array by values, but need same lenght of both Series (here both are length == 1), for selecting with boolean indexing is better use loc:
mask = log_text['Workflow']=='ABC'
start_time = log_text.loc[mask & (log_text['Status']=='Started'), 'Timestamp']
compl_time = log_text.loc[mask & (log_text['Status']=='Completed'),'Timestamp']
print (len(start_time))
1
print (len(compl_time))
1
duration = compl_time - start_time.values
print (duration)
1 00:00:58
Name: Timestamp, dtype: timedelta64[ns]
duration = compl_time.values - start_time.values
print (pd.to_timedelta(duration))
TimedeltaIndex(['00:00:58'], dtype='timedelta64[ns]', freq=None)
print (pd.Series(pd.to_timedelta(duration)))
0 00:00:58
dtype: timedelta64[ns]