Timestamp string to seconds in Dataframe - python

I have a large dataframe containing a Timestamp column like the one shown below:
Timestamp
16T122109960
16T122109965
16T122109970
16T122109975
[73853 rows x 1 columns]
I need to convert this into a seconds (formatted 12.523) since first timestamp column using something like this:
start_time = log_file['Timestamp'][0]
log_file['Timestamp'] = log_file.Timestamp.apply(lambda x: x - start_time)
But first I need to parse the timestamps into seconds as quickly as possible, I've tried using regex to split the timestamp into hours, minuntes, seconds, and milliseconds and then multipling & dividing appropriatly but was given a memory error. Is there a function within datetime or dateutils that would help?
The method I have used at the moment is below:
def regex_time(time):
list = re.split(r"(\d*)(T)(\d{2})(\d{2})(\d{2})(\d{3})", time)
date, delim, hours, minutes, seconds, mills = list[1:-1]
seconds = int(seconds)
seconds += int(mills) /1000
seconds += int(minutes) * 60
seconds += int(hours) * 3600
return seconds
df['Timestamp'] = df.Timestamp.apply(lambda j: regex_time(j))

You could try to convert the timestamp to datetime format and then extract the seconds in the format you want.
Here I attach you a code sample of how it works:
from datetime import datetime
timestamp = 1545730073
dt_object = datetime.fromtimestamp(timestamp)
seconds = dt_object.strftime("%S.%f")
print(seconds)
Output:
53.000000
You can also apply it to the dataframe you are using, for instance:
from datetime import datetime
df = pd.DataFrame({'timestamp':[1545730073]})
df['datetime'] = df['timestamp'].apply(lambda x: datetime.fromtimestamp(x))
df['seconds'] = df['datetime'] .apply(lambda x : x.strftime("%S.%f"))
And it will return a dataFrame containing:
timestamp datetime seconds
0 1545730073 2018-12-25 10:27:53 53.000000

you could parse the string with strptime, subtract the start_time as a pd.Timestamp and use the total_seconds() of the resulting timedelta:
import pandas as pd
df = pd.DataFrame({'Timestamp': ['16T122109960','16T122109965','16T122109970','16T122109975']})
start_time = pd.Timestamp('1900-01-01')
df['totalseconds'] = (pd.to_datetime(df['Timestamp'], format='%dT%H%M%S%f')-start_time).dt.total_seconds()
df['totalseconds']
# 0 1340469.960
# 1 1340469.965
# 2 1340469.970
# 3 1340469.975
# Name: totalseconds, dtype: float64
To use the first entry of the 'Timestamp' column as reference time start_time, use
start_time = pd.to_datetime(df['Timestamp'].iloc[0], format='%dT%H%M%S%f')

Related

How to find the number of seconds elapsed from the start of the day in pandas dataframe

I have a pandas dataframe df in which I have a column named time_column which consists of timestamp objects. I want to calculate the number of seconds elapsed from the start of the day i.e from 00:00:00 Hrs for each timestamp. How can that be done?
You can use pandas.Series.dt.total_seconds
df['time_column'] = pd.to_datetime(df['time_column'])
df['second'] = pd.to_timedelta(df['time_column'].dt.time.astype(str)).dt.total_seconds()
Do df['time_column]. That will give you the time column. Than just do something like:
import datetime as date
current_date = date.datetime.now()
time_elapsed = []
for x in range(0, current_date.minute*60 + current_date.hour*60*60):
time_elapsed.append((df['time_column'][x].minute*60 + df['time_column][x].hour*60*60)- (current_date.minute*60 + current_date.hour*60*60))

How to convert timestamp to integer in Pandas dataframe. I tried to_numeric() function but it's not working

I want to convert a given time, let's say 09:25:59 (hh:mm:ss) into an integer value in pandas. My requirement is that I should know the number of minutes elapsed from midnight of that day. For example, 09:25:59 corresponds to 565 minutes. How can I do that in Python Pandas?
I think it will be useful with the use of a dataframe sample:
import pandas as pd
import datetime
data = {"time":["09:25:59", "09:35:59"],
"minutes_from_midnight": ""}
df = pd.DataFrame(data) # create df
# you don't seem to care about the date, so keep only the time from the datetime dtype
# create a datetime.datetime object to use for timedeltas
df["time"] = pd.to_datetime(df["time"], format="%H:%M:%S")
df['time'] = [datetime.datetime.time(d) for d in df['time']] # keep only the time
df
You can then do:
# your comparison value
midnight= "00:00:00"
# https://stackoverflow.com/a/48967889/13834173
midnight = datetime.datetime.strptime(midnight, "%H:%M:%S").time()
# fill the empty column with the timedeltas
from datetime import datetime, date
df["minutes_from_midnight"] = df["time"].apply(lambda
x: int((datetime.combine(date.today(), x) - datetime.combine(date.today(), midnight))
.total_seconds()//60))
df
# example with the 1st row time value
target_time = df["time"][0]
# using datetime.combine
delta= datetime.combine(date.today(), target_time) - datetime.combine(date.today(), midnight)
seconds = delta.total_seconds()
minutes = seconds//60
print(int(minutes)) # 565
from datetime import timedelta
hh, mm, ss = 9, 25, 59
delta = timedelta(hours=hh, minutes=mm, seconds=ss)
total_seconds = delta.total_seconds()
minutes = int(total_seconds // 60)
minutes has the the minutes elapsed.

Convert number of miliseconds from midnight from datetime objct using pandas

I have datetime index
import pandas as pd
sample = ['2021-01-19 15:55:00-05:00',
'2021-01-19 15:56:00-05:00',
'2021-01-19 15:56:00-05:00']
sample = pd.to_datetime(sample)
I would like to create new column that represents time measured in the number of milliseconds since midnight.
There is one solution for seconds timeframe Get the time spent since midnight in dataframe but I was not able to convert it to miliseconds.
try
df = pd.DataFrame(sample, columns=['time'])
df['time'] = pd.to_datetime(df['time'])
df['milliseconds'] = df['time'].apply(lambda x: (x.hour * 3600 + x.minute * 60 + x.second)*1000)
df
output
time milliseconds
0 2021-01-19 15:55:00-05:00 57300000
1 2021-01-19 15:56:00-05:00 57360000
2 2021-01-19 15:56:00-05:00 57360000

Dataframe - mean of string type column with time values

I have to calculate mean() of time column, but this column type is string, how can I do it?
id time
1 1h:2m
2 1h:58m
3 35m
4 2h
...
You can use regex to extract hours and minutes. To calcualte the mean time in minutus:
h = df['time'].str.extract('(\d{1,2})h').fillna(0).astype(int)
m = df['time'].str.extract('(\d{1,2})m').fillna(0).astype(int)
(h * 60 + m).mean()
Result:
0 83.75
dtype: float64
It's largely inspired from How to construct a timedelta object from a simple string, but you can do as below:
def convertToSecond(time_str):
regex=re.compile(r'((?P<hours>\d+?)h)?:*((?P<minutes>\d+?)m)?:*((?P<seconds>\d+?)s)?')
parts = regex.match(time_str)
if not parts:
return
parts = parts.groupdict()
time_params = {}
for (name, param) in parts.items():
if param:
time_params[name] = int(param)
return timedelta(**time_params).total_seconds()
df = pd.DataFrame({
'time': ['1h:2m', '1h:58m','35m','2h'],})
df['inSecond']=df['time'].apply(convertToSecond)
mean_inSecond=df['inSecond'].mean()
print(f"Mean of Time Column: {datetime.timedelta(seconds=mean_inSecond)}")
Result:
Mean of Time Column: 1:23:45
Another possibility is to convert your string column into timedelta (since they don't seem to be times but rather durations?).
Since your strings are not all formatted equally, you unfortinately cannot use pandas' to_timedelta function. However, parser from dateutil has an option fuzzy that you can use to convert your column to datetime. If you subtract midnight today from that, you get the value as a timedelta.
import pandas as pd
from dateutil import parser
from datetime import date
from datetime import datetime
df = pd.DataFrame([[1,'1h:2m'],[2,'1h:58m'],[3,'35m'],[4,'2h']],columns=['id','time'])
today = date.today()
midnight = datetime.combine(today, datetime.min.time())
df['time'] = df['time'].apply(lambda x: (parser.parse(x, fuzzy=True)) - midnight)
This will convert your dataframe like this (print(df)):
id time
0 1 01:02:00
1 2 01:58:00
2 3 00:35:00
3 4 02:00:00
from which you can calculate the mean using print(df['time'].mean()):
0 days 01:23:45
Full example: https://ideone.com/Aze9mR

pandas get integer seconds when doing a time difference instead of a x days mm:ss:hh format

I have two date-hour column A and B of type shown below. Both colum are in a Dataframe (pandas).
yyyy-mm-dd hh:mm:ss
I create
df['difference'] = df['A'] - df['B']
I get a format like
0 days 00:01:13
I would prefer to have a column which contains the seconds in integer. For instance, I need to get 73 in my above example instead of 1min13.
How to do that?
We can use total_seconds
(df['A'] - df['B']).dt.total_seconds()
from datetime import datetime, time
#Specified date
date1 = datetime.strptime('2015-01-01 01:00:00', '%Y-%m-%d %H:%M:%S')
import datetime
old_time = date1
print(old_time)
new_time = old_time - datetime.timedelta(hours=2, minutes=10)
print(new_time)

Categories