Convert fractional seconds passed since onset to a date time index - python

For my football data analysis, to use the pandas between_time function, I need to convert a list of strings representing fractional seconds from measurement onset into the pandas date_time index. The time data looks as follows:
In order to achieve this I tried the following:
df['Time'] = df['Timestamp']*(1/freq)
df.index = pd.to_datetime(df['Time'], unit='s')
In which freq=600 and Timestamp is the frame number counting up from 0.
I was expecting the new index to show the following format:
%y%m%d-%h%m%s%f
But unfortunately, the to_datetime doesn't know how to handle my type of time data (namely counting up till 4750s after the start).
My question is, therefore, how do I convert my time sample data into a date_time index.

Based on this topic I now created the following function:
def timeDelta2DateTime(self, time_delta_list):
'''This method converts a list containing the time since measurement onset [seconds] into a
list containing dateTime objects counting up from 00:00:00.
Args:
time_delta_list (list): List containing the times since the measurement has started.
Returns:
list: A list with the time in the DateTime format.
'''
### Use divmod to convert seconds to m,h,s.ms ###
s, fs = list(zip(*[divmod(item, 1) for item in time_delta_list]))
m, s = list(zip(*[divmod(item, 60) for item in s]))
h, m = list(zip(*[divmod(item, 60) for item in m]))
### Create DatTime list ###
ms = [item*1000 for item in fs] # Convert fractional seconds to ms
time_list_int = list(zip(*[list(map(int,h)), list(map(int,m)), list(map(int,s)), list(map(int,ms))])) # Combine h,m,s,ms in one list
### Return dateTime object list ###
return [datetime(2018,1,1,item[0],item[1],item[2],item[3]) for item in time_list_int]
As it seems to very slow feel free to suggest a better option.

Related

how to transform for loop to lambda function

I have written this function:
def time_to_unix(df,dateToday):
'''this function creates the timestamp column for the dataframe. it also gets today's date (ex: 2022-8-8 0:0:0)
and then it adds the seconds that were originally in the timestamp column.
input: dataframe, dateToday(type: pandas.core.series.Series)
output: list of times
'''
dateTime = dateToday[0]
times = []
for i in range(0,len(df['timestamp'])):
dateAndTime = dateTime + timedelta(seconds = float(df['timestamp'][i]))
unix = pd.to_datetime([dateAndTime]).astype(int) / 10**9
times.append(unix[0])
return times
so it takes a dataframe and it gets today's date and then its taking the value of the timestamp in the dataframe( which is in seconds like 10,20,.... ) then it applies the function and returns times in unix time
however, because I have approx 2million row in my dataframe, its taking me a lot of time to run this code.
how can I use lambda function or something else in order to speed up my code and the process.
something along the line of:
df['unix'] = df.apply(lambda row : something in here), axis = 1)
What I think you'll find is that most of the time is spent in the creation and manipulation of the datetime / timestamp objects in the dataframe (see here for more info). I also try to avoid using lambdas like this on large dataframes as they go row by row which should be avoided. What I've done when dealing with datetimes / timestamps / timezone changes in the past is to build a dictionary of the possible datetime combinations and then use map to apply them. Something like this:
import datetime as dt
import pandas as pd
#Make a time key column out of your date and timestamp fields
df['time_key'] = df['date'].astype(str) + '#' + df['timestamp']
#Build a dictionary from the unique time keys in the dataframe
time_dict = dict()
for time_key in df['time_key'].unique():
time_split = time_key.split('#')
#Create the Unix time stamp based on the values in the key; store it in the dictionary so it can be mapped later
time_dict[time_key] = (pd.to_datetime(time_split[0]) + dt.timedelta(seconds=float(time_split[1]))).astype(int) / 10**9
#Now map the time_key to the unix column in the dataframe from the dictionary
df['unix'] = df['time_key'].map(time_dict)
Note if all the datetime combinations are unique in the dataframe, this likely won't help.
I'm not exactly sure what type dateTime[0] has. But you could try a more vectorized approach:
import pandas as pd
df["unix"] = (
(pd.Timestamp(dateTime[0]) + pd.to_timedelta(df["timestamp"], unit="seconds"))
.astype("int").div(10**9)
)
or
df["unix"] = (
(dateTime[0] + pd.to_timedelta(df["timestamp"], unit="seconds"))
.astype("int").div(10**9)
)

How to incorporate time string to classifier

One of the columns in my dataset looks like the following:
I'm wondering what the best practice is to incorporate this information into the classifier models.
Please help me with it. The project is done in python with the Jupyter Notebook.
You can translate string to seconds and use it as one of the features:
df = pd.DataFrame(['25m 34s', '1m', '22s'], columns=['game_length'])
df['game_length_seconds'] = pd.to_timedelta(df['game_length']).apply(lambda x: x.seconds)
>>> df
game_length game_length_seconds
0 25m 34s 1534
1 1m 60
2 22s 22
You can write a lambda function to map that column:
def duration_from_epoc(time:str):
# converts string to datetime and in this case the following line would result in 1900-01-01 00:25:28
date_time = datetime.strptime(time, "%Mm %Ss")
# calculate duration from epoc / start datetime. here you can CHANGE your starting time
date_time_from = datetime(1970,1,1)
# returns delta total seconds
return (date_time - date_time_from).total_seconds()
Then you can call,
df["game_length"].map(duration_from_epoc)
For example, the result of '25m 28s' (str) would be -2208987272.0 (float)
Overall, the solution is calculating seconds from a standard date time to your date time (calculated from string form of duration). Caution that "25m 28s" converts to 1900-01-01 00:25:28 datetime object.
At the end, I would say that instead of saving duration, try to save start and end time of each game play so that you can always calculate duration on the go.

Python SumIfs for list of list dates

I have a list of lists composed of dates in excel float format (every minute since July 5, 1996) and an integer value associated with each date like this: [[datetime,integer]...]. I need to create a new list composed of all of the dates (no hours or minutes) and the sum of the values for all of the datetimes within that date. In other words, what is the sum of the values for each date when listolists[x][0] >= math.floor(listolists[x][0]) and listolists[x][0] < math.floor(listolists[x][0]). Thanks
Since you didn't provide any actual data (just the data structure you used, nested lists), I created some dummy data below to demonstrate how you might do a SUMIFS-type of problem in Python.
from datetime import datetime
import numpy as np
import pandas as pd
dates_list = []
# just take one month as an example of how to group by day
year = 2015
month = 12
# generate similar data to what you might have
for day in range(1, 32):
for hour in range(1, 24):
for minute in range(1, 60):
dates_list.append([datetime(year, month, day, hour, minute), np.random.randint(20)])
# unpack these nested list pairs so we have all of the dates in
# one list, and all of the values in the other
# this makes it easier for pandas later
dates, values = zip(*dates_list)
# to eventually group by day, we need to forget about all intra-day data, e.g.
# different hours and minutes. we only care about the data for a given day,
# not the by-minute observations. So, let's set all of the intra-day values to
# some constant for easier rolling-up of these dates.
new_dates = []
for d in dates:
new_d = d.replace(hour = 0, minute = 0)
new_dates.append(new_d)
# throw the new dates and values into a pandas.DataFrame object
df = pd.DataFrame({'new_dates': new_dates, 'values': values})
# here's the SUMIFS function you're looking for
grouped = df.groupby('new_dates')['values'].sum()
Let's see the results:
>>> print(grouped.head())
new_dates
2015-12-01 12762
2015-12-02 13292
2015-12-03 12857
2015-12-04 12762
2015-12-05 12561
Name: values, dtype: int64
Edit: If you want these new grouped data back in the nested list format, just do this:
new_list = [[date, value] for date, value in zip(grouped.index, grouped)]
Thanks everyone. This is the simplest code I could come up with that doesn't require panda:
for row in listolist:
for k in (0, 1):
row[k] = math.floor(float(row[k]))
date = {}
for d,v in listolist:
if d in date:
date[math.floor(d)].append(v)
else:
date[math.floor(d)] = [v]
result = [(d,sum(v)) for d,v in date.items()]

Express time elapsed between element of an array of date in hour min second using python

I have a list of time creation of files obtained using os.path.getmtime
time_creation_sorted
Out[45]:
array([ 1.47133334e+09, 1.47133437e+09, 1.47133494e+09,
1.47133520e+09, 1.47133577e+09, 1.47133615e+09,
1.47133617e+09, 1.47133625e+09, 1.47133647e+09])
I know how to convert those elements in hour minute seconds.
datetime.fromtimestamp(time_creation_sorted[1]).strftime('%H:%M:%S')
Out[62]: '09:59:26'
What I would like to do is to create another table that contains the time elapsed since the first element but expressed in hour:min:sec such that it would look like:
array(['00:00:00','00:16:36',...])
But I have not manage to find how to do that. Taking naively the difference between the elements of time_creation_sorted and trying to convert to hour:min:sec does not give something logical:
datetime.fromtimestamp(time_creation_sorted[1]-time_creation_sorted[0]).strftime('%H:%M:%S')
Out[67]: '01:17:02'
Any idea or link on how to do that?
Thanks,
Grégory
You need to rearrange some parts of your code in order to get the desired output.
First you should convert the time stamps to datetime objects which differences result into so called timedelta objects. The __str__() representation of those timedelta objects are exactly what you want:
from datetime import datetime
tstamps = [1.47133334e+09, 1.47133437e+09, 1.47133494e+09, 1.47133520e+09, 1.47133577e+09, 1.47133615e+09, 1.47133617e+09, 1.47133625e+09, 1.47133647e+09]
tstamps = [datetime.fromtimestamp(stamp) for stamp in tstamps]
tstamps_relative = [(t - tstamps[0]).__str__() for t in tstamps]
print(tstamps_relative)
giving:
['0:00:00', '0:17:10', '0:26:40', '0:31:00', '0:40:30', '0:46:50', '0:47:10', '0:48:30', '0:52:10']
Check out timedelta objects, it gives difference between two dates or times
https://docs.python.org/2/library/datetime.html#timedelta-objects

Converting a list of datetime objects to a list of number of days since a certain date

I have a large list of dates that are datetime objects like for example
[datetime.datetime(2016,8,14),datetime.datetime(2016,8,13),datetime.datetime(2016,8,12),....etc.]
Instead of datetime objects of the date what I want instead is a list of numerical integer values since the date 1/1/1900. I have defined 1/1/1900 as the base date and in the for loop below, I have calculated the days between the date in the list since that base date:
baseDate = datetime(1900,1,1)
numericalDates = []
for i in enumerate(dates):
a=i[1]-baseDate
numericalDates.append(a)
print(numericalDates)
However when I print this out, I get datetime.timedelta objects instead
[datetime.timedelta(42592), datetime.timedelta(42591), datetime.timedelta(42590),...etc.]
Any ideas on how I can convert it the proper way?
timedelta objects have days attribute, so you can simply append that as an int:
numericalDates.append(a.days)
will result with numericalDates being [42594, 42593, 42592].
Note that you can also simplify your code a bit by using list comprehension:
numericalDates = [(d - baseDate).days for d in dates]

Categories