One of the columns in my dataset looks like the following:
I'm wondering what the best practice is to incorporate this information into the classifier models.
Please help me with it. The project is done in python with the Jupyter Notebook.
You can translate string to seconds and use it as one of the features:
df = pd.DataFrame(['25m 34s', '1m', '22s'], columns=['game_length'])
df['game_length_seconds'] = pd.to_timedelta(df['game_length']).apply(lambda x: x.seconds)
>>> df
game_length game_length_seconds
0 25m 34s 1534
1 1m 60
2 22s 22
You can write a lambda function to map that column:
def duration_from_epoc(time:str):
# converts string to datetime and in this case the following line would result in 1900-01-01 00:25:28
date_time = datetime.strptime(time, "%Mm %Ss")
# calculate duration from epoc / start datetime. here you can CHANGE your starting time
date_time_from = datetime(1970,1,1)
# returns delta total seconds
return (date_time - date_time_from).total_seconds()
Then you can call,
df["game_length"].map(duration_from_epoc)
For example, the result of '25m 28s' (str) would be -2208987272.0 (float)
Overall, the solution is calculating seconds from a standard date time to your date time (calculated from string form of duration). Caution that "25m 28s" converts to 1900-01-01 00:25:28 datetime object.
At the end, I would say that instead of saving duration, try to save start and end time of each game play so that you can always calculate duration on the go.
Related
I have data of experiments with time greater than 24 hours. For ex. [23:24:44, 25:10:44]. To operate duration of tests, I like to use Python, however I have a value error when I create datetime.time() with hours more than 23:59:.
You could split your time by the colons in order to get a list of the component parts, which you could then use to initialise your timedelta:
from datetime import timedelta
myDuration = "25:43:12"
mD = [int(x) for x in myDuration.split(":")]
delta = timedelta(hours=mD[0], minutes=mD[1], seconds=mD[2])
print(delta)
# 1 day, 1:43:12
I have solar data in the format:
I want to convert the time index to hour angle using the pvlib package. So far my code reads in the input data from a .csv file using pandas and extracts the time data. I need to convert this data (in 30-minute intervals) into hour angles, but I keep getting the error:
TypeError: index is not a valid DatetimeIndex or PeriodIndex
Here is my code so far:
# Import modules
import pandas as pd
import pvlib
# Read in data from .csv file for time and DHI
headers = ["Time","DHI"]
data_file = pd.read_csv("path to csv file",names=headers)
time_data = data_file["Time"]
# Find equation of time for hour angle calc
equation_of_time = pvlib.solarposition.equation_of_time_spencer71(1)
# Find hour angle
hour_angle = pvlib.solarposition.hour_angle(time_data, -89.401230, equation_of_time)
As the error message states, the issue is that your index is not a DateTimeIndex. To calculate the hour angle we need to know the specific time, hence the need for a DateTimeIndex. Right now you are simply passing in a list of integers, which doesn't have any meaning to the function.
Let's first create a small example:
import pandas as pd
import pvlib
df = pd.DataFrame(data={'time': [0,570,720], 'DHI': [0,50,100]})
df.head()
time DHI
0 0 0
1 570 50
2 720 100
# Create a DateTimeIndex:
start_date = pd.Timestamp(2020,7,28).tz_localize('Europe/Copenhagen')
df.index = start_date + pd.to_timedelta(df['time'], unit='min')
Now the DataFrame has looks like this:
time DHI
time
2020-07-28 00:00:00+02:00 0 0
2020-07-28 09:30:00+02:00 570 50
2020-07-28 12:00:00+02:00 720 100
Now we can pass the index to the hour angle function as it represents unique time periods:
equation_of_time = pvlib.solarposition.equation_of_time_spencer71(df.index.dayofyear)
# Find hour angle
hour_angle = pvlib.solarposition.hour_angle(df.index, -89.401230,
equation_of_time)
Note how the start date was localized to a specific time zone. This is necessary unless that you're data is UTC as otherwise the index does not represent unique time periods.
The answer by #Adam_Jensen is good, but not the simplest. If you look at the code for the hour_angle function, you will see that 2/3 of it is devoted to turning those timestamps back into integers. The rest is so easy you do not need pvlib.
# hour_angle and equation_of_time are defined in the question
LONGITUDE = -89.401230
LOCAL_MERIDIAN = -90
hour_angle = (LONGITUDE - LOCAL_MERIDIAN) + (time_data - 720 + equation_of_time) / 4
It's always good to understand what's going on under the hood!
Say I have this time
00:46:19,870
where it represents 46h 19m and 870 is 870/1000 of a minute (I think I can just get rid of the last part). How do I convert this to seconds?
I've tried
time.strptime('00:46:19,870'.split(',')[0],'%H:%M:%S')
but realized that it wouldn't work as it's using a format different than mine.
How can I convert 00:46:19,870 to 2779?
You are close, you can still use the datetime you just need to calculate the time delta. What you really have isn't really a date but what appears to be a stopwatch time time. You can still strip the time from that and you will notice that Python uses a default year, month, and day. You can use that default to figure out the delta in seconds:
from datetime import datetime
DEFAULT_DATE = (1900, 1, 1)
stopwatch = datetime.strptime('00:46:19,870', '%H:%M:%S,%f')
a_timedelta = stopwatch - datetime(*DEFAULT_DATE)
seconds = a_timedelta.total_seconds()
print(seconds)
I am having an issue with converting the Epoch time format 1585542406929 into the 2020-09-14 Hours Minutes Seconds format.
I tried running this, but it gives me an error
from datetime import datetime
DATETIME_FORMAT = '%Y-%m-%d %H:%M:%S'
datetime.utcfromtimestamp(df2.timestamp_ms).strftime('%Y-%m-%d %H:%M:%S')
error : cannot convert the series to <class 'int'>
What am I not understanding about this datetime function? Is there a better function that I should be using?
edit: should mention that timestamp_ms is my column from my dataframe called df.
Thanks to #chepner for helping me understand the format that this is in.
A quick solution is the following:
# make a new column with Unix time as #ForceBru mentioned
start_date = '1970-01-01'
df3['helper'] = pd.to_datetime(start_date)
# convert your column of JSON dates / numbers to days
df3['timestamp_ms'] = df3['timestamp_ms'].apply(lambda x: (((x/1000)/60)/60/24))
# add a day adder column
df3['time_added'] = pd.to_timedelta(df3['timestamp_ms'],'d')
# add the two columns together
df3['actual_time'] = df3['helper'] + df3['time_added']
Note that you might have to subtract some time off from the actual time stamp. For instance, I had sent my message at 10: 40 am today when it is central time (mid west USA), but the timestamp was putting it at 3:40 pm today.
For my football data analysis, to use the pandas between_time function, I need to convert a list of strings representing fractional seconds from measurement onset into the pandas date_time index. The time data looks as follows:
In order to achieve this I tried the following:
df['Time'] = df['Timestamp']*(1/freq)
df.index = pd.to_datetime(df['Time'], unit='s')
In which freq=600 and Timestamp is the frame number counting up from 0.
I was expecting the new index to show the following format:
%y%m%d-%h%m%s%f
But unfortunately, the to_datetime doesn't know how to handle my type of time data (namely counting up till 4750s after the start).
My question is, therefore, how do I convert my time sample data into a date_time index.
Based on this topic I now created the following function:
def timeDelta2DateTime(self, time_delta_list):
'''This method converts a list containing the time since measurement onset [seconds] into a
list containing dateTime objects counting up from 00:00:00.
Args:
time_delta_list (list): List containing the times since the measurement has started.
Returns:
list: A list with the time in the DateTime format.
'''
### Use divmod to convert seconds to m,h,s.ms ###
s, fs = list(zip(*[divmod(item, 1) for item in time_delta_list]))
m, s = list(zip(*[divmod(item, 60) for item in s]))
h, m = list(zip(*[divmod(item, 60) for item in m]))
### Create DatTime list ###
ms = [item*1000 for item in fs] # Convert fractional seconds to ms
time_list_int = list(zip(*[list(map(int,h)), list(map(int,m)), list(map(int,s)), list(map(int,ms))])) # Combine h,m,s,ms in one list
### Return dateTime object list ###
return [datetime(2018,1,1,item[0],item[1],item[2],item[3]) for item in time_list_int]
As it seems to very slow feel free to suggest a better option.