How to convert an integer to time - python

I have time in hours as shown below
hour:
0
1
2
3
4
5
6
7
8
9
10
11
12
14
15
16
17
18
19
20
21
22
23
I would like to have the time like:
00:00:00
01:00:00
02:00:00
03:00:00 etc

Here's the solution by using pandas and datetime python library. I hope it will help you
Code:
import pandas as pd
from datetime import datetime
hour = 13
a = pd.to_datetime(hour, format='%H')
converted_time = a.strftime("%H:%M:%S")
print(converted_time)
Output
13:00:00

Just use time from datetime:
In [1]: from datetime import time
In [2]: for i in range(4):
...: t = time(i, 0, 0)
...: print(t.strftime('%H:%M:%S'))
...:
00:00:00
01:00:00
02:00:00
03:00:00

You can use datetime with its function strptime. Used like that :
from datetime import datetime
t = datetime.strptime(str(your_int), "%H")
print(t.strftime("%H"))
Using the strptime is useful, because if your int format change, you can still directly format it to a date, using the different codes available here https://docs.python.org/2/library/datetime.html?highlight=datetime#strftime-strptime-behavior

Related

Date Time Format Issues Python

I am currently having issues with date-time format, particularly converting string input to the correct python datetime format
Date/Time Dry_Temp[C] Wet_Temp[C] Solar_Diffuse_Rate[[W/m2]] \
0 01/01 00:10:00 8.45 8.237306 0.0
1 01/01 00:20:00 7.30 6.968360 0.0
2 01/01 00:30:00 6.15 5.710239 0.0
3 01/01 00:40:00 5.00 4.462898 0.0
4 01/01 00:50:00 3.85 3.226244 0.0
These are current examples of timestamps I have in my time, I have tried splitting date and time such that I now have the following columns:
WC_Humidity[%] WC_Htgsetp[C] WC_Clgsetp[C] Date Time
0 55.553640 18 26 1900-01-01 00:10:00
1 54.204342 18 26 1900-01-01 00:20:00
2 51.896272 18 26 1900-01-01 00:30:00
3 49.007770 18 26 1900-01-01 00:40:00
4 45.825810 18 26 1900-01-01 00:50:00
I have managed to get the year into datetime format, but there are still 2 problems to resolve:
the data was not recorded in 1900, so I would like to change the year in the Date,
I get the following error whent rying to convert time into time datetime python format
pandas/_libs/tslibs/strptime.pyx in pandas._libs.tslibs.strptime.array_strptime()
ValueError: time data '00:00:00' does not match format ' %m/%d %H:%M:%S' (match)
I tried having 24:00:00, however, python didn't like that either...
preferences:
I would prefer if they were both in the same cell without having to split this information into two columns.
I would also like to get rid of the seconds data as the data was recorded in 10 min intervals so there is no need for seconds in my case.
Any help would be greatly appreciated.
the data was not recorded in 1900, so I would like to change the year in the Date,
datetime.datetime.replace method of datetime.datetime instance is used for this task consider following example:
import pandas as pd
df = pd.DataFrame({"when":pd.to_datetime(["1900-01-01","1900-02-02","1900-03-03"])})
df["when"] = df["when"].apply(lambda x:x.replace(year=2000))
print(df)
output
when
0 2000-01-01
1 2000-02-02
2 2000-03-03
Note that it can be used also without pandas for example
import datetime
d = datetime.datetime.strptime("","") # use all default values which result in midnight of Jan 1 of year 1900
print(d) # 1900-01-01 00:00:00
d = d.replace(year=2000)
print(d) # 2000-01-01 00:00:00

Pandas DataFrame Time index using .loc function error

I have created DataFrame with DateTime index, then I split the index into the Date index column and Time index column. Now, when I call for a row of a specific time by using pd.loc(), the system shows an error.
Here're an example of steps of how I made the DataFrame from beginning till reaching my consideration.
import pandas as pd
import numpy as np
df= pd.DataFrame({'A':[1, 2, 3, 4], 'B':[5, 6, 7, 8], 'C':[9, 10, 11, 12],
'DateTime':pd.to_datetime(['2021-09-01 10:00:00', '2021-09-01 11:00:00', '2021-09-01 12:00:00', '2021-09-01 13:00:00'])})
df=df.set_index(df['DateTime'])
df.drop('DateTime', axis=1, inplace=True)
df
OUT >>
A B C
DateTime
2021-09-01 10:00:00 1 5 9
2021-09-01 11:00:00 2 6 10
2021-09-01 12:00:00 3 7 11
2021-09-01 13:00:00 4 8 12
In this step, I'm gonna splitting DateTime index into multi-index Date & Time
df.index = pd.MultiIndex.from_arrays([df.index.date, df.index.time], names=['Date','Time'])
df
OUT >>
A B C
Date Time
2021-09-01 10:00:00 1 5 9
11:00:00 2 6 10
12:00:00 3 7 11
13:00:00 4 8 12
##Here is the issue##
when I call this statement, The system shows an error
df.loc["11:00:00"]
How to fix that?
1. If you want to use .loc, you can just specify the time by:
import datetime
df.loc[(slice(None), datetime.time(11, 0)), :]
or use pd.IndexSlice similar to the solution by BENY, as follows:
import datetime
idx = pd.IndexSlice
df.loc[idx[:,datetime.time(11, 0)], :]
(defining a variable idx to use pd.IndexSlice gives us cleaner code and less typing if you are going to use pd.IndexSlice multiple times).
Result:
A B C
Date Time
2021-09-01 11:00:00 2 6 10
2. If you want to select just for one day, you can use:
import datetime
df.loc[(datetime.date(2021, 9, 1), datetime.time(11, 0))]
Result:
A 2
B 6
C 10
Name: (2021-09-01, 11:00:00), dtype: int64
3. You can also use .xs to access the MultiIndex row index, as follows:
import datetime
df.xs(datetime.time(11,0), axis=0, level='Time')
Result:
A B C
Date
2021-09-01 2 6 10
4. Alterative way if you haven't split DateTime index into multi-index Date & Time
Actually, if you haven't split the DatetimeIndex into separate date and time index, you can also use the .between_time() function to filter the time, as follows:
df.between_time("11:00:00", "11:00:00")
You can specify a range of time to filter, instead of just a point of time, if you specify different values for the start_time and end_time.
Result:
A B C
DateTime
2021-09-01 11:00:00 2 6 10
As you can see, .between_time() allows you to enter the time in simple string to filter, instead of requiring the use of datetime objects. This should be nearest to your tried ideal (but invalid) syntax of using df.loc["11:00:00"] to filter.
As a suggestion, if you split the DatetimeIndex into separate date and time index simply for the sake of filtering by time, you can consider using the .between_time() function instead.
We can just do the correct value slice with IndexSlice
import datetime
out = df.loc[pd.IndexSlice[:,datetime.time(11, 0)],:]
Out[76]:
A B C DateTime
Date Time
2021-09-01 11:00:00 2 6 10 2021-09-01 11:00:00
Why do you need to split your datetime into two parts?
You can use indexer_at_time
>>> df
A B C
DateTime
2021-09-01 10:00:00 1 5 9
2021-09-01 11:00:00 2 6 10
2021-09-01 12:00:00 3 7 11
2021-09-01 13:00:00 4 8 12
# Extract 11:00:00 from any day
>>> df.iloc[df.index.indexer_at_time('11:00:00')]
A B C
DateTime
2021-09-01 11:00:00 2 6 10
You can also create a proxy to save time typing:
T = df.index.indexer_at_time
df.iloc[T('11:00:00')]

How aggregate a pandas date timeline series only by hour

I have a pandas timeline table containing dates objects and scores:
datetime score
2018-11-23 08:33:02 4
2018-11-24 09:43:30 2
2018-11-25 08:21:34 5
2018-11-26 19:33:01 4
2018-11-23 08:50:40 1
2018-11-23 09:03:10 3
I want to aggregate the score by hour without taking into consideration the date, the result desired is :
08:00:00 10
09:00:00 5
19:00:00 4
So basically I have to remove the date-month-year, and then group score by hour,
I tried this command
monthagg = df['score'].resample('H').sum().to_frame()
Which does work but takes into consideration the date-month-year, How to remove DD-MM-YYYY and aggregate by Hour?
One possible solution is use DatetimeIndex.floor for set minutes and seconds to 0 and then convert DatetimeIndex to strings by DatetimeIndex.strftime, then aggregate sum:
a = df['score'].groupby(df.index.floor('H').strftime('%H:%M:%S')).sum()
#if column datetime
#a = df['score'].groupby(df['datetime'].dt.floor('H').dt.strftime('%H:%M:%S')).sum()
print (a)
08:00:00 10
09:00:00 5
19:00:00 4
Name: score, dtype: int64
Or use DatetimeIndex.hour and aggregate sum:
a = df.groupby(df.index.hour)['score'].sum()
#if column datetime
#a = df.groupby(df['datetime'].dt.hour)['score'].sum()
print (a)
datetime
8 10
9 5
19 4
Name: score, dtype: int64
Setup to generate a frame with datetime objects:
import datetime
import pandas as pd
rows = [datetime.datetime.now() + datetime.timedelta(hours=i) for i in range(100)]
df = pd.DataFrame(rows,columns = ["date"])
You can now add a hour-column like this, and then group by it:
df["hour"] = df["date"].dt.hour
df.groupby("hour").sum()
import pandas as pd
df = pd.DataFrame({'datetime':['2018-11-23 08:33:02 ','2018-11-24 09:43:30',
'2018-11-25 08:21:34',
'2018-11-26 19:33:01','2018-11-23 08:50:40',
'2018-11-23 09:03:10'],'score':[4,2,5,4,1,3]})
df['datetime']=pd.to_datetime(df['datetime'], errors='coerce')
df["hour"] = df["datetime"].dt.hour
df.groupby("hour").sum()
Output:
8 10
9 5
19 4

Python Pandas sizeof times

I am working in a dataframe in Pandas that looks like this.
Identifier datetime
0 AL011851 00:00:00
1 AL011851 06:00:00
2 Al011851 12:00:00
This is my code so far:
import pandas as pd
hurricane_df = pd.read_csv("hurdat2.csv",parse_dates=['datetime'])
hurricane_df['datetime'] = pd.to_timedelta(hurricane_df['datetime'].dt.strftime('%H:%M:%S'))
hurricane_df
grouped = hurricane_df.groupby('datetime').size()
grouped
What I did was convert the datetime column to a timedelta to get the hours. I want to get the size of the datetime column but I want just hours like 1:00, 2:00, 3:00, etc. but I get minute intervals as well like 1:15 and 2:45.
Any way to just display the hour?
Thank you.
You can use pandas.Timestamp.round with Series.dt shortcut:
df['datetime'] = df['datetime'].dt.round('h')
So
... datetime
01:15:00
02:45:00
becomes
... datetime
01:00:00
03:00:00
df = pd.DataFrame({'Identifier':['AL011851','AL011851','AL011851'],'datetime': ["2018-12-08 16:35:23","2018-12-08 14:20:45", "2018-12-08 11:45:00"]})
df['datetime'] = pd.to_datetime(df['datetime'])
df
Identifier datetime
0 AL011851 2018-12-08 16:35:23
1 AL011851 2018-12-08 14:20:45
2 AL011851 2018-12-08 11:45:00
# Rounds to nearest hour
def roundHour(t):
return (t.replace(second=0, microsecond=0, minute=0, hour=t.hour)
+timedelta(hours=t.minute//30))
df.datetime=df.datetime.map(lambda t: roundHour(t)) # Step 1: Round to nearest hour
df.datetime=df.datetime.map(lambda t: t.strftime('%H:%M')) # Step 2: Remove seconds
df
Identifier datetime
0 AL011851 17:00
1 AL011851 14:00
2 AL011851 12:00

pandas: selecting rows in a specific time window

I have a dataset of samples covering multiple days, all with a timestamp.
I want to select rows within a specific time window. E.g. all rows that were generated between 1pm and 3 pm every day.
This is a sample of my data in a pandas dataframe:
22 22 2018-04-12T20:14:23Z 2018-04-12T21:14:23Z 0 6370.1
23 23 2018-04-12T21:14:23Z 2018-04-12T21:14:23Z 0 6368.8
24 24 2018-04-12T22:14:22Z 2018-04-13T01:14:23Z 0 6367.4
25 25 2018-04-12T23:14:22Z 2018-04-13T01:14:23Z 0 6365.8
26 26 2018-04-13T00:14:22Z 2018-04-13T01:14:23Z 0 6364.4
27 27 2018-04-13T01:14:22Z 2018-04-13T01:14:23Z 0 6362.7
28 28 2018-04-13T02:14:22Z 2018-04-13T05:14:22Z 0 6361.0
29 29 2018-04-13T03:14:22Z 2018-04-13T05:14:22Z 0 6359.3
.. ... ... ... ... ...
562 562 2018-05-05T08:13:21Z 2018-05-05T09:13:21Z 0 6300.9
563 563 2018-05-05T09:13:21Z 2018-05-05T09:13:21Z 0 6300.7
564 564 2018-05-05T10:13:14Z 2018-05-05T13:13:14Z 0 6300.2
565 565 2018-05-05T11:13:14Z 2018-05-05T13:13:14Z 0 6299.9
566 566 2018-05-05T12:13:14Z 2018-05-05T13:13:14Z 0 6299.6
How do I achieve that? I need to ignore the date and just evaluate the time component. I could traverse the dataframe in a loop and evaluate the date time in that way, but there must be a more simple way to do that..
I converted the messageDate which was read a a string to a dateTime by
df["messageDate"]=pd.to_datetime(df["messageDate"])
But after that I got stuck on how to filter on time only.
Any input appreciated.
datetime columns have DatetimeProperties object, from which you can extract datetime.time and filter on it:
import datetime
df = pd.DataFrame(
[
'2018-04-12T12:00:00Z', '2018-04-12T14:00:00Z','2018-04-12T20:00:00Z',
'2018-04-13T12:00:00Z', '2018-04-13T14:00:00Z', '2018-04-13T20:00:00Z'
],
columns=['messageDate']
)
df
messageDate
# 0 2018-04-12 12:00:00
# 1 2018-04-12 14:00:00
# 2 2018-04-12 20:00:00
# 3 2018-04-13 12:00:00
# 4 2018-04-13 14:00:00
# 5 2018-04-13 20:00:00
df["messageDate"] = pd.to_datetime(df["messageDate"])
time_mask = (df['messageDate'].dt.hour >= 13) & \
(df['messageDate'].dt.hour <= 15)
df[time_mask]
# messageDate
# 1 2018-04-12 14:00:00
# 4 2018-04-13 14:00:00
I hope the code is self explanatory. You can always ask questions.
import pandas as pd
# Prepping data for example
dates = pd.date_range('1/1/2018', periods=7, freq='H')
data = {'A' : range(7)}
df = pd.DataFrame(index = dates, data = data)
print df
# A
# 2018-01-01 00:00:00 0
# 2018-01-01 01:00:00 1
# 2018-01-01 02:00:00 2
# 2018-01-01 03:00:00 3
# 2018-01-01 04:00:00 4
# 2018-01-01 05:00:00 5
# 2018-01-01 06:00:00 6
# Creating a mask to filter the value we with to have or not.
# Here, we use df.index because the index is our datetime.
# If the datetime is a column, you can always say df['column_name']
mask = (df.index > '2018-1-1 01:00:00') & (df.index < '2018-1-1 05:00:00')
print mask
# [False False True True True False False]
df_with_good_dates = df.loc[mask]
print df_with_good_dates
# A
# 2018-01-01 02:00:00 2
# 2018-01-01 03:00:00 3
# 2018-01-01 04:00:00 4
df=df[(df["messageDate"].apply(lambda x : x.hour)>13) & (df["messageDate"].apply(lambda x : x.hour)<15)]
You can use x.minute, x.second similarly.
try this after ensuring messageDate is indeed datetime format as you have done
df.set_index('messageDate',inplace=True)
choseInd = [ind for ind in df.index if (ind.hour>=13)&(ind.hour<=15)]
df_select = df.loc[choseInd]
you can do the same, even without making the datetime column as an index, as the answer with apply: lambda shows
it just makes your dataframe 'better looking' if the datetime is your index rather than numerical one.

Categories