I am working in a dataframe in Pandas that looks like this.
Identifier datetime
0 AL011851 00:00:00
1 AL011851 06:00:00
2 Al011851 12:00:00
This is my code so far:
import pandas as pd
hurricane_df = pd.read_csv("hurdat2.csv",parse_dates=['datetime'])
hurricane_df['datetime'] = pd.to_timedelta(hurricane_df['datetime'].dt.strftime('%H:%M:%S'))
hurricane_df
grouped = hurricane_df.groupby('datetime').size()
grouped
What I did was convert the datetime column to a timedelta to get the hours. I want to get the size of the datetime column but I want just hours like 1:00, 2:00, 3:00, etc. but I get minute intervals as well like 1:15 and 2:45.
Any way to just display the hour?
Thank you.
You can use pandas.Timestamp.round with Series.dt shortcut:
df['datetime'] = df['datetime'].dt.round('h')
So
... datetime
01:15:00
02:45:00
becomes
... datetime
01:00:00
03:00:00
df = pd.DataFrame({'Identifier':['AL011851','AL011851','AL011851'],'datetime': ["2018-12-08 16:35:23","2018-12-08 14:20:45", "2018-12-08 11:45:00"]})
df['datetime'] = pd.to_datetime(df['datetime'])
df
Identifier datetime
0 AL011851 2018-12-08 16:35:23
1 AL011851 2018-12-08 14:20:45
2 AL011851 2018-12-08 11:45:00
# Rounds to nearest hour
def roundHour(t):
return (t.replace(second=0, microsecond=0, minute=0, hour=t.hour)
+timedelta(hours=t.minute//30))
df.datetime=df.datetime.map(lambda t: roundHour(t)) # Step 1: Round to nearest hour
df.datetime=df.datetime.map(lambda t: t.strftime('%H:%M')) # Step 2: Remove seconds
df
Identifier datetime
0 AL011851 17:00
1 AL011851 14:00
2 AL011851 12:00
Related
I have a dataframe where I have a date and a time column.
Each row describes some event. I want to calculate the timespan for each different day and add it as new row. The actual calculation is not that important (which units etc.), I just want to know, how I can the first and last row for each date, to access the time value.
The dataframe is already sorted by date and all rows of the same date are also ordered by the time.
Minimal example of what I have
import pandas as pd
df = pd.DataFrame({"Date": ["01.01.2020", "01.01.2020", "01.01.2020", "02.02.2022", "02.02.2022"],
"Time": ["12:00", "13:00", "14:45", "02:00", "08:00"]})
df
and what I want
EDIT: The duration column should be calculated by
14:45 - 12:00 = 2:45 for the first date and
08:00 - 02:00 = 6:00 for the second date.
I suspect this is possible with the groupby function but I am not sure how exactly to do it.
I hope you will find this helpful.
import pandas as pd
df = pd.DataFrame({"Date": ["01.01.2020", "01.01.2020", "01.01.2020", "02.02.2022", "02.02.2022"],
"Time": ["12:00", "13:00", "14:45", "02:00", "08:00"]})
df["Datetime"] = pd.to_datetime((df["Date"] + " " + df["Time"]))
def date_diff(df):
df["Duration"] = df["Datetime"].max() - df["Datetime"].min()
return df
df = df.groupby("Date").apply(date_diff)
df = df.drop("Datetime", axis=1)
Output:
Date Time Duration
0 01.01.2020 12:00 0 days 02:45:00
1 01.01.2020 13:00 0 days 02:45:00
2 01.01.2020 14:45 0 days 02:45:00
3 02.02.2022 02:00 0 days 06:00:00
4 02.02.2022 08:00 0 days 06:00:00
You can then do some string styling:
df['Duration'] = df['Duration'].astype(str).map(lambda x: x[7:12])
Output:
Date Time Duration
0 01.01.2020 12:00 02:45
1 01.01.2020 13:00 02:45
2 01.01.2020 14:45 02:45
3 02.02.2022 02:00 06:00
4 02.02.2022 08:00 06:00
here is one way to do it
# groupby on Date and find the difference of max and min time in each group
# format it as HH:MM by extracting Hours and minutes
# and creating a dictionary
d=dict((df.groupby('Date')['Time'].apply(lambda x:
(pd.to_timedelta(x.max() +':00') -
pd.to_timedelta(x.min() +':00')
)
).astype(str).str.extract(r'days (..:..)')
).reset_index().values)
# map the dictionary and update the duration in DF
df['duration']=df['Date'].map(d)
df
Date Time duration
0 01.01.2020 12:00 02:45
1 01.01.2020 13:00 02:45
2 01.01.2020 14:45 02:45
3 02.02.2022 02:00 06:00
4 02.02.2022 08:00 06:00
By the example shown below, you can achieve want you want.
df['Start time'] = df.apply(lambda row: df[df['Date'] == row['Date']]['Time'].max(), axis=1)
df
Update:
import datetime
df['Duration'] = df.apply(lambda row: str(datetime.timedelta(seconds=(datetime.datetime.strptime(df[df['Date'] == row['Date']]['Time'].max(), '%H:%M') - datetime.datetime.strptime(df[df['Date'] == row['Date']]['Time'].min(), '%H:%M')).total_seconds())) , axis=1)
df
you can use:
from datetime import timedelta
import numpy as np
df['xdate']=pd.to_datetime(df['Date'] + ' '+ df['Time'],format='%d.%m.%Y %H:%M')
df['max']=df.groupby(df['xdate'].dt.date)['xdate'].transform(np.max) #get max dates each date
df['min']=df.groupby(df['xdate'].dt.date)['xdate'].transform(np.min) #get min date each date
#get difference max and min dates
df['Duration']= df[['min','max']].apply(lambda x: x['max'] - timedelta(hours=x['min'].hour,minutes=x['min'].minute,seconds=x['min'].second),axis=1).dt.strftime('%H:%M')
df=df.drop(['xdate','min','max'],axis=1)
print(df)
'''
Date Time Duration
0 01.01.2020 12:00 02:45
1 01.01.2020 13:00 02:45
2 01.01.2020 14:45 02:45
3 02.02.2022 02:00 06:00
4 02.02.2022 08:00 06:00
'''
Any ideas on how I can manipulate my current date-time data to make it suitable for use when converting the datatype to time?
For example:
df1['Date/Time'] = pd.to_datetime(df1['Date/Time'])
The current format for the data is mm/dd 00:00:00
an example of the column in the dataframe can be seen below.
Date/Time Dry_Temp[C] Wet_Temp[C] Solar_Diffuse_Rate[[W/m2]] \
0 01/01 00:10:00 8.45 8.237306 0.0
1 01/01 00:20:00 7.30 6.968360 0.0
2 01/01 00:30:00 6.15 5.710239 0.0
3 01/01 00:40:00 5.00 4.462898 0.0
4 01/01 00:50:00 3.85 3.226244 0.0
For the condition where the hour is denoted as 24, you have two choices. First you can simply reset the hour to 00 and second you can reset the hour to 00 and also add 1 to the date.
In either case the first step is detecting the condition which can be done with a simple find statement t.find(' 24:')
Having detected the condition in the first case it is a simple matter of reseting the hour to 00 and proceeding with the process of formatting the field. In the second case, however, adding 1 to the day is a little more complicated because of the fact you can roll over to next month.
Here is the approach I would use:
Given a df of form:
Date Time
0 01/01 00:00:00
1 01/01 00:24:00
2 01/01 24:00:00
3 01/31 24:00:00
The First Case
def parseDate2(tx):
ti = tx.find(' 24:')
if ti >= 0:
tk = pd.to_datetime(tx[:5]+' 00:'+tx[10:], format= '%m/%d %H:%M:%S')
return tk + du.relativedelta.relativedelta(hours=+24)
return pd.to_datetime(tx, format= '%m/%d %H:%M:%S')
df['Date Time'] = df['Date Time'].apply(lambda x: parseDate(x))
Produces the following:
Date Time
0 1900-01-01 00:00:00
1 1900-01-01 00:24:00
2 1900-01-01 00:00:00
3 1900-01-31 00:00:00
For the second case, I employed the dateutil relativedelta library and slightly modified my parseDate funstion as shown below:
import dateutil as du
def parseDate2(tx):
ti = tx.find(' 24:')
if ti >= 0:
tk = pd.to_datetime(tx[:5]+' 00:'+tx[10:], format= '%m/%d %H:%M:%S')
return tk + du.relativedelta.relativedelta(hours=+24)
return pd.to_datetime(tx, format= '%m/%d %H:%M:%S')
df['Date Time'] = df['Date Time'].apply(lambda x: parseDate2(x))
Yields:
Date Time
0 1900-01-01 00:00:00
1 1900-01-01 00:24:00
2 1900-01-02 00:00:00
3 1900-02-01 00:00:00
To access the values of the datetime (namely the time), you can use:
# These are now in a usable format
seconds = df1['Date/Time'].dt.second
minutes = df1['Date/Time'].dt.minute
hours = df1['Date/Time'].dt.hours
And if need be, you can create its own independent time series with:
df1['Dat/Time'].dt.time
I have the following df:
time_series date sales
store_0090_item_85261507 1/2020 1,0
store_0090_item_85261501 2/2020 0,0
store_0090_item_85261500 3/2020 6,0
Being 'date' = Week/Year.
So, I tried use the following code:
df['date'] = df['date'].apply(lambda x: datetime.strptime(x + '/0', "%U/%Y/%w"))
But, return this df:
time_series date sales
store_0090_item_85261507 2020-01-05 1,0
store_0090_item_85261501 2020-01-12 0,0
store_0090_item_85261500 2020-01-19 6,0
But, the first day of the first week of 2020 is 2019-12-29, considering sunday as first day. How can I have the first day 2020-12-29 of the first week of 2020 and not 2020-01-05?
From the datetime module's documentation:
%U: Week number of the year (Sunday as the first day of the week) as a zero padded decimal number. All days in a new year preceding the first Sunday are considered to be in week 0.
Edit: My originals answer doesn't work for input 1/2023 and using ISO 8601 date values doesn't work for 1/2021, so I've edited this answer by adding a custom function
Here is a way with a custom function
import pandas as pd
from datetime import datetime, timedelta
##############################################
# to demonstrate issues with certain dates
print(datetime.strptime('0/2020/0', "%U/%Y/%w")) # 2019-12-29 00:00:00
print(datetime.strptime('1/2020/0', "%U/%Y/%w")) # 2020-01-05 00:00:00
print(datetime.strptime('0/2021/0', "%U/%Y/%w")) # 2020-12-27 00:00:00
print(datetime.strptime('1/2021/0', "%U/%Y/%w")) # 2021-01-03 00:00:00
print(datetime.strptime('0/2023/0', "%U/%Y/%w")) # 2023-01-01 00:00:00
print(datetime.strptime('1/2023/0', "%U/%Y/%w")) # 2023-01-01 00:00:00
#################################################
df = pd.DataFrame({'date':["1/2020", "2/2020", "3/2020", "1/2021", "2/2021", "1/2023", "2/2023"]})
print(df)
def get_first_day(date):
date0 = datetime.strptime('0/' + date.split('/')[1] + '/0', "%U/%Y/%w")
date1 = datetime.strptime('1/' + date.split('/')[1] + '/0', "%U/%Y/%w")
date = datetime.strptime(date + '/0', "%U/%Y/%w")
return date if date0 == date1 else date - timedelta(weeks=1)
df['new_date'] = df['date'].apply(lambda x:get_first_day(x))
print(df)
Input
date
0 1/2020
1 2/2020
2 3/2020
3 1/2021
4 2/2021
5 1/2023
6 2/2023
Output
date new_date
0 1/2020 2019-12-29
1 2/2020 2020-01-05
2 3/2020 2020-01-12
3 1/2021 2020-12-27
4 2/2021 2021-01-03
5 1/2023 2023-01-01
6 2/2023 2023-01-08
You'll want to use ISO week parsing directives, Ex:
import pandas as pd
date = pd.Series(["1/2020", "2/2020", "3/2020"])
pd.to_datetime(date+"/1", format="%V/%G/%u")
0 2019-12-30
1 2020-01-06
2 2020-01-13
dtype: datetime64[ns]
you can also shift by one day if the week should start on Sunday:
pd.to_datetime(date+"/1", format="%V/%G/%u") - pd.Timedelta('1d')
0 2019-12-29
1 2020-01-05
2 2020-01-12
dtype: datetime64[ns]
I have a pandas Dataframe in which one of the column is pandas datetime column created using pd.to_datetime()1. I want to extract the date and hour from each datetime object, in other words, I want to change the minute and seconds to 0.
I used normalize() to change the time to midnight but don't how how to change the time to start of the hour. Please suggest a way to do so.
making some test data and turning it into a dataframe
rng = pd.date_range('1/1/2018 11:59:00', periods=3, freq='min')
df = pd.DataFrame(rng)
print(df)
print(df[0].round('H'))
gives the input
0
0 2018-01-01 11:59:00
1 2018-01-01 12:00:00
2 2018-01-01 12:01:00
and rounded to the nearest hour gives
0
0 2018-01-01 12:00:00
1 2018-01-01 12:00:00
2 2018-01-01 12:00:00
and
print(df[0].dt.floor('H'))
gives
0
0 2018-01-01 11:00:00
1 2018-01-01 12:00:00
2 2018-01-01 12:00:00
if you always want to round down. Likewise, ceil('H') if you want to round up
I think you need to checkout pandas.Series.dt.strftime
Or try this:
import datetime
df=pd.DataFrame({'timestamp':[pd.Timestamp('today')]})
df['Date']=[pd.to_datetime(i.date())+ datetime.timedelta(hours=i.hour) for i in df['timestamp']]
I have a pandas dataframe with time periods in the second column. Every period represents 30 minutes and it goes all the way up to 48 periods (24 hours). Is there some way to change the integers representing the periods into a time format and concatenate it with the date column for a full datetime? E.g. 1 becomes 00:30, 2 becomes 01:00, 3 becomes 01:30 and so on.
You can cast the DATE column to datetime and add a timedelta of 30 minutes multiplied by PERIOD.
import pandas as pd
df = pd.DataFrame({'DATE':['2015-01-03', '2015-01-03', '2015-01-03'],
'PERIOD':[1,2,3]})
df['DATETIME'] = pd.to_datetime(df['DATE']) + df['PERIOD'] * pd.Timedelta(30, unit='min')
# df
# DATE PERIOD DATETIME
# 0 2015-01-03 1 2015-01-03 00:30:00
# 1 2015-01-03 2 2015-01-03 01:00:00
# 2 2015-01-03 3 2015-01-03 01:30:00