how to apply condition in datetime generated from date_range function? - python

I have function which is generating a time stamp of 10 days at interval of 15mins. These values are stored in a numpy array. I want to apply a condition on each that checks when the time is between 6am and 7pm then print "something" and when its above 7pm to next day till 6am print "something else".
my code for time stamp:
import pandas as pd
import numpy as np
timeStamp = pd.date_range(start='1/1/2020', end='1/10/2020', freq='15min')
chargeTime = datetime.time(6,0,0)
dischargeTime = datetime.time(19,0,0)
I don't know how to grab the date from the timeStamp and compare it chargeTime and dischargeTime.

I think what you want is something like this, where you set the charge_time and discharge_time to integers and compare to the .hour attribute of the timestamp:
import pandas as pd
import numpy as np
time_stamps = pd.date_range(start='1/1/2020', end='1/10/2020', freq='15min')
charge_time = 6 # 6am
discharge_time = 19 # 7pm
for t in time_stamps:
if charge_time <= t.hour < discharge_time:
print str(t) + " - something"
else:
print str(t) + " - something else"
which will print the timestamp and then the message like this:
2020-01-09 18:45:00 - something
2020-01-09 19:00:00 - something else

This should work. I'm using dt.hour to extract the hour of the datetime object and then np.where() to check for the conditions if it's between 6 and 19, and "something" else (between 19 and 6) "something else". This avoids using for loops and is pretty efficient:
timeStamp = pd.date_range(start='1/1/2020', end='1/10/2020', freq='15min')
df = timeStamp.to_frame()
df.columns = ['dates']
df['conditions'] = np.where((df['dates'].dt.hour > 6) & (df['dates'].dt.hour < 19),"something","something else")
For example:
print(df.loc['2020-01-01 06:45:00':'2020-01-01 07:30:00'])
Output:
dates conditions
2020-01-01 06:45:00 2020-01-01 06:45:00 something else
2020-01-01 07:00:00 2020-01-01 07:00:00 something
2020-01-01 07:15:00 2020-01-01 07:15:00 something
2020-01-01 07:30:00 2020-01-01 07:30:00 something
Finally, if you wish to provide different values for charge_time and discharge_time you can simply define them and use them in the np.where().
charge_time = 6
discharge_time = 19
df['conditions'] = np.where((df['dates'].dt.hour > charge_time) & (df['dates'].dt.hour < discharge_time),"something","something else")

Related

Pandas change time values based on condition

I have a dataframe:
data = {'time':['08:45:00', '09:30:00', '18:00:00', '15:00:00']}
df = pd.DataFrame(data)
I would like to convert the time based on conditions: if the hour is less than 9, I want to set it to 9 and if the hour is more than 17, I need to set it to 17.
I tried this approach:
df['time'] = np.where(((df['time'].dt.hour < 9) & (df['time'].dt.hour != 0)), dt.time(9, 00))
I am getting an error: Can only use .dt. accesor with datetimelike values.
Can anyone please help me with this? Thanks.
Here's a way to do what your question asks:
df.time = pd.to_datetime(df.time)
df.loc[df.time.dt.hour < 9, 'time'] = (df.time.astype('int64') + (9 - df.time.dt.hour)*3600*1000000000).astype('datetime64[ns]')
df.loc[df.time.dt.hour > 17, 'time'] = (df.time.astype('int64') + (17 - df.time.dt.hour)*3600*1000000000).astype('datetime64[ns]')
Input:
time
0 2022-06-06 08:45:00
1 2022-06-06 09:30:00
2 2022-06-06 18:00:00
3 2022-06-06 15:00:00
Output:
time
0 2022-06-06 09:45:00
1 2022-06-06 09:30:00
2 2022-06-06 17:00:00
3 2022-06-06 15:00:00
UPDATE:
Here's alternative code to try to address OP's error as described in the comments:
import pandas as pd
import datetime
data = {'time':['08:45:00', '09:30:00', '18:00:00', '15:00:00']}
df = pd.DataFrame(data)
print('', 'df loaded as strings:', df, sep='\n')
df.time = pd.to_datetime(df.time, format='%H:%M:%S')
print('', 'df converted to datetime by pd.to_datetime():', df, sep='\n')
df.loc[df.time.dt.hour < 9, 'time'] = (df.time.astype('int64') + (9 - df.time.dt.hour)*3600*1000000000).astype('datetime64[ns]')
df.loc[df.time.dt.hour > 17, 'time'] = (df.time.astype('int64') + (17 - df.time.dt.hour)*3600*1000000000).astype('datetime64[ns]')
df.time = [time.time() for time in pd.to_datetime(df.time)]
print('', 'df with time column adjusted to have hour between 9 and 17, converted to type "time":', df, sep='\n')
Output:
df loaded as strings:
time
0 08:45:00
1 09:30:00
2 18:00:00
3 15:00:00
df converted to datetime by pd.to_datetime():
time
0 1900-01-01 08:45:00
1 1900-01-01 09:30:00
2 1900-01-01 18:00:00
3 1900-01-01 15:00:00
df with time column adjusted to have hour between 9 and 17, converted to type "time":
time
0 09:45:00
1 09:30:00
2 17:00:00
3 15:00:00
UPDATE #2:
To not just change the hour for out-of-window times, but to simply apply 9:00 and 17:00 as min and max times, respectively (see OP's comment on this), you can do this:
df.loc[df['time'].dt.hour < 9, 'time'] = pd.to_datetime(pd.DataFrame({
'year':df['time'].dt.year, 'month':df['time'].dt.month, 'day':df['time'].dt.day,
'hour':[9]*len(df.index)}))
df.loc[df['time'].dt.hour > 17, 'time'] = pd.to_datetime(pd.DataFrame({
'year':df['time'].dt.year, 'month':df['time'].dt.month, 'day':df['time'].dt.day,
'hour':[17]*len(df.index)}))
df['time'] = [time.time() for time in pd.to_datetime(df['time'])]
Since your 'time' column contains strings they can kept as strings and assign new string values where appropriate. To filter for your criteria it is convenient to: create datetime Series from the 'time' column, create boolean Series by comparing the datetime Series with your criteria, use the boolean Series to filter the rows which need to be changed.
Your data:
import numpy as np
import pandas as pd
data = {'time':['08:45:00', '09:30:00', '18:00:00', '15:00:00']}
df = pd.DataFrame(data)
print(df.to_string())
>>>
time
0 08:45:00
1 09:30:00
2 18:00:00
3 15:00:00
Convert to datetime, make boolean Series with your criteria
dts = pd.to_datetime(df['time'])
lt_nine = dts.dt.hour < 9
gt_seventeen = (dts.dt.hour >= 17)
print(lt_nine)
print(gt_seventeen)
>>>
0 True
1 False
2 False
3 False
Name: time, dtype: bool
0 False
1 False
2 True
3 False
Name: time, dtype: bool
Use the boolean series to assign a new value:
df.loc[lt_nine,'time'] = '09:00:00'
df.loc[gt_seventeen,'time'] = '17:00:00'
print(df.to_string())
>>>
time
0 09:00:00
1 09:30:00
2 17:00:00
3 15:00:00
Or just stick with strings altogether and create the boolean Series using regex patterns and .str.match.
data = {'time':['08:45:00', '09:30:00', '18:00:00', '15:00:00','07:22:00','22:02:06']}
dg = pd.DataFrame(data)
print(dg.to_string())
>>>
time
0 08:45:00
1 09:30:00
2 18:00:00
3 15:00:00
4 07:22:00
5 22:02:06
# regex patterns
pattern_lt_nine = '^00|01|02|03|04|05|06|07|08'
pattern_gt_seventeen = '^17|18|19|20|21|22|23'
Make boolean Series and assign new values
gt_seventeen = dg['time'].str.match(pattern_gt_seventeen)
lt_nine = dg['time'].str.match(pattern_lt_nine)
dg.loc[lt_nine,'time'] = '09:00:00'
dg.loc[gt_seventeen,'time'] = '17:00:00'
print(dg.to_string())
>>>
time
0 09:00:00
1 09:30:00
2 17:00:00
3 15:00:00
4 09:00:00
5 17:00:00
Time series / date functionality
Working with text data

Reading in Date / Time Values Correctly

Any ideas on how I can manipulate my current date-time data to make it suitable for use when converting the datatype to time?
For example:
df1['Date/Time'] = pd.to_datetime(df1['Date/Time'])
The current format for the data is mm/dd 00:00:00
an example of the column in the dataframe can be seen below.
Date/Time Dry_Temp[C] Wet_Temp[C] Solar_Diffuse_Rate[[W/m2]] \
0 01/01 00:10:00 8.45 8.237306 0.0
1 01/01 00:20:00 7.30 6.968360 0.0
2 01/01 00:30:00 6.15 5.710239 0.0
3 01/01 00:40:00 5.00 4.462898 0.0
4 01/01 00:50:00 3.85 3.226244 0.0
For the condition where the hour is denoted as 24, you have two choices. First you can simply reset the hour to 00 and second you can reset the hour to 00 and also add 1 to the date.
In either case the first step is detecting the condition which can be done with a simple find statement t.find(' 24:')
Having detected the condition in the first case it is a simple matter of reseting the hour to 00 and proceeding with the process of formatting the field. In the second case, however, adding 1 to the day is a little more complicated because of the fact you can roll over to next month.
Here is the approach I would use:
Given a df of form:
Date Time
0 01/01 00:00:00
1 01/01 00:24:00
2 01/01 24:00:00
3 01/31 24:00:00
The First Case
def parseDate2(tx):
ti = tx.find(' 24:')
if ti >= 0:
tk = pd.to_datetime(tx[:5]+' 00:'+tx[10:], format= '%m/%d %H:%M:%S')
return tk + du.relativedelta.relativedelta(hours=+24)
return pd.to_datetime(tx, format= '%m/%d %H:%M:%S')
df['Date Time'] = df['Date Time'].apply(lambda x: parseDate(x))
Produces the following:
Date Time
0 1900-01-01 00:00:00
1 1900-01-01 00:24:00
2 1900-01-01 00:00:00
3 1900-01-31 00:00:00
For the second case, I employed the dateutil relativedelta library and slightly modified my parseDate funstion as shown below:
import dateutil as du
def parseDate2(tx):
ti = tx.find(' 24:')
if ti >= 0:
tk = pd.to_datetime(tx[:5]+' 00:'+tx[10:], format= '%m/%d %H:%M:%S')
return tk + du.relativedelta.relativedelta(hours=+24)
return pd.to_datetime(tx, format= '%m/%d %H:%M:%S')
df['Date Time'] = df['Date Time'].apply(lambda x: parseDate2(x))
Yields:
Date Time
0 1900-01-01 00:00:00
1 1900-01-01 00:24:00
2 1900-01-02 00:00:00
3 1900-02-01 00:00:00
​
To access the values of the datetime (namely the time), you can use:
# These are now in a usable format
seconds = df1['Date/Time'].dt.second
minutes = df1['Date/Time'].dt.minute
hours = df1['Date/Time'].dt.hours
And if need be, you can create its own independent time series with:
df1['Dat/Time'].dt.time

Datetime difference between 2 columns with datetime/str - Python

I have a dataset - below
Create Complete
0 2005-01-02 01:15:00 2005-01-05 14:05:00
1 2005-01-06 00:00:00 open
I want to get the difference in minutes between the two using the below code. However as the 'complete' column also contains a string value, how can I get pandas to ign
df['diff_mins'] = df.Create - df.Complete
you can use pd.to_datetime for example:
import pandas as pd
df = pd.DataFrame([
['2005-01-02 01:15:00', '2005-01-05 14:05:00'],
['2005-01-06 00:00:00', 'open']],
columns=('Create', 'Complete')
)
and then:
df['diff_mins'] = (
pd.to_datetime(df.Create) - pd.to_datetime(df.Complete, errors='coerce')
)
to get the value in hours, just implement simple lambda function lambda x: x.total_seconds() / 60 / 60:
df['diff_mins_hours'] = (
pd.to_datetime(df.Create) - pd.to_datetime(df.Complete, errors='coerce')
).apply(lambda x: x.total_seconds() / 60 / 60)
give you:
print(df)
Create Complete diff_mins diff_mins_hours
0 2005-01-02 01:15:00 2005-01-05 14:05:00 -4 days +11:10:00 -84.833333
1 2005-01-06 00:00:00 open NaT NaN
I tried to do it using map. It should look something like this:
import datetime
def get_diff_mins(elem_a, elem_b):
if (elem_b=='open'):
elem_b = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
a = elem_a.replace(' ', '-').replace(':','-').split('-')
b = elem_b.replace(' ', '-').replace(':','-').split('-')
# Roughly converts yearly time to mins
# since month is always considered 30 days
f = [60*24*30*12, 60*24*30, 60*24, 60, 1, 0]
mins_a = sum([int(a)*f for a,f in zip(a,f)])
mins_b = sum([int(b)*f for b,f in zip(b,f)])
return mins_a-mins_b
df['diff_mins'] = map(get_diff_mins, df.Create, df.Complete)

Rounding datetime based on time of day

I have a pandas dataframe with timestamps shown below:
6/30/2019 3:45:00 PM
I would like to round the date based on time. Anything before 6AM will be counted as the day before.
6/30/2019 5:45:00 AM -> 6/29/2019
6/30/2019 6:30:00 AM -> 6/30/2019
What I have considered doing is splitting date and time into 2 different columns then using an if statement to shift the date (if time >= 06:00 etc). Just wondering there is a built in function in pandas to do this. Ive seen posts of people rounding up and down based on the closest hour but never a specific time threshold (6AM).
Thank you for the help!
there could be a better way to do this.. But this is one way of doing it.
import pandas as pd
def checkDates(d):
if d.time().hour < 6:
return d - pd.Timedelta(days=1)
else:
return d
ls = ["12/31/2019 3:45:00 AM", "6/30/2019 9:45:00 PM", "6/30/2019 10:45:00 PM", "1/1/2019 4:45:00 AM"]
df = pd.DataFrame(ls, columns=["dates"])
df["dates"] = df["dates"].apply(lambda d: checkDates(pd.to_datetime(d)))
print (df)
dates
0 2019-12-30 03:45:00
1 2019-06-30 21:45:00
2 2019-06-30 22:45:00
3 2018-12-31 04:45:00
Also note i am not taking into consideration of the time. when giving back the result..
if you just want the date at the end of it you can just get that out of the datetime object doing something like this
print ((pd.to_datetime("12/31/2019 3:45:00 AM")).date()) >>> 2019-12-31
if understand python well and dont want anyone else(in the future) to understand what your are doing
one liner to the above is.
df["dates"] = df["dates"].apply(lambda d: pd.to_datetime(d) - pd.Timedelta(days=1) if pd.to_datetime(d).time().hour < 6 else pd.to_datetime(d))

How aggregate a pandas date timeline series only by hour

I have a pandas timeline table containing dates objects and scores:
datetime score
2018-11-23 08:33:02 4
2018-11-24 09:43:30 2
2018-11-25 08:21:34 5
2018-11-26 19:33:01 4
2018-11-23 08:50:40 1
2018-11-23 09:03:10 3
I want to aggregate the score by hour without taking into consideration the date, the result desired is :
08:00:00 10
09:00:00 5
19:00:00 4
So basically I have to remove the date-month-year, and then group score by hour,
I tried this command
monthagg = df['score'].resample('H').sum().to_frame()
Which does work but takes into consideration the date-month-year, How to remove DD-MM-YYYY and aggregate by Hour?
One possible solution is use DatetimeIndex.floor for set minutes and seconds to 0 and then convert DatetimeIndex to strings by DatetimeIndex.strftime, then aggregate sum:
a = df['score'].groupby(df.index.floor('H').strftime('%H:%M:%S')).sum()
#if column datetime
#a = df['score'].groupby(df['datetime'].dt.floor('H').dt.strftime('%H:%M:%S')).sum()
print (a)
08:00:00 10
09:00:00 5
19:00:00 4
Name: score, dtype: int64
Or use DatetimeIndex.hour and aggregate sum:
a = df.groupby(df.index.hour)['score'].sum()
#if column datetime
#a = df.groupby(df['datetime'].dt.hour)['score'].sum()
print (a)
datetime
8 10
9 5
19 4
Name: score, dtype: int64
Setup to generate a frame with datetime objects:
import datetime
import pandas as pd
rows = [datetime.datetime.now() + datetime.timedelta(hours=i) for i in range(100)]
df = pd.DataFrame(rows,columns = ["date"])
You can now add a hour-column like this, and then group by it:
df["hour"] = df["date"].dt.hour
df.groupby("hour").sum()
import pandas as pd
df = pd.DataFrame({'datetime':['2018-11-23 08:33:02 ','2018-11-24 09:43:30',
'2018-11-25 08:21:34',
'2018-11-26 19:33:01','2018-11-23 08:50:40',
'2018-11-23 09:03:10'],'score':[4,2,5,4,1,3]})
df['datetime']=pd.to_datetime(df['datetime'], errors='coerce')
df["hour"] = df["datetime"].dt.hour
df.groupby("hour").sum()
Output:
8 10
9 5
19 4

Categories