FIlter Data Frame by time - python

I have a large data frame that is being imported through an excel sheet. I already filtered it to exclude weekends but also need to do the same so only daytime hours eg 7:00 - 18:00 will be displayed. Here is what the data frame looks like after successfully taking out weekends.
picture of data
isBusinessDay = BDay().is_on_offset
match_series = pd.to_datetime(df['timestamp(America/New_York)']).map(isBusinessDay)
df_new = df[match_series]
df_new

A simple approach is to use filters on your datetime field using the Series dt accessor.
In this case...
filt = (df['timestamp(America/New_York)'].dt.hour >= 7) & (df['timestamp(America/New_York)'].dt.hour <= 18)
df_filtered = df.loc[filt, :]
More reading: https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.html
For more and a sample of this in action, see the below code block. The random date generator was taken from here and modified slightly.
import random
import time
import pandas as pd
def str_time_prop(start, end, time_format, prop):
"""Get a time at a proportion of a range of two formatted times.
start and end should be strings specifying times formatted in the
given format (strftime-style), giving an interval [start, end].
prop specifies how a proportion of the interval to be taken after
start. The returned time will be in the specified format.
"""
stime = time.mktime(time.strptime(start, time_format))
etime = time.mktime(time.strptime(end, time_format))
ptime = stime + prop * (etime - stime)
return time.strftime(time_format, time.localtime(ptime))
def random_date(start, end, prop):
return str_time_prop(start, end, '%Y-%m-%d %I:%M %p', prop)
dates = {'dtfield':[random_date("2007-1-1 1:30 PM", "2009-1-1 4:50 AM", random.random()) for n in range(1000)]}
df = pd.DataFrame(data=dates)
df['dtfield'] = pd.to_datetime(df['dtfield'])
filt = (df['dtfield'].dt.hour >= 7) & (df['dtfield'].dt.hour <= 18)
df_filtered = df.loc[filt, :]
df_filtered

Related

Is there a way to get Pandas DataFrame values based on timestamp from another column in Python?

I have rain data and sensor data that is collected on 15min intervals. What I want to do is only collect sensor data 72 hours after the last rain drop has fallen. If rain is observed between that time, the counter resets until 72 hours dry time is observed.
I converted the data to timestamp data but can't figure out the logic for the above. Links to example data as well as example tables below.
Timestamp
Precipitation(mm)
2021-04-01 00:15
6
2021-04-01 00:30
0
Timestamp
Sensor Depth (mm)
2021-04-01 00:15
12
2021-04-01 00:30
4
example rain data
example sensor data
import pandas as pd
import matplotlib.pyplot as plt
import os
from datetime import datetime, date, time
file = pd.read_csv('example_sensor.csv')
rain = pd.read_csv('example_rain.csv')
east1_df = pd.DataFrame(file)
east1_df['Timestamp'] = pd.to_datetime(east1_df['Timestamp'], format='%Y-%m-%d %H:%M')
east1_df.index=east1__df['Timestamp']
rain['Timestamp'] = pd.to_datetime(rain['Timestamp'], format='%Y-%m-%d %H:%M')
rain.index=rain['Timestamp']#pd.DatetimeIndex([east1_spring_df['Timestamp']], dtype='datetime64[ns]', freq=None)
I am not aware of a pandas functionality to achieve this.
However, there is a way to do this with numpy. You would just need to extract the data from the dataframe.
Using a boxcar function one can filter for events which span a certain period by convolving it with the rainfall data.
Here's a minimal example on how to achieve this using numpy:
import numpy as np
from datetime import datetime, timedelta
def datetime_range(start, end, delta):
result = []
current = start
while current < end:
result.append(current)
current += delta
return result
def create_boxcar(dry_hours, delta_minutes):
n_dry = dry_hours * 60 // delta_minutes
return np.ones(n_dry) / n_dry
def create_data(delta_minutes):
stamps = np.array(datetime_range(datetime(2022, 2, 23), datetime(2022, 2, 28), timedelta(minutes=delta_minutes)))
rainfall = np.random.randn(len(stamps))-1 # shifted normal distribution
rainfall[rainfall < 0] = 0 # coerce negative values to zero
sensor = np.arange(len(stamps)) # just a ramp
return stamps, rainfall, sensor
delta_minutes = 15
stamps, rainfall, sensor = create_data(delta_minutes)
# get dry regions
no_rainfall = (rainfall == 0).astype(int)
# create boxcar filter with desired length
dry_hours_before_read = 3
box_filter = create_boxcar(dry_hours_before_read, delta_minutes)
# get regions with desired dry period:
# Convolve boxcar and data, look for a result of 1,
# i.e full overlap of boxcar and no_rainfall
readout_region = np.convolve(no_rainfall, box_filter, 'same') == 1
# get timestamps and values during dry period
timestamp_dry_enough = stamps[readout_region]
sensor_dry_enough = sensor[readout_region]
After that manipulation, you could feed that information back to the dataframe for further pandas-based filtering:
east1_df[f'no rain for {dry_hours_before_read} hours'] = readout_region

Pandas: Generate date intervals between two dates with yearly reset

I am trying to generate 8 day intervals between two-time periods using pandas.date_range. In addition, when the 8 day interval exceeds the end of year (i.e., 365/366), I would like the range start to reset to the beginning of respective year. Below is the example code for just two years, however, I do plan to use it across several years, e.g., 2014-01-01 to 2021-01-01.
import pandas as pd
print(pd.date_range(start='2018-12-01', end='2019-01-31', freq='8D'))
Results in,
DatetimeIndex(['2018-12-01', '2018-12-09', '2018-12-17', '2018-12-25','2019-01-02', '2019-01-10', '2019-01-18', '2019-01-26'], dtype='datetime64[ns]', freq='8D')
However, I would like the start of the interval in 2019 to reset to the first day, e.g., 2019-01-01
You could loop creating a date_range up to the start of the next year for each year, appending them until you hit the end date.
import pandas as pd
from datetime import date
def date_range_with_resets(start, end, freq):
start = date.fromisoformat(start)
end = date.fromisoformat(end)
result = pd.date_range(start=start, end=start, freq=freq) # initialize result with just start date
next_year_start = start.replace(year=start.year+1, month=1, day=1)
while next_year_start < end:
result = result.append(pd.date_range(start=start, end=next_year_start, freq=freq))
start = next_year_start
next_year_start = next_year_start.replace(year=next_year_start.year+1)
result = result.append(pd.date_range(start=start, end=end, freq=freq))
return result[1:] # remove duplicate start date
start = '2018-12-01'
end = '2019-01-31'
date_range_with_resets(start, end, freq='8D')
Edit:
Here's a simpler way without using datetime. Create a date_range of years between start and end, then loop through those.
def date_range_with_resets(start, end, freq):
years = pd.date_range(start=start, end=end, freq='YS') # YS=year start
if len(years) == 0:
return pd.date_range(start=start, end=end, freq=freq)
result = pd.date_range(start=start, end=years[0], freq=freq)
for i in range(0, len(years) - 1):
result = result.append(pd.date_range(start=years[i], end=years[i+1], freq=freq))
result = result.append(pd.date_range(start=years[-1], end=end, freq=freq))
return result

Getting list of months in between two dates according to specific format

start = "Nov20"
end = "Jan21"
# Expected output:
["Nov20", "Dec20", "Jan21"]
What I've tried so far is the following but am looking for more elegant way.
from calendar import month_abbr
from time import strptime
def get_range(a, b):
start = strptime(a[:3], '%b').tm_mon
end = strptime(b[:3], '%b').tm_mon
dates = []
for m in month_abbr[start:]:
dates.append(m+a[-2:])
for mm in month_abbr[1:end + 1]:
dates.append(mm+b[-2:])
print(dates)
get_range('Nov20', 'Jan21')
Note: i don't want to use pandas as that's not logical to import such library for generating dates.
The date range may span different years so one way is to loop from the start date to end date and increment the month by 1 until end date is reached.
Try this:
from datetime import datetime
def get_range(a, b):
start = datetime.strptime(a, '%b%y')
end = datetime.strptime(b, '%b%y')
dates = []
while start <= end:
dates.append(start.strftime('%b%y'))
if start.month == 12:
start = start.replace(month=1, year=start.year+1)
else:
start = start.replace(month=start.month+1)
return dates
dates = get_range("Nov20", "Jan21")
print(dates)
Output:
['Nov20', 'Dec20', 'Jan21']
You can use timedelta to step one month (31 days) forward, but make sure you stay on the 1st of the month, otherwise the days might add up and eventually skip a month.
from datetime import datetime
from datetime import timedelta
def get_range(a, b):
start = datetime.strptime(a, '%b%y')
end = datetime.strptime(b, '%b%y')
dates = []
while start <= end:
dates.append(start.strftime('%b%y'))
start = (start + timedelta(days=31)).replace(day=1) # go to 1st of next month
return dates
dates = get_range("Jan20", "Jan21")
print(dates)

find the interval of a given date in python

I'm struggling with date objects in python.
I have the following data:
from datetime import datetime, timedelta
# date retrieved from a list
ini = [u'2016-01-01']
# transform the ini in a readable string
ini2 = ', '.join(map(str, ini))
# transform the string a date object
date_1 = datetime.strptime(ini2, "%Y-%m-%d")
# number that is the length of the date
l = 365.0
# adding l to ini2
final = date_1 + timedelta(days = l)
Now I'd need to split the whole interval (that is the period from date_1 to final) by an input number (e.g. ts = 4) and, given another input date (e.g. new_date = u'2016-05-19') check in which interval it is (in the example 19th of May is in t2 = 2).
I hope I made myself clear enough.
Thanks
I tried different approaches but none seems the right one.
This might help:
from datetime import datetime, timedelta
def which_interval(date0, delta, date1, n_intervals):
date0 = datetime.strptime(date0, '%Y-%m-%d')
delta = timedelta(days = delta)
date1 = datetime.strptime(date1, '%Y-%m-%d')
delta1 = date1 - date0
quadrile = int(((float(delta1.days) / delta.days) * n_intervals))
return quadrile
# Example: figure out which quarter August 1st is in
interval = which_interval(
'2016-01-01',
366,
'2016-08-01',
4)
print '2016-08-01 is in interval %d, Q%d'%(interval, interval+1)
Note that this function uses python indices so it will start at quarter 0 and end at quarter 3. If you want 1-based indices (so the answer will be 1, 2, 3, or 4) you would want to add 1 to the result.
the timedelta object supports division, so use floor division by a step and you will get an interval in range(ts)
new_date = datetime.strptime(u'2016-05-19', "%Y-%m-%d")
ts = 4
step = timedelta(days=l)/ts #divide by the number of steps
interval = (new_date - date_1)//step #get the number this interval is in
so for date_1 <= new_date < date_1 + step interval will be 0, for date_1+step<=new_date < date_1 + step*2 interval will be 1, etc.
This of course is using python style indices so to get the number starting from 1, add one:
interval = (new_date - date_1)//step + 1
EDIT: the functionality to divide timedelta objects was only added in python3, you would need to use the .total_seconds() method to do the calculation in python 2:
step = timedelta(days=l).total_seconds()/ts #divide by inteval
interval = (new_date - date_1).total_seconds()//step
You could calculate this using the seconds total of the intervals.
import math
from datetime import datetime, timedelta
l = 365.0
factor = 4
date_1 = datetime.strptime('2016-01-01', "%Y-%m-%d")
lookup_dt = datetime.strptime('2016-12-01', "%Y-%m-%d")
def get_interval_num(factor, start_dt, td, lookup_dt):
final = start_dt + td
interval = (final - start_dt).total_seconds()
subinterval = interval / factor
interval_2 = (lookup_dt - start_dt).total_seconds()
return int(math.ceil(interval_2 / subinterval))
num = get_interval_num(
factor=factor,
start_dt=date_1,
td= timedelta(days=l),
lookup_dt=lookup_dt
)
print("The interval number is: %s" % num)
Output would be:
The interval number is: 4
EDIT: clearified variable naming, extended code snippet

How to calculate the time interval between two time strings

I have two times, a start and a stop time, in the format of 10:33:26 (HH:MM:SS). I need the difference between the two times. I've been looking through documentation for Python and searching online and I would imagine it would have something to do with the datetime and/or time modules. I can't get it to work properly and keep finding only how to do this when a date is involved.
Ultimately, I need to calculate the averages of multiple time durations. I got the time differences to work and I'm storing them in a list. I now need to calculate the average. I'm using regular expressions to parse out the original times and then doing the differences.
For the averaging, should I convert to seconds and then average?
Yes, definitely datetime is what you need here. Specifically, the datetime.strptime() method, which parses a string into a datetime object.
from datetime import datetime
s1 = '10:33:26'
s2 = '11:15:49' # for example
FMT = '%H:%M:%S'
tdelta = datetime.strptime(s2, FMT) - datetime.strptime(s1, FMT)
That gets you a timedelta object that contains the difference between the two times. You can do whatever you want with that, e.g. converting it to seconds or adding it to another datetime.
This will return a negative result if the end time is earlier than the start time, for example s1 = 12:00:00 and s2 = 05:00:00. If you want the code to assume the interval crosses midnight in this case (i.e. it should assume the end time is never earlier than the start time), you can add the following lines to the above code:
if tdelta.days < 0:
tdelta = timedelta(
days=0,
seconds=tdelta.seconds,
microseconds=tdelta.microseconds
)
(of course you need to include from datetime import timedelta somewhere). Thanks to J.F. Sebastian for pointing out this use case.
Try this -- it's efficient for timing short-term events. If something takes more than an hour, then the final display probably will want some friendly formatting.
import time
start = time.time()
time.sleep(10) # or do something more productive
done = time.time()
elapsed = done - start
print(elapsed)
The time difference is returned as the number of elapsed seconds.
Here's a solution that supports finding the difference even if the end time is less than the start time (over midnight interval) such as 23:55:00-00:25:00 (a half an hour duration):
#!/usr/bin/env python
from datetime import datetime, time as datetime_time, timedelta
def time_diff(start, end):
if isinstance(start, datetime_time): # convert to datetime
assert isinstance(end, datetime_time)
start, end = [datetime.combine(datetime.min, t) for t in [start, end]]
if start <= end: # e.g., 10:33:26-11:15:49
return end - start
else: # end < start e.g., 23:55:00-00:25:00
end += timedelta(1) # +day
assert end > start
return end - start
for time_range in ['10:33:26-11:15:49', '23:55:00-00:25:00']:
s, e = [datetime.strptime(t, '%H:%M:%S') for t in time_range.split('-')]
print(time_diff(s, e))
assert time_diff(s, e) == time_diff(s.time(), e.time())
Output
0:42:23
0:30:00
time_diff() returns a timedelta object that you can pass (as a part of the sequence) to a mean() function directly e.g.:
#!/usr/bin/env python
from datetime import timedelta
def mean(data, start=timedelta(0)):
"""Find arithmetic average."""
return sum(data, start) / len(data)
data = [timedelta(minutes=42, seconds=23), # 0:42:23
timedelta(minutes=30)] # 0:30:00
print(repr(mean(data)))
# -> datetime.timedelta(0, 2171, 500000) # days, seconds, microseconds
The mean() result is also timedelta() object that you can convert to seconds (td.total_seconds() method (since Python 2.7)), hours (td / timedelta(hours=1) (Python 3)), etc.
This site says to try:
import datetime as dt
start="09:35:23"
end="10:23:00"
start_dt = dt.datetime.strptime(start, '%H:%M:%S')
end_dt = dt.datetime.strptime(end, '%H:%M:%S')
diff = (end_dt - start_dt)
diff.seconds/60
This forum uses time.mktime()
Structure that represent time difference in Python is called timedelta. If you have start_time and end_time as datetime types you can calculate the difference using - operator like:
diff = end_time - start_time
you should do this before converting to particualr string format (eg. before start_time.strftime(...)). In case you have already string representation you need to convert it back to time/datetime by using strptime method.
I like how this guy does it — https://amalgjose.com/2015/02/19/python-code-for-calculating-the-difference-between-two-time-stamps.
Not sure if it has some cons.
But looks neat for me :)
from datetime import datetime
from dateutil.relativedelta import relativedelta
t_a = datetime.now()
t_b = datetime.now()
def diff(t_a, t_b):
t_diff = relativedelta(t_b, t_a) # later/end time comes first!
return '{h}h {m}m {s}s'.format(h=t_diff.hours, m=t_diff.minutes, s=t_diff.seconds)
Regarding to the question you still need to use datetime.strptime() as others said earlier.
Try this
import datetime
import time
start_time = datetime.datetime.now().time().strftime('%H:%M:%S')
time.sleep(5)
end_time = datetime.datetime.now().time().strftime('%H:%M:%S')
total_time=(datetime.datetime.strptime(end_time,'%H:%M:%S') - datetime.datetime.strptime(start_time,'%H:%M:%S'))
print total_time
OUTPUT :
0:00:05
import datetime as dt
from dateutil.relativedelta import relativedelta
start = "09:35:23"
end = "10:23:00"
start_dt = dt.datetime.strptime(start, "%H:%M:%S")
end_dt = dt.datetime.strptime(end, "%H:%M:%S")
timedelta_obj = relativedelta(start_dt, end_dt)
print(
timedelta_obj.years,
timedelta_obj.months,
timedelta_obj.days,
timedelta_obj.hours,
timedelta_obj.minutes,
timedelta_obj.seconds,
)
result:
0 0 0 0 -47 -37
Both time and datetime have a date component.
Normally if you are just dealing with the time part you'd supply a default date. If you are just interested in the difference and know that both times are on the same day then construct a datetime for each with the day set to today and subtract the start from the stop time to get the interval (timedelta).
Take a look at the datetime module and the timedelta objects. You should end up constructing a datetime object for the start and stop times, and when you subtract them, you get a timedelta.
you can use pendulum:
import pendulum
t1 = pendulum.parse("10:33:26")
t2 = pendulum.parse("10:43:36")
period = t2 - t1
print(period.seconds)
would output:
610
import datetime
day = int(input("day[1,2,3,..31]: "))
month = int(input("Month[1,2,3,...12]: "))
year = int(input("year[0~2020]: "))
start_date = datetime.date(year, month, day)
day = int(input("day[1,2,3,..31]: "))
month = int(input("Month[1,2,3,...12]: "))
year = int(input("year[0~2020]: "))
end_date = datetime.date(year, month, day)
time_difference = end_date - start_date
age = time_difference.days
print("Total days: " + str(age))
Concise if you are just interested in the time elapsed that is under 24 hours. You can format the output as needed in the return statement :
import datetime
def elapsed_interval(start,end):
elapsed = end - start
min,secs=divmod(elapsed.days * 86400 + elapsed.seconds, 60)
hour, minutes = divmod(min, 60)
return '%.2d:%.2d:%.2d' % (hour,minutes,secs)
if __name__ == '__main__':
time_start=datetime.datetime.now()
""" do your process """
time_end=datetime.datetime.now()
total_time=elapsed_interval(time_start,time_end)
Usually, you have more than one case to deal with and perhaps have it in a pd.DataFrame(data) format. Then:
import pandas as pd
df['duration'] = pd.to_datetime(df['stop time']) - pd.to_datetime(df['start time'])
gives you the time difference without any manual conversion.
Taken from Convert DataFrame column type from string to datetime.
If you are lazy and do not mind the overhead of pandas, then you could do this even for just one entry.
Here is the code if the string contains days also [-1 day 32:43:02]:
print(
(int(time.replace('-', '').split(' ')[0]) * 24) * 60
+ (int(time.split(' ')[-1].split(':')[0]) * 60)
+ int(time.split(' ')[-1].split(':')[1])
)

Categories