How can I get the list of intervals (as 2-tuples) of x minutes between to datetimes? It should be easy but even pandas.date_range() returns exacts intervals, dropping end time. For example:
In:
from = '2023-01-05 16:01:58'
to = '2023-01-05 17:30:15'
x = 30
Expected:
[
('2023-01-05 16:01:58', '2023-01-05 16:30:00'),
('2023-01-05 16:30:00', '2023-01-05 17:00:00'),
('2023-01-05 17:00:00', '2023-01-05 17:30:00'),
('2023-01-05 17:30:00', '2023-01-05 17:30:15')
]
Here is one way to do it with Pandas to_datetime, DateOffset and dt.ceil:
import pandas as pd
from pandas.tseries.offsets import DateOffset
from_date = pd.to_datetime("2023-01-05 16:01:58")
to_date = pd.to_datetime("2023-01-05 17:30:15")
freq = "30min"
intervals = []
start = pd.Series(from_date)
while True:
end = (start + DateOffset(minutes=1)).dt.ceil(freq)
if end[0] >= to_date:
intervals.append((str(start[0]), str(pd.to_datetime(to_date))))
break
intervals.append((str(start[0]), str(end[0])))
start = end
Then:
print(intervals)
# Output
[
("2023-01-05 16:01:58", "2023-01-05 16:30:00"),
("2023-01-05 16:30:00", "2023-01-05 17:00:00"),
("2023-01-05 17:00:00", "2023-01-05 17:30:00"),
("2023-01-05 17:30:00", "2023-01-05 17:30:15"),
]
Related
I want to randomly choose a date from 2021/1/1 to 2021/12/31, the process might include as follows:
generate a date list from 2021/1/1 to 2021/12/31, totally 365 elements;
randomly choose a date from the list.
Thanks!
As you tagged the question pandas, here is a pandas way:
out = (pd.date_range('2021/1/1', '2021/12/31') # random dates
.strftime('%Y/%m/%d') # format as string
.to_series().sample(n=1) # select 1 random date
.squeeze() # compress output
)
variant, setting the start date and number of days
out = (pd.date_range('2021/1/1', periods=365) # random dates
.strftime('%Y/%m/%d') # format as string
.to_series().sample(n=1) # select 1 random date
.squeeze() # compress output
)
example output: '2021/10/09'
list of random dates
You can easily adapt to generate several dates:
out = (pd
.date_range('2021/1/1', periods=365)
.strftime('%Y/%m/%d').to_series()
.sample(n=10)
.to_list()
)
example output:
['2021/04/06', '2021/09/11', '2021/08/02', '2021/09/17', '2021/12/30',
'2021/10/27', '2021/03/09', '2021/02/27', '2021/11/28', '2021/01/18']
Here's another way, using random between to epoch dates:
import pandas as pd
import numpy as np
date1 = pd.Timestamp("2021/01/01")
date2 = pd.Timestamp("2021/12/31")
print(date1.timestamp())
print(date2.timestamp())
n = 3 # How many samples to take
out = pd.to_datetime(
np.random.randint(date1.timestamp(), date2.timestamp(), n), unit="s"
).normalize()
print(out)
Output:
1609459200.0
1640908800.0
DatetimeIndex(['2021-04-13', '2021-01-17', '2021-08-24'], dtype='datetime64[ns]', freq=None)
import datetime
from datetime import date, timedelta
from random import sample
start = date(2021, 1, 1)
end = date(2021, 12, 31)
dates = []
day = start
while day <= end:
dates.append(day)
day = day + datetime.timedelta(days=1)
sample(dates, 1)
Could you please help me with the following tackle?
I need to remove the weekend days from the dataframe (attached link: dataframe_running_example. I can get a list of all the weekend days between mix and max date pulled out from the event however I cannot filter out the df based on "list_excluded" list.
from datetime import timedelta, date
import pandas as pd
#Data Loading
df= pd.read_csv("running-example.csv", delimiter=";")
df["timestamp"] = pd.to_datetime(df["timestamp"])
df["timestamp_date"] = df["timestamp"].dt.date
def daterange(date1, date2):
for n in range(int ((date2 - date1).days)+1):
yield date1 + timedelta(n)
#start_dt & end_dt
start_dt = df["timestamp"].min()
end_dt = df["timestamp"].max()
print("Start_dt: {} & end_dt: {}".format(start_dt, end_dt))
weekdays = [6,7]
#List comprehension
list_excluded = [dt for dt in daterange(start_dt, end_dt) if dt.isoweekday() in weekdays]
df.info()
df_excluded = pd.DataFrame(list_excluded).rename({0: 'timestamp_excluded'}, axis='columns')
df_excluded["ts_excluded"] = df_excluded["timestamp_excluded"].dt.date
df[~df["timestamp_date"].isin(df_excluded["ts_excluded"])]
ooh an issue has been resolved. I used pd.bdate_range() function.
from datetime import timedelta, date
import pandas as pd
import numpy as np
#Wczytanie danych
df= pd.read_csv("running-example.csv", delimiter=";")
df["timestamp"] = pd.to_datetime(df["timestamp"])
df["timestamp_date"] = df["timestamp"].dt.date
#Zakres timestamp: start_dt & end_dt
start_dt = df["timestamp"].min()
end_dt = df["timestamp"].max()
print("Start_dt: {} & end_dt: {}".format(start_dt, end_dt))
bus_days = pd.bdate_range(start_dt, end_dt)
df["timestamp_date"] = pd.to_datetime(df["timestamp_date"])
df['Is_Business_Day'] = df['timestamp_date'].isin(bus_days)
df[df["Is_Business_Day"]!=False]
Assume you have two pandas datetimes: from_date and end_date. I need a function that splits it into folds of n months (lets say n=3). For example:
import pandas as pd
from_date = pd.to_datetime("2020-02-15")
to_date = pd.to_datetime("2020-05-20")
should be splitted into 2 folds:
{
"1": {"from_date": 2020-02-15, "to_date": 2020-05-15},
"2": {"from_date": 2020-05-16, "to_date": 2020-05-20}
}
each fold needs to satisfy the condition:
from_date + pd.DateOffset(months=2) >= end_date. So it is not about the number of days between start and end date.
what is the most pythonic way to do this? Is there something in pandas?
My solution:
import pandas as pd
def check_and_split(from_date, to_date):
from_date = pd.to_datetime(from_date)
to_date = pd.to_datetime(to_date)
done = False
fold = 0
result = {}
start = from_date
end = to_date
while not done:
if start + pd.DateOffset(months=2) > to_date:
done = True
end = to_date
else:
end = start + pd.DateOffset(months=3)
result[fold] = {"from_date": start, "to_date": end}
if not done:
start = end + pd.DateOffset(days=1)
fold += 1
return result
Isn't there a more pythonic way? Something in pandas maybe?
Replace the respective print statements according to the way you wish to use the 2 dates!
According to How do I calculate the date six months from the current date using the datetime Python module? , dateutil.relativedelta can help resolve those months with and without the 31st day!
import pandas as pd
from dateutil.relativedelta import relativedelta
from_date = pd.to_datetime("2020-02-15")
to_date = pd.to_datetime("2020-05-20")
fold = 0
result = {}
while from_date+relativedelta(months=+3)<to_date:
curfrom = from_date #retain current 'from_date'
from_date =from_date+relativedelta(months=+3)
result[fold] = {"from_date": curfrom, "to_date": from_date}
fold = fold+1
from_date = from_date + relativedelta(days=+1) #So that the next 'from_date' starts 1 day after
result[fold] = {"from_date": curfrom, "to_date": to_date}
print(result)
I'm trying to use pandas / python to load a dataframe and count outage minutes that occur between 0900-2100. I've been trying to get this per site but have only been able to get a sum value. Example dataframe is below. I'm trying to produce the data in the third column:
import pandas as pd
from pandas import Timestamp
import pytz
from pytz import all_timezones
import datetime
from datetime import time
from threading import Timer
import time as t
import xlrd
import xlwt
import numpy as np
import xlsxwriter
data = pd.read_excel('lab.xlsx')
data['outage'] = data['Down'] - data['Down']
data['outage'] = data['Down']/np.timedelta64(1,'m')
s = data.apply(lambda row: pd.date_range(row['Down'], row['Up'], freq='T'), axis=1).explode()
#returns total amount of downtime between 9-21 but not by site
total = s.dt.time.between(time(9), time(21)).sum()
#range of index[0] for s
slist = range(0, 20)
#due to thy this loop itterates, it returns the number of minutes between down and up
for num in slist:
Duration = s[num].count()
print(Duration)
#percentage of minutes during business hours
percentage = (total / sum(data['duration'])) * 100
print('The percentage of outage minutes during business hours is:', percentage)
#secondary function to test
def by_month():
s = data.apply(lambda row: pd.date_range(row['Adjusted_Down'], row['Adjusted_Up'], freq='T'), axis=1).explode()
downtime = pd.DataFrame({
'Month': s.astype('datetime64[M]'),
'IsDayTime': s.dt.time.between(time(9), time(21))
})
downtime.groupby('Month')['IsDayTime'].sum()
#data.to_excel('delete.xls', 'a+')
You can use pandas' DatetimeIndex function to convert the difference between your down time and up time into hours, minutes, and seconds. Then you can multiply the hours by 60 and add minutes to get your total down time in minutes. See example below:
import pandas as pd
date_format = "%m-%d-%Y %H:%M:%S"
# Example up and down times to insert into dataframe
down1 = dt.datetime.strptime('8-01-2019 00:00:00', date_format)
up1 = dt.datetime.strptime('8-01-2019 00:20:00', date_format)
down2 = dt.datetime.strptime('8-01-2019 02:26:45', date_format)
up2 = dt.datetime.strptime('8-01-2019 03:45:04', date_format)
down3 = dt.datetime.strptime('8-01-2019 06:04:00', date_format)
up3 = dt.datetime.strptime('8-01-2019 06:06:34', date_format)
time_df = pd.DataFrame([{'down':down1,'up':up1},{'down':down2,'up':up2},{'down':down3,'up':up3},])
# Subtract your up column from your down column and convert the result to a datetime index
down_time = pd.DatetimeIndex(time_df['up'] - time_df['down'])
# Access your new index, converting the hours to minutes and adding minutes to get down time in minutes
down_time_min = time.hour * 60 + time.minute
# Apply above array to new dataframe column
time_df['down_time'] = down_time_min
time_df
This is the result for this example:
I have two times and I want to make a list of all the hours between them using the same format in Python
from= '2016-12-02T11:00:00.000Z'
to= '2017-06-06T07:00:00.000Z'
hours=to-from
so the result will be something like this
2016-12-02T11:00:00.000Z
2016-12-02T12:00:00.000Z
2016-12-02T13:00:00.000Z
..... and so on
How can I so this and what kind of plugin should I use?
If possible I would recommend using pandas.
import pandas
time_range = pandas.date_range('2016-12-02T11:00:00.000Z', '2017-06-06T07:00:00.000Z', freq='H')
If you need strings then use the following:
timestamps = [str(x) + 'Z' for x in time_range]
# Output
# ['2016-12-02 11:00:00+00:00Z',
# '2016-12-02 12:00:00+00:00Z',
# '2016-12-02 13:00:00+00:00Z',
# '2016-12-02 14:00:00+00:00Z',
# '2016-12-02 15:00:00+00:00Z',
# '2016-12-02 16:00:00+00:00Z',
# ...]
simpler solution using standard library's datetime package:
from datetime import datetime, timedelta
DATE_TIME_STRING_FORMAT = '%Y-%m-%dT%H:%M:%S.%fZ'
from_date_time = datetime.strptime('2016-12-02T11:00:00.000Z',
DATE_TIME_STRING_FORMAT)
to_date_time = datetime.strptime('2017-06-06T07:00:00.000Z',
DATE_TIME_STRING_FORMAT)
date_times = [from_date_time.strftime(DATE_TIME_STRING_FORMAT)]
date_time = from_date_time
while date_time < to_date_time:
date_time += timedelta(hours=1)
date_times.append(date_time.strftime(DATE_TIME_STRING_FORMAT))
will give us
>>>date_times
['2016-12-02T11:00:00.000000Z',
'2016-12-02T12:00:00.000000Z',
'2016-12-02T13:00:00.000000Z',
'2016-12-02T14:00:00.000000Z',
'2016-12-02T15:00:00.000000Z',
'2016-12-02T16:00:00.000000Z',
'2016-12-02T17:00:00.000000Z',
'2016-12-02T18:00:00.000000Z',
'2016-12-02T19:00:00.000000Z',
'2016-12-02T20:00:00.000000Z',
...]