I'm trying to use pandas / python to load a dataframe and count outage minutes that occur between 0900-2100. I've been trying to get this per site but have only been able to get a sum value. Example dataframe is below. I'm trying to produce the data in the third column:
import pandas as pd
from pandas import Timestamp
import pytz
from pytz import all_timezones
import datetime
from datetime import time
from threading import Timer
import time as t
import xlrd
import xlwt
import numpy as np
import xlsxwriter
data = pd.read_excel('lab.xlsx')
data['outage'] = data['Down'] - data['Down']
data['outage'] = data['Down']/np.timedelta64(1,'m')
s = data.apply(lambda row: pd.date_range(row['Down'], row['Up'], freq='T'), axis=1).explode()
#returns total amount of downtime between 9-21 but not by site
total = s.dt.time.between(time(9), time(21)).sum()
#range of index[0] for s
slist = range(0, 20)
#due to thy this loop itterates, it returns the number of minutes between down and up
for num in slist:
Duration = s[num].count()
print(Duration)
#percentage of minutes during business hours
percentage = (total / sum(data['duration'])) * 100
print('The percentage of outage minutes during business hours is:', percentage)
#secondary function to test
def by_month():
s = data.apply(lambda row: pd.date_range(row['Adjusted_Down'], row['Adjusted_Up'], freq='T'), axis=1).explode()
downtime = pd.DataFrame({
'Month': s.astype('datetime64[M]'),
'IsDayTime': s.dt.time.between(time(9), time(21))
})
downtime.groupby('Month')['IsDayTime'].sum()
#data.to_excel('delete.xls', 'a+')
You can use pandas' DatetimeIndex function to convert the difference between your down time and up time into hours, minutes, and seconds. Then you can multiply the hours by 60 and add minutes to get your total down time in minutes. See example below:
import pandas as pd
date_format = "%m-%d-%Y %H:%M:%S"
# Example up and down times to insert into dataframe
down1 = dt.datetime.strptime('8-01-2019 00:00:00', date_format)
up1 = dt.datetime.strptime('8-01-2019 00:20:00', date_format)
down2 = dt.datetime.strptime('8-01-2019 02:26:45', date_format)
up2 = dt.datetime.strptime('8-01-2019 03:45:04', date_format)
down3 = dt.datetime.strptime('8-01-2019 06:04:00', date_format)
up3 = dt.datetime.strptime('8-01-2019 06:06:34', date_format)
time_df = pd.DataFrame([{'down':down1,'up':up1},{'down':down2,'up':up2},{'down':down3,'up':up3},])
# Subtract your up column from your down column and convert the result to a datetime index
down_time = pd.DatetimeIndex(time_df['up'] - time_df['down'])
# Access your new index, converting the hours to minutes and adding minutes to get down time in minutes
down_time_min = time.hour * 60 + time.minute
# Apply above array to new dataframe column
time_df['down_time'] = down_time_min
time_df
This is the result for this example:
Related
I have a pandas dataframe df in which I have a column named time_column which consists of timestamp objects. I want to calculate the number of seconds elapsed from the start of the day i.e from 00:00:00 Hrs for each timestamp. How can that be done?
You can use pandas.Series.dt.total_seconds
df['time_column'] = pd.to_datetime(df['time_column'])
df['second'] = pd.to_timedelta(df['time_column'].dt.time.astype(str)).dt.total_seconds()
Do df['time_column]. That will give you the time column. Than just do something like:
import datetime as date
current_date = date.datetime.now()
time_elapsed = []
for x in range(0, current_date.minute*60 + current_date.hour*60*60):
time_elapsed.append((df['time_column'][x].minute*60 + df['time_column][x].hour*60*60)- (current_date.minute*60 + current_date.hour*60*60))
Could you please help me with the following tackle?
I need to remove the weekend days from the dataframe (attached link: dataframe_running_example. I can get a list of all the weekend days between mix and max date pulled out from the event however I cannot filter out the df based on "list_excluded" list.
from datetime import timedelta, date
import pandas as pd
#Data Loading
df= pd.read_csv("running-example.csv", delimiter=";")
df["timestamp"] = pd.to_datetime(df["timestamp"])
df["timestamp_date"] = df["timestamp"].dt.date
def daterange(date1, date2):
for n in range(int ((date2 - date1).days)+1):
yield date1 + timedelta(n)
#start_dt & end_dt
start_dt = df["timestamp"].min()
end_dt = df["timestamp"].max()
print("Start_dt: {} & end_dt: {}".format(start_dt, end_dt))
weekdays = [6,7]
#List comprehension
list_excluded = [dt for dt in daterange(start_dt, end_dt) if dt.isoweekday() in weekdays]
df.info()
df_excluded = pd.DataFrame(list_excluded).rename({0: 'timestamp_excluded'}, axis='columns')
df_excluded["ts_excluded"] = df_excluded["timestamp_excluded"].dt.date
df[~df["timestamp_date"].isin(df_excluded["ts_excluded"])]
ooh an issue has been resolved. I used pd.bdate_range() function.
from datetime import timedelta, date
import pandas as pd
import numpy as np
#Wczytanie danych
df= pd.read_csv("running-example.csv", delimiter=";")
df["timestamp"] = pd.to_datetime(df["timestamp"])
df["timestamp_date"] = df["timestamp"].dt.date
#Zakres timestamp: start_dt & end_dt
start_dt = df["timestamp"].min()
end_dt = df["timestamp"].max()
print("Start_dt: {} & end_dt: {}".format(start_dt, end_dt))
bus_days = pd.bdate_range(start_dt, end_dt)
df["timestamp_date"] = pd.to_datetime(df["timestamp_date"])
df['Is_Business_Day'] = df['timestamp_date'].isin(bus_days)
df[df["Is_Business_Day"]!=False]
I want to convert a given time, let's say 09:25:59 (hh:mm:ss) into an integer value in pandas. My requirement is that I should know the number of minutes elapsed from midnight of that day. For example, 09:25:59 corresponds to 565 minutes. How can I do that in Python Pandas?
I think it will be useful with the use of a dataframe sample:
import pandas as pd
import datetime
data = {"time":["09:25:59", "09:35:59"],
"minutes_from_midnight": ""}
df = pd.DataFrame(data) # create df
# you don't seem to care about the date, so keep only the time from the datetime dtype
# create a datetime.datetime object to use for timedeltas
df["time"] = pd.to_datetime(df["time"], format="%H:%M:%S")
df['time'] = [datetime.datetime.time(d) for d in df['time']] # keep only the time
df
You can then do:
# your comparison value
midnight= "00:00:00"
# https://stackoverflow.com/a/48967889/13834173
midnight = datetime.datetime.strptime(midnight, "%H:%M:%S").time()
# fill the empty column with the timedeltas
from datetime import datetime, date
df["minutes_from_midnight"] = df["time"].apply(lambda
x: int((datetime.combine(date.today(), x) - datetime.combine(date.today(), midnight))
.total_seconds()//60))
df
# example with the 1st row time value
target_time = df["time"][0]
# using datetime.combine
delta= datetime.combine(date.today(), target_time) - datetime.combine(date.today(), midnight)
seconds = delta.total_seconds()
minutes = seconds//60
print(int(minutes)) # 565
from datetime import timedelta
hh, mm, ss = 9, 25, 59
delta = timedelta(hours=hh, minutes=mm, seconds=ss)
total_seconds = delta.total_seconds()
minutes = int(total_seconds // 60)
minutes has the the minutes elapsed.
I have a DataFrame with dates in the index. I make a Subset of the DataFrame for every Day. Is there any way to write a function or a loop to generate these steps automatically?
import json
import requests
import pandas as pd
from pandas.io.json import json_normalize
import datetime as dt
#Get the channel feeds from Thinkspeak
response = requests.get("https://api.thingspeak.com/channels/518038/feeds.json?api_key=XXXXXX&results=500")
#Convert Json object to Python object
response_data = response.json()
channel_head = response_data["channel"]
channel_bottom = response_data["feeds"]
#Create DataFrame with Pandas
df = pd.DataFrame(channel_bottom)
#rename Parameters
df = df.rename(columns={"field1":"PM 2.5","field2":"PM 10"})
#Drop all entrys with at least on nan
df = df.dropna(how="any")
#Convert time to datetime object
df["created_at"] = df["created_at"].apply(lambda x:dt.datetime.strptime(x,"%Y-%m-%dT%H:%M:%SZ"))
#Set dates as Index
df = df.set_index(keys="created_at")
#Make a DataFrame for every day
df_2018_12_07 = df.loc['2018-12-07']
df_2018_12_06 = df.loc['2018-12-06']
df_2018_12_05 = df.loc['2018-12-05']
df_2018_12_04 = df.loc['2018-12-04']
df_2018_12_03 = df.loc['2018-12-03']
df_2018_12_02 = df.loc['2018-12-02']
Supposing that you do that on the first day of next week (so, exporting monday to sunday next monday, you can do that as follows:
from datetime import date, timedelta
day = date.today() - timedelta(days=7) # so, if today is monday, we start monday before
df = df.loc[today]
while day < today:
df1 = df.loc[str(day)]
df1.to_csv('mypath'+str(day)+'.csv') #so that export files have different names
day = day+ timedelta(days=1)
you can use:
from datetime import date
today = str(date.today())
df = df.loc[today]
and schedule the script using any scheduler such as crontab.
You can create dictionary of DataFrames - then select by keys for DataFrame:
dfs = dict(tuple(df.groupby(df.index.strftime('%Y-%m-%d'))))
print (dfs['2018-12-07'])
Hey: Spent several hours trying to do a quite simple thing,but couldnt figure it out.
I have a dataframe with a column, df['Time'] which contains time, starting from 0, up to 20 minutes,like this:
1:10,10
1:16,32
3:03,04
First being minutes, second is seconds, third is miliseconds (only two digits).
Is there a way to automatically transform that column into seconds with Pandas, and without making that column the time index of the series?
I already tried the following but it wont work:
pd.to_datetime(df['Time']).convert('s') # AttributeError: 'Series' object has no attribute 'convert'
If the only way is to parse the time just point that out and I will prepare a proper / detailed answer to this question, dont waste your time =)
Thank you!
Code:
import pandas as pd
import numpy as np
import datetime
df = pd.DataFrame({'Time':['1:10,10', '1:16,32', '3:03,04']})
df['time'] = df.Time.apply(lambda x: datetime.datetime.strptime(x,'%M:%S,%f'))
df['timedelta'] = df.time - datetime.datetime.strptime('00:00,0','%M:%S,%f')
df['secs'] = df['timedelta'].apply(lambda x: x / np.timedelta64(1, 's'))
print df
Output:
Time time timedelta secs
0 1:10,10 1900-01-01 00:01:10.100000 00:01:10.100000 70.10
1 1:16,32 1900-01-01 00:01:16.320000 00:01:16.320000 76.32
2 3:03,04 1900-01-01 00:03:03.040000 00:03:03.040000 183.04
If you have also negative time deltas:
import pandas as pd
import numpy as np
import datetime
import re
regex = re.compile(r"(?P<minus>-)?((?P<minutes>\d+):)?(?P<seconds>\d+)(,(?P<centiseconds>\d{2}))?")
def parse_time(time_str):
parts = regex.match(time_str)
if not parts:
return
parts = parts.groupdict()
time_params = {}
for (name, param) in parts.iteritems():
if param and (name != 'minus'):
time_params[name] = int(param)
time_params['milliseconds'] = time_params['centiseconds']*10
del time_params['centiseconds']
return (-1 if parts['minus'] else 1) * datetime.timedelta(**time_params)
df = pd.DataFrame({'Time':['-1:10,10', '1:16,32', '3:03,04']})
df['timedelta'] = df.Time.apply(lambda x: parse_time(x))
df['secs'] = df['timedelta'].apply(lambda x: x / np.timedelta64(1, 's'))
print df
Output:
Time timedelta secs
0 -1:10,10 -00:01:10.100000 -70.10
1 1:16,32 00:01:16.320000 76.32
2 3:03,04 00:03:03.040000 183.04