I am using Django in combination with a TimescaleDB to process energy data from pv installations. Now I would like to calculate, how much energy multiple plants have generated within a given timebucket (usually per day). The plants writing the new values in a table which looks like that:
customer
datasource
value
timestamp
1
1
1
01.05.22 10:00
1
1
5
01.05.22 18:00
1
2
3
01.05.22 09:00
1
2
9
01.05.22 17:00
1
1
5
02.05.22 10:00
1
1
12
02.05.22 18:00
1
2
9
02.05.22 09:00
1
2
16
02.05.22 17:00
Now what I would like to have is, get the overal daily gain of values (so, for each day: Last Entry Value - (minus) First Entry Value) for each customer (which means, sum the daily generated energy values from all datasource which belong to the customer).
In the above example that would be for customer 1:
Day 01.05.22:
Daily Gain of Datasource 1: 5 - 1 = 4
Daily Gain of Datasource 2: 9 - 3 = 6
Overall: 4 + 6 = 10
Day 02.05.22:
Daily Gain of Datasource 1: 12 - 5 = 7
Daily Gain of Datasource 2: 16 - 9 = 7
Overall: 7 + 7 = 14
The result should look like that:
customer
timestamp
value
1
01.05.22 00:00
10
1
02.05.22 00:00
14
What I have now in Code is this:
dpes = (DataPointEnergy.timescale
.filter(source__in=datasources, time__range=(_from, _to))
.values('source', interval_end=timebucket)
.order_by('interval_end')
.annotate(value=Last('value', 'time') - First('value', 'time'))
.values('interval_end', 'value', 'source')
)
Which gives me the following result:
{'source': 16, 'timestamp': datetime.datetime(2022, 1, 9, 0, 0, tzinfo=<UTC>), 'value': 2.0}
{'source': 17, 'timestamp': datetime.datetime(2022, 1, 9, 0, 0, tzinfo=<UTC>), 'value': 2.0}
{'source': 16, 'timestamp': datetime.datetime(2022, 1, 10, 0, 0, tzinfo=<UTC>), 'value': 2.0}
{'source': 17, 'timestamp': datetime.datetime(2022, 1, 10, 0, 0, tzinfo=<UTC>), 'value': 2.0}
{'source': 16, 'timestamp': datetime.datetime(2022, 1, 11, 0, 0, tzinfo=<UTC>), 'value': 2.0}
{'source': 17, 'timestamp': datetime.datetime(2022, 1, 11, 0, 0, tzinfo=<UTC>), 'value': 2.0}
However I would still need to group the results by timestamp and sum the value column (which I am doing now using Pandas). Is there a possibility to let the database do the work?
I tried to use another .annotate() to sum up the values but this results in the error:
django.core.exceptions.FieldError: Cannot compute Sum('value'): 'value' is an aggregate
Related
I've a list of days list = [1,5,16,29]
Considering current month september and year 2021
I've a user wise day df
user_id day month year
1 1 9 2021
1 2 9 2021
1 6 9 2021
1 14 9 2021
1 22 9 2021
1 18 9 2021
2 2 9 2021
2 17 9 2021
2 3 9 2021
2 30 9 2021
2 29 9 2021
2 28 9 2021
How can I get the user wise days of given month and year that are not present in respective users df['day'] and in the list?
Expected result
user_id remaining_days_of_month
1 3,4,7,8,9,10,11,12,13,15,17,19,20,21,23,24,25,26,27,28,30
2 4,6,7,8,9,10,11,12,13,14,15,18,19,20,21,22,23,24,25,26,27
You can use calendar.monthrange to get a range of the number of days in a year-month
df
import calendar
def get_remaining_days(group, lst):
month = group.month.unique()[0]
days_to_remove = np.unique(np.concatenate((group.day, lst)))
lst_of_days = list(range(*calendar.monthrange(2021, month)))
remaining_days = [i for i in lst_of_days if i not in days_to_remove]
return remaining_days
lst = [1,5,16,29]
result = df.groupby(by=["user_id", "month"]).apply(lambda x: get_remaining_days(x, lst))
result.name = "remaining_days_of_month"
result = result.to_frame()
result
I made it work for different months and same user. In case you happen to different year too, it won't require much change
Use calendar.monthrange to get the size of a month, then make a set difference
import pandas as pd
import calendar
df = pd.DataFrame({'user_id': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2],
'day': [1, 2, 6, 14, 22, 18, 2, 17, 3, 30, 29, 28],
'month': [9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9]})
month = df['month'].iloc[0]
values = [1, 5, 16, 29]
days_of_month = set(range(1, 1 + calendar.monthrange(2021, month)[1])).difference(values)
df: pd.DataFrame = df.groupby('user_id')['day'].apply(list).reset_index()
df['day'] = df['day'].apply(lambda cell: set(days_of_month).difference(cell))
user_id
day
0
1
{3, 4, 7, 8, 9, 10, 11, 12, 13, 15, 17, 19, 20, 21, 23, 24, 25, 26, 27, 28}
1
2
{4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27}
I am trying to create this staffing grit to make my admin work easier at work. 'days' contains a week.
days = [M, T, W, Th, F]
days = [0, 1, 1, 1, 1] means s/he works everyday except for Mondays.
If the value is 2, that means they work a special shift.
S/he works from start_time to end_time - e.g. wakana works 0600-1400 everyday.
S/he works from special_start to special_end on days the value is 2, e.g. eleonor works 0700-1900 Monday and Friday, and 0700-1500 on Wednesday.
I got Monday down, but I know there is a better way, perhaps using function, to print all days. I have been playing around forever with it now, but I cannot figure it out. Thank you in advance! I have so much respect for all of you experts!
staffing_data = [
{'name': 'wakana',
'start_time': 6,
'end_time': 14,
'days': [1, 1, 1, 1, 1],
'special_start': None,
'special_end': None},
{'name': 'kate',
'start_time': 11,
'end_time': 21,
'days': [0, 1, 1, 1, 1],
'special_start': None,
'special_end': None},
{'name': 'eleonor',
'start_time': 7,
'end_time': 19,
'days': [1, 0, 2, 0, 1],
'special_start': 7,
'special_end': 15}]
at_7 = 0
at_11 = 0
at_15 = 0
at_19 = 0
for person in staffing_data:
if person['start_time'] <= 7 and person['end_time'] > 7 and person['days'][0] == 1:
at_7 += 1
if person['start_time'] <= 11 and person['end_time'] > 11 and person['days'][0] == 1:
at_11 += 1
if person['start_time'] <= 15 and person['end_time'] > 15 and person['days'][0] == 1:
at_15 += 1
if person['start_time'] <= 19 and person['end_time'] > 19 and person['days'][0] == 1:
at_19 += 1
print(f"{at_7} at 7")
print(f"{at_11} at 11")
print(f"{at_15} at 15")
print(f"{at_19} at 19")
#Monday Staffing
#2 at 7
#3 at 11
#1 at 15
#0 at 19
You just need another loop for looping the days, and store the data.
staffing_data = [
{'name': 'wakana',
'start_time': 6,
'end_time': 14,
'days': [1, 1, 1, 1, 1],
'special_start': None,
'special_end': None},
{'name': 'kate',
'start_time': 11,
'end_time': 21,
'days': [0, 1, 1, 1, 1],
'special_start': None,
'special_end': None},
{'name': 'eleonor',
'start_time': 7,
'end_time': 19,
'days': [1, 0, 2, 0, 1],
'special_start': 7,
'special_end': 15}]
days = ['M', 'T', 'W', 'Th', 'F']
#result = [{"at_7":0,"at_11":0,"at_15":0,"at_19":0} for _ in range(len(days))]
result = []
for _ in range(len(days)):
result.append({"at_7":0,"at_11":0,"at_15":0,"at_19":0})
for person in staffing_data:
for day in range(len(days)):
start = 'start_time'
end = 'end_time'
if person['days'][day] == 0:
continue
elif person['days'][day] == 2:
start = 'special_start'
end = 'special_end'
if person[start] <= 7 and person[end] > 7:
result[day]["at_7"] += 1
if person[start] <= 11 and person[end] > 11:
result[day]["at_11"] += 1
if person[start] <= 15 and person[end] > 15:
result[day]["at_15"] += 1
if person[start] <= 19 and person[end] > 19:
result[day]["at_19"] += 1
for i in range(len(days)):
print(days[i])
print(f"{result[i]['at_7']} at 7")
print(f"{result[i]['at_11']} at 11")
print(f"{result[i]['at_15']} at 15")
print(f"{result[i]['at_19']} at 19")
print()
This question already has an answer here:
How to calculate time difference by group using pandas?
(1 answer)
Closed 4 years ago.
For a given data frame df
timestamps = [
datetime.datetime(2018, 1, 1, 10, 0, 0, 0), # person 1
datetime.datetime(2018, 1, 1, 10, 0, 0, 0), # person 2
datetime.datetime(2018, 1, 1, 11, 0, 0, 0), # person 2
datetime.datetime(2018, 1, 2, 11, 0, 0, 0), # person 2
datetime.datetime(2018, 1, 1, 10, 0, 0, 0), # person 3
datetime.datetime(2018, 1, 2, 11, 0, 0, 0), # person 3
datetime.datetime(2018, 1, 4, 10, 0, 0, 0), # person 3
datetime.datetime(2018, 1, 5, 12, 0, 0, 0) # person 3
]
df = pd.DataFrame({'person': [1, 2, 2, 2, 3, 3, 3, 3], 'timestamp': timestamps })
I want to calculate for each person (df.groupby('person')) the time differences between all timestamps of that person, which I would to with diff().
df.groupby('person').timestamp.diff()
is just half the way, because the mapping back to the person is lost.
How could a solution look like?
i think you should use
df.groupby('person').timestamp.transform(pd.Series.diff)
There is problem diff no aggregate values, so possible solution is transform:
df['new'] = df.groupby('person').timestamp.transform(pd.Series.diff)
print (df)
person timestamp new
0 1 2018-01-01 10:00:00 NaT
1 2 2018-01-01 10:00:00 NaT
2 2 2018-01-01 11:00:00 0 days 01:00:00
3 2 2018-01-02 11:00:00 1 days 00:00:00
4 3 2018-01-01 10:00:00 NaT
5 3 2018-01-02 11:00:00 1 days 01:00:00
6 3 2018-01-04 10:00:00 1 days 23:00:00
7 3 2018-01-05 12:00:00 1 days 02:00:00
I have the following DataFrame:
from datetime import datetime
from pandas import DataFrame
df = DataFrame({
'Buyer': ['Carl', 'Carl', 'Carl', 'Carl', 'Joe', 'Carl'],
'Quantity': [18, 3, 5, 1, 9, 3],
'Date': [
datetime(2013, 9, 1, 13, 0),
datetime(2013, 9, 1, 13, 5),
datetime(2013, 10, 1, 20, 0),
datetime(2013, 10, 3, 10, 0),
datetime(2013, 12, 2, 12, 0),
datetime(2013, 9, 2, 14, 0),
]
})
First: I am looking to add another column to this DataFrame which sums up the purchases of the last 5 days for each buyer. In particular the result should look like this:
Quantity
Buyer Date
Carl 2013-09-01 21
2013-09-02 24
2013-10-01 5
2013-10-03 6
Joe 2013-12-02 9
To do so I started with the following:
df1 = (df.set_index(['Date', 'Buyer'])
.unstack(level=[1])
.resample('D', how='sum')
.fillna(0))
However, I do not know how to add another column to this DataFrame which can add up for each row the previous 5 row entries.
Second:
Add another column to this DataFrame which does not only sum up the purchases of the last 5 days like in (1) but also weights these purchases based on their dates. For example: those purchases from 5 days ago should be counted 20%, those from 4 days ago 40%, those from 3 days ago 60%, those from 2 days ago 80% and those from one day ago and from today 100%
I have a circulation pump that I check wither it's on or off on and this is not by any fixed interval what so ever. For a single day that could give me a dataset looking like this where 'value' represents the pump being on or off.
data=(
{'value': 0, 'time': datetime.datetime(2011, 1, 18, 7, 58, 25)},
{'value': 1, 'time': datetime.datetime(2011, 1, 18, 8, 0, 3)},
{'value': 0, 'time': datetime.datetime(2011, 1, 18, 8, 32, 10)},
{'value': 0, 'time': datetime.datetime(2011, 1, 18, 9, 22, 7)},
{'value': 1, 'time': datetime.datetime(2011, 1, 18, 9, 30, 58)},
{'value': 1, 'time': datetime.datetime(2011, 1, 18, 12, 2, 23)},
{'value': 0, 'time': datetime.datetime(2011, 1, 18, 15, 43, 11)},
{'value': 1, 'time': datetime.datetime(2011, 1, 18, 20, 14, 55)})
The format is not that important and can be changed.
What I do want to know is how to calculate how many minutes ( or timespan or whatever) the 'value' has been 0 or 1 (or ON or OFF)?
This is just a small sample of the data, it stretches over several years so there could be a lot.
I have been using numpy/mathplotlib for plotting some graphs and there might be something in numpy to do this but I'm not good enough at it.
Edit
What I would like to see as an output to this would be a sum of the time in the different states. Something like...
0 04:42:13
1 07:34:17
It really depends on how you're going to treat this data points, are they representative of what? Generally, to know when switch occur you could use itertools.groupby like this:
>>> from itertools import groupby
>>> for i, grp in groupby(data, key=lambda x: x['value']):
lst = [x['time'] for x in grp]
print(i, max(lst) - min(lst))
0 0:00:00
1 0:00:00
0 0:49:57
1 2:31:25
0 0:00:00
1 0:00:00
This is the example of minimal time you can be sure your system was up or down (assuming no interruptions between measurement).
Once you decide how to treat your points, modification to this algorithm would be trivial.
EDIT: since you only need sums of up/down-time, here is the simpler version:
>>> sums = {0:datetime.timedelta(0), 1:datetime.timedelta(0)}
>>> for cur, nex in zip(data, data[1:]):
sums[cur['value']] += nex['time'] - cur['time']
>>> for i, j in sums.items():
print(i, j)
0 5:32:10
1 6:44:20
If you expect long-periods of continuous up/down-time, you might still benefit of the itertools.groupby. This is py3k version, so it won't be particularly efficient in py2k.