I have a circulation pump that I check wither it's on or off on and this is not by any fixed interval what so ever. For a single day that could give me a dataset looking like this where 'value' represents the pump being on or off.
data=(
{'value': 0, 'time': datetime.datetime(2011, 1, 18, 7, 58, 25)},
{'value': 1, 'time': datetime.datetime(2011, 1, 18, 8, 0, 3)},
{'value': 0, 'time': datetime.datetime(2011, 1, 18, 8, 32, 10)},
{'value': 0, 'time': datetime.datetime(2011, 1, 18, 9, 22, 7)},
{'value': 1, 'time': datetime.datetime(2011, 1, 18, 9, 30, 58)},
{'value': 1, 'time': datetime.datetime(2011, 1, 18, 12, 2, 23)},
{'value': 0, 'time': datetime.datetime(2011, 1, 18, 15, 43, 11)},
{'value': 1, 'time': datetime.datetime(2011, 1, 18, 20, 14, 55)})
The format is not that important and can be changed.
What I do want to know is how to calculate how many minutes ( or timespan or whatever) the 'value' has been 0 or 1 (or ON or OFF)?
This is just a small sample of the data, it stretches over several years so there could be a lot.
I have been using numpy/mathplotlib for plotting some graphs and there might be something in numpy to do this but I'm not good enough at it.
Edit
What I would like to see as an output to this would be a sum of the time in the different states. Something like...
0 04:42:13
1 07:34:17
It really depends on how you're going to treat this data points, are they representative of what? Generally, to know when switch occur you could use itertools.groupby like this:
>>> from itertools import groupby
>>> for i, grp in groupby(data, key=lambda x: x['value']):
lst = [x['time'] for x in grp]
print(i, max(lst) - min(lst))
0 0:00:00
1 0:00:00
0 0:49:57
1 2:31:25
0 0:00:00
1 0:00:00
This is the example of minimal time you can be sure your system was up or down (assuming no interruptions between measurement).
Once you decide how to treat your points, modification to this algorithm would be trivial.
EDIT: since you only need sums of up/down-time, here is the simpler version:
>>> sums = {0:datetime.timedelta(0), 1:datetime.timedelta(0)}
>>> for cur, nex in zip(data, data[1:]):
sums[cur['value']] += nex['time'] - cur['time']
>>> for i, j in sums.items():
print(i, j)
0 5:32:10
1 6:44:20
If you expect long-periods of continuous up/down-time, you might still benefit of the itertools.groupby. This is py3k version, so it won't be particularly efficient in py2k.
Related
I'm currently trying to annotate and count some dates, based on the number of times they appear.
visits = Subs.objects.filter(camp=campdata, timestamp__lte=datetime.datetime.today(), timestamp__gt=datetime.datetime.today()-datetime.timedelta(days=30)).\
values('timestamp').annotate(count=Count('timestamp'))
If I print this in a for loop like,
for a in visits:
print(a)
I would get back the following in Json.
{'timestamp': datetime.datetime(2018, 10, 5, 15, 16, 25, 130966, tzinfo=<UTC>), 'count': 1}
{'timestamp': datetime.datetime(2018, 10, 5, 15, 16, 45, 639464, tzinfo=<UTC>), 'count': 1}
{'timestamp': datetime.datetime(2018, 10, 6, 8, 43, 24, 721050, tzinfo=<UTC>), 'count': 1}
{'timestamp': datetime.datetime(2018, 10, 7, 4, 54, 59, tzinfo=<UTC>), 'count': 1}
This is kinda the right direction, however, it's counting to the second.. I just need to days, so that the event that happened on 2018, 10, 5 would be count: 2 for example.
Can anyone lead me into the right direction?
Additionally, whats the most "django" way of converting the dates into something more json / api friendly?
My ideal json return would be something like
{'timestamp': 2018-10-5, 'count': 2}
Thanks!
You can use the TruncDate annotation to achieve this:
visits = Subs.objects.annotate(date=TruncDate('timestamp')).filter(
camp=campdata,
date__lte=datetime.datetime.today(),
date__gt=datetime.datetime.today() - datetime.timedelta(days=30)
).values('date').annotate(count=Count('date'))
As for your question about serializing dates for JSON, Django provides the DjangoJSONEncoder to help with just that:
import json
from django.core.serializers.json import DjangoJSONEncoder
json.dumps(list(visits), cls=DjangoJSONEncoder)
I am working on an NLP (Natural Language Processing) project where I used the Python Counter() function from collections library. I am getting the results in the following form:
OUTPUT:
Counter({'due': 23, 'support': 20, 'ATM': 16, 'come': 12, 'case': 11, 'Sallu': 10, 'tough,': 9, 'team': 8, 'evident': , 'likely': 6, 'rupee': 4, 'depreciated': 2, 'senior': 1, 'neutral': 1, 'told': 1, 'tour\n\nRussia’s': 1, 'Vladimir': 1, 'indeed,': 1, 'welcome,”': 1, 'player': 1, 'added': 1, 'Games,': 1, 'Russia': 1, 'arrest': 1, 'system.\nBut': 1, 'rate': 1, 'Tuesday': 1, 'February,': 1, 'idea': 1, 'ban': 1, 'data': 1, 'consecutive': 1, 'interbank': 1, 'man,': 1, 'involved': 1, 'aggressive': 1, 'took': 1, 'sure': 1, 'market': 1, 'custody': 1, 'gang.\nWithholding': 1, 'cricketer': 1})
The problem is, I want to extract the words having count more than 1. In other words, I am trying to get only those words whose count is greater than 1 or 2.
I want to use the output to make a vocabulary list after reducing the words with low frequency.
PS: I have more than 100 documents to test my data with almost 2000 distinct words.
PPS: I have tried everything to get the results but unable to do so. I only need a logic and will be able to implement.
You can iterate over the key, value pairs in the dict and add them to a separate list. This is just that you wanted to produce a list in the end, otherwise #jpp has the better solution.
from collections import Counter
myStr = "This this this is really really good."
myDict = Counter(myStr.split())
myList = [k for k, v in myDict.items() if v > 1]
# ['this', 'really']
You can use a dictionary comprehension to limit your Counter items to words with more than 1 count:
from collections import Counter
c = Counter({'due': 23, 'support': 20, 'ATM': 16, 'come': 12, 'Russia': 1, 'arrest': 1})
res = Counter({k: v for k, v in c.items() if v > 1})
# Counter({'ATM': 16, 'come': 12, 'due': 23, 'support': 20})
I have a situation where I need to get the third latest date, i.e
INPUT :
['14-04-2001', '29-12-2061', '21-10-2019',
'07-01-1973', '19-07-2014','11-03-1992','21-10-2019']
Also , INPUT
6
14-04-2001
29-12-2061
21-10-2019
07-01-1973
19-07-2014
11-03-1992
OUTPUT : 19-07-2014
import datetime
datelist = ['14-04-2001', '29-12-2061', '21-10-2019', '07-01-1973', '19-07-2014','11-03-1992','21-10-2019' ]
for d in datelist:
x = datetime.datetime.strptime(d,'%d-%m-%Y')
print x
How can i achieve this?
You can sort the list and take the 3rd element from it.
my_list = [datetime.datetime.strptime(d,'%d-%m-%Y') for d in list]
# [datetime.datetime(2001, 4, 14, 0, 0), datetime.datetime(2061, 12, 29, 0, 0), datetime.datetime(2019, 10, 21, 0, 0), datetime.datetime(1973, 1, 7, 0, 0), datetime.datetime(2014, 7, 19, 0, 0), datetime.datetime(1992, 3, 11, 0, 0), datetime.datetime(2019, 10, 21, 0, 0)]
my_list.sort(reverse=True)
my_list[2]
# datetime.datetime(2019, 10, 21, 0, 0)
Also, as per Kerorin's suggestion, if you don't need to sort in-place and just need the 3rd element always, you can simply do
sorted(my_list, reverse=True)[2]
Update
To remove the duplicates, taking inspiration from this answer, you can do the following -
import datetime
datelist = ['14-04-2001', '29-12-2061', '21-10-2019', '07-01-1973', '19-07-2014', '11-03-1992', '21-10-2019']
seen = set()
my_list = [datetime.datetime.strptime(d,'%d-%m-%Y')
for d in datelist
if d not in seen and not seen.add(d)]
my_list.sort(reverse=True)
You can use heapq.nlargest to do this.
import heapq
from datetime import datetime
datelist = [
'14-04-2001',
'29-12-2061',
'21-10-2019',
'07-01-1973',
'19-07-2014',
'11-03-1992',
'21-10-2019'
]
heapq.nlargest(3, {datetime.strptime(d, "%d-%m-%Y") for d in datelist})[-1]
This return datetime.datetime(2014, 7, 19, 0, 0)
I have a two Django Querysets that I want to merge based on its date atrribute. Well it is not really Django question, but I try to explain as clearly as I can.
I need to group entries based on two data attributes. Lets say I have a model:
class User(models.Model):
start_date = models.DateField(blank=True, null=True)
end_date = models.DateField(blank=True, null=True)
...
Now I need to group these entries by month (how many users started on May 2010 etc):
truncate_start_date = connection.ops.date_trunc_sql('month', 'start_date')
report_start = User.objects.exclude(start_date__isnull=True)\
.extra({'month': truncate_start_date}).values('month')\
.annotate(start_count=Count('pk')).order_by('-month')
and I have same query for end_date:
truncate_end_date = connection.ops.date_trunc_sql('month', 'end_date')
report_end = Employee.objects.exclude(end_date__isnull=True)\
.extra({'month': truncate_end_date}).values('month')\
.annotate(end_count=Count('pk')).order_by('-month')
Now this is what report_start looks like:
[{'start_count': 33, 'month': datetime.datetime(2016, 5, 1, 0, 0, tzinfo=<UTC>)},
{'start_count': 79, 'month': datetime.datetime(2016, 4, 1, 0, 0, tzinfo=<UTC>)},
{'start_count': 72, 'month': datetime.datetime(2016, 3, 1, 0, 0, tzinfo=<UTC>)},
... ]
Now, how do I merge these two lists of dicts to one based on month? I tried chain, but there were duplicate month records.
I want to get:
[{'start_count': 33, 'end_count': None, 'month': datetime.datetime(2016, 5, 1, 0, 0, tzinfo=<UTC>)},
{'start_count': 79, 'end_count': 2, 'month': datetime.datetime(2016, 4, 1, 0, 0, tzinfo=<UTC>)},
{'start_count': 72, 'end_count': 8, 'month': datetime.datetime(2016, 3, 1, 0, 0, tzinfo=<UTC>)},
... ]
What I was able to come up with was to change it to dict and then back to list of dicts. But I believe this is not very elegant solution and there must be a better way to write this pythonic way.
Any ideas? Here is my ugly code:
d = dict()
for end in report_end:
d[end['month']] = {"end_count": end['end_count']}
for start in report_start:
if start['month'] in d.keys():
d[start['month']]["start_count"] = start['start_count']
else:
d[start['month']] = {"start_count": start['start_count']}
result = []
for key, i in d.items():
result.append({'month': key,
'start_count': i['start_count'] if 'start_count' in i.keys() else None,
'end_count': i['end_count'] if 'end_count' in i.keys() else None})
datetime is hashable, so you can store it as a key to a dict and merge easily. Here is a bit terser solution using itemgetter. This assumes that your timestamps are unique within each list of dicts.
from operator import itemgetter
import datetime
starts = [
{'start_count': 33, 'month': datetime.datetime(2016, 5, 1, 0, 0)},
{'start_count': 79, 'month': datetime.datetime(2016, 4, 1, 0, 0)},
{'start_count': 72, 'month': datetime.datetime(2016, 3, 1, 0, 0)}
]
# dummy data
ends = [
{'end_count': 122, 'month': datetime.datetime(2016, 5, 1, 0, 0)},
{'end_count': 213, 'month': datetime.datetime(2016, 4, 1, 0, 0)},
{'end_count': 121, 'month': datetime.datetime(2016, 3, 1, 0, 0)}
]
starts = dict(map(itemgetter('month', 'start_count'), starts))
ends = dict(map(itemgetter('month', 'end_count'), ends))
joined = [{'month': m, 'start_count': s, 'end_count': ends.get(m, None)}
for m, s in starts.items()]
I've tried make 'day of week aggregation', I have this code:
IN:
MyMode.objects.values('day').annotate(Sum('visits'))
OUT:
[{'visits__sum': 44, 'day': datetime.datetime(2015, 4, 5, 0, 0)},
{'visits__sum': 699, 'day': datetime.datetime(2015, 9, 6, 0, 0)},
{'visits__sum': 3, 'day': datetime.datetime(2015, 9, 3, 0, 0)},
{'visits__sum': 12, 'day': datetime.datetime(2011, 4, 5, 0, 0)}]
But I want to aggregate by the name of day, not the number.
I need totally visits in Monday, Tuesday, etc. Monday from 2015.08 should be in the same 'bag' where Monday from 2015.06 or 2012.02.
For Django 1.8+:
from django.db.models import Func
class DayOfWeek(Func):
""" Sunday == 0, ..., Saturday == 6 """
template = 'extract(dow from %(expressions)s)'
>>> (
MyMode.objects.annotate(dow=DayOfWeek(F('day')))
.values('dow')
.annotate(c=Count('dow'))
)
[{'c': 185, 'dow': 0.0}, {'c': 178, 'dow': 5.0}]
For Django 1.7-, I believe you need to do a raw query.
I don't think this is the solution you are expecting for, but maybe can help you:
for Sundays:
MyMode.objects.values('day').annotate(Sum('visits')).filter(day__week_day=1)
Change the filter value for the other weekend days. Days starts in Sunday=1, Monday=2, etc.