Efficiently sum items by type - python

I have a list of items with properties "Type" and "Time" that I want to quickly sum the time for each "Type" and append to another list. The list looks like this:
Items = [{'Name': A, 'Type': 'Run', 'Time': 5},
{'Name': B, 'Type': 'Walk', 'Time': 15},
{'Name': C, 'Type': 'Drive', 'Time': 2},
{'Name': D, 'Type': 'Walk', 'Time': 17},
{'Name': E, 'Type': 'Run', 'Time': 5}]
I want to do something that works like this:
Travel_Times=[("Time_Running","Time_Walking","Time_Driving")]
Run=0
Walk=0
Drive=0
for I in Items:
if I['Type'] == 'Run':
Run=Run+I['Time']
elif I['Type'] == 'Walk':
Walk=Walk+I['Time']
elif I['Type'] == 'Drive':
Drive=Drive+I['Time']
Travel_Times.append((Run,Walk,Drive))
With Travel_Times finally looking like this:
print(Travel_Times)
[("Time_Running","Time_Walking","Time_Driving")
(10,32,2)]
This seems like something that should be easy to do efficiently with either a list comprehension or something similar to collections.Counter, but I can't figure it out. The best way I have figured is to use a separate list comprehension for each "Type" but that requires iterating through the list repeatedly. I would appreciate any ideas on how to speed it up.
Thanks

Note that case is very important in Python :
For isn't a valid statement
Travel_times isn't the same as Travel_Times
there's no : after elif
Travel_Times.append(... has a leading space, which confuses Python
items has one [ too many
A isn't defined
Having said that, a Counter works just fine for your example :
from collections import Counter
time_counter = Counter()
items = [{'Name': 'A', 'Type': 'Run', 'Time': 5},
{'Name': 'B', 'Type': 'Walk', 'Time': 15},
{'Name': 'C', 'Type': 'Drive', 'Time': 2},
{'Name': 'D', 'Type': 'Walk', 'Time': 17},
{'Name': 'E', 'Type': 'Run', 'Time': 5}]
for item in items:
time_counter[item['Type']] += item['Time']
print(time_counter)
# Counter({'Walk': 32, 'Run': 10, 'Drive': 2})
To get a list of tuples :
[tuple(time_counter.keys()), tuple(time_counter.values())]
# [('Run', 'Drive', 'Walk'), (10, 2, 32)]

You can use a dict to keep track of the total times. Using the .get() method, you can tally up the total times. If the key for the activity doesn't already exist, set its tally to zero and count up from there.
items = [{'Name': 'A', 'Type': 'Run', 'Time': 5},
{'Name': 'B', 'Type': 'Walk', 'Time': 15},
{'Name': 'C', 'Type': 'Drive', 'Time': 2},
{'Name': 'D', 'Type': 'Walk', 'Time': 17},
{'Name': 'E', 'Type': 'Run', 'Time': 5}]
totals = {}
for item in items:
totals[item['Type']] = totals.get(item['Type'], 0) + item['Time']
for k, v in totals.items():
print("Time {}ing:\t {} mins".format(k, v))

You could use Counter from collections along with chain and repeat from itertools:
from itertools import chain, repeat
from collections import Counter
from_it = chain.from_iterable
res = Counter(from_it(repeat(d['Type'], d['Time']) for d in Items))
This small snippet results in a Counter instance containing the sums:
print(res)
Counter({'Drive': 2, 'Run': 10, 'Walk': 32})
It uses repeat to, obviously, repeat the d['Type'] for d['Time'] times and then feeds all these to Counter for the summation using chain.from_iterable.
If your Items list has many entries, you can again use chain.from_iterable to chain these all together:
res = Counter(from_it(repeat(d['Type'], d['Time']) for d in from_it(Items)))
This will get you a sum of all types in all the nested lists.

You can use reduce with collections.Counter:
# from functools import reduce # Python 3
d = reduce(lambda x, y: x + Counter({y['Type']: y['Time']}), Items, Counter())
print(d)
# Counter({'Walk': 32, 'Run': 10, 'Drive': 2})
It simply builds up the Counter updating each Type using the corresponding Time value.

Here is a brief way of expressing what you'd like in one line. By the way, your list Items doesn't need to be double bracketed:
>>> Items = [{'Type': 'Run', 'Name': 'A', 'Time': 5},
{'Type': 'Walk', 'Name': 'B', 'Time': 15},
{'Type': 'Drive', 'Name': 'C', 'Time': 2},
{'Type': 'Walk', 'Name': 'D', 'Time': 17},
{'Type': 'Run', 'Name': 'E', 'Time': 5}]
>>> zip(("Time_Running","Time_Walking","Time_Driving"), (sum(d['Time'] for d in Items if d['Type'] == atype) for atype in 'Run Walk Drive'.split()))
[('Time_Running', 10), ('Time_Walking', 32), ('Time_Driving', 2)]
Here I zipped your output labels to a generator that calculates the sum for each of the three transportation types you have listed. For your exact output you could just use:
>>> [("Time_Running","Time_Walking","Time_Driving"), tuple(sum(d['Time'] for d in Items if d['Type'] == atype) for atype in 'Run Walk Drive'.split())]
[('Time_Running', 'Time_Walking', 'Time_Driving'), (10, 32, 2)]

If you're willing to abuse generators for their side effects:
from collections import Counter
count = Counter()
# throw away the resulting elements, as .update does the work for us
[_ for _ in (count.update({item['Type']:item['Time']}) for item in items) if _]
>>> count
Counter({'Walk': 32, 'Run': 10, 'Drive': 2})
This works because Counter.update() returns None. if None will always evaluate False and throw out that element. So this generates a side effect empty list [] as the only memory overhead. if False would work equally well.

Just use a dictionary! Note that in python it is idomatic to use snake_case for variables and keys.
travel_times = {'run': 0, 'walk': 0, 'drive': 0}
for item in items:
action, time = item['type'], item['time']
travel_times[action] += time

Related

How do I check for duplicate dictionaries by a key and remove the dictionaries by a different key?

How do I check for duplicate dictionaries by the key 'initials' and remove the duplicate dictionaries with the lowest 'score' key?
Code:
scores = [{'initials': 'AS', 'score': 87},
{'initials': 'AS', 'score': 23},
{'initials': 'WI', 'score': 43},
{'initials': 'WI', 'score': 98}]
(code goes here)
print(scores)
Intended output:
[{'initials': 'AS', 'score': 87},
{'initials': 'WI', 'score': 98}],
Edit: I realize now that I was supposed to show my current attempt at the problem, but the problem got solved. For my next question I will show my attempt. Thank you for answering!
This is yet again an example of where you want to group by a key then aggregate the other keys, in this case, you aggregation is to take max of the other keys. So use the dictionary grouping idiom:
>>> grouper = {}
>>> for d in scores:
... key = d['initials']
... if key in grouper:
... grouper[key] = max(grouper[key], d['score'])
... else:
... grouper[key] = d['score']
...
>>> grouper
{'AS': 87, 'WI': 98}
At this point, this dictionary is likely a more appropriate data structure for what you want. But if you really must have a list of dicts, you can just transform the above:
>>> [dict(initials=k, score=v) for k,v in grouper.items()]
[{'initials': 'AS', 'score': 87}, {'initials': 'WI', 'score': 98}]
scores = [{'initials': 'AS', 'score': 87}, {'initials': 'AS', 'score': 23},
{'initials': 'WI', 'score': 43}, {'initials': 'WI', 'score': 98}]
dict_of_dict = dict()
for dic in scores:
if dic['initials'] not in dict_of_dict:
dict_of_dict[dic['initials']] = dic
else:
if dic['score'] > dict_of_dict[dic['initials']]['score']:
dict_of_dict[dic['initials']] = dic
scores = list(dict_of_dict.values())
This is more like keeping the one with max score. I guess that is exactly what you meant.

Find a value in a list of dicts [duplicate]

This question already has answers here:
Get the first item from an iterable that matches a condition
(18 answers)
Closed 4 years ago.
Let:
M = [{'name': 'john', 'result': 12},
{'name': 'sara', 'result': 20},
{'name': 'karl', 'result': 11}]
If I want to find Sara's result, I thought about:
M[[m['name'] for m in M].index('sara')]['result'] # hard to read, but works
and
[m['result'] for m in M if m['name'] == 'sara'][0] # better
Is there an even more natural way to do this in Python?
Use a generator with next().
next(m['result'] for m in L if m['name'] == 'sara')
If you have several lookups to perform, linear search isn't the best option.
Rebuild a dictionary once with the proper key (name):
M = [{'name': 'john', 'result': 12},
{'name': 'sara', 'result': 20},
{'name': 'karl', 'result': 11}]
newdict = {d["name"] : d["result"] for d in M}
It creates:
{'john': 12, 'karl': 11, 'sara': 20}
now when you need the result from a name, just do:
print(newdict.get('sara'))

Python: Fetch item in list where dict key is some value using lambda

Is it possible to fetch using lambda? I know that we can do sorted function with lambda and its VERY useful.
Is there a short form way of fetching an object in a list in which the object at key 'id' is equal to lets say 20?
We can of course use loop and loop over the entire thing.
x = [
{'Car': 'Honda', 'id': 12},
{'Car': 'Mazda', 'id': 45},
{'Car': 'Toyota', 'id': 20}
]
desired_val = None
for item in list:
if item['id'] == 20:
desired_val = item
break
Is it possible to achieve the same functionality using lambda? I am not very knowledgeable with lambda.
Using lambda here isn't necessary. Lambda isn't something magical, it's just a shorthand for writing a simple function. It's less powerful than an ordinary way of writing a function, not more. (That's not to say sometimes it isn't very handy, just that it doesn't have superpowers.)
Anyway, you can use a generator expression with the default argument. Note that here I'm returning the object itself, not 20, because that makes more sense to me.
>>> somelist = [{"id": 10, "x": 1}, {"id": 20, "y": 2}, {"id": 30, "z": 3}]
>>> desired_val = next((item for item in somelist if item['id'] == 20), None)
>>> print(desired_val)
{'y': 2, 'id': 20}
>>> desired_val = next((item for item in somelist if item['id'] == 21), None)
>>> print(desired_val)
None
Using a lambda as you asked, with a generator expression, which is generally considered more readable than filter, and note this works equally well in Python 2 or 3.
lambda x: next(i for i in x if i['id'] == 20)
Usage:
>>> foo = lambda x: next(i for i in x if i['id'] == 20)
>>> foo(x)
{'Car': 'Toyota', 'id': 20}
And this usage of lambda is probably not very useful. We can define a function just as easily:
def foo(x):
return next(i for i in x if i['id'] == 20)
But we can give it docstrings, and it knows its own name and has other interesting attributes that anonymous functions (that we then name) don't have.
Additionally, I really think what you're getting at is the filter part of the expression.
In
filter(lambda x: x[id]==20, x)
we have replaced that functionality with the conditional part of the generator expression. The functional part of generator expressions (list comprehensions when in square brackets) are similarly replacing map.
I would propose to you that your own method is the best way to find the first item in a list matching a criteria.
It is straightforward and will break out of the loop once the desired target is found.
It is also the fastest. Here compared to numerous way to return the FIRST dict in the list with 'id'==20:
from __future__ import print_function
def f1(LoD, idd=20):
# loop until first one is found then break and return the dict found
desired_dict = None
for di in LoD:
if di['id'] == idd:
desired_dict = di
break
return desired_dict
def f2(LoD, idd=20):
# The genexp goes through the entire list, then next() returns either the first or None
return next((di for di in LoD if di['id'] == idd), None)
def f3(LoD, idd=20):
# NOTE: the 'filter' here is ifilter if Python2
return next(filter(lambda di: di['id']==idd, LoD), None)
def f4(LoD, idd=20):
desired_dict=None
i=0
while True:
try:
if LoD[i]['id']==idd:
desired_dict=LoD[i]
break
else:
i+=1
except IndexError:
break
return desired_dict
def f5(LoD, idd=20):
try:
return [d for d in LoD if d['id']==idd][0]
except IndexError:
return None
if __name__ =='__main__':
import timeit
import sys
if sys.version_info.major==2:
from itertools import ifilter as filter
x = [
{'Car': 'Honda', 'id': 12},
{'Car': 'Mazda', 'id': 45},
{'Car': 'Toyota', 'id': 20}
] * 10 # the '* 10' makes a list of 30 dics...
result=[]
for f in (f1, f2, f3, f4, f5):
fn=f.__name__
fs="f(x, idd=20)"
ft=timeit.timeit(fs, setup="from __main__ import x, f", number=1000000)
r=eval(fs)
result.append((ft, fn, r, ))
result.sort(key=lambda t: t[0])
for i, t in enumerate(result):
ft, fn, r = t
if i==0:
fr='{}: {:.4f} secs is fastest\n\tf(x)={}\n========'.format(fn, ft, r)
else:
t1=result[0][0]
dp=(ft-t1)/t1
fr='{}: {:.4f} secs - {} is {:.2%} faster\n\tf(x)={}'.format(fn, ft, result[0][1], dp, r)
print(fr)
If the value 'id'==20 is found, prints:
f1: 0.4324 secs is fastest
f(x)={'Car': 'Toyota', 'id': 20}
========
f4: 0.6963 secs - f1 is 61.03% faster
f(x)={'Car': 'Toyota', 'id': 20}
f3: 0.9077 secs - f1 is 109.92% faster
f(x)={'Car': 'Toyota', 'id': 20}
f2: 0.9840 secs - f1 is 127.56% faster
f(x)={'Car': 'Toyota', 'id': 20}
f5: 2.6065 secs - f1 is 502.77% faster
f(x)={'Car': 'Toyota', 'id': 20}
And, if not found, prints:
f1: 1.6084 secs is fastest
f(x)=None
========
f2: 2.0128 secs - f1 is 25.14% faster
f(x)=None
f5: 2.5494 secs - f1 is 58.50% faster
f(x)=None
f3: 4.4643 secs - f1 is 177.56% faster
f(x)=None
f4: 5.7889 secs - f1 is 259.91% faster
f(x)=None
Of course, as written, these functions only return the first dict in this list with 'id'==20. If you want ALL of them, you might use a list comprehension or filter with a lambda.
You can see that as you wrote the function originally, modified to return a list instead, it is still competitive:
def f1(LoD, idd):
desired_lst = []
for item in LoD:
if item['id'] == idd:
desired_lst.append(item)
return desired_lst
def f2(LoD, idd):
return [d for d in LoD if d['id']==idd]
def f3(LoD, idd):
return list(filter(lambda x: x['id']==idd, LoD) )
Using the same code to time it, these functions print:
f2: 2.3849 secs is fastest
f(x)=[{'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}]
========
f1: 3.0051 secs - f2 is 26.00% faster
f(x)=[{'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}]
f3: 5.2386 secs - f2 is 119.66% faster
f(x)=[{'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}, {'Car': 'Toyota', 'id': 20}]
In this case, the list comprehension is better.
In Py3k filter returns an iterator, so you can use next to get its first value:
val = next(filter(lambda x: x['id'] == 20, list))
For Python 2 use itertools.ifilter, because the built-in filter constructs the list with results:
from itertools import ifilter
val = next(ifilter(lambda x: x['id'] == 20, list))
Consider passing the default value to next that will be returned in case of empty iterator:
In [3]: next(filter(bool, [False]), 'default value here')
Out[3]: 'default value here'

How to categorize list of dictionaries based on the value of a key in python efficiently?

I have a list of dictionaries in python which I want to categorized them based on the value of a key which exists in all dictionaries and process each category separately. I don't know what are the values, I just know that there exists a special key. Here's the list:
dictList = [
{'name': 'name1', 'type': 'type1', 'id': '14464'},
{'name': 'name2', 'type': 'type1', 'id': '26464'},
{'name': 'name3', 'type': 'type3', 'id': '36464'},
{'name': 'name4', 'type': 'type5', 'id': '43464'},
{'name': 'name5', 'type': 'type2', 'id': '68885'}
]
This is the code I currently use:
while len(dictList):
category = [l for l in dictList if l['type'] == dictList[0]['type']]
processingMethod(category)
for item in category:
dictList.remove(item)
This iteration on the above list will give me following result:
Iteration 1:
category = [
{'name': 'name1', 'type': 'type1', 'id': '14464'},
{'name': 'name2', 'type': 'type1', 'id': '26464'},
]
Iteration 2:
category = [
{'name': 'name3', 'type': 'type3', 'id': '36464'}
]
Iteration 3:
category = [
{'name': 'name4', 'type': 'type5', 'id': '43464'}
]
Iteration 4:
category = [
{'name': 'name5', 'type': 'type2', 'id': '68885'}
]
Each time, I get a category, process it and finally remove processed items to iterate over remaining items, until there is no remaining item. Any idea to make it better?
Your code can be rewritten using itertools.groupby
for _, category in itertools.groupby(dictList, key=lambda item:item['type']):
processingMethod(list(category))
Or if processingMethod can process iterable,
for _, category in itertools.groupby(dictList, key=lambda item:item['type']):
processingMethod(category)
If l['type'] is hashable for each l in dictList, here's a possible, somewhat-elegant solution:
bins = {}
for l in dictList:
if l['type'] in bins:
bins[l['type']].append(l)
else:
bins[l['type']] = [l]
for category in bins.itervalues():
processingMethod(category)
The idea is that first, we'll sort all the ls into bins, using l['type'] as the key; second, we'll process each bin.
If l['type'] isn't guaranteed to be hashable for each l in dictList, the approach is essentially the same, but we'll have to use a list of tuples instead of the dict, which means this is a bit less efficient:
bins = []
for l in dictList:
for bin in bins:
if bin[0] == l['type']:
bin[1].append(l)
break
else:
bins.append((l['type'], [l]))
for _, category in bins:
processingMethod(category)

How to uniqufy the tuple element?

i have a result tuple of dictionaries.
result = ({'name': 'xxx', 'score': 120L }, {'name': 'xxx', 'score': 100L}, {'name': 'yyy', 'score': 10L})
I want to uniqify it. After uniqify operation result = ({'name': 'xxx', 'score': 120L }, {'name': 'yyy', 'score': 10L})
The result contain only one dictionary of each name and the dict should have maximum score. The final result should be in the same format ie tuple of dictionary.
from operator import itemgetter
names = set(d['name'] for d in result)
uniq = []
for name in names:
scores = [res for res in result if res['name'] == name]
uniq.append(max(scores, key=itemgetter('score')))
I'm sure there is a shorter solution, but you won't be able to avoid filtering the scores by name in some way first, then find the maximum for each name.
Storing scores in a dictionary with names as keys would definitely be preferable here.
I would create an intermediate dictionary mapping each name to the maximum score for that name, then turn it back to a tuple of dicts afterwards:
>>> result = ({'name': 'xxx', 'score': 120L }, {'name': 'xxx', 'score': 100L}, {'name': 'xxx', 'score': 10L}, {'name':'yyy', 'score':20})
>>> from collections import defaultdict
>>> max_scores = defaultdict(int)
>>> for d in result:
... max_scores[d['name']] = max(d['score'], max_scores[d['name']])
...
>>> max_scores
defaultdict(<type 'int'>, {'xxx': 120L, 'yyy': 20})
>>> tuple({name: score} for (name, score) in max_scores.iteritems())
({'xxx': 120L}, {'yyy': 20})
Notes:
1) I have added {'name': 'yyy', 'score': 20} to your example data to show it working with a tuple with more than one name.
2)I use a defaultdict that assumes the minimum value for score is zero. If the score can be negative you will need to change the int parameter of defaultdict(int) to a function that returns a number smaller than the minimum possible score.
Incidentally I suspect that having a tuple of dictionaries is not the best data structure for what you want to do. Have you considered alternatives, such as having a single dict, perhaps with a list of scores for each name?
I would reconsider the data structure to fit your needs better (for example dict hashed with name with list of scores as value), but I would do like this:
import operator as op
import itertools as it
result = ({'name': 'xxx', 'score': 120L },
{'name': 'xxx', 'score': 100L},
{'name': 'xxx', 'score': 10L},
{'name':'yyy', 'score':20})
# groupby
highscores = tuple(max(namegroup, key=op.itemgetter('score'))
for name,namegroup in it.groupby(result,
key=op.itemgetter('name'))
)
print highscores
How about...
inp = ({'name': 'xxx', 'score': 120L }, {'name': 'xxx', 'score': 100L}, {'name': 'yyy', 'score': 10L})
temp = {}
for dct in inp:
if dct['score'] > temp.get(dct['name']): temp[dct['name']] = dct['score']
result = tuple({'name': name, 'score': score} for name, score in temp.iteritems())

Categories