Python Most Efficient Way to Search a List - python

Bear with me as I am very new to Python. Basically I am looking for the most efficient way to search through a multi-dimensional list. So say I have the following list:
fruit = [
[banana, 6],
[apple, 5],
[banana, 9],
[apple, 10],
[pear, 2],
]
And I wanted the outcome of my function to produce: Apple: 15, Banana: 15, Pear 2. What would be the most efficient way to do this?

That is not in any way a search...
What you want is
import collections
def count(items):
data = collections.defaultdict(int)
for kind, count in items:
data[kind] += count
return data

fruit = [['banana', 6], ['apple',5], ['banana',9],['apple',10],['pear',2]]
f = {}
def fruit_count():
for x in fruit:
if x[0] not in f.keys():
f.update({x[0]:x[1]})
else:
t = f.get(x[0])
t = t + x[1]
f.update({x[0]:t})
return f
f = {'apple': 15, 'banana': 15, 'pear': 2}

Use a collections.defaultdict to accumulate, and iterate through the list.
accum = collections.defaultdict(int)
for e in fruit:
accum[e[0]] += e[1]

myHash = {}
fruit = [
[banana, 6],
[apple, 5],
[banana, 9],
[apple, 10],
[pear, 2],
]
for i in fruit:
if not i[0] in myHash.keys():
myHash[i[0]] = 0
myHash[i[0]] += i[1]
for i in myHash:
print i, myHash[i]
would return
apple 15
banana 15
pear 2
Edit
I didn't know about defaultdict in python. That is a much better way.

I'm unsure what type apple and banana are, so I made just them empty classes and used their class names for identification. One approach to this problem is to use the dictionary method setdefault() which first checks to see if a given key is already in the dictionary and if it is simply returns it, but if it's not, will insert it it with a default value before returning that.
To make more efficient use of it for this problem by avoiding multiple dictionary key lookups, the count associated with each key needs be stored in something "mutable" or changeable since simple integers are not in Python. The trick is to store the numeric count in a one-element list which can be changed. The first function in code below shows how this can be done.
Note that the Python collections module in the standard library has had a dictionary subclass in it called defaultdict which could have been used instead which effectively does the setdefault() operation for you whenever a non-existent key is first accessed. It also makes storing the count in a list for efficiency unnecessary and updating it a slightly simpler.
In Python 2.7 another dictionary subclass was added to the collections module called counter. Using it probably would be the best solution since it was designed for exactly this kind of application. The code below shows how to do it all three ways (and sorts the list of totals created).
class apple: pass
class banana: pass
class pear: pass
fruit = [
[banana, 6],
[apple, 5],
[banana, 9],
[apple, 10],
[pear, 2],
]
# ---- using regular dictionary
def tally(items):
totals = dict()
for kind, count in items:
totals.setdefault(kind, [0])[0] += count
return sorted([key.__name__,total[0]] for key, total in totals.iteritems())
print tally(fruit)
# [['apple', 15], ['banana', 15], ['pear', 2]]
import collections
# ---- using collections.defaultdict dict subclass
def tally(items):
totals = collections.defaultdict(int) # requires Python 2.5+
for kind, count in items:
totals[kind] += count
return sorted([key.__name__, total] for key, total in totals.iteritems())
print tally(fruit)
# [['apple', 15], ['banana', 15], ['pear', 2]]
# ---- using collections.Counter dict subclass
def tally(items):
totals = collections.Counter() # requires Python 2.7+
for kind, count in items:
totals[kind] += count
return sorted([key.__name__, total] for key, total in totals.iteritems())
print tally(fruit)
# [['apple', 15], ['banana', 15], ['pear', 2]]

Related

Looking for easiest way to perform average of common dictionaries in a list

I have a list of dictionaries like,
list1 = [{'a':[10,2],'b':[20,4]}, {'a':[60,6],'b':[40,8]}]
Trying to get the final output as
list1 = [{'a':[35,4],'b':[30,6]}]
I was trying to get the list of values for each key in each dictionary and average them based on the length of the list and put into a new dictionary.
Not sure of the best / most pythonic way of doing this.
Any help highly appreciated.
There are many different ways to do this. A simple one iterates over each key, over each inner index, and calculates the average to store to a new dictionary:
from pprint import pprint
list1 = [
{'a': [10, 2], 'b': [20, 4]},
{'a': [60, 6], 'b': [40, 8]},
]
means = {}
for key in list1[0]:
key_means = []
means[key] = key_means
for index in range(len(list1[0][key])):
key_means.append(
sum(
ab_dict[key][index]
for ab_dict in list1
) / len(list1)
)
pprint(means)
This implementation assumes that keys appearing in the first row are uniformly represented in all other rows.

Weird behavior when using bool as dict key in python [duplicate]

I need a dictionary that has two keys with the same name, but different values. One way I tried to do this is by creating a class where I would put the each key name of my dictionary, so that they would be different objects:
names = ["1", "1"]
values = [[1, 2, 3], [4, 5, 6]]
dict = {}
class Sets(object):
def __init__(self,name):
self.name = name
for i in range(len(names)):
dict[Sets(names[i])] = values[i]
print dict
The result I was expecting was:
{"1": [1, 2, 3], "1": [4, 5, 6]}
But instead it was:
{"1": [4, 5, 6]}
[EDIT]
So I discovered that keys in a dictionary are meant to be unique, having two keys with the same name is a incorrect use of dictionary. So I need to rethink my problem and use other methods avaliable in Python.
What you are trying to do is not possible with dictionaries. In fact, it is contrary to the whole idea behind dictionaries.
Also, your Sets class won't help you, as it effectively gives each name a new (sort of random) hash code, making it difficult to retrieve items from the dictionary, other than checking all the items, which defeats the purpose of the dict. You can not do dict.get(Sets(some_name)), as this will create a new Sets object, having a different hash code than the one already in the dictionary!
What you can do instead is:
Just create a list of (name, value) pairs, or
pairs = zip(names, values) # or list(zip(...)) in Python 3
create a dictionary mapping names to lists of values.
dictionary = {}
for n, v in zip(names, values):
dictionary.setdefault(n, []).append(v)
The first approach, using lists of tuples, will have linear lookup time (you basically have to check all the entries), but the second one, a dict mapping to lists, is as close as you can get to "multi-key-dicts" and should serve your purposes well. To access the values per key, do this:
for key, values in dictionary.iteritems():
for value in values:
print key, value
Instead of wanting multiple keys with the same name, could you getting away of having multiple values per each key?
names = [1]
values = [[1, 2, 3], [4, 5, 6]]
dict = {}
for i in names:
dict[i] = values
for k,v in dict.items():
for v in dict[k]:
print("key: {} :: v: {}".format(k, v))
Output:
key: 1 :: v: [1, 2, 3]
key: 1 :: v: [4, 5, 6]
Then you would access each value like this (or in a loop):
print("Key 1 value 1: {}".format(dict[1][0]))
print("Key 1 value 2: {}".format(dict[1][1]))

sum up values of dictionaries

I have a dictionary such as below:
grocery={
'James': {'Brocolli': 3, 'Carrot': 3, 'Cherry': 5},
'Jill': {'Apples': 2, 'Carrot': 4, 'Tomatoes': 8},
'Sunny': {'Apples': 5, 'Carrot': 2, 'Cherry': 2, 'Chicken': 3, 'Tomatoes': 6}
}
food={}
for a,b in grocery.items():
for i,j in b.items():
food[i]+=(b.get(i,0))
I am trying to calculate total of each food item and it is not working as expected.
For eg: I would like to count total of Carrot, total of Apples and so on.
The above code is giving me following error:
File "dictionary1.py", line 6, in <module>
food[i]+=(b.get(i,0))
KeyError: 'Cherry
How to sum up total of each item?
Simply do
from collections import defaultdict
food = defaultdict(int) <-- default value of 0 to every non existent key
..and your code should work :)
PS. You get the error because you are trying to add values to uninitialized keys... Don't assume that non existent keys start from 0...
Your food dictionary is empty and has no keys at the start; you can't just sum up a value to something that isn't there yet.
Instead of +=, get the current value or a default, using dict.get() again:
food[i] = food.get(i, 0) + b.get(i,0)
You don't really need to use b.get() here, as you already have the values of b in the variable j:
food[i] = food.get(i, 0) + j
You could also use a collections.defaultdict() object to make keys 'automatically' exist when you try to access them, with a default value:
from collections import defaultdict
food = defaultdict(int) # insert int() == 0 when a key is not there yet
and in the inner loop then use food[i] += j.
I strongly recommend you use better names for your variables. If you iterate over dict.values() rather than dict.items(), you can look at the values only when you don't need the keys (like for the outer for loop):
food = {}
for shopping in grocery.values():
for name, quantity in shopping.items():
food[name] = food.get(name, 0) + quantity
Another option is to use a dedicated counting and summing dictionary subclass, called collections.Counter(). This class directly supports summing your groceries in a single line:
from collections import Counter
food = sum(map(Counter, grocery.values()), Counter())
map(Counter, ...) creates Counter objects for each of your input dictionaries, and sum() adds up all those objects (the extra Counter() argument 'primes' the function to use an empty Counter() as a starting value rather than an integer 0).
Demo of the latter:
>>> from collections import Counter
>>> sum(map(Counter, grocery.values()), Counter())
Counter({'Tomatoes': 14, 'Carrot': 9, 'Cherry': 7, 'Apples': 7, 'Brocolli': 3, 'Chicken': 3})
A Counter is still a dictionary, just one with extra functionality. You can always go back to a dictionary by passing the Counter to dict():
>>> food = sum(map(Counter, grocery.values()), Counter())
>>> dict(food)
{'Brocolli': 3, 'Carrot': 9, 'Cherry': 7, 'Apples': 7, 'Tomatoes': 14, 'Chicken': 3}
You get the error, because in the beginning the keys, i.e. 'Apples', 'Tomatoes', ..., do not exist in food. You can correct this with a try-except block:
grocery={
"Jill":{"Apples":2, "Tomatoes":8,"Carrot":4},
"James":{"Carrot":3,"Brocolli":3,"Cherry":5},
"Sunny":{"Chicken":3,"Apples":5,"Carrot":2,"Tomatoes":6,"Cherry":2}
}
food={}
for a,b in grocery.items():
for i,j in b.items():
try:
food[i] += j
except KeyError:
food[i] = j
Also, you can get rid of the b.get(i,0) statement, because you already iterate through b and only get values (j) that actually exist in b.

python while id is the same do something

I feel like it is something basic but somehow I don't get it. I would like to loop over a list and append all the persons to the same id. Or write to the file, it doesn't matter.
[1, 'Smith']
[1, 'Black']
[1, 'Mueller']
[2, 'Green']
[2, 'Adams']
[1; 'Smith', 'Black', 'Mueller']
[2; 'Green', 'Adams']
First I have created a list of all ids and then a I had two for-loops like this:
final_doc = []
for id in all_ids:
persons = []
for line in doc:
if line[0] == id:
persons.append(line[1])
final_doc.append(id, persons)
It takes ages. I was trying to create a dictionary with ids and then combine it somehow, but the dictionary was taking the same id only once (may be I did there something not as I should have). Now I am thinking about using while-loop. While id is still the same append persons. But it is easy to understand how to do it if it has to be, for example, while id is less than 25. But in the case "while it is the same" I am not sure what to do. Any ideas are very appreciated.
You can group them together in a dictionary.
Given
lists = [[1, 'Smith'],
[1, 'Black'],
[1, 'Mueller'],
[2, 'Green'],
[2, 'Adams'] ]
do
d = {}
for person_id, name in lists:
d.setdefault(person_id, []).append(name)
d now contains
{1: ['Smith', 'Black', 'Mueller'], 2: ['Green', 'Adams']}
Note:
d.setdefault(person_id, []).append(name)
is a shortcut for
if person_id not in d:
d[person_id] = []
d[person_id].append(name)
If you prefer your answer to be a list of lists with the person_id as the first item in the list (as implied in your question), change code to
d = {}
for person_id, name in lists:
d.setdefault(person_id, [person_id]).append(name) # note [person_id] default
result = list(d.values()) # omit call to list if on Python 2.x
result contains
[[1, 'Smith', 'Black', 'Mueller'], [2, 'Green', 'Adams']]

Append returning None [duplicate]

I want to do something like this:
myList = [10, 20, 30]
yourList = myList.append(40)
Unfortunately, list append does not return the modified list.
So, how can I allow append to return the new list?
See also: Why do these list operations (methods) return None, rather than the resulting list?
Don't use append but concatenation instead:
yourList = myList + [40]
This returns a new list; myList will not be affected. If you need to have myList affected as well either use .append() anyway, then assign yourList separately from (a copy of) myList.
In python 3 you may create new list by unpacking old one and adding new element:
a = [1,2,3]
b = [*a,4] # b = [1,2,3,4]
when you do:
myList + [40]
You actually have 3 lists.
list.append is a built-in and therefore cannot be changed. But if you're willing to use something other than append, you could try +:
In [106]: myList = [10,20,30]
In [107]: yourList = myList + [40]
In [108]: print myList
[10, 20, 30]
In [109]: print yourList
[10, 20, 30, 40]
Of course, the downside to this is that a new list is created which takes a lot more time than append
Hope this helps
Try using itertools.chain(myList, [40]). That will return a generator as a sequence, rather than allocating a new list. Essentially, that returns all of the elements from the first iterable until it is exhausted, then proceeds to the next iterable, until all of the iterables are exhausted.
Unfortunately, none of the answers here solve exactly what was asked. Here is a simple approach:
lst = [1, 2, 3]
lst.append(4) or lst # the returned value here would be the OP's `yourList`
# [1, 2, 3, 4]
One may ask the real need of doing this, like when someone needs to improve RAM usage, do micro-benchmarks etc. that are, usually, useless. However, sometimes someone is really "asking what was asked" (I don't know if this is the case here) and the reality is more diverse than we can know of. So here is a (contrived because out-of-a-context) usage...
Instead of doing this:
dic = {"a": [1], "b": [2], "c": [3]}
key, val = "d", 4 # <- example
if key in dic:
dic[key].append(val)
else:
dic[key] = [val]
dic
# {'a': [1], 'b': [2], 'c': [3], 'd': [4]}
key, val = "b", 5 # <- example
if key in dic:
dic[key].append(val)
else:
dic[key] = [val]
dic
# {'a': [1], 'b': [2, 5], 'c': [3], 'd': [4]}
One can use the OR expression above in any place an expression is needed (instead of a statement):
key, val = "d", 4 # <- example
dic[key] = dic[key].append(val) or dic[key] if key in dic else [val]
# {'a': [1], 'b': [2], 'c': [3], 'd': [4]}
key, val = "b", 5 # <- example
dic[key] = dic[key].append(val) or dic[key] if key in dic else [val]
# {'a': [1], 'b': [2, 5], 'c': [3], 'd': [4]}
Or, equivalently, when there are no falsy values in the lists, one can try dic.get(key, <default value>) in some better way.
You can subclass the built-in list type and redefine the 'append' method. Or even better, create a new one which will do what you want it to do. Below is the code for a redefined 'append' method.
#!/usr/bin/env python
class MyList(list):
def append(self, element):
return MyList(self + [element])
def main():
l = MyList()
l1 = l.append(1)
l2 = l1.append(2)
l3 = l2.append(3)
print "Original list: %s, type %s" % (l, l.__class__.__name__)
print "List 1: %s, type %s" % (l1, l1.__class__.__name__)
print "List 2: %s, type %s" % (l2, l2.__class__.__name__)
print "List 3: %s, type %s" % (l3, l3.__class__.__name__)
if __name__ == '__main__':
main()
Hope that helps.
Just to expand on Storstamp's answer
You only need to do
myList.append(40)
It will append it to the original list,now you can return the variable containing the original list.
If you are working with very large lists this is the way to go.
You only need to do
myList.append(40)
It will append it to the original list, not return a new list.

Categories