How to count co-ocurrences with collections.Counter() in python? - python

I learned about the collections.Counter() class recently and, as it's a neat (and fast??) way to count stuff, I started using it.
But I detected a bug on my program recently due to the fact that when I try to update the count with a tuple, it actually treats it as a sequence and updates the count for each item in the tuple instead of counting how many times I inserted that particular tuple.
For example, if you run:
import collections
counter = collections.Counter()
counter.update(('user1', 'loggedin'))
counter.update(('user2', 'compiled'))
counter.update(('user1', 'compiled'))
print counter
You'll get:
Counter({'compiled': 2, 'user1': 2, 'loggedin': 1, 'user2': 1})
as a result. Is there a way to count tuples with the Counter()? I could concatenate the strings but this is... ugly. Could I use named tuples? Implement my own very simple dictionary counter? Don't know what's best.

Sure: you simply have to add one level of indirection, namely pass .update a container with the tuple as an element.
>>> import collections
>>> counter = collections.Counter()
>>> counter.update((('user1', 'loggedin'),))
>>> counter.update((('user2', 'compiled'),))
>>> counter.update((('user1', 'compiled'),))
>>> counter.update((('user1', 'compiled'),))
>>> counter
Counter({('user1', 'compiled'): 2, ('user1', 'loggedin'): 1, ('user2', 'compiled'): 1})

Related

Return the first item of a dictionar as a dictionary in Python?

Let's assume we have a dictionary like this:
>>> d={'a': 4, 'b': 2, 'c': 1.5}
If I want to select the first item of d, I can simply run the following:
>>> first_item = list(d.items())[0]
('a', 4)
However, I am trying to have first_item return a dict instead of a tuple i.e., {'a': 4}. Thanks for any tips.
Use itertools.islice to avoid creating the entire list, that is unnecessarily wasteful. Here's a helper function:
from itertools import islice
def pluck(mapping, pos):
return dict(islice(mapping.items(), pos, pos + 1))
Note, the above will return an empty dictionary if pos is out of bounds, but you can check that inside pluck and handle that case however you want (IMO it should probably raise an error).
>>> pluck(d, 0)
{'a': 4}
>>> pluck(d, 1)
{'b': 2}
>>> pluck(d, 2)
{'c': 1.5}
>>> pluck(d, 3)
{}
>>> pluck(d, 4)
{}
Note, accessing an element by position in a dict requires traversing the dict. If you need to do this more often, for arbitrary positions, consider using a sequence type like list which can do it in constant time. Although dict objects maintain insertion order, the API doesn't expose any way to manipulate the dict as a sequence, so you are stuck with using iteration.
Dictionary is a collections of key-value pairs. It is not really ordered collection, though since python 3.7 it keeps the order in which keys were added.
Anyway if you really want some "first" element you can get it in this manner:
some_item = next(iter(d.items()))
You should not convert it into a list because it will eat much (O(n)) memory and walk through whole dict as well.
Anyway I'd recommend not to think that dictionary has "first" element. It has keys and values. You can iterate over them in some unknown order (if you do not control how it is created)

How to make nested for loop more Pythonic

I have to create a list of blocked users per key. Each user has multiple attributes and if any of these attributes are in keys, the user is blocked.
I wrote the following nested for-loop and it works for me, but I want to write it in a more pythonic way with fewer lines and more readable fashion. How can I do that?
for key in keys:
key.blocked_users = []
for user in get_users():
for attribute in user.attributes:
for key in keys:
if attribute.name == key.name:
key.blocked_users.append(user)
In your specific case, where the inner for loops rely on the outer loop variables, I'd leave the code just as is. You don't make code more pythonic or readable by forcefully reducing the number of lines.
If those nested loops were intuitively written, they are probably easy to read.
If you have nested for loops with "independent" loop variables, you can use itertools.product however. Here's a demo:
>>> from itertools import product
>>> a = [1, 2]
>>> b = [3, 4]
>>> c = [5]
>>> for x in product(a, b, c): x
...
(1, 3, 5)
(1, 4, 5)
(2, 3, 5)
(2, 4, 5)
You could use a conditional comprehension in your first for-loop:
for key in keys:
keyname = key.name
key.blocked_users = [user for user in get_users() if any(attribute.name == keyname for attribute in user)]
Aside from making it shorter, you could try to reduce the operations to functions that are optimized in Python. It may not be shorter but it could be faster then - and what's more pythonic than speed?. :)
For example you iterate over the keys for each attribute of each user. That just sreams to be optimized "away". For example you could collect the key-names in a dictionary (for the lookup) and a set (for the intersection with attribute names) once:
for key in keys:
key.blocked_users = []
keyname_map = {key.name: key.blocked_users for key in keys} # map the key name to blocked_user list
keynames = set(keyname_map)
The set(keyname_map) is a very efficient operation so it doesn't matter much that you keep two collections around.
And then use set.intersection to get the keynames that match an attribute name:
for user in get_users():
for key in keynames.intersection({attribute.name for attribute in user.attributes}):
keyname_map[key].append(user)
set.intersection is pretty fast too.
However, this approach requires that your attribute.names and key.names are hashable.
Try using listed for loop in list comprehension, if that's considered more Pythonic, something like:
[key.blocked_users.append(user) for key in keys
for attribute in user.attributes
for user in get_users()
if attribute.name == key.name]

How to keep a unique bag of dicts?

I use a set when I need to keep a reference list of values which I want to keep unique (and later on, check if something is in that set). This does not work with dict because it is not hashable.
There are quite a few techniques to "uniquify" a list of dict but all of them assume that I have a final list, which I want to reduce to unique elements.
How to do that in a dynamic way? For a set I would just .add() and element and would know that it will be added only if it is unique. Is such a (EDIT: ideally, but not necessarily) built-in mechanism available for a bag of dict (I use the word "bag" because I do not want to limit possible answers to any data container)
You can use a frozen dict which is an immutable implementation of a regular dict.
This approach should allow you to use frozen dicts inside a set.
>>> from frozendict import frozendict
>>> x = [frozendict({'a':2, 'b':3}),frozendict({'b':3, 'a':2})]
>>> set(x)
{<frozendict {'b': 3, 'a': 2}>}
>>> frozendict({'b': 3, 'a': 2}) in set(x)
True
>>> frozendict({'b': 4, 'a': 2}) in set(x)
False
>>> frozendict({'a': 2, 'b': 3}) in set(x)
True
The set classes are implemented using dictionaries. Accordingly, the
requirements for set elements are the same as those for dictionary
keys; namely, that the element defines both eq() and hash().
As a result, sets cannot contain mutable elements such as lists or
dictionaries. However, they can contain immutable collections such as
tuples or instances of ImmutableSet.
So if you only want to use built-ins, you could convert your dictionaries to a tuple of tuples upon entering them to a set, and converting them back to dictionaries when you want to use them.
dict_set = set()
dict_set.add(tuple(a_dict.items()))
For a set I would just .add() and element and would know that it will be added only if it is unique.
Instead of add(), use update() or |= with the dictionary's items. That will meet your goal of adding dynamically and incrementally while "knowing that it will be added only if it is unique.":
>>> d = dict(raymond='red')
>>> e = dict(raymond='blue')
>>> f = dict(raymond='red')
>>> s = set()
>>> s |= d.items()
>>> s |= e.items()
>>> s |= f.items()
>>> s

Counting element in python Deque

I've attempted to count 1 values within my deque (i.e. deque.count(1)) but get the following error:
'deque' object has no attribute 'count'
I assume that I am working with a Python version that's before 2.7 when the deque.count() function was first introduced.
Besides using a for loop, what would be the most efficient/fastest way of counting how many 1's there are in my deque?
"without loops" requirement is strange, but if you're curious...
len(filter(lambda x: x == 1, d))
I know you asked for no for loops, but I don't think there's any other way:
def count(dq, item):
return sum(elem == item for elem in dq)
For example:
>>> from collections import deque
>>> d = deque([1, 2, 3, 1])
>>> count(d, 1)
2

Python map() dictionary values

I'm trying to use map() on the dict_values object returned by the values() function on a dictionary. However, I can't seem to be able to map() over a dict_values:
map(print, h.values())
Out[31]: <builtins.map at 0x1ce1290>
I'm sure there's an easy way to do this. What I'm actually trying to do is create a set() of all the Counter keys in a dictionary of Counters, doing something like this:
# counters is a dict with Counters as values
whole_set = set()
map(lambda x: whole_set.update(set(x)), counters.values())
Is there a better way to do this in Python?
In Python 3, map returns an iterator, not a list. You still have to iterate over it, either by calling list on it explicitly, or by putting it in a for loop. But you shouldn't use map this way anyway. map is really for collecting return values into an iterable or sequence. Since neither print nor set.update returns a value, using map in this case isn't idiomatic.
Your goal is to put all the keys in all the counters in counters into a single set. One way to do that is to use a nested generator expression:
s = set(key for counter in counters.values() for key in counter)
There's also the lovely dict comprehension syntax, which is available in Python 2.7 and higher (thanks Lattyware!) and can generate sets as well as dictionaries:
s = {key for counter in counters.values() for key in counter}
These are both roughly equivalent to the following:
s = set()
for counter in counters.values():
for key in counter:
s.add(key)
You want the set-union of all the values of counters? I.e.,
counters[1].union(counters[2]).union(...).union(counters[n])
? That's just functools.reduce:
import functools
s = functools.reduce(set.union, counters.values())
If counters.values() aren't already sets (e.g., if they're lists), then you should turn them into sets first. You can do it using a dict comprehension using iteritems, which is a little clunky:
>>> counters = {1:[1,2,3], 2:[4], 3:[5,6]}
>>> counters = {k:set(v) for (k,v) in counters.iteritems()}
>>> print counters
{1: set([1, 2, 3]), 2: set([4]), 3: set([5, 6])}
or of course you can do it inline, since you don't care about counters.keys():
>>> counters = {1:[1,2,3], 2:[4], 3:[5,6]}
>>> functools.reduce(set.union, [set(v) for v in counters.values()])
set([1, 2, 3, 4, 5, 6])

Categories