from itertools import groupby
from operator import itemgetter
d = [{'k': 'v1'}]
r = ((k, v) for k, v in groupby(d, key=itemgetter('k')))
for k, v in r:
print(k, list(v)) # v1 [{'k': 'v1'}]
print('---')
r = {k: v for k, v in groupby(d, key=itemgetter('k'))}
for k, v in r.items():
print(k, list(v)) # v1 []
Seems like some quirk, or am I missing something?
This is a documented part of itertools.groupby:
The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list
In other words, you need to access the group before getting the next item in the iterator -- in this case, from inside the dict comprehension. To use it in a dict comprehension you need to make the list in the comprehension:
from itertools import groupby
from operator import itemgetter
d = [{'k': 'v1'}]
r = {k: list(v) for k, v in groupby(d, key=itemgetter('k'))}
for k, v in r.items():
print(k, v) # v1 [{'k': 'v1'}]
In your first example, because you are using a generator expression, you don't actually start iterating the groupby iterator until you start the for loop. However, you would have the same issue if you used a non-lazy list comprehension instead of a generator (i.e. r = [(k, v) for k, v in groupby(d, key=itemgetter('k'))]).
Why does it work this way?
Preserving lazy iteration is the motivating idea behind itertools. Because it is dealing with (possibly large, or infinite) iterators, it never wants to store any values in memory. It just calls next() on the underlying iterator and does something with that value. Once you've called next() you can't go back to earlier values (without storing them, which itertools doesn't want to do).
With groupby it's easier to see with an example. Here is a simple generator that makes alternating ranges of positive and negative numbers and a groupby iterator that groups them:
def make_groups():
i = 1
while True:
for n in range(1, 10):
print("yielding: ", n*i)
yield n * i
i *= -1
g = make_groups()
grouper = groupby(g, key=lambda x: x>0)
make_groups prints a line each time next() is called before yielding the value to help know what's happening. When we call next() on grouper this results in a next call to g and gets out first group and value:
> k, gr = next(grouper)
yielding: 1
Now each next() call on gr results in a next() call to the underlying g as you can see from the print:
> next(gr)
1 # already have this value from the initial next(grouper)
> next(gr)
yielding: 2 # gets the next value and clicks the underlying generator to the next yield
2
Now look what happens if we call next() on grouper to get the next group:
> next(grouper)
yielding: 3
yielding: 4
yielding: 5
yielding: 6
yielding: 7
yielding: 8
yielding: 9
yielding: -1
Groupby is iterated through the generator until it hit a value that changed the key. The values have been yielded by g. We can no longer the next value of gr (ie. 3) unless somehow we stored all those values or we somehow t-ed off the underlying g generator into two independent generators. Neither of these are good solutions for the default implementation (especially since the point of itertools is not to do this), so it leaves it up to you to do this, but you need to store these values before something causes next(grouper) to be called and advance the generator past the values you wanted.
Related
I am very new to python and I was wondering how to get the following dictionary from the list of tuples?
Question
x = [('A',1),('B',2),('C',3),('A',10),('B',10)]
required_dict = {'A': 11,'B': 12, 'C': 3}
Easiest way IMO is using defaultdict and a single for loop:
from collections import defaultdict
required_dict = defaultdict(int)
for k, v in x:
required_dict[k] += v
You could also do this in a single line with a nested comprehension, but this is less efficient because it involves iterating over x repeatedly instead of doing it in a single pass:
required_dict = {k: sum(v for k1, v in x if k1 == k) for k, v in x}
Another comprehension-based solution that doesn't involve redundant iteration would be to use groupby in order to iterate only within each group of identical keys:
from itertools import groupby
required_dict = {
k: sum(v for _, v in g)
for k, g in groupby(sorted(x), key=lambda t: t[0])
}
These three approaches are respectively:
O(n) (single iteration)
O(n^2) (re-iteration for each element)
O(nlogn) (a full sort followed by a single iteration)
I'm trying to do grouping in python in a one line expression. I want to build a dict of groups and number of items in the group:
{k: {'objects': list(g), 'count': len(list(g))}
for k,g in groupby(rows, key=lambda x: x['group_id'])}
But g is an iterator and it does not work in second use with 'count': len(list(g)).
How can I do counting and reusing g in one line expression?
You can't call list() on an iterator more than once, no. You have to store the result first.
Your options are, in order of feasibility:
To not use a one-liner. Use a regular for loop and assign the list() result to a separate variable first.
Wrap the groupby() iterator in a generator expression that applies list() to the group object.
Add a second loop over with a single element tuple, the list() call, so you can use the loop target as a variable for both keys in the dictionary you are building.
Wait until Python 3.8 which adds PEP 572 assignment expressions and assign the list() call result to a name to re-use for len()
The first should be the preferred option. Readability counts!
result = {}
for group_id, group in groupby(rows, key=lambda x: x['group_id']):
objects = list(group)
result[group_id] = {'objects': objects, 'count': len(objects)}
Using a generator expression is perhaps the next best option:
list_group = ((k, list(g)) for k, g in groupby(rows, key=lambda x: x['group_id']))
result = {k: {'objects': gl, 'count': len(gl)} for k, gl in list_group}
The generator expression loop is executed in parallel as for k, gl in list_group iterates.
The second-loop option looks like this:
{
k: {'objects': gl, 'count': len(gl)}
for k, g in groupby(rows, key=lambda x: x['group_id'])
for gl in (list(g),)
}
Because this trick is surprising and hard to read, I strongly recommend against using it.
In Python 3.8, with PEP 572 implemented, you can use:
{
k: {'objects': gl := list(g), 'count': len(gl)}
for k, g in groupby(rows, key=lambda x: x['group_id'])
}
Iterators can be 'doubled' by using the itertools.tee() object, but that has to then cache the whole list in memory separately, doubling the memory cost and the code would become no more readable (as you'd have to use a similar trick then to store the tee() call iterators in variables too!).
I have a list of 2 lists, each with 700 dictionaries.
Each dictionary has a word count, and I want to combine them, such that values of same keys will be added.
I tried doing :
combine_dicts = collections.defaultdict(int)
for k, v in itertools.chain(x.iteritems() for x in tuple(dicts[0])):
combine_dicts[k] += v
dicts[0] and dicts[1] are 2 lists of dictionaries.
But it throws the following error:
ValueError: too many values to unpack.
Is there any better way of doing this?
You misused chain; you wanted chain.from_iterable to chain the iterable outputs of your generator expression, not just wrap the generator function as a no-op:
for k, v in itertools.chain.from_iterable(x.iteritems() for x in dicts[0]):
That only gets the first list of dicts though; to get both, we need MOAR CHAINING!:
# Qualifying chain over and over is a pain
from itertools import chain
for k, v in chain.from_iterable(x.iteritems() for x in chain(*dicts)):
combine_dicts = defaultdict(int)
for i in range(0,2):
for d in dicts[i]:
for k,v in d.iteritems():
combine_dicts[k] += v
This iterates each dictionary once so memory usage should be efficient.
I try to iterate over an ordered dictionary in last in first out order.
While for a standard dictionary everything works fine, the first solution for the orderedDict reacts strange. It seems, that while popitem() returns one key/value pair (but somehow sequentially, since I can't replace kv_pair by two variables), iteration is finished then. I see no easy way to proceed to the next key/value pair.
While I found two working alternatives (shown below), both of them lack the elegance of the normal dictionary approach.
From what I found in the online help, it is impossible to decide, but I assume I have wrong expectations. Is there a more elgant approach?
from collections import OrderedDict
normaldict = {"0": "a0.csf", "1":"b1.csf", "2":"c2.csf"}
for k, v in normaldict.iteritems():
print k,":",v
d = OrderedDict()
d["0"] = "a0.csf"
d["1"] = "b1.csf"
d["2"] = "c2.csf"
print d, "****"
for kv_pair in d.popitem():
print kv_pair
print "++++"
for k in reversed(d.keys()):
print k, d[k]
print "%%%%"
while len(d) > 0:
k, v = d.popitem()
print k, v
dict.popitem() is not the same thing as dict.iteritems(); it removes one pair from the dictionary as a tuple, and you are looping over that pair.
The most efficient method is to use a while loop instead; no need to call len(), just test against the dictionary itself, an empty dictionary is considered false:
while d:
key, value = d.popitem()
print key, value
The alternative is to use reversed():
for key, item in reversed(d.items()):
print key, value
but that requires the whole dictionary to be copied into a list first.
However, if you were looking for a FIFO queue, use collections.deque() instead:
from collections import deque
d = deque(["a0.csf", "b1.csf", "c2.csf"])
while d:
item = d.pop()
or use deque.reverse().
d.popitems() will return only one tuple (k,v). So your for loop is iterating over the one item and the loop ends.
you can try
while d:
k, v = d.popitem()
say that the following is our code:
d = {"Happy":"Clam", "Sad":"Panda"}
for i in d:
print(i)
now print(i) will print out just the keys, but how could I change it so that it printed the values?
d = {"Happy":"Clam", "Sad":"Panda"}
for i in d:
print(i, d[i]) # This will print the key, then the value
or
d = {"Happy":"Clam", "Sad":"Panda"}
for k,v in d.items(): # A method that lets you access the key and value
print(k,v) # k being the key, v being the value
A Python dict has a number of methods available for getting a list of the keys, values, or both.
To answer your question, you can use d.values() to get a list of the values only:
d = {"Happy":"Clam", "Sad":"Panda"}
for v in d.values():
print(v)
Output:
Clam
Panda
The items() method is particularly useful, though, so should be mentioned.
d = {"Happy":"Clam", "Sad":"Panda"}
for k, v in d.items():
print(k, v)
will print:
Happy Clam
Sad Panda
A warning about the ordering of items from the documentation:
CPython implementation detail: Keys and values are listed in an
arbitrary order which is non-random, varies across Python
implementations, and depends on the dictionary’s history of insertions
and deletions.
d = {"Happy":"Clam", "Sad":"Panda"}
for i in d:
print(i, d[i])
Gives me:
('Sad', 'Panda')
('Happy', 'Clam')
A simple approach would to use a for each loop
for value in d.values:
print(value)
You could also make a generator. Generators are iterable sequences that you can stop and resume.
def generator(input):
for value in input.values():
yield value
gen = generator(d);
print(d.next()) // Clam
// Doing complex calculations
answer = 21 + 21
print(d.next()) // Panda
Another way is using the higher order function 'map'.
map( print, d.values() )
// You can also use generators ;)
map( print, gen )
Lastly, in python 3 dictionaries now support compression. Dictionary compression is great for creating dictionaries, not so much for printing the content of each entry's value. It's something worth googling, since everything in python is a dictionary.