Python array of tuples group by first, store second

Python array of tuples group by first, store second - python

So I have an array of tuples something like this
query_results = [("foo", "bar"), ("foo", "qux"), ("baz", "foo")]
I would like to achieve something like:
{
"foo": ["bar", "qux"],
"baz": ["foo"]
}
So I have tried using this
from itertools import groupby
grouped_results = {}
for key, y in groupby(query_results, lambda x: x[0]):
grouped_results[key] = [y[1] for u in list(y)]
The issue I have is although the number of keys are correct, the number of values in each array is dramatically lower than it should be. Can anyone explain why this happens and what I should be doing?

You better use a defaultdict for this:
from collections import defaultdict
result = defaultdict(list)
for k,v in query_results:
result[k].append(v)
Which yields:
>>> result
defaultdict(<class 'list'>, {'baz': ['foo'], 'foo': ['bar', 'qux']})
If you wish to turn it into a vanilla dictionary again, you can - after the for loop - use:
result = dict(result)
this then results in:
>>> dict(result)
{'baz': ['foo'], 'foo': ['bar', 'qux']}
A defaultdict is constructed with a factory, here list. In case the key cannot be found in the dictionary, the factory is called (list() constructs a new empty list). The result is then associated with the key.
So for each key k that is not yet in the dictionary, we will construct a new list first. We then call .append(v) on that list to append values to it.

Well why not use a simple for loop?
grouped_results = {}
for key, value in query_results:
grouped_results.setdefault(key, []).append(value)
Output:
{'foo': ['bar', 'qux'], 'baz': ['foo']}

How about using a defaultdict?
d = defaultdict(list)
for pair in query_results:
d[pair[0]].append(pair[1])

Related

Optimize creation of dictionary

I have a list with ids called ids. Every element in ids is a string. One id can exist multiple times in this list.
My aim is to create a dictionary which has the the number of occurrences as a key and the value is a list of the ids which appear that often.
My current approach looks like this:
from collections import defaultdict
import numpy as np
ids = ["foo", "foo", "bar", "hi", "hi"]
counts = defaultdict(list)
for id in np.unique(ids):
counts[ids.count(id)].append(id)
Output:
print counts
--> defaultdict(<type 'list'>, {1: ['bar'], 2: ['foo', 'hi']})
This works nicely if the list of ids is not too long. However, for longer lists the performance is rather bad.
How can I make this faster?

Instead of calling count for each element in the list, create a collections.Counter for the entire list:
ids = ["foo", "foo", "bar", "hi", "hi"]
counts = defaultdict(list)
for i, c in Counter(ids).items():
counts[c].append(i)
# counts: defaultdict(<class 'list'>, {1: ['bar'], 2: ['foo', 'hi']})
If you prefer a one-liner, you could also combine Counter.most_common (for view on the elements sorted by counts) and itertools.groupby (but I rather wouldn't)
>>> {k: [v[0] for v in g] for k, g in groupby(Counter(ids).most_common(), lambda x: x[1])}
{1: ['bar'], 2: ['foo', 'hi']}

How do you merge and switch keys with values if they have same values?

Let's say I have a dictionary with these keys and values :
{'foo': 1, 'bar': 5,'foo1' : 1,'bar1' : 1,'foo2': 5}
I can't zip them like this
dict(zip(my.values(),my.keys()))
because this happens :
{1: 'foo', 5: 'bar'}
What I would like to be my output is :
{1:{'bar1','foo','foo1'},5:{'bar','foo2'}}

You should use a collections.defaultdict().
from collections import defaultdict
result = defaultdict(list)
for k, v in my.items():
result[v].append(k)

You can't have multiple values for a given key, so the values in such a data structure need to be lists. You can't do this transformation easily with zip(); you'll need a for loop:
my = {'foo': 1, 'bar': 5, 'foo1': 1, 'bar1': 1,'foo2': 5}
rev = {}
for k, v in my.items():
rev.setdefault(v, []).append(k)
From the edits in your question, it appears that you want to use a set for the values. This is also straightforward:
for k, v in my.items():
rev.setdefault(v, set()).add(k)
You can also use a defaultdict as Daniel has suggested, but it seems overkill here to do an import just for that. Depending on the size of your dictionary it might be a little faster, since using setdefault() we are continuously creating and throwing away empty containers.

For a one-liner with a functional twist (probably neither the most readable nor performant code, though):
import itertools, operator
my_dict = {'foo': 1, 'bar': 5, 'foo1' : 1,'bar1' : 1, 'foo2': 5}
inverse_dict = { k:map(operator.itemgetter(0), v) for k, v in itertools.groupby(sorted(my_dict.items(), key=operator.itemgetter(1)), operator.itemgetter(1)) }
To aggregate using a set, just wrap the mapped value in a set constructor.
inverse_dict = { k:set(map(operator.itemgetter(0), v)) for k, v in itertools.groupby(sorted(my_dict.items(), key=operator.itemgetter(1)), operator.itemgetter(1)) }

deconstructing and reconstructing python dictionary

I have a dictionary which I need to deconstruct its keys and values in perhaps two lists(or any other type that does the job) and later in another function, construct the exact same dictionary putting back the keys and values. What's the right way of approaching this?

You can use dict.items() to get all the key-value pairs from the dictionary, then either store them directly...
>>> d = {"foo": 42, "bar": 23}
>>> items = list(d.items())
>>> dict(items)
{'bar': 23, 'foo': 42}
... or distribute them to two separate lists, using zip:
>>> keys, values = zip(*d.items())
>>> dict(zip(keys, values))
{'bar': 23, 'foo': 42}

d = {'jack': 4098, 'sape': 4139}
k, v = d.keys(), d.values()
# Do stuff with keys and values
# -
# Create new dict from keys and values
nd = dict(zip(k, v))

Better Don't deconstruct it. Where you need the keys and values as list you can get that with the following methods.
keyList=list(dict.keys())
valueList = [dict[key] for key in keyList] or [dict[key] for key in dict.keys()]
Hope it helps.

To deconstruct a Dict to two list
>>> test_dict={"a":1, "b":2}
>>> keyList=[]
>>> valueList =[]
>>> for key,val in test_dict.items():
... keyList.append(key)
... valueList.append(val)
>>> print valueList
[1, 2]
>>> print keyList
['a', 'b']
To construct from two list of key and value I would use zip method along with dict comprehension.
>>> {key:val for key,val in zip(keyList,valueList)}
{'a': 1, 'b': 2}

Get a list of a dictionary's key names

How do I make a function that takes a dictionary as input and outputs the names of its keys? So like:
input_dictionary = {"foo": 1, "bar": 2}
names = get_names(input_dictionary) # returns ["foo", "bar"]

You should read the Python documentation for dictionaries (see here for Python 3.5). There are multiple ways that this can be done.
You could use:
input_dictionary.keys(), obviously the easiest solution:
def get_names(input_dictionary):
return input_dictionary.keys()
input_dictionary.iterkeys(), to get an iterator over the dictionary's keys, then you can iterate over those keys to create a list of them:
def get_names(input_dictionary):
list_of_keys = []
for key in input_dictionary.iterkeys():
list_of_keys.append(key)
return list_of_keys
input_dictionary.iteritems(), which returns an iterator over the dictionary's (key, value) pairs, which you can iterate over and then extract the keys:
def get_names(input_dictionary):
list_of_keys = []
for item in input_dictionary.iteritems():
list_of_keys.append(item[0])
return list_of_keys
input_dictionary.popitem(), which pops (removes) and returns an arbitrary (key, value) pair from your dictionary, from which you can extract the key. You probably don't want this one, though, since it clears your dictionary
And finally, input_dictionary.viewitems() or input_dictionary.viewkeys() to get a view of the (key, value) pairs or the list of keys, respectively, for your dictionary. Anytime the dictionary changes, this view object will reflect that change.

input_dictionary = {"foo": 1, "bar": 2}
input_dictionary.keys() # ["foo", "bar"]

Using keys():
>>> input_dictionary = {"foo": 1, "bar": 2}
>>> print input_dictionary.keys()
['foo', 'bar']
So a function would be:
def dictkeys(mydictionary):
return mydictionary.keys()
Output:
>>> dictkeys(input_dictionary)
['foo', 'bar']
You don't really need a function for this though because it's the same as just simply using dictionaryname.keys()

Dictionary comprehension for swapping keys/values in a dict with multiple equal values

def invert_dict(d):
inv = dict()
for key in d:
val = d[key]
if val not in inv:
inv[val] = [key]
else:
inv[val].append(key)
return inv
This is an example from Think Python book, a function for inverting(swapping) keys and values in a dictionary. New values (former keys) are stored as lists, so if there was multiple dictionary values (bound to a different keys) that were equal before inverting, then this function simply appends them to the list of former keys.
Example:
somedict = {'one': 1, 'two': 2, 'doubletwo': 2, 'three': 3}
invert_dict(somedict) ---> {1: ['one'], 2: ['doubletwo', 'two'], 3: ['three']}
My question is, can the same be done with dictionary comprehensions? This function creates an empty dict inv = dict(), which is then checked later in the function with if/else for the presence of values. Dict comprehension, in this case, should check itself. Is that possible, and how the syntax should look like?
General dict comprehension syntax for swapping values is:
{value:key for key, value in somedict.items()}
but if I want to add an 'if' clausule, what it should look like? if value not in (what)?
Thanks.

I don't think it's possible with simple dict comprehension without using other functions.
Following code uses itertools.groupby to group keys that have same values.
>>> import itertools
>>> {k: [x[1] for x in grp]
for k, grp in itertools.groupby(
sorted((v,k) for k, v in somedict.iteritems()),
key=lambda x: x[0])
}
{1: ['one'], 2: ['doubletwo', 'two'], 3: ['three']}

You can use a set comprehension side effect:
somedict = {'one': 1, 'two': 2, 'doubletwo': 2, 'three': 3}
invert_dict={}
{invert_dict.setdefault(v, []).append(k) for k, v in somedict.items()}
print invert_dict
# {1: ['one'], 2: ['doubletwo', 'two'], 3: ['three']}

Here is a good answer:
fts = {1:1,2:1,3:2,4:1}
new_dict = {dest: [k for k, v in fts.items() if v == dest] for dest in set(fts.values())}
Reference: Head First Python ,2nd Edition, Page(502)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python array of tuples group by first, store second - python

Well why not use a simple for loop? grouped_results = {} for key, value in query_results: grouped_results.setdefault(key, []).append(value) Output: {'foo': ['bar', 'qux'], 'baz': ['foo']}

How about using a defaultdict? d = defaultdict(list) for pair in query_results: d[pair[0]].append(pair[1])

Related

Optimize creation of dictionary

How do you merge and switch keys with values if they have same values?

deconstructing and reconstructing python dictionary

Get a list of a dictionary's key names

Dictionary comprehension for swapping keys/values in a dict with multiple equal values

Categories

Resources