Alternative way to use setdefault() using dictionary comprehension? - python

I have a nested dictionary that was created from a nested list where the first item in the nested list would be the outer key and outer value would be a dictionary which is the next two items. The following code is working great using the two setdefault() functions because it just adds to the nested dictionary when it sees a duplicate key of the outer. I was just wondering how you could do this same logic using a dictionary comprehension?
dict1 = {}
list1 = [[1, 2, 6],
[1, 3, 7],
[2, 5, 8],
[2, 8, 9]]
for i in list1:
dict1.setdefault(i[0], {}).setdefault(i[1], i[2])
OUTPUT:
{1: {2: 6, 3: 7}, 2: {5: 8, 8: 9}}

Use the loop because it's very readable and efficient. Not all code has to be a one-liner.
Having said that, it's possible. It abuses syntax, extremely unreadable, inefficient, and generally just plain bad code (don't do it!)
out = {k: next(gg for gg in [{}] if all(gg.setdefault(a, b) for a,b in v)) for k, v in next(g for g in [{}] if not any(g.setdefault(key, []).append(v) for key, *v in list1)).items()}
Output:
{1: {2: 6, 3: 7}, 2: {5: 8, 8: 9}}

I actually tried to achieve that result and failed.
The comprehension overwrites the new entries.
After, giving this idea a look, I found a similar post in which it is stated it is not possible:https://stackoverflow.com/questions/11276473/append-to-a-dict-of-lists-with-a-dict-comprehension
I believe Amber's answer best sumarizes what the conclusion with my failed attempt with dict comprehensions:
No - dict comprehensions are designed to generate non-overlapping keys with each iteration; they don't support aggregation. For this particular use case, a loop is the proper way to accomplish the task efficiently (in linear time)

Related

Append elements in the value field of a dictionary using comprehensions

I have a list of elements, lets say:
y = [1, 3, 1, 5, 1]
And I would like to create a dictionary where:
Keys: are the elements in y
Values: is a list of the elements that appear before the Key in y
I attempted the following comprehension.
a={elem:y[i] for i, elem in enumerate(y[1:])}
However, since the value field in the dictionary is not a list, it only keeps the previous element in the last occurrence of the key.
In other words, for this example I get the following:
{3: 1, 1: 5, 5: 3}
Is there a way to do so using comprehensions ?
Note: I forgot to add the desired result.
{3: [1], 1: [3,5], 5: [1]}
Your keys are duplicated, so you cannot create a dictionary with them (you'll lose the first elements).
So comprehensions are difficult to use (and inefficient, as stated by other comprehension answers here) because of the accumulation effect that you need.
I suggest using collections.defaultdict(list) instead and a good old loop:
import collections
y = [1, 3, 1, 5, 1]
d = collections.defaultdict(list)
for i,x in enumerate(y[1:]):
d[x].append(y[i]) # i is the index of the previous element in y
print(d)
result:
defaultdict(<class 'list'>, {1: [3, 5], 3: [1], 5: [1]})
Use enumerate and set operations.
{value: set(y[:i]) - {value} for i, value in enumerate(y)}
Out: {1: {3, 5}, 3: {1}, 5: {1, 3}}
It's a bit ugly and inefficient because in your example it works out a new answer each time it encounters 1, but it works out right because the final time it does this is the final time it encounters 1.
Just for the fun of it. Here's a comprehension.
a = {y[i]: [y[x-1] for x in range(len(y)) if y[x]==y[i]] for i in range(1, len(y))}
>> {3: [1], 1: [3,5], 5: [1]}
Just note that it's too long and inefficient to be allowed in any practical program.
Using the defaultdict as Jean-François Fabre suggested in his answer below should be the proper way.

Converting a list of "pairs" into a dictionary of dictionaries?

This question was previously asked here with an egregious typo: Counting "unique pairs" of numbers into a python dictionary?
This is an algorithmic problem, and I don't know of the most efficient solution. My idea would be to somehow cache values in a list and enumerate pairs...but that would be so slow. I'm guessing there's something useful from itertools.
Let's say I have a list of integers whereby are never repeats:
list1 = [2, 3]
In this case, there is a unique pair 2-3 and 3-2, so the dictionary should be:
{2:{3: 1}, 3:{2: 1}}
That is, there is 1 pair of 2-3 and 1 pair of 3-2.
For larger lists, the pairing is the same, e.g.
list2 = [2, 3, 4]
has the dicitonary
{2:{3:1, 4:1}, 3:{2:1, 4:1}, 4:{3:1, 2:1}}
(1) Once the size of the lists become far larger, how would one algorithmically find the "unique pairs" in this format using python data structures?
(2) I mentioned that the lists cannot have repeat integers, e.g.
[2, 2, 3]
is impossible, as there are two 2s.
However, one may have a list of lists:
list3 = [[2, 3], [2, 3, 4]]
whereby the dictionary must be
{2:{3:2, 4:1}, 3:{2:2, 4:1}, 4:{2:1, 3:1}}
as there are two pairs of 2-3 and 3-2. How would one "update" the dictionary given multiple lists within a list?
EDIT: My ultimate use case is, I want to iterate through hundreds of lists of integers, and create a single dictionary with the "counts" of pairs. Does this make sense? There might be another data structure which is more useful.
For the nested list example, you can do the following, making use of itertools.permutations and dict.setdefault:
from itertools import permutations
list3 = [[2, 3], [2, 3, 4]]
d = {}
for l in list3:
for a, b in permutations(l, 2):
d[a][b] = d.setdefault(a, {}).setdefault(b, 0) + 1
# {2: {3: 2, 4: 1}, 3: {2: 2, 4: 1}, 4: {2: 1, 3: 1}}
For flat lists l, use only the inner loop and omit the outer one
For this example I'll just use a list with straight numbers and no nested list:
values = [3, 2, 4]
result = dict.from_keys(values)
for key, value in result.items():
value = {}
for num in values:
if num != key:
value[num] = 1
This creates a dict with each number as a key. Now in each key, make the value a nested dict who's contents are num: 1 for each number in the original values list if it isn't the name of the key that we're in
use defaultdict with permutations
from collections import defaultdict
from itertools import permutations
d = defaultdict(dict)
for i in [x for x in permutations([4,2,3])]:
d[i[0]] = {k: 1 for k in i[1:]}
output is
In [22]: d
Out[22]: defaultdict(dict, {2: {3: 1, 4: 1}, 4: {2: 1, 3: 1}, 3: {2: 1, 4: 1}})
for inherit list of lists https://stackoverflow.com/a/52206554/8060120

How can I deal with duplicate values when creating a dictionary in Python?

I am a beginner in python, and I would like to create a simple program that assigns each element in list1 to its respective element in list2 using the zip function.
list1=[1,2,3,4,1]
list2=[2,3,4,5,6]
dictionary=dict(zip(list1,list2))
print(dictionary)
However, because I have a duplicate value in list1, the dictionary displays the following results:
{1: 6, 2: 3, 3: 4, 4: 5}
Because 1 is a duplicate value in list1, only 1:6 is displayed and not 1:2 as well. How can I change my code such that the duplicate value is taken into account and is displayed in its respective order?
{1: 2, 2: 3, 3: 4, 4: 5, 1: 6}
Thank you
What you ask is not possible with a Python dict, since that goes against the definition of a dict--keys must be unique. The comments explain why that is the case.
However, there are multiple other ways that may be useful to you that can achieve almost the same affect. One simple, yet not terribly useful, way is:
almost_dictionary = list(zip(list1, list2))
This gives the result
[(1, 2), (2, 3), (3, 4), (4, 5), (1, 6)]
which sometimes can be used like a dictionary. However, this probably is not what you want. Better is a dict or defaultdict that, for each key, holds a list of all the values connected with that key. The defaultdict is easier to use, though harder to set up:
dictionary = defaultdict(list)
for k, v in zip(list1, list2):
dictionary[k].append(v)
print(dictionary)
This gives the result
defaultdict(<class 'list'>, {1: [2, 6], 2: [3], 3: [4], 4: [5]})
and you see that each key has each value--the values are just in lists. The value of dictionary[1] is [2, 6], so you have both values to work with.
Which method you choose depends on your purpose for the dictionary.

Python: "Hash" a nested List

I have a dictionary master which contains around 50000 to 100000 unique lists which can be simple lists or also lists of lists. Every list is assigned to a specific ID (which is the key of the dictionary):
master = {12: [1, 2, 4], 21: [[1, 2, 3], [5, 6, 7, 9]], ...} # len(master) is several ten thousands
Now I have a few hundreds of dictionarys which again contain around 10000 lists (same as above: can be nested). Example of one of those dicts:
a = {'key1': [6, 9, 3, 1], 'key2': [[1, 2, 3], [5, 6, 7, 9]], 'key3': [7], ...}
I want to cross-reference this data for every single dictionary in reference to my master, i.e. instead of saving every list within a, I want to only store the ID of the master in case the list is present in the master.
=> a = {'key1': [6, 9, 3, 1], 'key2': 21, 'key3': [7], ...}
I can do that by looping over all values in a and all values of master and try to match the lists (by sorting them), but that'll take ages.
Now I'm wondering how would you solve this?
I thought of "hashing" every list in master to a unique string and store it as a key of a new master_inverse reference dict, e.g.:
master_inverse = {hash([1,2,4]): 12, hash([[1, 2, 3], [5, 6, 7, 9]]): 21}
Then it would be very simple to look it up later on:
for k, v in a.items():
h = hash(v)
if h in master_inverse:
a[k] = master_inverse[h]
Do you have a better idea?
How could such a hash look like? Is there a built-in-method already which is fast and unique?
EDIT:
Dunno why I didn't come up instantly with this approach:
What do you think of using a m5-hash of either the pickle or the repr() any single list?
Something like this:
import hashlib
def myHash(str):
return hashlib.md5(repr(str)).hexdigest()
master_inverse = {myHash(v): k for k, v in master.items()}
for k, v in a.items():
h = myHash(v)
if h in master_inverse:
a[k] = master_inverse[h]
EDIT2:
I benched it: To check one of the hundred dicts (in my example a, a contains for my benchmark around 20k values) against my master_inverse is very fast, didn't expect that: 0.08sec. So I guess I can live with that well enough.
MD5 approach will work, but you need to be cautions about very small possibility of cache collisions (see How many random elements before MD5 produces collisions? for more deitals) when using MD5 hash.
If you need to be absolutely sure that program works correctly you can convert lists to tuples and create dictionary where keys are tuples you have created and values are keys from your master dictionary (same as master_inverse, but with full values instead of MD5 hash values).
More info on how to use tuples as dictionary keys: http://www.developer.com/lang/other/article.php/630941/Learn-to-Program-using-Python-Using-Tuples-as-Keys.htm.

Flatten a dictionary of dictionaries (2 levels deep) of lists

I'm trying to wrap my brain around this but it's not flexible enough.
In my Python script I have a dictionary of dictionaries of lists. (Actually it gets a little deeper but that level is not involved in this question.) I want to flatten all this into one long list, throwing away all the dictionary keys.
Thus I want to transform
{1: {'a': [1, 2, 3], 'b': [0]},
2: {'c': [4, 5, 1], 'd': [3, 8]}}
to
[1, 2, 3, 0, 4, 5, 1, 3, 8]
I could probably set up a map-reduce to iterate over items of the outer dictionary to build a sublist from each subdictionary and then concatenate all the sublists together.
But that seems inefficient for large data sets, because of the intermediate data structures (sublists) that will get thrown away. Is there a way to do it in one pass?
Barring that, I would be happy to accept a two-level implementation that works... my map-reduce is rusty!
Update:
For those who are interested, below is the code I ended up using.
Note that although I asked above for a list as output, what I really needed was a sorted list; i.e. the output of the flattening could be any iterable that can be sorted.
def genSessions(d):
"""Given the ipDict, return an iterator that provides all the sessions,
one by one, converted to tuples."""
for uaDict in d.itervalues():
for sessions in uaDict.itervalues():
for session in sessions:
yield tuple(session)
...
# Flatten dict of dicts of lists of sessions into a list of sessions.
# Sort that list by start time
sessionsByStartTime = sorted(genSessions(ipDict), key=operator.itemgetter(0))
# Then make another copy sorted by end time.
sessionsByEndTime = sorted(sessionsByStartTime, key=operator.itemgetter(1))
Thanks again to all who helped.
[Update: replaced nthGetter() with operator.itemgetter(), thanks to #intuited.]
I hope you realize that any order you see in a dict is accidental -- it's there only because, when shown on screen, some order has to be picked, but there's absolutely no guarantee.
Net of ordering issues among the various sublists getting catenated,
[x for d in thedict.itervalues()
for alist in d.itervalues()
for x in alist]
does what you want without any inefficiency nor intermediate lists.
edit: re-read the original question and reworked answer to assume that all non-dictionaries are lists to be flattened.
In cases where you're not sure how far down the dictionaries go, you would want to use a recursive function. #Arrieta has already posted a function that recursively builds a list of non-dictionary values.
This one is a generator that yields successive non-dictionary values in the dictionary tree:
def flatten(d):
"""Recursively flatten dictionary values in `d`.
>>> hat = {'cat': ['images/cat-in-the-hat.png'],
... 'fish': {'colours': {'red': [0xFF0000], 'blue': [0x0000FF]},
... 'numbers': {'one': [1], 'two': [2]}},
... 'food': {'eggs': {'green': [0x00FF00]},
... 'ham': ['lean', 'medium', 'fat']}}
>>> set_of_values = set(flatten(hat))
>>> sorted(set_of_values)
[1, 2, 255, 65280, 16711680, 'fat', 'images/cat-in-the-hat.png', 'lean', 'medium']
"""
try:
for v in d.itervalues():
for nested_v in flatten(v):
yield nested_v
except AttributeError:
for list_v in d:
yield list_v
The doctest passes the resulting iterator to the set function. This is likely to be what you want, since, as Mr. Martelli points out, there's no intrinsic order to the values of a dictionary, and therefore no reason to keep track of the order in which they were found.
You may want to keep track of the number of occurrences of each value; this information will be lost if you pass the iterator to set. If you want to track that, just pass the result of flatten(hat) to some other function instead of set. Under Python 2.7, that other function could be collections.Counter. For compatibility with less-evolved pythons, you can write your own function or (with some loss of efficiency) combine sorted with itertools.groupby.
A recursive function may work:
def flat(d, out=[]):
for val in d.values():
if isinstance(val, dict):
flat(d, out)
else:
out+= val
If you try it with :
>>> d = {1: {'a': [1, 2, 3], 'b': [0]}, 2: {'c': [4, 5, 6], 'd': [3, 8]}}
>>> out = []
>>> flat(d, out)
>>> print out
[1, 2, 3, 0, 4, 5, 6, 3, 8]
Notice that dictionaries have no order, so the list is in random order.
You can also return out (at the end of the loop) and don't call the function with a list argument.
def flat(d, out=[]):
for val in d.values():
if isinstance(val, dict):
flat(d, out)
else:
out+= val
return out
call as:
my_list = flat(d)

Categories