Creating Dictionaries from Lists inside of Dictionaries - python

I'm quite new to Python and I have been stumped by a seemingly simple task.
In part of my program, I would like to create Secondary Dictionaries from the values inside of lists, of which they are values of a Primary Dictionary.
I would also like to default those values to 0
For the sake of simplicity, the Primary Dictionary looks something like this:
primaryDict = {'list_a':['apple', 'orange'], 'list_b':['car', 'bus']}
What I would like my result to be is something like:
{'list_a':[{'apple':0}, {'orange':0}], 'list_b':[{'car':0}, {'bus':0}]}
I understand the process should be to iterate through each list in the primaryDict, then iterate through the items in the list and then assign them as Dictionaries.
I've tried many variations of "for" loops all looking similar to:
for listKey in primaryDict:
for word in listKey:
{word:0 for word in listKey}
I've also tried some methods of combining Dictionary and List comprehension,
but when I try to index and print the Dictionaries with, for example:
print(primaryDict['list_a']['apple'])
I get the "TypeError: list indices must be integers or slices, not str", which I interpret that my 'apple' is not actually a Dictionary, but still a string in a list. I tested that by replacing 'apple' with 0 and it just returns 'apple', proving it true.
I would like help with regards to:
-Whether or not the values in my list are assigned as Dictionaries with value '0'
or
-Whether the mistake is in my indexing (in the loop or the print function), and what I am mistaken with
or
-Everything I've done won't get me the desired outcome and I should attempt a different approach
Thanks

Here is a dict comprehension that works:
{k: [{v: 0} for v in vs] for k, vs in primaryDict.items()}
There are two problems with your current code. First, you are trying to iterate over listKey, which is a string. This produces a sequence of characters.
Second, you should use something like
[{word: 0} for word in words]
in place of
{word:0 for word in listKey}

You are close. The main issue is the way you iterate your dictionary, and the fact you do not append or assign your sub-dictionaries to any variable.
This is one solution using only for loops and list.append.
d = {}
for k, v in primaryDict.items():
d[k] = []
for w in v:
d[k].append({w: 0})
{'list_a': [{'apple': 0}, {'orange': 0}],
'list_b': [{'car': 0}, {'bus': 0}]}
A more Pythonic solution is to use a single list comprehension.
d = {k: [{w: 0} for w in v] for k, v in primaryDict.items()}
If you are using your dictionary for counting, which seems to be the implication, an even more Pythonic solution is to use collections.Counter:
from collections import Counter
d = {k: Counter(dict.fromkeys(v, 0)) for k, v in primaryDict.items()}
{'list_a': Counter({'apple': 0, 'orange': 0}),
'list_b': Counter({'bus': 0, 'car': 0})}
There are specific benefits attached to collections.Counter relative to normal dictionaries.

You can get the data structure that you desire via:
primaryDict = {'list_a':['apple', 'orange'], 'list_b':['car', 'bus']}
for k, v in primaryDict.items():
primaryDict[k] = [{e: 0} for e in v]
# primaryDict
{'list_b': [{'car': 0}, {'bus': 0}], 'list_a': [{'apple': 0}, {'orange': 0}]}
But the correct nested access would be:
print(primaryDict['list_a'][0]['apple']) # note the 0
If you actually want primaryDict['list_a']['apple'] to work, do instead
for k, v in primaryDict.items():
primaryDict[k] = {e: 0 for e in v}
# primaryDict
{'list_b': {'car': 0, 'bus': 0}, 'list_a': {'orange': 0, 'apple': 0}}

primaryDict = {'list_a':['apple', 'orange'], 'list_b':['car', 'bus']}
for listKey in primaryDict:
primaryDict[i] = [{word:0} for word in primaryDict[listKey]]
print(primaryDict)
Output:
{'list_a':[{'apple':0}, {'orange':0}], 'list_b':[{'car':0}, {'bus':0}]}
Hope this helps!

#qqc1037, I checked and updated your code to make it working. I have mentioned the problem with your code as comments. Finally, I have also added one more example using list comprehension, map() & lambda function.
import json
secondaryDict = {}
for listKey in primaryDict:
new_list = [] # You did not define any temporary list
for word in primaryDict [listKey]: # You forgot to use key that refers the list
new_list.append( {word:0}) # Here you forgot to append to list
secondaryDict2.update({listKey: new_list}) # Finally, you forgot to update the secondary dictionary
# Pretty printing dictionary
print(json.dumps(secondaryDict, indent=4));
"""
{
"list_a": [
{
"apple": 0
},
{
"orange": 0
}
],
"list_b": [
{
"car": 0
},
{
"bus": 0
}
]
}
"""
Another example: Using list comprehension, map(), lambda function
# Using Python 3.5.2
import json
primaryDict = {'list_a':['apple', 'orange'], 'list_b':['car', 'bus']}
secondaryDict = dict(map(lambda key: (key, [{item:0} for item in primaryDict[key]]), list(primaryDict) ))
# Pretty printing secondary dictionary
print(json.dumps(secondaryDict, indent=4))
"""
{
"list_a": [
{
"apple": 0
},
{
"orange": 0
}
],
"list_b": [
{
"car": 0
},
{
"bus": 0
}
]
}
"""

Related

How to find and append dictionary values which are present in a list

I have a list which has unique sorted values
arr = ['Adam', 'Ben', 'Chris', 'Dean', 'Flower']
I have a dictionary which has values as such
dict = {
'abc': {'Dean': 1, 'Adam':0, 'Chris':1},
'def': {'Flower':0, 'Ben':1, 'Dean':0}
}
From looking at values from arr I need to have each item and if the value isn't present in subsequent smaller dict that should be assigned a value -1
Result
dict = {
'abc': {'Adam':0, 'Ben':-1, 'Chris':1, 'Dean': 1, 'Flower':-1},
'def': {'Adam':-1, 'Ben':1, 'Chris':-1, 'Dean': 0, 'Flower':0}
}
how can I achieve this using list and dict comprehensions in python
dd = {
key: {k: value.get(k, -1) for k in arr}
for key, value in dd.items()
}
{k: value.get(k, -1) for k in arr} will make sure that your keys are in the same order as you defined in the arr list.
A side note on the order of keys in dictionary.
Dictionaries preserve insertion order. Note that updating a key does
not affect the order. Keys added after deletion are inserted at the
end.
Changed in version 3.7: Dictionary order is guaranteed to be insertion
order. This behavior was an implementation detail of CPython from 3.6.
Please do not make a variable called dict, rename it to dct or something since dict it is a reserved python internal.
As for your question: just iterate through your dct and add the missing keys using setdefault:
arr = ['Adam', 'Ben', 'Chris', 'Dean', 'Flower']
dct = {
'abc': {'Dean': 1, 'Adam':0, 'Chris':1},
'def': {'Flower':0, 'Ben':1, 'Dean':0}
}
def add_dict_keys(dct, arr):
for key in arr:
dct.setdefault(key, -1)
return dct
for k, v in dct.items():
add_dict_keys(v, arr)
print(dct) # has updated values

Efficiently filtering nested list of dictionary

I have a nested list of dictionary like follows:
list_of_dict = [
{
"key": "key1",
"data": [
{
"u_key": "u_key_1",
"value": "value_1"
},
{
"u_key": "u_key_2",
"value": "value_2"
}
]
},
{
"key": "key2",
"data": [
{
"u_key": "u_key_1",
"value": "value_3"
},
{
"u_key": "u_key_2",
"value": "value_4"
}
]
}
]
As you can see list_of_dict is a list of dict and inside that, data is also a list of dict. Assume that all the objects inside list_of_dict and data has similar structure and all the keys are always present.
In the next step I convert list_of_dict to list_of_tuples, where first element of tuple is key followed by all the values against value key inside data
list_of_tuples = [
('key1', 'value_1'),
('key1', 'value_2'),
('key2', 'value_3'),
('key2','value_4')
]
The final step is comparison with a list(comparison_list). List contains string values. The values inside the list CAN be from the value key inside data. I need to check if any value inside comparison_list is inside list_of_tuples and fetch the key(first item of tuple) of that value.
comparison_list = ['value_1', 'value_2']
My expected output is:
out = ['key1', 'key1']
My solution is follows:
>>> list_of_tuples = [(c.get('key'),x.get('value'))
for c in list_of_dict for x in c.get('data')]
>>> for t in list_of_tuple:
if t[1] in comparison_list:
print("Found: {}".format(t[0]))
So summary of problem is that I have list of values(comparison_list) which I need to find inside data array.
The dataset that I am operating on is quite huge(>100M). I am looking to speed up my solution and also make it more compact and readable.
Can I somehow skip the step where I create list_of_tuples and do the comparison directly?
There are a few simple optimization you can try:
make comparison_list a set so the lookup is O(1) instead of O(n)
make list_of_tuples a generator, so you don't have to materialize all the entries at once
you can also integrate the condition into the generator itself
Example:
comparison_set = set(['value_1', 'value_2'])
tuples_generator = ((c['key'], x['value'])
for c in list_of_dict for x in c['data']
if x['value'] in comparison_set)
print(*tuples_generator)
# ('key1', 'value_1') ('key1', 'value_2')
Of course, you can also keep the comparison separate from the generator:
tuples_generator = ((c['key'], x['value'])
for c in list_of_dict for x in c['data'])
for k, v in tuples_generator:
if v in comparison_set:
print(k, v)
Or you could instead create a dict mapping values from comparison_set to keys from list_of_dicts. This will make finding the key to a particular value faster, but note that you can then only keep one key to each value.
values_dict = {x['value']: c['key']
for c in list_of_dict for x in c['data']
if x['value'] in comparison_set}
print(values_dict)
# {'value_2': 'key1', 'value_1': 'key1'}
In last step you can use filter something like this instead of iterating over that:
comparison_list = ['value_1', 'value_2']
print(list(filter(lambda x:x[1] in comparison_list,list_of_tuples)))
output:
[('key1', 'value_1'), ('key1', 'value_2')]

Independent objects needed when zipping into dict

I have a list of items:
my_list = ['first', 'second', 'third']
I need to convert this items into a dictionary (so I can access each element by it's name), and associate multiple counters to each element: counter1, counter2, counter3.
So I do the following:
counter_dict = {'counter1': 0, 'counter2': 0, 'counter3': 0}
my_dict = dict(zip(mylist, [counter_dict]*len(mylist)))
This way I obtain a nested dictionary
{
'first': {
'counter1': 0,
'counter2': 0,
'counter3': 0
},
'second': {
'counter1': 0,
'counter2': 0,
'counter3': 0
},
'third': {
'counter1': 0,
'counter2': 0,
'counter3': 0
}
}
The problem here is that each counter dictionary is not independent, so when I update one counter, all of them are updated.
The following line not only updates the counter2 of second but also updates counter2 of first and third
my_dict['second']['counter2'] += 1
{
'first': {
'counter1': 0,
'counter2': 1,
'counter3': 0
},
'second': {
'counter1': 0,
'counter2': 1,
'counter3': 0
},
'third': {
'counter1': 0,
'counter2': 1,
'counter3': 0
}
}
I am aware that by doing [counter_dict]*len(mylist) I am pointing all dictionaries to a single one.
Question is, how can I achieve what I need creating independent dictionaries?
At the end I need to:
Acces each element of the list as a key
For each element have multiple counters that I can update independently
Thanks a lot
You can use copy.deepcopy to copy a dict and copy its contents.
import copy
Instead of [counter_dict]*len(mylist)
use:
[copy.deepcopy(counter_dict) for _ in mylist]
Try this one:
my_dict = {k: counter_dict.copy() for k in my_list}
Instead of zipping values - it's enough to iterate over keys and copy your counters to values using dict compression.
But note, that this will work only with one level counter_dict. If needed nested dictionary - use deepcopy from copy package:
from copy import deepcopy
my_dict = {k: deepcopy(counter_dict) for k in my_list}
Another aproach - using defaultdict from collections, so you dont even need to create dictionary with predefined counters:
from collections import defaultdict
my_dict = {k: defaultdict(int) for k in my_list}
Use a list comprehension, not the * operator, and create a new dictionary for each element of the list.
my_dict = dict(zip(mylist, [dict(counter_dict) for _ in mylist])
You can go and make a function that will create a counter_dict whenever you want to assign one:
def create_counter_dict(length):
return {'counter{}'.format(n+1) : 0 for n in range(length)}
Then when you want to assign a counter_dict to your my_dict:
my_dict = dict(zip(my_list, [create_counter_dict(3) for _ in range(len(my_list))]))
This way you don't have to make a copy of your dictionary, and you can alter this factory function to your needs.

Python Collections Counter for a List of Dictionaries

I have a dynamically growing list of arrays that I would like to add like values together. Here's an example:
{"something" : [{"one":"200"}, {"three":"400"}, {"one":"100"}, {"two":"800"} ... ]}
I'd like to be able to add together the dictionaries inside the list. So, in this case for the key "something", the result would be:
["one":400, "three": 400, "two": 800]
or something to that effect. I'm familiar with the Python's collection counter, but since the "something" list contains dicts, it will not work (unless I'm missing something). The dict is also being dynamically created, so I can't build the list without the dicts. EG:
Counter({'b':3, 'c':4, 'd':5, 'b':2})
Would normally work, but as soon as I try to add an element, the previous value will be overwritten. I've noticed other questions such as these:
Is there any pythonic way to combine two dicts (adding values for keys that appear in both)?
Python count of items in a dictionary of lists
But again, the objects within the list are dicts.
I think this does what you want, but I'm not sure because I don't know what "The dict is also being dynamically created, so I can't build the list without the dicts" means. Still:
input = {
"something" : [{"one":"200"}, {"three":"400"}, {"one":"100"}, {"two":"800"}],
"foo" : [{"a" : 100, "b" : 200}, {"a" : 300, "b": 400}],
}
def counterize(x):
return Counter({k : int(v) for k, v in x.iteritems()})
counts = {
k : sum((counterize(x) for x in v), Counter())
for k, v in input.iteritems()
}
Result:
{
'foo': Counter({'b': 600, 'a': 400}),
'something': Counter({'two': 800, 'three': 400, 'one': 300})
}
I expect using sum with Counter is inefficient (in the same way that using sum with strings is so inefficient that Guido banned it), but I might be wrong. Anyway, if you have performance problems, you could write a function that creates a Counter and repeatedly calls += or update on it:
def makeints(x):
return {k : int(v) for k, v in x.iteritems()}
def total(seq):
result = Counter()
for s in seq:
result.update(s)
return result
counts = {k : total(makeints(x) for x in v) for k, v in input.iteritems()}
One way would be do as follows:
from collections import defaultdict
d = {"something" :
[{"one":"200"}, {"three":"400"}, {"one":"100"}, {"two":"800"}]}
dd = defaultdict(list)
# first get and group values from the original data structure
# and change strings to ints
for inner_dict in d['something']:
for k,v in inner_dict.items():
dd[k].append(int(v))
# second. create output dictionary by summing grouped elemetns
# from the first step.
out_dict = {k:sum(v) for k,v in dd.items()}
print(out_dict)
# {'two': 800, 'one': 300, 'three': 400}
In here I don't use counter, but defaultdict. Its a two step approach.

How do I find an item in an array of dictionaries?

Suppose I have this:
list = [ { 'p1':'v1' } ,{ 'p2':'v2' } ,{ 'p3':'v3' } ]
I need to find p2 and get its value.
You can try the following ... That will return all the values equivilant to the givenKey in all dictionaries.
ans = [d[key] for d in list if d.has_key(key)]
If this is what your actual code looks like (each key is unique), you should just use one dictionary:
things = { 'p1':'v1', 'p2':'v2', 'p3':'v3' }
do_something(things['p2'])
You can convert a list of dictionaries to one dictionary by merging them with update (but this will overwrite duplicate keys):
dict = {}
for item in list:
dict.update(item)
do_something(dict['p2'])
If that's not possible, you'll need to just loop through them:
for item in list:
if 'p2' in item:
do_something(item['p2'])
If you expect multiple results, you can also build up a list:
p2s = []
for item in list:
if 'p2' in item:
p2s.append(item['p2'])
Also, I wouldn't recommend actually naming any variables dict or list, since that will cause problems with the built-in dict() and list() functions.
These shouldn't be stored in a list to begin with, they should be stored in a dictionary. Since they're stored in a list, though, you can either search them as they are:
lst = [ { 'p1':'v1' } ,{ 'p2':'v2' } ,{ 'p3':'v3' } ]
p2 = next(d["p2"] for d in lst if "p2" in d)
Or turn them into a dictionary:
dct = {}
any(dct.update(d) for d in lst)
p2 = dct["p2"]
You can also use this one-liner:
filter(lambda x: 'p2' in x, list)[0]['p2']
if you have more than one 'p2', this will pick out the first; if you have none, it will raise IndexError.
for d in list:
if d.has_key("p2"):
return d['p2']
If it's a oneoff lookup, you can do something like this
>>> [i['p2'] for i in my_list if 'p2' in i]
['v2']
If you need to look up multiple keys, you should consider converting the list to something that can do key lookups in constant time (such as a dict)
>>> my_list = [ { 'p1':'v1' } ,{ 'p2':'v2' } ,{ 'p3':'v3' } ]
>>> my_dict = dict(i.popitem() for i in my_list)
>>> my_dict['p2']
'v2'
Start by flattening the list of dictionaries out to a dictionary, then you can index it by key and get the value:
{k:v for x in list for k,v in x.iteritems()}['p2']

Categories