How do I find an item in an array of dictionaries? - python

Suppose I have this:
list = [ { 'p1':'v1' } ,{ 'p2':'v2' } ,{ 'p3':'v3' } ]
I need to find p2 and get its value.

You can try the following ... That will return all the values equivilant to the givenKey in all dictionaries.
ans = [d[key] for d in list if d.has_key(key)]

If this is what your actual code looks like (each key is unique), you should just use one dictionary:
things = { 'p1':'v1', 'p2':'v2', 'p3':'v3' }
do_something(things['p2'])
You can convert a list of dictionaries to one dictionary by merging them with update (but this will overwrite duplicate keys):
dict = {}
for item in list:
dict.update(item)
do_something(dict['p2'])
If that's not possible, you'll need to just loop through them:
for item in list:
if 'p2' in item:
do_something(item['p2'])
If you expect multiple results, you can also build up a list:
p2s = []
for item in list:
if 'p2' in item:
p2s.append(item['p2'])
Also, I wouldn't recommend actually naming any variables dict or list, since that will cause problems with the built-in dict() and list() functions.

These shouldn't be stored in a list to begin with, they should be stored in a dictionary. Since they're stored in a list, though, you can either search them as they are:
lst = [ { 'p1':'v1' } ,{ 'p2':'v2' } ,{ 'p3':'v3' } ]
p2 = next(d["p2"] for d in lst if "p2" in d)
Or turn them into a dictionary:
dct = {}
any(dct.update(d) for d in lst)
p2 = dct["p2"]

You can also use this one-liner:
filter(lambda x: 'p2' in x, list)[0]['p2']
if you have more than one 'p2', this will pick out the first; if you have none, it will raise IndexError.

for d in list:
if d.has_key("p2"):
return d['p2']

If it's a oneoff lookup, you can do something like this
>>> [i['p2'] for i in my_list if 'p2' in i]
['v2']
If you need to look up multiple keys, you should consider converting the list to something that can do key lookups in constant time (such as a dict)
>>> my_list = [ { 'p1':'v1' } ,{ 'p2':'v2' } ,{ 'p3':'v3' } ]
>>> my_dict = dict(i.popitem() for i in my_list)
>>> my_dict['p2']
'v2'

Start by flattening the list of dictionaries out to a dictionary, then you can index it by key and get the value:
{k:v for x in list for k,v in x.iteritems()}['p2']

Related

Merge multiple key in dictionary with same values inside an array

I need to merge three dictionaries with array values into a single dict based on if they have same values inside the array. The dictionaries is like this:
data = {
'A1':['Cheese','Cupcake', 'Salad','Sandwich'],
'A2':['Cheese','Cupcake', 'Pasta','Pudding'],
'A3':['Pudding','Pasta', 'Salad','Sandwich']
}
Then, the output would be like this:
{
'A1,A2':['Cheese','Cupcake']
'A1,A3':['Salad', 'Sandwich']
'A2,A3':['Pudding','Pasta']
}
I've tried this:
tmp = {}
for key, value in data.items():
if value in tmp:
tmp[value].append(key)
else:
tmp[value] = [ key ]
print(tmp)
But it only works if the values isn't a list or array. Any solution?
Given you use case, you could use itertools.combinations and set intersection:
data = {
'A1':['Cheese','Cupcake', 'Salad','Sandwich'],
'A2':['Cheese','Cupcake', 'Pasta','Pudding'],
'A3':['Pudding','Pasta', 'Salad','Sandwich']
}
from itertools import combinations
out = {f'{a},{b}': list(set(data[a]).intersection(data[b]))
for a,b in combinations(data, 2)}
output:
{'A1,A2': ['Cupcake', 'Cheese'],
'A1,A3': ['Sandwich', 'Salad'],
'A2,A3': ['Pasta', 'Pudding']}

Python add key to list of dictionary if key doesn't exist

I want to add the key 'Name' to list of dictionaries in whichever dictionary 'Name' doesn't exist.
For example,
[dict(item, **{'Name': 'apple'}) for item in d_list]
will update value of key 'Name' even if key already exists and
[dict(item, **{'Name': 'apple'}) for item in d_list if 'Name' not in item]
returns empty list
You need to handle the two different cases. In case the list is empty, and if it's not.
It's not possible to handle both use-cases in a single list comprehension statement since when the list is empty, it will always return zero-value (empty list). It is like doing for i in my_list. If the list is empty, the code inside the for block won't be executed.
I would tackle it with a single loop. I find it more readable.
>>> default = {"Name": "apple"}
>>> miss_map = {"Data": "text"}
>>> exist_map = {"Name": "pie"}
>>>
>>> d = [miss_map, exist_map]
>>>
>>> list_dict = [miss_map, exist_map]
>>> for d in list_dict:
... if "Name" not in d.keys():
... d.update(default)
...
>>> list_dict
[{'Data': 'text', 'Name': 'apple'}, {'Name': 'pie'}]
>>>
You can then move it to it's own function and pass it the list of dicts.
In one line of code:
d_list = [{**d, "Name": "apple"} for d in d_list if "Name" not in d] + [d for d in d_list if "Name" in d]
Based on Abdul Aziz comment, I could do it in 1 line using
[item.setdefault("Name", 'apple') for item in d_list]

Creating Dictionaries from Lists inside of Dictionaries

I'm quite new to Python and I have been stumped by a seemingly simple task.
In part of my program, I would like to create Secondary Dictionaries from the values inside of lists, of which they are values of a Primary Dictionary.
I would also like to default those values to 0
For the sake of simplicity, the Primary Dictionary looks something like this:
primaryDict = {'list_a':['apple', 'orange'], 'list_b':['car', 'bus']}
What I would like my result to be is something like:
{'list_a':[{'apple':0}, {'orange':0}], 'list_b':[{'car':0}, {'bus':0}]}
I understand the process should be to iterate through each list in the primaryDict, then iterate through the items in the list and then assign them as Dictionaries.
I've tried many variations of "for" loops all looking similar to:
for listKey in primaryDict:
for word in listKey:
{word:0 for word in listKey}
I've also tried some methods of combining Dictionary and List comprehension,
but when I try to index and print the Dictionaries with, for example:
print(primaryDict['list_a']['apple'])
I get the "TypeError: list indices must be integers or slices, not str", which I interpret that my 'apple' is not actually a Dictionary, but still a string in a list. I tested that by replacing 'apple' with 0 and it just returns 'apple', proving it true.
I would like help with regards to:
-Whether or not the values in my list are assigned as Dictionaries with value '0'
or
-Whether the mistake is in my indexing (in the loop or the print function), and what I am mistaken with
or
-Everything I've done won't get me the desired outcome and I should attempt a different approach
Thanks
Here is a dict comprehension that works:
{k: [{v: 0} for v in vs] for k, vs in primaryDict.items()}
There are two problems with your current code. First, you are trying to iterate over listKey, which is a string. This produces a sequence of characters.
Second, you should use something like
[{word: 0} for word in words]
in place of
{word:0 for word in listKey}
You are close. The main issue is the way you iterate your dictionary, and the fact you do not append or assign your sub-dictionaries to any variable.
This is one solution using only for loops and list.append.
d = {}
for k, v in primaryDict.items():
d[k] = []
for w in v:
d[k].append({w: 0})
{'list_a': [{'apple': 0}, {'orange': 0}],
'list_b': [{'car': 0}, {'bus': 0}]}
A more Pythonic solution is to use a single list comprehension.
d = {k: [{w: 0} for w in v] for k, v in primaryDict.items()}
If you are using your dictionary for counting, which seems to be the implication, an even more Pythonic solution is to use collections.Counter:
from collections import Counter
d = {k: Counter(dict.fromkeys(v, 0)) for k, v in primaryDict.items()}
{'list_a': Counter({'apple': 0, 'orange': 0}),
'list_b': Counter({'bus': 0, 'car': 0})}
There are specific benefits attached to collections.Counter relative to normal dictionaries.
You can get the data structure that you desire via:
primaryDict = {'list_a':['apple', 'orange'], 'list_b':['car', 'bus']}
for k, v in primaryDict.items():
primaryDict[k] = [{e: 0} for e in v]
# primaryDict
{'list_b': [{'car': 0}, {'bus': 0}], 'list_a': [{'apple': 0}, {'orange': 0}]}
But the correct nested access would be:
print(primaryDict['list_a'][0]['apple']) # note the 0
If you actually want primaryDict['list_a']['apple'] to work, do instead
for k, v in primaryDict.items():
primaryDict[k] = {e: 0 for e in v}
# primaryDict
{'list_b': {'car': 0, 'bus': 0}, 'list_a': {'orange': 0, 'apple': 0}}
primaryDict = {'list_a':['apple', 'orange'], 'list_b':['car', 'bus']}
for listKey in primaryDict:
primaryDict[i] = [{word:0} for word in primaryDict[listKey]]
print(primaryDict)
Output:
{'list_a':[{'apple':0}, {'orange':0}], 'list_b':[{'car':0}, {'bus':0}]}
Hope this helps!
#qqc1037, I checked and updated your code to make it working. I have mentioned the problem with your code as comments. Finally, I have also added one more example using list comprehension, map() & lambda function.
import json
secondaryDict = {}
for listKey in primaryDict:
new_list = [] # You did not define any temporary list
for word in primaryDict [listKey]: # You forgot to use key that refers the list
new_list.append( {word:0}) # Here you forgot to append to list
secondaryDict2.update({listKey: new_list}) # Finally, you forgot to update the secondary dictionary
# Pretty printing dictionary
print(json.dumps(secondaryDict, indent=4));
"""
{
"list_a": [
{
"apple": 0
},
{
"orange": 0
}
],
"list_b": [
{
"car": 0
},
{
"bus": 0
}
]
}
"""
Another example: Using list comprehension, map(), lambda function
# Using Python 3.5.2
import json
primaryDict = {'list_a':['apple', 'orange'], 'list_b':['car', 'bus']}
secondaryDict = dict(map(lambda key: (key, [{item:0} for item in primaryDict[key]]), list(primaryDict) ))
# Pretty printing secondary dictionary
print(json.dumps(secondaryDict, indent=4))
"""
{
"list_a": [
{
"apple": 0
},
{
"orange": 0
}
],
"list_b": [
{
"car": 0
},
{
"bus": 0
}
]
}
"""

Efficiently filtering nested list of dictionary

I have a nested list of dictionary like follows:
list_of_dict = [
{
"key": "key1",
"data": [
{
"u_key": "u_key_1",
"value": "value_1"
},
{
"u_key": "u_key_2",
"value": "value_2"
}
]
},
{
"key": "key2",
"data": [
{
"u_key": "u_key_1",
"value": "value_3"
},
{
"u_key": "u_key_2",
"value": "value_4"
}
]
}
]
As you can see list_of_dict is a list of dict and inside that, data is also a list of dict. Assume that all the objects inside list_of_dict and data has similar structure and all the keys are always present.
In the next step I convert list_of_dict to list_of_tuples, where first element of tuple is key followed by all the values against value key inside data
list_of_tuples = [
('key1', 'value_1'),
('key1', 'value_2'),
('key2', 'value_3'),
('key2','value_4')
]
The final step is comparison with a list(comparison_list). List contains string values. The values inside the list CAN be from the value key inside data. I need to check if any value inside comparison_list is inside list_of_tuples and fetch the key(first item of tuple) of that value.
comparison_list = ['value_1', 'value_2']
My expected output is:
out = ['key1', 'key1']
My solution is follows:
>>> list_of_tuples = [(c.get('key'),x.get('value'))
for c in list_of_dict for x in c.get('data')]
>>> for t in list_of_tuple:
if t[1] in comparison_list:
print("Found: {}".format(t[0]))
So summary of problem is that I have list of values(comparison_list) which I need to find inside data array.
The dataset that I am operating on is quite huge(>100M). I am looking to speed up my solution and also make it more compact and readable.
Can I somehow skip the step where I create list_of_tuples and do the comparison directly?
There are a few simple optimization you can try:
make comparison_list a set so the lookup is O(1) instead of O(n)
make list_of_tuples a generator, so you don't have to materialize all the entries at once
you can also integrate the condition into the generator itself
Example:
comparison_set = set(['value_1', 'value_2'])
tuples_generator = ((c['key'], x['value'])
for c in list_of_dict for x in c['data']
if x['value'] in comparison_set)
print(*tuples_generator)
# ('key1', 'value_1') ('key1', 'value_2')
Of course, you can also keep the comparison separate from the generator:
tuples_generator = ((c['key'], x['value'])
for c in list_of_dict for x in c['data'])
for k, v in tuples_generator:
if v in comparison_set:
print(k, v)
Or you could instead create a dict mapping values from comparison_set to keys from list_of_dicts. This will make finding the key to a particular value faster, but note that you can then only keep one key to each value.
values_dict = {x['value']: c['key']
for c in list_of_dict for x in c['data']
if x['value'] in comparison_set}
print(values_dict)
# {'value_2': 'key1', 'value_1': 'key1'}
In last step you can use filter something like this instead of iterating over that:
comparison_list = ['value_1', 'value_2']
print(list(filter(lambda x:x[1] in comparison_list,list_of_tuples)))
output:
[('key1', 'value_1'), ('key1', 'value_2')]

python create list of dictionaries without reference

I have a requirement in which I need create dictionary objects with duplicate keys embedded into a list object, something like this:
[{ "key": "ABC" },{ "key": "EFG" } ]
I decided to have a top level list initialized to empty like outer_list=[] and a placeholder dictionary object like dict_obj= {}. Next I keep adding elements to my list using the following steps:
assign { "key": "ABC" } to dict_obj using dict_obj["key"]="ABC"
Add this object to the list using outer_list.append(dict_obj)
Flush/pop the key/items in dictionary object using dict_obj.clear()
Repeat steps 1 to 3 based on the number of key/item combinations in my data
Issue: the outer_list object maintains a reference to the original dict_obj and if the dict_obj is flushed or a new key/item is added it changes accordingly. So finally, I end up with this [{ "key": "EFG" },{ "key": "EFG" } ] instead of [{ "key": "ABC" },{ "key": "EFG" } ]
Please guide me with some workarounds if possible.
I think there are two ways to avoid the duplicate references.
The first is to append a copy of the dictionary, instead of a reference to it. dict instances have a copy method, so this is easy. Just change your current append call to:
outer_list.append(dict_obj.copy())`
The other option is to not always use the same dict_obj object, but rather create a separate dictionary object for each entry. In this version, you'd replace your call to dict_obj.clear() with:
dict_obj = {}
For the second approach, you might choose to reorder things rather than doing a straight one-line replacement. You could move the setup code to the start of the loop and get rid of the reset logic at the end of the loop.
That is, change code that looks like this:
outer_list = []
dict_obj = {}
for foo in whatever:
# add stuff to dict_obj
outer_list.append(dict_obj)
dict_obj.clear()
To:
outer_list = []
for foo in whatever:
dict_obj = {}
# add stuff to dict_obj
outer_list.append(dict_obj)
If the logic for creating the inner dictionaries is simple enough to compute, you might even turn the whole thing into a list comprehension:
outer_list = [{"key": value, "key2": value2} for value, value2 in some_sequence]
The following should be self-explanatory:
# object reuse
d = {}
l = []
d['key'] = 'ABC'
l.append(d)
d.clear()
print(l) # [{}] : cleared!
d['key'] = 'EFG'
l.append(d)
print(l) # [{'key': 'EFG'}, {'key': 'EFG'}]
# explicit creation of new objects
d = {}
l = []
d['key'] = 'ABC'
l.append(d)
print(l)
d = {}
d['key'] = 'EFG'
l.append(d)
print(l)

Categories