Consider you have JSON structured like below:
{
"valueA": "2",
"valueB": [
{
"key1": "value1"
},
{
"key2": "value2"
},
{
"key3": "value3"
}
]
}
and when doing something like:
dict_new = {key:value for (key,value) in dict['valueB'] if key == 'key2'}
I get:
ValueError: need more than 1 value to unpack
Why and how to fix it?
dict['valueB'] is a list of dictionaries. You need another layer of nesting for your code to work, and since you are looking for one key, you need to produce a list here (keys must be unique in a dictionary):
values = [value for d in dict['valueB'] for key, value in d.items() if key == 'key2']
If you tried to make a dictionary of key2: value pairs, you will only have the last pair left, as the previous values have been replaced by virtue of having been associated with the same key.
Better still, just grab that one key, no need to loop over all items if you just wanted that one key:
values = [d['key2'] for d in dict['valueB'] if 'key2' in d]
This filters on the list of dictionaries in the dict['valueB'] list; if 'key2' is a key in that nested dictionary, we extract it.
Related
I have a list of dictionaries and I want to filter it by values from other dictionary.
orig_list = [{"name":"Peter","last_name":"Wick","mail":"Peter#mail.com","number":"111"},
{"name":"John","last_name":"Hen","mail":"John#mail.com","number":"222"},
{"name":"Jack","last_name":"Malm","mail":"Jack#mail.com","number":"542"},
{"name":"Anna","last_name":"Hedge","mail":"Anna#mail.com"},
{"name":"Peter","last_name":"Roesner","mail":"Peter2#mail.com","number":"445"},
{"name":"Tino","last_name":"Tes","mail":"Tino#mail.com","number":"985"},]
expected result example 1:
filter = {"name":"Peter"}
orig_list[{"name":"Peter","last_name":"Wick","mail":"Peter#mail.com","number":"111"},
{"name":"Peter","last_name":"Roesner","mail":"Peter2#mail.com","number":"445"}]
expected result example 2:
filter = {"name":"Peter","number":"445"}
orig_list[
{"name":"Peter","last_name":"Roesner","mail":"Peter2#mail.com","number":"445"}]
The filter can have multiple keys. possible keys are(name,last_name,number).
Basically what I want, is to go through the list of dict and check every dict if the dict contains key from given filter and if it does, check if the key values match. If they dont, remove the whole dict from the list of dict.
The final list does not have to be the orig_list. It can be a new list. So its not mandatory to delete dicts from the orig_list. The dicts can be also copied to new list of dicts.
You can use list comprehension:
orig_list = [{"name":"Peter","last_name":"Wick","mail":"Peter#mail.com","number":"111"},
{"name":"John","last_name":"Hen","mail":"John#mail.com","number":"222"},
{"name":"Jack","last_name":"Malm","mail":"Jack#mail.com","number":"542"},
{"name":"Anna","last_name":"Hedge","mail":"Anna#mail.com"},
{"name":"Peter","last_name":"Roesner","mail":"Peter2#mail.com","number":"445"},
{"name":"Tino","last_name":"Tes","mail":"Tino#mail.com","number":"985"},]
filter_by = {"name":"Peter"}
result = [dic for dic in orig_list if all(key in dic and dic[key] == val for key, val in filter_by.items())]
print(result)
Output:
[
{
"name": "Peter",
"last_name": "Wick",
"mail": "Peter#mail.com",
"number": "111"
},
{
"name": "Peter",
"last_name": "Roesner",
"mail": "Peter2#mail.com",
"number": "445"
}
]
For filter_by = {"name":"Peter","number":"445"} you get:
[
{
"name": "Peter",
"last_name": "Roesner",
"mail": "Peter2#mail.com",
"number": "445"
}
]
If you are certain that all the filter dictionary's keys exist in the other dictionaries, you could write the search like this (in other words, absent keys will be considered to match):
filterDict = {"name":"Peter"}
result = [ d for d in orig_list if {**d,**filterDict} == d ]
If the absent keys are not matches, you could do this:
result = [d for d in orig_list if {*filterDict.items()}<={*d.items()}]
I'm trying to compare two large dictionaries that describe the contents of product catalogs. Each dictionary consists of a unique, coded key and a list of terms for each key.
dict1 = {
"SKU001": ["Plumbing", "Pumps"],
"SKU002": ["Motors"],
"SKU003": ["Snow", "Blowers"],
"SKU004": ["Pnuematic", "Hose", "Pumps"],
...
}
dict2 = {
"FAS001": ["Pnuematic", "Pumps"],
"GRA001": ["Lawn", "Mowers"],
"FAS002": ["Servo", "Motors"],
"FAS003": ["Hose"],
"GRA002": ["Snow", "Shovels"],
"GRA003": ["Water", "Pumps"]
...
}
I want to create a new dictionary that borrows the keys from dict1 and whose values are a list of keys from dict2 where at least one of their term values match. The ideal end result may resemble this:
match_dict = {
"SKU001": ["FAS001", "GRA003"],
"SKU002": ["FAS002"],
"SKU003": ["GRA002"],
"SKU004": ["FAS001", "FAS003", "GRA003],
...
}
I'm having issues creating this output though. Is it possible to create a list of keys and assign it as a value to another key? I've made a few attempts using nested loops like below, but the output isn't as desired and I'm unsure if it's even working properly. Any help is appreciated!
matches = {}
for key, values in dict1.items():
for value in values:
if value in dict2.values():
matches[key] = value
print(matches)
This is one possible implementation:
dict1 = {
"SKU001": ["Plumbing", "Pumps"],
"SKU002": ["Motors"],
"SKU003": ["Snow", "Blowers"],
"SKU004": ["Pnuematic", "Hose", "Pumps"],
}
dict2 = {
"FAS001": ["Pnuematic", "Pumps"],
"GRA001": ["Lawn", "Mowers"],
"FAS002": ["Servo", "Motors"],
"FAS003": ["Hose"],
"GRA002": ["Snow", "Shovels"],
"GRA003": ["Water", "Pumps"]
}
match_dict_test = {
"SKU001": ["FAS001", "GRA003"],
"SKU002": ["FAS002"],
"SKU003": ["GRA002"],
"SKU004": ["FAS001", "FAS003", "GRA003"],
}
# Find keys for each item in dict2
dict2_reverse = {}
for k, v in dict2.items():
for item in v:
dict2_reverse.setdefault(item, []).append(k)
# Build dict of matches
match_dict = {}
for k, v in dict1.items():
# Keys in dict2 associated to each item
keys2 = (dict2_reverse.get(item, []) for item in v)
# Save sorted list of keys from dict2 without repetitions
match_dict[k] = sorted(set(k2i for k2 in keys2 for k2i in k2))
# Check result
print(match_dict == match_dict_test)
# True
Assuming that dict1 and dict2 can have duplicate value entries, you would need to build an intermediate multi-map dictionary and also handle uniqueness of the expanded value list for each SKU:
mapDict = dict()
for prod,attributes in dict2.items():
for attribute in attributes:
mapDict.setdefault(attribute,[]).append(prod)
matchDict = dict()
for sku,attributes in dict1.items():
for attribute in attributes:
matchDict.setdefault(sku,set()).update(mapDict.get(attribute,[]))
matchDict = { sku:sorted(prods) for sku,prods in matchDict.items() }
print(matchDict)
{'SKU001': ['FAS001', 'GRA003'], 'SKU002': ['FAS002'], 'SKU003': ['GRA002'], 'SKU004': ['FAS001', 'FAS003', 'GRA003']}
I have dicts of two types representing same data. These are consumed by two different channels hence their keys are different.
for example:
Type A
{
"key1": "value1",
"key2": "value2",
"nestedKey1" : {
"key3" : "value3",
"key4" : "value4"
}
}
Type B
{
"equiKey1": "value1",
"equiKey2": "value2",
"equinestedKey1.key3" : "value3",
"equinestedKey1.key4" : "value4"
}
I want to map data from Type B to type A.
currently i am creating it as below
{
"key1": typeBObj.get("equiKey1"),
.....
}
Is there a better and faster way to do that in Python
First, you need a dictionary mapping keys in B to keys (or rather lists of keys) in A. (If the keys follow the pattern from your question, or a similar pattern, this dict might also be generated.)
B_to_A = {
"equiKey1": ["key1"],
"equiKey2": ["key2"],
"equinestedKey1.key3" : ["nestedKey1", "key3"],
"equinestedKey1.key4" : ["nestedKey1", "key4"]
}
Then you can define a function for translating those keys.
def map_B_to_A(d):
res = {}
for key, val in B.items():
r = res
*head, last = B_to_A[key]
for k in head:
r = res.setdefault(k, {})
r[last] = val
return res
print(map_B_to_A(B) == A) # True
Or a bit shorter, but probably less clear, using reduce:
def map_B_to_A(d):
res = {}
for key, val in B.items():
*head, last = B_to_A[key]
reduce(lambda d, k: d.setdefault(k, {}), head, res)[last] = val
return res
I have a nested list of dictionary like follows:
list_of_dict = [
{
"key": "key1",
"data": [
{
"u_key": "u_key_1",
"value": "value_1"
},
{
"u_key": "u_key_2",
"value": "value_2"
}
]
},
{
"key": "key2",
"data": [
{
"u_key": "u_key_1",
"value": "value_3"
},
{
"u_key": "u_key_2",
"value": "value_4"
}
]
}
]
As you can see list_of_dict is a list of dict and inside that, data is also a list of dict. Assume that all the objects inside list_of_dict and data has similar structure and all the keys are always present.
In the next step I convert list_of_dict to list_of_tuples, where first element of tuple is key followed by all the values against value key inside data
list_of_tuples = [
('key1', 'value_1'),
('key1', 'value_2'),
('key2', 'value_3'),
('key2','value_4')
]
The final step is comparison with a list(comparison_list). List contains string values. The values inside the list CAN be from the value key inside data. I need to check if any value inside comparison_list is inside list_of_tuples and fetch the key(first item of tuple) of that value.
comparison_list = ['value_1', 'value_2']
My expected output is:
out = ['key1', 'key1']
My solution is follows:
>>> list_of_tuples = [(c.get('key'),x.get('value'))
for c in list_of_dict for x in c.get('data')]
>>> for t in list_of_tuple:
if t[1] in comparison_list:
print("Found: {}".format(t[0]))
So summary of problem is that I have list of values(comparison_list) which I need to find inside data array.
The dataset that I am operating on is quite huge(>100M). I am looking to speed up my solution and also make it more compact and readable.
Can I somehow skip the step where I create list_of_tuples and do the comparison directly?
There are a few simple optimization you can try:
make comparison_list a set so the lookup is O(1) instead of O(n)
make list_of_tuples a generator, so you don't have to materialize all the entries at once
you can also integrate the condition into the generator itself
Example:
comparison_set = set(['value_1', 'value_2'])
tuples_generator = ((c['key'], x['value'])
for c in list_of_dict for x in c['data']
if x['value'] in comparison_set)
print(*tuples_generator)
# ('key1', 'value_1') ('key1', 'value_2')
Of course, you can also keep the comparison separate from the generator:
tuples_generator = ((c['key'], x['value'])
for c in list_of_dict for x in c['data'])
for k, v in tuples_generator:
if v in comparison_set:
print(k, v)
Or you could instead create a dict mapping values from comparison_set to keys from list_of_dicts. This will make finding the key to a particular value faster, but note that you can then only keep one key to each value.
values_dict = {x['value']: c['key']
for c in list_of_dict for x in c['data']
if x['value'] in comparison_set}
print(values_dict)
# {'value_2': 'key1', 'value_1': 'key1'}
In last step you can use filter something like this instead of iterating over that:
comparison_list = ['value_1', 'value_2']
print(list(filter(lambda x:x[1] in comparison_list,list_of_tuples)))
output:
[('key1', 'value_1'), ('key1', 'value_2')]
I wish to remove keys and values in one JSON dictionary based on another JSON dictionary's keys and values. In a sense I am looking perform a "subtraction". Let's say I have JSON dictionaries a and b:
a = {
"my_app":
{
"environment_variables":
{
"SOME_ENV_VAR":
[
"/tmp",
"tmp2"
]
},
"variables":
{ "my_var": "1",
"my_other_var": "2"
}
}
}
b = {
"my_app":
{
"environment_variables":
{
"SOME_ENV_VAR":
[
"/tmp"
]
},
"variables":
{ "my_var": "1" }
}
}
Imagine you could do a-b=c where c looks like this:
c = {
"my_app":
{
"environment_variables":
{
"SOME_ENV_VAR":
[
"/tmp2"
]
},
"variables":
{ "my_other_var": "2" }
}
}
How can this be done?
You can loop through your dictionary using for key in dictionary: and you can delete keys using del dictionary[key], I think that's all you need. See the documentation for dictionaries: https://docs.python.org/2/tutorial/datastructures.html#dictionaries
The way you can do it is to:
Create copy of a -> c;
Iterate over every key, value pair inside b;
Check if for same top keys you have same inner keys and values and delete them from c;
Remove keys with empty values.
You should modify code, if your case will be somehow different (no dict(dict), etc).
print(A)
print(B)
C = A.copy()
# INFO: Suppose your max depth is as follows: "A = dict(key:dict(), ...)"
for k0, v0 in B.items():
# Look for similiar outer keys (check if 'vars' or 'env_vars' in A)
if k0 in C:
# Look for similiar inner (keys, values)
for k1, v1 in v0.items():
# If we have e.g. 'my_var' in B and in C and values are the same
if k1 in C[k0] and v1 == C[k0][k1]:
del C[k0][k1]
# Remove empty 'vars', 'env_vars'
if not C[k0]:
del C[k0]
print(C)
{'environment_variables': {'SOME_ENV_VAR': ['/tmp']},
'variables': {'my_var': '2', 'someones_var': '1'}}
{'environment_variables': {'SOME_ENV_VAR': ['/tmp']},
'variables': {'my_var': '2'}}
{'variables': {'someones_var': '1'}}
The following does what you need:
def subtract(a, b):
result = {}
for key, value in a.items():
if key not in b or b[key] != value:
if not isinstance(value, dict):
if isinstance(value, list):
result[key] = [item for item in value if item not in b[key]]
else:
result[key] = value
continue
inner_dict = subtract(value, b[key])
if len(inner_dict) > 0:
result[key] = inner_dict
return result
It checks if both key and value are present. It could del items, but I think is much better to return a new dict with the desired data instead of modifying the original.
c = subtract(a, b)
UPDATE
I have just updated for the latest version of the data provided by in the question. Now it 'subtract' list values as well.
UPDATE 2
Working example: ipython notebook