I wish to remove keys and values in one JSON dictionary based on another JSON dictionary's keys and values. In a sense I am looking perform a "subtraction". Let's say I have JSON dictionaries a and b:
a = {
"my_app":
{
"environment_variables":
{
"SOME_ENV_VAR":
[
"/tmp",
"tmp2"
]
},
"variables":
{ "my_var": "1",
"my_other_var": "2"
}
}
}
b = {
"my_app":
{
"environment_variables":
{
"SOME_ENV_VAR":
[
"/tmp"
]
},
"variables":
{ "my_var": "1" }
}
}
Imagine you could do a-b=c where c looks like this:
c = {
"my_app":
{
"environment_variables":
{
"SOME_ENV_VAR":
[
"/tmp2"
]
},
"variables":
{ "my_other_var": "2" }
}
}
How can this be done?
You can loop through your dictionary using for key in dictionary: and you can delete keys using del dictionary[key], I think that's all you need. See the documentation for dictionaries: https://docs.python.org/2/tutorial/datastructures.html#dictionaries
The way you can do it is to:
Create copy of a -> c;
Iterate over every key, value pair inside b;
Check if for same top keys you have same inner keys and values and delete them from c;
Remove keys with empty values.
You should modify code, if your case will be somehow different (no dict(dict), etc).
print(A)
print(B)
C = A.copy()
# INFO: Suppose your max depth is as follows: "A = dict(key:dict(), ...)"
for k0, v0 in B.items():
# Look for similiar outer keys (check if 'vars' or 'env_vars' in A)
if k0 in C:
# Look for similiar inner (keys, values)
for k1, v1 in v0.items():
# If we have e.g. 'my_var' in B and in C and values are the same
if k1 in C[k0] and v1 == C[k0][k1]:
del C[k0][k1]
# Remove empty 'vars', 'env_vars'
if not C[k0]:
del C[k0]
print(C)
{'environment_variables': {'SOME_ENV_VAR': ['/tmp']},
'variables': {'my_var': '2', 'someones_var': '1'}}
{'environment_variables': {'SOME_ENV_VAR': ['/tmp']},
'variables': {'my_var': '2'}}
{'variables': {'someones_var': '1'}}
The following does what you need:
def subtract(a, b):
result = {}
for key, value in a.items():
if key not in b or b[key] != value:
if not isinstance(value, dict):
if isinstance(value, list):
result[key] = [item for item in value if item not in b[key]]
else:
result[key] = value
continue
inner_dict = subtract(value, b[key])
if len(inner_dict) > 0:
result[key] = inner_dict
return result
It checks if both key and value are present. It could del items, but I think is much better to return a new dict with the desired data instead of modifying the original.
c = subtract(a, b)
UPDATE
I have just updated for the latest version of the data provided by in the question. Now it 'subtract' list values as well.
UPDATE 2
Working example: ipython notebook
Related
How can i get all the keys by recursing the following nested dict.
DICTS = {
'test/test': [
{
'test1/test1': [{'test3/test3': []}],
'test2/test2': [],
'test4/test4': []
}
],
'test8/test8': [
{
'test1/test5': [
{
'test6/test6': []
}
],
'test7/test7': [],
'test7/test7': []
}
],
}
For example call a function by giving the key 'test/test' and get a list of values:
my_recursive_func('test/test')
test1/test1
test3/test3
test2/test2
test4/test4
You basically have two cases:
Case 1 when your dictionary is in another dictionary
Case 2 when your dictionary is in a dictionary array
For every key that you have in your dictionary, you put that key into keys array and recall the function get_keys with the nested dictionary.
If your nested dictionary is a list, you return get_keys() for every item in your list.
def get_keys(dictionary):
keys = []
if isinstance(dictionary, list):
for item in dictionary:
keys.extend(get_keys(item))
elif isinstance(dictionary, dict):
for key in dictionary:
keys.append(key)
keys.extend(get_keys(dictionary[key]))
return keys
print(get_keys(DICTS["test/test"]))
outputs
['test1/test1', 'test3/test3', 'test2/test2', 'test4/test4']
This solution should work for any given structure.
This solution would be valid only for your specific data structure.
def my_recursive_func(data):
result = []
if isinstance(data, list):
for datum in data:
result.extend(my_recursive_func(datum))
elif isinstance(data, dict):
for key, value in data.items():
result.append(key)
result.extend(my_recursive_func(value))
return result
my_recursive_func(DICTS['test/test'])
> ['test1/test1', 'test3/test3', 'test2/test2', 'test4/test4']
I have dicts of two types representing same data. These are consumed by two different channels hence their keys are different.
for example:
Type A
{
"key1": "value1",
"key2": "value2",
"nestedKey1" : {
"key3" : "value3",
"key4" : "value4"
}
}
Type B
{
"equiKey1": "value1",
"equiKey2": "value2",
"equinestedKey1.key3" : "value3",
"equinestedKey1.key4" : "value4"
}
I want to map data from Type B to type A.
currently i am creating it as below
{
"key1": typeBObj.get("equiKey1"),
.....
}
Is there a better and faster way to do that in Python
First, you need a dictionary mapping keys in B to keys (or rather lists of keys) in A. (If the keys follow the pattern from your question, or a similar pattern, this dict might also be generated.)
B_to_A = {
"equiKey1": ["key1"],
"equiKey2": ["key2"],
"equinestedKey1.key3" : ["nestedKey1", "key3"],
"equinestedKey1.key4" : ["nestedKey1", "key4"]
}
Then you can define a function for translating those keys.
def map_B_to_A(d):
res = {}
for key, val in B.items():
r = res
*head, last = B_to_A[key]
for k in head:
r = res.setdefault(k, {})
r[last] = val
return res
print(map_B_to_A(B) == A) # True
Or a bit shorter, but probably less clear, using reduce:
def map_B_to_A(d):
res = {}
for key, val in B.items():
*head, last = B_to_A[key]
reduce(lambda d, k: d.setdefault(k, {}), head, res)[last] = val
return res
I have a nested list of dictionary like follows:
list_of_dict = [
{
"key": "key1",
"data": [
{
"u_key": "u_key_1",
"value": "value_1"
},
{
"u_key": "u_key_2",
"value": "value_2"
}
]
},
{
"key": "key2",
"data": [
{
"u_key": "u_key_1",
"value": "value_3"
},
{
"u_key": "u_key_2",
"value": "value_4"
}
]
}
]
As you can see list_of_dict is a list of dict and inside that, data is also a list of dict. Assume that all the objects inside list_of_dict and data has similar structure and all the keys are always present.
In the next step I convert list_of_dict to list_of_tuples, where first element of tuple is key followed by all the values against value key inside data
list_of_tuples = [
('key1', 'value_1'),
('key1', 'value_2'),
('key2', 'value_3'),
('key2','value_4')
]
The final step is comparison with a list(comparison_list). List contains string values. The values inside the list CAN be from the value key inside data. I need to check if any value inside comparison_list is inside list_of_tuples and fetch the key(first item of tuple) of that value.
comparison_list = ['value_1', 'value_2']
My expected output is:
out = ['key1', 'key1']
My solution is follows:
>>> list_of_tuples = [(c.get('key'),x.get('value'))
for c in list_of_dict for x in c.get('data')]
>>> for t in list_of_tuple:
if t[1] in comparison_list:
print("Found: {}".format(t[0]))
So summary of problem is that I have list of values(comparison_list) which I need to find inside data array.
The dataset that I am operating on is quite huge(>100M). I am looking to speed up my solution and also make it more compact and readable.
Can I somehow skip the step where I create list_of_tuples and do the comparison directly?
There are a few simple optimization you can try:
make comparison_list a set so the lookup is O(1) instead of O(n)
make list_of_tuples a generator, so you don't have to materialize all the entries at once
you can also integrate the condition into the generator itself
Example:
comparison_set = set(['value_1', 'value_2'])
tuples_generator = ((c['key'], x['value'])
for c in list_of_dict for x in c['data']
if x['value'] in comparison_set)
print(*tuples_generator)
# ('key1', 'value_1') ('key1', 'value_2')
Of course, you can also keep the comparison separate from the generator:
tuples_generator = ((c['key'], x['value'])
for c in list_of_dict for x in c['data'])
for k, v in tuples_generator:
if v in comparison_set:
print(k, v)
Or you could instead create a dict mapping values from comparison_set to keys from list_of_dicts. This will make finding the key to a particular value faster, but note that you can then only keep one key to each value.
values_dict = {x['value']: c['key']
for c in list_of_dict for x in c['data']
if x['value'] in comparison_set}
print(values_dict)
# {'value_2': 'key1', 'value_1': 'key1'}
In last step you can use filter something like this instead of iterating over that:
comparison_list = ['value_1', 'value_2']
print(list(filter(lambda x:x[1] in comparison_list,list_of_tuples)))
output:
[('key1', 'value_1'), ('key1', 'value_2')]
Consider you have JSON structured like below:
{
"valueA": "2",
"valueB": [
{
"key1": "value1"
},
{
"key2": "value2"
},
{
"key3": "value3"
}
]
}
and when doing something like:
dict_new = {key:value for (key,value) in dict['valueB'] if key == 'key2'}
I get:
ValueError: need more than 1 value to unpack
Why and how to fix it?
dict['valueB'] is a list of dictionaries. You need another layer of nesting for your code to work, and since you are looking for one key, you need to produce a list here (keys must be unique in a dictionary):
values = [value for d in dict['valueB'] for key, value in d.items() if key == 'key2']
If you tried to make a dictionary of key2: value pairs, you will only have the last pair left, as the previous values have been replaced by virtue of having been associated with the same key.
Better still, just grab that one key, no need to loop over all items if you just wanted that one key:
values = [d['key2'] for d in dict['valueB'] if 'key2' in d]
This filters on the list of dictionaries in the dict['valueB'] list; if 'key2' is a key in that nested dictionary, we extract it.
Suppose I have this:
list = [ { 'p1':'v1' } ,{ 'p2':'v2' } ,{ 'p3':'v3' } ]
I need to find p2 and get its value.
You can try the following ... That will return all the values equivilant to the givenKey in all dictionaries.
ans = [d[key] for d in list if d.has_key(key)]
If this is what your actual code looks like (each key is unique), you should just use one dictionary:
things = { 'p1':'v1', 'p2':'v2', 'p3':'v3' }
do_something(things['p2'])
You can convert a list of dictionaries to one dictionary by merging them with update (but this will overwrite duplicate keys):
dict = {}
for item in list:
dict.update(item)
do_something(dict['p2'])
If that's not possible, you'll need to just loop through them:
for item in list:
if 'p2' in item:
do_something(item['p2'])
If you expect multiple results, you can also build up a list:
p2s = []
for item in list:
if 'p2' in item:
p2s.append(item['p2'])
Also, I wouldn't recommend actually naming any variables dict or list, since that will cause problems with the built-in dict() and list() functions.
These shouldn't be stored in a list to begin with, they should be stored in a dictionary. Since they're stored in a list, though, you can either search them as they are:
lst = [ { 'p1':'v1' } ,{ 'p2':'v2' } ,{ 'p3':'v3' } ]
p2 = next(d["p2"] for d in lst if "p2" in d)
Or turn them into a dictionary:
dct = {}
any(dct.update(d) for d in lst)
p2 = dct["p2"]
You can also use this one-liner:
filter(lambda x: 'p2' in x, list)[0]['p2']
if you have more than one 'p2', this will pick out the first; if you have none, it will raise IndexError.
for d in list:
if d.has_key("p2"):
return d['p2']
If it's a oneoff lookup, you can do something like this
>>> [i['p2'] for i in my_list if 'p2' in i]
['v2']
If you need to look up multiple keys, you should consider converting the list to something that can do key lookups in constant time (such as a dict)
>>> my_list = [ { 'p1':'v1' } ,{ 'p2':'v2' } ,{ 'p3':'v3' } ]
>>> my_dict = dict(i.popitem() for i in my_list)
>>> my_dict['p2']
'v2'
Start by flattening the list of dictionaries out to a dictionary, then you can index it by key and get the value:
{k:v for x in list for k,v in x.iteritems()}['p2']