Nested list of objects get all keys recursively - python

How can i get all the keys by recursing the following nested dict.
DICTS = {
'test/test': [
{
'test1/test1': [{'test3/test3': []}],
'test2/test2': [],
'test4/test4': []
}
],
'test8/test8': [
{
'test1/test5': [
{
'test6/test6': []
}
],
'test7/test7': [],
'test7/test7': []
}
],
}
For example call a function by giving the key 'test/test' and get a list of values:
my_recursive_func('test/test')
test1/test1
test3/test3
test2/test2
test4/test4

You basically have two cases:
Case 1 when your dictionary is in another dictionary
Case 2 when your dictionary is in a dictionary array
For every key that you have in your dictionary, you put that key into keys array and recall the function get_keys with the nested dictionary.
If your nested dictionary is a list, you return get_keys() for every item in your list.
def get_keys(dictionary):
keys = []
if isinstance(dictionary, list):
for item in dictionary:
keys.extend(get_keys(item))
elif isinstance(dictionary, dict):
for key in dictionary:
keys.append(key)
keys.extend(get_keys(dictionary[key]))
return keys
print(get_keys(DICTS["test/test"]))
outputs
['test1/test1', 'test3/test3', 'test2/test2', 'test4/test4']
This solution should work for any given structure.

This solution would be valid only for your specific data structure.
def my_recursive_func(data):
result = []
if isinstance(data, list):
for datum in data:
result.extend(my_recursive_func(datum))
elif isinstance(data, dict):
for key, value in data.items():
result.append(key)
result.extend(my_recursive_func(value))
return result
my_recursive_func(DICTS['test/test'])
> ['test1/test1', 'test3/test3', 'test2/test2', 'test4/test4']

Related

How can I recursively walk 2 dictionaries, and modify original, based on the other?

I'm trying to traverse a dictionary (which has many strings, dicts, lists of dicts), and compare it against another dictionary.
Here's an example:
data = {
"topic": "Seniors' Health Care Freedom Act of 2007",
"foo": "bar",
"last_update": "2011-08-29T20:47:44Z",
"organisations": [
{
"organization_id": "22973",
"name": "National Health Federation",
"bar": "baz"
},
{
"organization_id": "27059",
"name": "A Christian Perspective on Health Issues"
},
]}
validate = {
"topic": None,
"last_update": "next_update",
"organisations": [
{
"organization_id": None,
"name": None
}
]
}
Essentially, if the item exists in "data", but not in "validate" at the current point, it should be deleted from data.
So in this case, I'd want data["foo"] and data["organisations"][x]["bar"] to be removed from the data dict.
Additionally, if the key in validate has a string value and isn't "None", I want to update the key name in data to that, i.e. "last_update" should become "next_update".
I'm not sure of a good way to do this in Python, my current version removes "foo" but I'm struggling trying to remove nested keys like organisations[x][bar].
This is my current attempt:
def func1(data, validate, parent = None):
for k, v in sorted(data.items()):
if not parent:
if k not in validate:
data.pop(k, None)
if isinstance(v, dict):
func1(v, validate)
elif isinstance(v, list):
for val in v:
func1(val, validate, parent = k)
func1(data, validate)
I tried to use something like this to compare the keys instead but figured it doesn't work well if data has additional keys (appeared to remove wrong keys) since dicts are unsorted so wasn't useful for me:
for (k, v), (k2, v2) in zip(sorted(data.items()), sorted(validate.items())):
I've read similar posts such as How to recursively remove certain keys from a multi-dimensional(depth not known) python dictionary?, but this seems to use a flat set to filter so it doesn't take into account where in the dict the key is located which is important for me - as "last_update" can appear in other lists where I need to keep it.
Here is a simple recursive function. Well, it used to be simple; and then I added tons of checks and now it's an if forest.
def validate_the_data(data, validate):
for key in list(data.keys()):
if key not in validate:
del data[key]
elif validate[key] is not None:
if isinstance(data[key], dict):
validate_the_data(data[key], validate[key])
elif isinstance(data[key], list):
for subdata, subvalidate in zip(data[key], validate[key]):
if isinstance(subdata, dict) and isinstance(subvalidate, dict):
validate_the_data(subdata, subvalidate)
else:
data[key] = validate[key]
How it works: if data[key] is a dictionary and key is valid, then we want to check the keys in data[key] against the keys in validate[key]. So we do a recursive call, but instead of putting validate in the recursive call, we put validate[key]. Likewise if data[key] is a list.
Assumptions: The above code will fail if one of the list in data contains elements which are not dictionaries, or if data[key] is a dictionary when validate[key] exists but isn't a dictionary or None, or if data[key] is a list when validate[key] exists but isn't a list or None.
Important note about the if forest: The order of the if/else/if/elif/else matters. In particular, we only execute data[key] = validate[key] in the case where we don't have a list. If validate[key] is a list, then data[key] = validate[key] would result in data[key] becoming the same list, and not a copy of the list, which is most certainly not what you want.
Important note about list(data.keys()): I used the iteration for key in list(data.keys()): and not for key in data: or for key, value in data:. Normally this would not be the preferred way of iterating over a dict. But we use del inside the for loop to remove values from the dictionary, which would interfere with the iteration. So we need to get the list of keys before deleting any element, and then use that list to iterate.
Interesting problem! To prevent multitude of if...else..., you would need to to find an approach which allows recursion regardless of the type of incoming values.
So I presume you need the following rules:
If any value from data is None in validate, value in data should be preserved
If values from data and validate are dictionaries, keep only keys from data if also present in validate, and apply these rules recursively to other keys.
If values from data and validate are lists, keep only items from data if also present in validate, and apply these rules recursively to other items.
If any value from data is not None in validate and rule (2) and (3) don't apply, value in data should be replaced by value in validate
Here is my suggestion:
def sanitize(data1, data2):
"""Sanitize *data1* depending on *data2*
"""
# If value2 is None, simply return value1
if data2 is None:
return data1
# Update value1 recursively if both values are dictionaries.
elif isinstance(data1, dict) and isinstance(data2, dict):
return {
key: sanitize(_value, data2.get(key))
for key, _value in data1.items()
if key in data2
}
# Update value1 recursively if both values are lists.
elif isinstance(data1, list) and isinstance(data2, list):
return [
sanitize(subvalue1, subvalue2)
for subvalue1, subvalue2
in zip(data1, data2)
]
# Otherwise, simply return value2.
return data2
Using your values, you'd get the following output:
> sanitize(data, validate)
{
'topic': "Seniors' Health Care Freedom Act of 2007",
'last_update': 'next_update',
'organisations': [
{
'organization_id': '22973',
'name': 'National Health Federation'
}
]
}
From rule 3, I presumed that you want to delete all list items from data if not present in validate, hence the removal of the second items from "organisations".
It rule 3 should rather be:
If values from data and validate are lists, apply these rules recursively to other items.
Then you can simply replace the zip function by itertools.zip_longest
Dictionary and list comprehensions make quick work of the problem -
def from_schema(t, s):
if isinstance(t, dict) and isinstance(s, dict):
return { v if isinstance(v, str) else k: from_schema(t[k], v) for (k, v) in s.items() if k in t }
elif isinstance(t, list) and isinstance(s, list):
return [ from_schema(v, s[0]) for v in t if s ]
else:
return t
A few line breaks might make the comprehensions more... comprehensible -
def from_schema(t, s):
if isinstance(t, dict) and isinstance(s, dict):
return \
{ v if isinstance(v, str) else k: from_schema(t[k], v)
for (k, v) in s.items()
if k in t
}
elif isinstance(t, list) and isinstance(s, list):
return \
[ from_schema(v, s[0])
for v in t
if s
]
else:
return t
result = from_schema(data, validate)
print(result)
{
"topic": "Seniors' Health Care Freedom Act of 2007",
"next_update": "2011-08-29T20:47:44Z",
"organisations": [
{
"organization_id": "22973",
"name": "National Health Federation"
},
{
"organization_id": "27059",
"name": "A Christian Perspective on Health Issues"
}
]
}

Python Conditional Key/Value in Dict

I have the following piece of code:
payload = [
{
'car': {
'vin': message.car_reference.vin,
'brand': message.car_reference.model_manufacturer,
'model': message.car_reference.model_description,
'color': message.car_reference.color,
},
}
]
The only field on message.car_reference that is guaranteed to not be None is vin.
I still want the other keys (brand, model, color) to be in the dict only if they have a value.
The payload gets send to an external API that gives me an error if e.g. color = None.
How do I make it so that keys and values are only added, if their value is not None?
What came to my mind until now was mutlitple if-statements, but that looks awful and I don't think it's the right way.
This code recursively looks inside the data structure
def recur_remover(collection):
if isinstance(collection, list):
# This allows you to pass in the whole list immediately
for item in collection:
recur_remover(item)
elif isinstance(collection, dict):
# When you hit a dictionary, this checks if there are nested dictionaries
to_delete = []
for key, val in collection.items():
if val is None:
to_delete.append(key)
else:
recur_remover(collection[key])
for k in to_delete:
# deletes all unwanted keys at once instead of mutating the dict each time
del collection[k]
else:
return
If I understand your problem correctly, you may do this
your_car_collection = [{'car': {k: v for k, v in car['car'].items() if v}} for car in your_car_collection]

Create dictionary based on matching terms from two other dictionaries - Python

I'm trying to compare two large dictionaries that describe the contents of product catalogs. Each dictionary consists of a unique, coded key and a list of terms for each key.
dict1 = {
"SKU001": ["Plumbing", "Pumps"],
"SKU002": ["Motors"],
"SKU003": ["Snow", "Blowers"],
"SKU004": ["Pnuematic", "Hose", "Pumps"],
...
}
dict2 = {
"FAS001": ["Pnuematic", "Pumps"],
"GRA001": ["Lawn", "Mowers"],
"FAS002": ["Servo", "Motors"],
"FAS003": ["Hose"],
"GRA002": ["Snow", "Shovels"],
"GRA003": ["Water", "Pumps"]
...
}
I want to create a new dictionary that borrows the keys from dict1 and whose values are a list of keys from dict2 where at least one of their term values match. The ideal end result may resemble this:
match_dict = {
"SKU001": ["FAS001", "GRA003"],
"SKU002": ["FAS002"],
"SKU003": ["GRA002"],
"SKU004": ["FAS001", "FAS003", "GRA003],
...
}
I'm having issues creating this output though. Is it possible to create a list of keys and assign it as a value to another key? I've made a few attempts using nested loops like below, but the output isn't as desired and I'm unsure if it's even working properly. Any help is appreciated!
matches = {}
for key, values in dict1.items():
for value in values:
if value in dict2.values():
matches[key] = value
print(matches)
This is one possible implementation:
dict1 = {
"SKU001": ["Plumbing", "Pumps"],
"SKU002": ["Motors"],
"SKU003": ["Snow", "Blowers"],
"SKU004": ["Pnuematic", "Hose", "Pumps"],
}
dict2 = {
"FAS001": ["Pnuematic", "Pumps"],
"GRA001": ["Lawn", "Mowers"],
"FAS002": ["Servo", "Motors"],
"FAS003": ["Hose"],
"GRA002": ["Snow", "Shovels"],
"GRA003": ["Water", "Pumps"]
}
match_dict_test = {
"SKU001": ["FAS001", "GRA003"],
"SKU002": ["FAS002"],
"SKU003": ["GRA002"],
"SKU004": ["FAS001", "FAS003", "GRA003"],
}
# Find keys for each item in dict2
dict2_reverse = {}
for k, v in dict2.items():
for item in v:
dict2_reverse.setdefault(item, []).append(k)
# Build dict of matches
match_dict = {}
for k, v in dict1.items():
# Keys in dict2 associated to each item
keys2 = (dict2_reverse.get(item, []) for item in v)
# Save sorted list of keys from dict2 without repetitions
match_dict[k] = sorted(set(k2i for k2 in keys2 for k2i in k2))
# Check result
print(match_dict == match_dict_test)
# True
Assuming that dict1 and dict2 can have duplicate value entries, you would need to build an intermediate multi-map dictionary and also handle uniqueness of the expanded value list for each SKU:
mapDict = dict()
for prod,attributes in dict2.items():
for attribute in attributes:
mapDict.setdefault(attribute,[]).append(prod)
matchDict = dict()
for sku,attributes in dict1.items():
for attribute in attributes:
matchDict.setdefault(sku,set()).update(mapDict.get(attribute,[]))
matchDict = { sku:sorted(prods) for sku,prods in matchDict.items() }
print(matchDict)
{'SKU001': ['FAS001', 'GRA003'], 'SKU002': ['FAS002'], 'SKU003': ['GRA002'], 'SKU004': ['FAS001', 'FAS003', 'GRA003']}

How to remove dictionary's keys and values based on another dictionary?

I wish to remove keys and values in one JSON dictionary based on another JSON dictionary's keys and values. In a sense I am looking perform a "subtraction". Let's say I have JSON dictionaries a and b:
a = {
"my_app":
{
"environment_variables":
{
"SOME_ENV_VAR":
[
"/tmp",
"tmp2"
]
},
"variables":
{ "my_var": "1",
"my_other_var": "2"
}
}
}
b = {
"my_app":
{
"environment_variables":
{
"SOME_ENV_VAR":
[
"/tmp"
]
},
"variables":
{ "my_var": "1" }
}
}
Imagine you could do a-b=c where c looks like this:
c = {
"my_app":
{
"environment_variables":
{
"SOME_ENV_VAR":
[
"/tmp2"
]
},
"variables":
{ "my_other_var": "2" }
}
}
How can this be done?
You can loop through your dictionary using for key in dictionary: and you can delete keys using del dictionary[key], I think that's all you need. See the documentation for dictionaries: https://docs.python.org/2/tutorial/datastructures.html#dictionaries
The way you can do it is to:
Create copy of a -> c;
Iterate over every key, value pair inside b;
Check if for same top keys you have same inner keys and values and delete them from c;
Remove keys with empty values.
You should modify code, if your case will be somehow different (no dict(dict), etc).
print(A)
print(B)
C = A.copy()
# INFO: Suppose your max depth is as follows: "A = dict(key:dict(), ...)"
for k0, v0 in B.items():
# Look for similiar outer keys (check if 'vars' or 'env_vars' in A)
if k0 in C:
# Look for similiar inner (keys, values)
for k1, v1 in v0.items():
# If we have e.g. 'my_var' in B and in C and values are the same
if k1 in C[k0] and v1 == C[k0][k1]:
del C[k0][k1]
# Remove empty 'vars', 'env_vars'
if not C[k0]:
del C[k0]
print(C)
{'environment_variables': {'SOME_ENV_VAR': ['/tmp']},
'variables': {'my_var': '2', 'someones_var': '1'}}
{'environment_variables': {'SOME_ENV_VAR': ['/tmp']},
'variables': {'my_var': '2'}}
{'variables': {'someones_var': '1'}}
The following does what you need:
def subtract(a, b):
result = {}
for key, value in a.items():
if key not in b or b[key] != value:
if not isinstance(value, dict):
if isinstance(value, list):
result[key] = [item for item in value if item not in b[key]]
else:
result[key] = value
continue
inner_dict = subtract(value, b[key])
if len(inner_dict) > 0:
result[key] = inner_dict
return result
It checks if both key and value are present. It could del items, but I think is much better to return a new dict with the desired data instead of modifying the original.
c = subtract(a, b)
UPDATE
I have just updated for the latest version of the data provided by in the question. Now it 'subtract' list values as well.
UPDATE 2
Working example: ipython notebook

How do I find an item in an array of dictionaries?

Suppose I have this:
list = [ { 'p1':'v1' } ,{ 'p2':'v2' } ,{ 'p3':'v3' } ]
I need to find p2 and get its value.
You can try the following ... That will return all the values equivilant to the givenKey in all dictionaries.
ans = [d[key] for d in list if d.has_key(key)]
If this is what your actual code looks like (each key is unique), you should just use one dictionary:
things = { 'p1':'v1', 'p2':'v2', 'p3':'v3' }
do_something(things['p2'])
You can convert a list of dictionaries to one dictionary by merging them with update (but this will overwrite duplicate keys):
dict = {}
for item in list:
dict.update(item)
do_something(dict['p2'])
If that's not possible, you'll need to just loop through them:
for item in list:
if 'p2' in item:
do_something(item['p2'])
If you expect multiple results, you can also build up a list:
p2s = []
for item in list:
if 'p2' in item:
p2s.append(item['p2'])
Also, I wouldn't recommend actually naming any variables dict or list, since that will cause problems with the built-in dict() and list() functions.
These shouldn't be stored in a list to begin with, they should be stored in a dictionary. Since they're stored in a list, though, you can either search them as they are:
lst = [ { 'p1':'v1' } ,{ 'p2':'v2' } ,{ 'p3':'v3' } ]
p2 = next(d["p2"] for d in lst if "p2" in d)
Or turn them into a dictionary:
dct = {}
any(dct.update(d) for d in lst)
p2 = dct["p2"]
You can also use this one-liner:
filter(lambda x: 'p2' in x, list)[0]['p2']
if you have more than one 'p2', this will pick out the first; if you have none, it will raise IndexError.
for d in list:
if d.has_key("p2"):
return d['p2']
If it's a oneoff lookup, you can do something like this
>>> [i['p2'] for i in my_list if 'p2' in i]
['v2']
If you need to look up multiple keys, you should consider converting the list to something that can do key lookups in constant time (such as a dict)
>>> my_list = [ { 'p1':'v1' } ,{ 'p2':'v2' } ,{ 'p3':'v3' } ]
>>> my_dict = dict(i.popitem() for i in my_list)
>>> my_dict['p2']
'v2'
Start by flattening the list of dictionaries out to a dictionary, then you can index it by key and get the value:
{k:v for x in list for k,v in x.iteritems()}['p2']

Categories