Related
I have dictionary which is in encoded format. There can be nested dictionary and I do not have information about how much nested it can be.
Sample of data look like this
1:{
b'key1':{
b'key11':2022,
b'key12':1,
b'key13':2022,
b'key32':1,
b'key14':b'x86\xe3\x88',
b'key21':b'U_001776',
b'key34':b'\xe6\xb4\xbe\xe9\x81\xa3\xe7\xa4\xbe\xe5\x93\xa1',
b'key65':b'U_001506',
b'key45':b'\xbc',
b'key98':b'1\x81\x88'b'kwy66':{
b'keyq':b'sometext'
}
}
},
To convert this into string
I tried this
def convert_dict(data):
if isinstance(data,str):
return data
elif isinstance(data,bytes):
return data.decode()
elif isinstance(data,dict):
for key,val in data.items():
if isinstance(key,bytes):
data[key.decode()] = convert_dict(data[key])
else:
data[key] = convert_dict(data[key])
return data
elif isinstance(data,list):
temp_list = []
for dt in data:
temp_list.append(convert_dict(dt))
return temp_list
else:
return data
I am getting dictionary changed size during iteration. Is there any mistake in this? Please help.
Edit1.
Data is actually serialized in php and I had to use python to unserialize.
I used This to convert it in Dictionary.
from phpserialize import *
temp = loads(serialized_data.encode())
I received dictionary but its key and values are encoded. I had to use serialized_data.encode() because loads will accept byte data type.
I am passing this temp to convert_dict function.
You can't modify the set of keys in a dict while iterating (it's unsafe to modify basically any collection while iterating it, but dicts, unlike some others, need to do some self-checking to avoid crashing if you violate that rule, so while they're at it, they raise an exception rather than silently doing screwy things). So build a new dict and return that instead:
def convert_dict(data):
if isinstance(data,str):
return data
elif isinstance(data,bytes):
return data.decode()
elif isinstance(data,dict):
newdata = {} # Build a new dict
for key, val in data.items():
# Simplify code path by just doing decoding in conditional, insertion unconditional
if isinstance(key,bytes):
key = key.decode()
newdata[key] = convert_dict(val) # Update new dict (and use the val since items() gives it for free)
return newdata
elif isinstance(data,list):
return [convert_dict(dt) for dt in data]
else:
return data
Just for fun, I made some minor modifications to reduce code repetition so most work is done through common paths, and demonstrated simplifying the list case with a listcomp.
You can't change a dictionary that you are iterating over. It is better to return a new structure:
def convert(d):
if isinstance(d, dict):
return {convert(k): convert(v) for k, v in d.items()}
if isinstance(d, list):
return [convert(i) for i in d]
if isinstance(d, bytes):
return d.decode()
return d
I had to remove some fields from a dictionary, the keys for those fields are on a list. So I wrote this function:
def delete_keys_from_dict(dict_del, lst_keys):
"""
Delete the keys present in lst_keys from the dictionary.
Loops recursively over nested dictionaries.
"""
dict_foo = dict_del.copy() #Used as iterator to avoid the 'DictionaryHasChanged' error
for field in dict_foo.keys():
if field in lst_keys:
del dict_del[field]
if type(dict_foo[field]) == dict:
delete_keys_from_dict(dict_del[field], lst_keys)
return dict_del
This code works, but it's not very elegant and I'm sure that there is a better solution.
First of, I think your code is working and not inelegant. There's no immediate reason not to use the code you presented.
There are a few things that could be better though:
Comparing the type
Your code contains the line:
if type(dict_foo[field]) == dict:
That can be definitely improved. Generally (see also PEP8) you should use isinstance instead of comparing types:
if isinstance(dict_foo[field], dict)
However that will also return True if dict_foo[field] is a subclass of dict. If you don't want that, you could also use is instead of ==. That will be marginally (and probably unnoticeable) faster.
If you also want to allow arbitary dict-like objects you could go a step further and test if it's a collections.abc.MutableMapping. That will be True for dict and dict subclasses and for all mutable mappings that explicitly implement that interface without subclassing dict, for example UserDict:
>>> from collections import MutableMapping
>>> # from UserDict import UserDict # Python 2.x
>>> from collections import UserDict # Python 3.x - 3.6
>>> # from collections.abc import MutableMapping # Python 3.7+
>>> isinstance(UserDict(), MutableMapping)
True
>>> isinstance(UserDict(), dict)
False
Inplace modification and return value
Typically functions either modify a data structure inplace or return a new (modified) data structure. Just to mention a few examples: list.append, dict.clear, dict.update all modify the data structure inplace and return None. That makes it easier to keep track what a function does. However that's not a hard rule and there are always valid exceptions from this rule. However personally I think a function like this doesn't need to be an exception and I would simply remove the return dict_del line and let it implicitly return None, but YMMV.
Removing the keys from the dictionary
You copied the dictionary to avoid problems when you remove key-value pairs during the iteration. However, as already mentioned by another answer you could just iterate over the keys that should be removed and try to delete them:
for key in keys_to_remove:
try:
del dict[key]
except KeyError:
pass
That has the additional advantage that you don't need to nest two loops (which could be slower, especially if the number of keys that need to be removed is very long).
If you don't like empty except clauses you can also use: contextlib.suppress (requires Python 3.4+):
from contextlib import suppress
for key in keys_to_remove:
with suppress(KeyError):
del dict[key]
Variable names
There are a few variables I would rename because they are just not descriptive or even misleading:
delete_keys_from_dict should probably mention the subdict-handling, maybe delete_keys_from_dict_recursive.
dict_del sounds like a deleted dict. I tend to prefer names like dictionary or dct because the function name already describes what is done to the dictionary.
lst_keys, same there. I'd probably use just keys there. If you want to be more specific something like keys_sequence would make more sense because it accepts any sequence (you just have to be able to iterate over it multiple times), not just lists.
dict_foo, just no...
field isn't really appropriate either, it's a key.
Putting it all together:
As I said before I personally would modify the dictionary in-place and not return the dictionary again. Because of that I present two solutions, one that modifies it in-place but doesn't return anything and one that creates a new dictionary with the keys removed.
The version that modifies in-place (very much like Ned Batchelders solution):
from collections import MutableMapping
from contextlib import suppress
def delete_keys_from_dict(dictionary, keys):
for key in keys:
with suppress(KeyError):
del dictionary[key]
for value in dictionary.values():
if isinstance(value, MutableMapping):
delete_keys_from_dict(value, keys)
And the solution that returns a new object:
from collections import MutableMapping
def delete_keys_from_dict(dictionary, keys):
keys_set = set(keys) # Just an optimization for the "if key in keys" lookup.
modified_dict = {}
for key, value in dictionary.items():
if key not in keys_set:
if isinstance(value, MutableMapping):
modified_dict[key] = delete_keys_from_dict(value, keys_set)
else:
modified_dict[key] = value # or copy.deepcopy(value) if a copy is desired for non-dicts.
return modified_dict
However it only makes copies of the dictionaries, the other values are not returned as copy, you could easily wrap these in copy.deepcopy (I put a comment in the appropriate place of the code) if you want that.
def delete_keys_from_dict(dict_del, lst_keys):
for k in lst_keys:
try:
del dict_del[k]
except KeyError:
pass
for v in dict_del.values():
if isinstance(v, dict):
delete_keys_from_dict(v, lst_keys)
return dict_del
Since the question requested an elegant way, I'll submit my general-purpose solution to wrangling nested structures. First, install the boltons utility package with pip install boltons, then:
from boltons.iterutils import remap
data = {'one': 'remains', 'this': 'goes', 'of': 'course'}
bad_keys = set(['this', 'is', 'a', 'list', 'of', 'keys'])
drop_keys = lambda path, key, value: key not in bad_keys
clean = remap(data, visit=drop_keys)
print(clean)
# Output:
{'one': 'remains'}
In short, the remap utility is a full-featured, yet succinct approach to handling real-world data structures which are often nested, and can even contain cycles and special containers.
This page has many more examples, including ones working with much larger objects from Github's API.
It's pure-Python, so it works everywhere, and is fully tested in Python 2.7 and 3.3+. Best of all, I wrote it for exactly cases like this, so if you find a case it doesn't handle, you can bug me to fix it right here.
def delete_keys_from_dict(d, to_delete):
if isinstance(to_delete, str):
to_delete = [to_delete]
if isinstance(d, dict):
for single_to_delete in set(to_delete):
if single_to_delete in d:
del d[single_to_delete]
for k, v in d.items():
delete_keys_from_dict(v, to_delete)
elif isinstance(d, list):
for i in d:
delete_keys_from_dict(i, to_delete)
d = {'a': 10, 'b': [{'c': 10, 'd': 10, 'a': 10}, {'a': 10}], 'c': 1 }
delete_keys_from_dict(d, ['a', 'c']) # inplace deletion
print(d)
>>> {'b': [{'d': 10}, {}]}
This solution works for dict and list in a given nested dict. The input to_delete can be a list of str to be deleted or a single str.
Plese note, that if you remove the only key in a dict, you will get an empty dict.
I think the following is more elegant:
def delete_keys_from_dict(dict_del, lst_keys):
if not isinstance(dict_del, dict):
return dict_del
return {
key: value
for key, value in (
(key, delete_keys_from_dict(value, lst_keys))
for key, value in dict_del.items()
)
if key not in lst_keys
}
Example usage:
test_dict_in = {
1: {1: {0: 2, 3: 4}},
0: {2: 3},
2: {5: {0: 4}, 6: {7: 8}},
}
test_dict_out = {
1: {1: {3: 4}},
2: {5: {}, 6: {7: 8}},
}
assert delete_keys_from_dict(test_dict_in, [0]) == test_dict_out
Since you already need to loop through every element in the dict, I'd stick with a single loop and just make sure to use a set for looking up the keys to delete
def delete_keys_from_dict(dict_del, the_keys):
"""
Delete the keys present in the lst_keys from the dictionary.
Loops recursively over nested dictionaries.
"""
# make sure the_keys is a set to get O(1) lookups
if type(the_keys) is not set:
the_keys = set(the_keys)
for k,v in dict_del.items():
if k in the_keys:
del dict_del[k]
if isinstance(v, dict):
delete_keys_from_dict(v, the_keys)
return dict_del
this works with dicts containing Iterables (list, ...) that may contain dict. Python 3. For Python 2 unicode should also be excluded from the iteration. Also there may be some iterables that don't work that I'm not aware of. (i.e. will lead to inifinite recursion)
from collections.abc import Iterable
def deep_omit(d, keys):
if isinstance(d, dict):
for k in keys:
d.pop(k, None)
for v in d.values():
deep_omit(v, keys)
elif isinstance(d, Iterable) and not isinstance(d, str):
for e in d:
deep_omit(e, keys)
return d
Since nobody posted an interactive version that could be useful for someone:
def delete_key_from_dict(adict, key):
stack = [adict]
while stack:
elem = stack.pop()
if isinstance(elem, dict):
if key in elem:
del elem[key]
for k in elem:
stack.append(elem[k])
This version is probably what you would push to production. The recursive version is elegant and easy to write but it scales badly (by default Python uses a maximum recursion depth of 1000).
If you have nested keys as well and based on #John La Rooy's answer here is an elegant solution:
from boltons.iterutils import remap
def sof_solution():
data = {"user": {"name": "test", "pwd": "******"}, "accounts": ["1", "2"]}
sensitive = {"user.pwd", "accounts"}
clean = remap(
data,
visit=lambda path, key, value: drop_keys(path, key, value, sensitive)
)
print(clean)
def drop_keys(path, key, value, sensitive):
if len(path) > 0:
nested_key = f"{'.'.join(path)}.{key}"
return nested_key not in sensitive
return key not in sensitive
sof_solution() # prints {'user': {'name': 'test'}}
Using the awesome code from this post and add a small statement:
def remove_fields(self, d, list_of_keys_to_remove):
if not isinstance(d, (dict, list)):
return d
if isinstance(d, list):
return [v for v in (self.remove_fields(v, list_of_keys_to_remove) for v in d) if v]
return {k: v for k, v in ((k, self.remove_fields(v, list_of_keys_to_remove)) for k, v in d.items()) if k not in list_of_keys_to_remove}
I came here to search for a solution to remove keys from deeply nested Python3 dicts and all solutions seem to be somewhat complex.
Here's a oneliner for removing keys from nested or flat dicts:
nested_dict = {
"foo": {
"bar": {
"foobar": {},
"shmoobar": {}
}
}
}
>>> {'foo': {'bar': {'foobar': {}, 'shmoobar': {}}}}
nested_dict.get("foo", {}).get("bar", {}).pop("shmoobar", None)
>>> {'foo': {'bar': {'foobar': {}}}}
I used .get() to not get KeyError and I also provide empty dict as default value up to the end of the chain. I do pop() for the last element and I provide None as the default there to avoid KeyError.
I have a list of objects, like so:
[
{"title":"cdap_tests", "datacenter":"B1", "count_failed": 1},
{"title":"cdap_tests", "datacenter":"G1", "count_failed": 1},
{"title":"cdap_tests", "datacenter":"GOV1", "count_failed": 1},
{"title":"developer_portal_tests", "datacenter":"B1", "count_failed": 1}
]
and I want to combine the objects that have the same title attribute together like so:
[
{"title":"cdap_tests", "datacenter":"B1,G1,GOV1", "count_failed": 1},
{"title":"developer_portal_tests", "datacenter":"B1", "count_failed": 1}
]
I have tried comparing each one to another based on their attribute, and adding the string to the other string if they were the same, but for some reason it is not combining them, I simply get the same data back from the function return
new_data_list = []
for row_to_compare_to in data:
for row_to_compare_from in data:
if row_to_compare_from["datacenter"] == row_to_compare_to["datacenter"]:
pass
elif row_to_compare_from["title"] == row_to_compare_to["title"]:
row_to_compare_to["datacenter"] = f"{row_to_compare_from['datacenter']}, {row_to_compare_to['datacenter']}"
row_to_compare_to["count_failed"] = f"{row_to_compare_from['count_failed']}, {row_to_compare_to['count_failed']}"
new_data_list.append(row_to_compare_to)
return new_data_list
Could someone point me in the direction of what I am doing wrong? Or maybe a cleaner solution?
The code produces an error because the "count_failed" is not in every dictionary.
If I were starting from scratch, I might substitute the original data list for a dictionary of dictionaries where the key to the outer dictionary is the title of each entry. This would result in code that is easier to read. I might also make the appended data like "B1,G1,GOV1" a list instead ["B1", "G1", "GOV1"].
I'm sure my approach isn't the most efficient or Pythonic, but I believe it works:
new_data_list = []
for raw_dct in data:
if raw_dct['title'] in [dct['title'] for dct in new_data_list]:
for new_dct in new_data_list:
if raw_dct['title'] == new_dct['title']:
for k, v in raw_dct.items():
if k in new_dct.keys():
if type(v) is str:
if v not in new_dct[k]:
new_dct[k] += "," + str(v)
if type(v) is int:
new_dct[k] += v
else:
new_dct[k] = v
else:
new_data_list.append(raw_dct)
which gives your desired new_data_list that also takes into account the counting of integer attributes like "count_failed".
I'm trying to traverse a dictionary (which has many strings, dicts, lists of dicts), and compare it against another dictionary.
Here's an example:
data = {
"topic": "Seniors' Health Care Freedom Act of 2007",
"foo": "bar",
"last_update": "2011-08-29T20:47:44Z",
"organisations": [
{
"organization_id": "22973",
"name": "National Health Federation",
"bar": "baz"
},
{
"organization_id": "27059",
"name": "A Christian Perspective on Health Issues"
},
]}
validate = {
"topic": None,
"last_update": "next_update",
"organisations": [
{
"organization_id": None,
"name": None
}
]
}
Essentially, if the item exists in "data", but not in "validate" at the current point, it should be deleted from data.
So in this case, I'd want data["foo"] and data["organisations"][x]["bar"] to be removed from the data dict.
Additionally, if the key in validate has a string value and isn't "None", I want to update the key name in data to that, i.e. "last_update" should become "next_update".
I'm not sure of a good way to do this in Python, my current version removes "foo" but I'm struggling trying to remove nested keys like organisations[x][bar].
This is my current attempt:
def func1(data, validate, parent = None):
for k, v in sorted(data.items()):
if not parent:
if k not in validate:
data.pop(k, None)
if isinstance(v, dict):
func1(v, validate)
elif isinstance(v, list):
for val in v:
func1(val, validate, parent = k)
func1(data, validate)
I tried to use something like this to compare the keys instead but figured it doesn't work well if data has additional keys (appeared to remove wrong keys) since dicts are unsorted so wasn't useful for me:
for (k, v), (k2, v2) in zip(sorted(data.items()), sorted(validate.items())):
I've read similar posts such as How to recursively remove certain keys from a multi-dimensional(depth not known) python dictionary?, but this seems to use a flat set to filter so it doesn't take into account where in the dict the key is located which is important for me - as "last_update" can appear in other lists where I need to keep it.
Here is a simple recursive function. Well, it used to be simple; and then I added tons of checks and now it's an if forest.
def validate_the_data(data, validate):
for key in list(data.keys()):
if key not in validate:
del data[key]
elif validate[key] is not None:
if isinstance(data[key], dict):
validate_the_data(data[key], validate[key])
elif isinstance(data[key], list):
for subdata, subvalidate in zip(data[key], validate[key]):
if isinstance(subdata, dict) and isinstance(subvalidate, dict):
validate_the_data(subdata, subvalidate)
else:
data[key] = validate[key]
How it works: if data[key] is a dictionary and key is valid, then we want to check the keys in data[key] against the keys in validate[key]. So we do a recursive call, but instead of putting validate in the recursive call, we put validate[key]. Likewise if data[key] is a list.
Assumptions: The above code will fail if one of the list in data contains elements which are not dictionaries, or if data[key] is a dictionary when validate[key] exists but isn't a dictionary or None, or if data[key] is a list when validate[key] exists but isn't a list or None.
Important note about the if forest: The order of the if/else/if/elif/else matters. In particular, we only execute data[key] = validate[key] in the case where we don't have a list. If validate[key] is a list, then data[key] = validate[key] would result in data[key] becoming the same list, and not a copy of the list, which is most certainly not what you want.
Important note about list(data.keys()): I used the iteration for key in list(data.keys()): and not for key in data: or for key, value in data:. Normally this would not be the preferred way of iterating over a dict. But we use del inside the for loop to remove values from the dictionary, which would interfere with the iteration. So we need to get the list of keys before deleting any element, and then use that list to iterate.
Interesting problem! To prevent multitude of if...else..., you would need to to find an approach which allows recursion regardless of the type of incoming values.
So I presume you need the following rules:
If any value from data is None in validate, value in data should be preserved
If values from data and validate are dictionaries, keep only keys from data if also present in validate, and apply these rules recursively to other keys.
If values from data and validate are lists, keep only items from data if also present in validate, and apply these rules recursively to other items.
If any value from data is not None in validate and rule (2) and (3) don't apply, value in data should be replaced by value in validate
Here is my suggestion:
def sanitize(data1, data2):
"""Sanitize *data1* depending on *data2*
"""
# If value2 is None, simply return value1
if data2 is None:
return data1
# Update value1 recursively if both values are dictionaries.
elif isinstance(data1, dict) and isinstance(data2, dict):
return {
key: sanitize(_value, data2.get(key))
for key, _value in data1.items()
if key in data2
}
# Update value1 recursively if both values are lists.
elif isinstance(data1, list) and isinstance(data2, list):
return [
sanitize(subvalue1, subvalue2)
for subvalue1, subvalue2
in zip(data1, data2)
]
# Otherwise, simply return value2.
return data2
Using your values, you'd get the following output:
> sanitize(data, validate)
{
'topic': "Seniors' Health Care Freedom Act of 2007",
'last_update': 'next_update',
'organisations': [
{
'organization_id': '22973',
'name': 'National Health Federation'
}
]
}
From rule 3, I presumed that you want to delete all list items from data if not present in validate, hence the removal of the second items from "organisations".
It rule 3 should rather be:
If values from data and validate are lists, apply these rules recursively to other items.
Then you can simply replace the zip function by itertools.zip_longest
Dictionary and list comprehensions make quick work of the problem -
def from_schema(t, s):
if isinstance(t, dict) and isinstance(s, dict):
return { v if isinstance(v, str) else k: from_schema(t[k], v) for (k, v) in s.items() if k in t }
elif isinstance(t, list) and isinstance(s, list):
return [ from_schema(v, s[0]) for v in t if s ]
else:
return t
A few line breaks might make the comprehensions more... comprehensible -
def from_schema(t, s):
if isinstance(t, dict) and isinstance(s, dict):
return \
{ v if isinstance(v, str) else k: from_schema(t[k], v)
for (k, v) in s.items()
if k in t
}
elif isinstance(t, list) and isinstance(s, list):
return \
[ from_schema(v, s[0])
for v in t
if s
]
else:
return t
result = from_schema(data, validate)
print(result)
{
"topic": "Seniors' Health Care Freedom Act of 2007",
"next_update": "2011-08-29T20:47:44Z",
"organisations": [
{
"organization_id": "22973",
"name": "National Health Federation"
},
{
"organization_id": "27059",
"name": "A Christian Perspective on Health Issues"
}
]
}
I have a dict like this one:
exampleDict={'name': 'Example1', 'code': 2, 'price': 23, 'dimensions': [2,2]}
And I want to change the type of dimensionsto a string, like this:
exampleDict['dimensions'] = str(dict['dimensions'])
This works just fine. But imagine there are nested dicts inside my exampleDict, and dimensionsis a bit far inside.
My guess is to do something recursively. From what I have searched in here, (examples like this one, or this one, they use yieldin a recursive function, but I am not sure why it's used.
I was thinking on doing this:
def changeToStringDim(d):
if 'dimensions' in d:
d['dimensiones'] = str(d['dimensions'])
for k in d:
if isinstance(d[k], list):
for i in d[k]:
for j in changeToStringDim(i):
j[dimensions] = str(j['dimensions'])
I found it in here, but instead of the assignment of j[dimensions]=str(j['dimensions']) it did a yield.But I adapted the solution to this, and it works fine in a dict like this example.
Now I am trying to do it in a nested one.
exDict2={'name': 'example1',
'nesting': {'subnesting1': 'sub2',
'coordinates': [41.6769705, 2.288154]},
'price': 123123132}
}
With the same function but changing it to coordinates:
def changeToStringCoord(d):
if 'coordinates' in d:
d['coordinates'] = str(d['coordinates'])
for k in d:
if isinstance(d[k], list):
for i in d[k]:
for j in changeToStringDim(i):
j['coordinates'] = str(j['coordinates'])
And it won't do anything. I have debugged it, and it will just go through name, nestingand price. The isinstanceis not working properly (or it is and I am not fully understanding its methodology).
Code with comments:
def changeNestedListToString(d):
for k in d:
# recursive call on dictionary type
if isinstance(d[k], dict):
changeNestedListToString(d[k])
# convert lists to string
elif isinstance(d[k], list):
d[k] = str(d[k])
# leave everything else untouched
Test data:
example = {
'name': 'example1',
'nesting': {
'subnesting1': 'sub2',
'coordinates': [41.6769705, 2.288154]
},
'price': 123123132
}
After calling the function:
{
'price': 123123132,
'name': 'example1',
'nesting': {
'coordinates': '[41.6769705, 2.288154]',
'subnesting1': 'sub2'
}
}
As you can see 'coordinates' has been converted to a string while everything else was left untouched.