I have dictionary which is in encoded format. There can be nested dictionary and I do not have information about how much nested it can be.
Sample of data look like this
1:{
b'key1':{
b'key11':2022,
b'key12':1,
b'key13':2022,
b'key32':1,
b'key14':b'x86\xe3\x88',
b'key21':b'U_001776',
b'key34':b'\xe6\xb4\xbe\xe9\x81\xa3\xe7\xa4\xbe\xe5\x93\xa1',
b'key65':b'U_001506',
b'key45':b'\xbc',
b'key98':b'1\x81\x88'b'kwy66':{
b'keyq':b'sometext'
}
}
},
To convert this into string
I tried this
def convert_dict(data):
if isinstance(data,str):
return data
elif isinstance(data,bytes):
return data.decode()
elif isinstance(data,dict):
for key,val in data.items():
if isinstance(key,bytes):
data[key.decode()] = convert_dict(data[key])
else:
data[key] = convert_dict(data[key])
return data
elif isinstance(data,list):
temp_list = []
for dt in data:
temp_list.append(convert_dict(dt))
return temp_list
else:
return data
I am getting dictionary changed size during iteration. Is there any mistake in this? Please help.
Edit1.
Data is actually serialized in php and I had to use python to unserialize.
I used This to convert it in Dictionary.
from phpserialize import *
temp = loads(serialized_data.encode())
I received dictionary but its key and values are encoded. I had to use serialized_data.encode() because loads will accept byte data type.
I am passing this temp to convert_dict function.
You can't modify the set of keys in a dict while iterating (it's unsafe to modify basically any collection while iterating it, but dicts, unlike some others, need to do some self-checking to avoid crashing if you violate that rule, so while they're at it, they raise an exception rather than silently doing screwy things). So build a new dict and return that instead:
def convert_dict(data):
if isinstance(data,str):
return data
elif isinstance(data,bytes):
return data.decode()
elif isinstance(data,dict):
newdata = {} # Build a new dict
for key, val in data.items():
# Simplify code path by just doing decoding in conditional, insertion unconditional
if isinstance(key,bytes):
key = key.decode()
newdata[key] = convert_dict(val) # Update new dict (and use the val since items() gives it for free)
return newdata
elif isinstance(data,list):
return [convert_dict(dt) for dt in data]
else:
return data
Just for fun, I made some minor modifications to reduce code repetition so most work is done through common paths, and demonstrated simplifying the list case with a listcomp.
You can't change a dictionary that you are iterating over. It is better to return a new structure:
def convert(d):
if isinstance(d, dict):
return {convert(k): convert(v) for k, v in d.items()}
if isinstance(d, list):
return [convert(i) for i in d]
if isinstance(d, bytes):
return d.decode()
return d
Related
I am trying to create a dictionary from a list recursively and my code only works when there is only one item in the list. It fails for multiple items and I suspect that this is because the dictionary is being recreated through each instance of the recursion instead of adding to it after the first instance. How can I avoid doing this so that the whole list is converted to a dictionary?
Note: the list is a list of tuples containing two items.
def poncePlanner(restaurantChoices):
if len(restaurantChoices) == 0:
return {}
else:
name, resto = restaurantChoices[0][0], restaurantChoices[0][1]
try:
dic[name] = resto
poncePlanner(restaurantChoices[1:])
return dic
except:
dic = {name: resto}
poncePlanner(restaurantChoices[1:])
return dic
Intended input and output:
>>> restaurantChoice = [("Paige", "Dancing Goats"), ("Fareeda", "Botiwala"),
("Ramya", "Minero"), ("Jane", "Pancake Social")]
>>> poncePlanner(restaurantChoice)
{'Jane': 'Pancake Social',
'Ramya': 'Minero',
'Fareeda': 'Botiwala',
'Paige': 'Dancing Goats'}
You have the edge condition, so you need to define what to do when you have more than one. Here you just take the first tuple, make a dict, and then add the results of recursion into that dict:
restaurantChoice = [("Paige", "Dancing Goats"), ("Fareeda", "Botiwala"),
("Ramya", "Minero"), ("Jane", "Pancake Social")]
def poncePlanner(restaurantChoice):
if not restaurantChoice:
return {}
head, *rest = restaurantChoice
return {head[0]: head[1], **poncePlanner(rest)}
poncePlanner(restaurantChoice)
Returning:
{'Jane': 'Pancake Social',
'Ramya': 'Minero',
'Fareeda': 'Botiwala',
'Paige': 'Dancing Goats'}
Since restaurantChoices are already (key,value) pairs, you can simply use the built-in function dict to create the dictionary:
def poncePlanner(restaurantChoices):
return dict(restaurantChoices)
Without built-in functions, you can also use a simple for-loop to return the desired transformation:
def poncePlanner(restaurantChoices):
result = {}
for key,value in restaurantChoices:
result[key] = value
return result
If you really want recursion, I would do something like this, but it doesn't make sense because the lists are not nested:
def poncePlanner(restaurantChoices):
def recursion(i, result):
if i<len(restaurantChoices):
key, value = restaurantChoices[i]
result[key] = value
recursion(i+1, result)
return result
return recursion(0,{})
This recursive function has O(1) time/space complexity per call, so it runs with optimal efficiency.
The main problem with the original code is that the dictionary dic is not passed to the deeper recursion calls, so the new contents are never added to the final dictionary. (They contents added to new dictionaries and forgotten).
Lets say i have this dictionary
obj = {'layerA1':'string','layerA2':{'layerB1':{'layerC1':{},'layerC2':{}},
'layerB2':{}}}
i would need to return
['layerA2','layberB1','layerC1']
['layerA2','layerB1','layerC2']
['layerA2','layerB2']
It should work regardless of how deep the dictionary gets.
Currently trying with some recursive functions but i can't seem to get it right.
What i currently have is this:
obj = {'layerA1':'string','layerA2':{'layerB1':{'layerC1':{},'layerC2':{}},
'layerB2':{}}}
hierarchy_list = []
def find_hierachy(param):
some_list = []
for key, value in param.items():
if type(param[key]) is dict:
some_list.append(key)
hierarchy_list.append(some_list)
find_hierachy(param[key])
find_hierachy(obj)
print(hierarchy_list)
[['layerA2'], ['layerB1', 'layerB2'], ['layerC1', 'layerC2'], ['layerC1', 'layerC2'], ['layerB1', 'layerB2']]
I don't know how to get it to return each hierarchical path made of keys
As you noticed in your code, you need to keep track of the path you have taken sofar, this is often referred to as the prefix. By storing the prefix and passing it along, we can keep track of the previous keys. An important thing to keep in mind is that default variables in python should be immutable (tuples) unless you know what happens with mutable objects while using recursion.
answer = []
def hierarchy(dictionary, prefix=()):
for key, value in dictionary.items():
if isinstance(value, dict) and value:
hierarchy(value, (*prefix, key))
else:
answer.append((*prefix, key))
If you want the final answer to be a list, you can loop over the answers and cast them to a list, or send the list as prefix. This requires us to copy the list to the next level of the hierarchy. Which is done using [*prefix, key] which makes a new copy of the list.
obj = {'layerA1': 'string', 'layerA2': {'layerB1': {'layerC1': {}, 'layerC2': {}},
'layerB2': {}}}
if __name__ == '__main__':
answer = []
def hierarchy(dictionary, prefix=None):
prefix = prefix if prefix is not None else []
for key, value in dictionary.items():
if isinstance(value, dict) and value:
hierarchy(value, [*prefix, key])
else:
answer.append([*prefix, key])
hierarchy(obj)
print(answer)
Output
[['layerA1'], ['layerA2', 'layerB1', 'layerC1'], ['layerA2', 'layerB1', 'layerC2'], ['layerA2', 'layerB2']]
Note:
Type checking can be done using isinstance(obj, type), which is the preferred way above type(obj) is type.
I have the following piece of code:
payload = [
{
'car': {
'vin': message.car_reference.vin,
'brand': message.car_reference.model_manufacturer,
'model': message.car_reference.model_description,
'color': message.car_reference.color,
},
}
]
The only field on message.car_reference that is guaranteed to not be None is vin.
I still want the other keys (brand, model, color) to be in the dict only if they have a value.
The payload gets send to an external API that gives me an error if e.g. color = None.
How do I make it so that keys and values are only added, if their value is not None?
What came to my mind until now was mutlitple if-statements, but that looks awful and I don't think it's the right way.
This code recursively looks inside the data structure
def recur_remover(collection):
if isinstance(collection, list):
# This allows you to pass in the whole list immediately
for item in collection:
recur_remover(item)
elif isinstance(collection, dict):
# When you hit a dictionary, this checks if there are nested dictionaries
to_delete = []
for key, val in collection.items():
if val is None:
to_delete.append(key)
else:
recur_remover(collection[key])
for k in to_delete:
# deletes all unwanted keys at once instead of mutating the dict each time
del collection[k]
else:
return
If I understand your problem correctly, you may do this
your_car_collection = [{'car': {k: v for k, v in car['car'].items() if v}} for car in your_car_collection]
According to this conversion table, Python ints get written as JSON numbers when serialized using the JSON module--as I would expect and desire.
I have a dictionary with an integer key and integer value:
>>> d = {1:2}
>>> type(d.items()[0][0])
<type 'int'>
>>> type(d.items()[0][1])
<type 'int'>
When I use the json module to serialize this to a JSON string, the value is written as a number, but the key is written as a string:
>>> json.dumps(d)
'{"1": 2}'
This isn't the behavior I want, and it seems particularly broken since it breaks json.dumps/json.loads round-tripping:
>>> d == json.loads(json.dumps(d))
False
Why does this happen, and is there a way I can force the key to be written as a number?
The simple reason is that JSON does not allow integer keys.
object
{}
{ members }
members
pair
pair , members
pair
string : value # Keys *must* be strings.
As to how to get around this limitation - you will first need to ensure that the receiving implementation can handle the technically-invalid JSON. Then you can either replace all of the quote marks or use a custom serializer.
If you really want to, you can check keys for being convertable to integers again using:
def pythonify(json_data):
for key, value in json_data.iteritems():
if isinstance(value, list):
value = [ pythonify(item) if isinstance(item, dict) else item for item in value ]
elif isinstance(value, dict):
value = pythonify(value)
try:
newkey = int(key)
del json_data[key]
key = newkey
except TypeError:
pass
json_data[key] = value
return json_data
This function will recursively cast all string-keys to int-keys, if possible. If not possible the key-type will remain unchanged.
I adjusted JLT's example below slightly. With some of my huge nested dictionaries that code made the size of the dictionary change, ending with an exception. Anyhow, credit goes to JLT!
def pythonify(json_data):
correctedDict = {}
for key, value in json_data.items():
if isinstance(value, list):
value = [pythonify(item) if isinstance(item, dict) else item for item in value]
elif isinstance(value, dict):
value = pythonify(value)
try:
key = int(key)
except Exception as ex:
pass
correctedDict[key] = value
return correctedDict
I'm trying to create a generic function that replaces dots in keys of a nested dictionary. I have a non-generic function that goes 3 levels deep, but there must be a way to do this generic. Any help is appreciated! My code so far:
output = {'key1': {'key2': 'value2', 'key3': {'key4 with a .': 'value4', 'key5 with a .': 'value5'}}}
def print_dict(d):
new = {}
for key,value in d.items():
new[key.replace(".", "-")] = {}
if isinstance(value, dict):
for key2, value2 in value.items():
new[key][key2] = {}
if isinstance(value2, dict):
for key3, value3 in value2.items():
new[key][key2][key3.replace(".", "-")] = value3
else:
new[key][key2.replace(".", "-")] = value2
else:
new[key] = value
return new
print print_dict(output)
UPDATE: to answer my own question, I made a solution using json object_hooks:
import json
def remove_dots(obj):
for key in obj.keys():
new_key = key.replace(".","-")
if new_key != key:
obj[new_key] = obj[key]
del obj[key]
return obj
output = {'key1': {'key2': 'value2', 'key3': {'key4 with a .': 'value4', 'key5 with a .': 'value5'}}}
new_json = json.loads(json.dumps(output), object_hook=remove_dots)
print new_json
Yes, there exists better way:
def print_dict(d):
new = {}
for k, v in d.iteritems():
if isinstance(v, dict):
v = print_dict(v)
new[k.replace('.', '-')] = v
return new
(Edit: It's recursion, more on Wikipedia.)
Actually all of the answers contain a mistake that may lead to wrong typing in the result.
I'd take the answer of #ngenain and improve it a bit below.
My solution will take care about the types derived from dict (OrderedDict, defaultdict, etc) and also about not only list, but set and tuple types.
I also do a simple type check in the beginning of the function for the most common types to reduce the comparisons count (may give a bit of speed in the large amounts of the data).
Works for Python 3. Replace obj.items() with obj.iteritems() for Py2.
def change_keys(obj, convert):
"""
Recursively goes through the dictionary obj and replaces keys with the convert function.
"""
if isinstance(obj, (str, int, float)):
return obj
if isinstance(obj, dict):
new = obj.__class__()
for k, v in obj.items():
new[convert(k)] = change_keys(v, convert)
elif isinstance(obj, (list, set, tuple)):
new = obj.__class__(change_keys(v, convert) for v in obj)
else:
return obj
return new
If I understand the needs right, most of users want to convert the keys to use them with mongoDB that does not allow dots in key names.
I used the code by #horejsek, but I adapted it to accept nested dictionaries with lists and a function that replaces the string.
I had a similar problem to solve: I wanted to replace keys in underscore lowercase convention for camel case convention and vice versa.
def change_dict_naming_convention(d, convert_function):
"""
Convert a nested dictionary from one convention to another.
Args:
d (dict): dictionary (nested or not) to be converted.
convert_function (func): function that takes the string in one convention and returns it in the other one.
Returns:
Dictionary with the new keys.
"""
new = {}
for k, v in d.iteritems():
new_v = v
if isinstance(v, dict):
new_v = change_dict_naming_convention(v, convert_function)
elif isinstance(v, list):
new_v = list()
for x in v:
new_v.append(change_dict_naming_convention(x, convert_function))
new[convert_function(k)] = new_v
return new
Here's a simple recursive solution that deals with nested lists and dictionnaries.
def change_keys(obj, convert):
"""
Recursivly goes through the dictionnary obj and replaces keys with the convert function.
"""
if isinstance(obj, dict):
new = {}
for k, v in obj.iteritems():
new[convert(k)] = change_keys(v, convert)
elif isinstance(obj, list):
new = []
for v in obj:
new.append(change_keys(v, convert))
else:
return obj
return new
You have to remove the original key, but you can't do it in the body of the loop because it will throw RunTimeError: dictionary changed size during iteration.
To solve this, iterate through a copy of the original object, but modify the original object:
def change_keys(obj):
new_obj = obj
for k in new_obj:
if hasattr(obj[k], '__getitem__'):
change_keys(obj[k])
if '.' in k:
obj[k.replace('.', '$')] = obj[k]
del obj[k]
>>> foo = {'foo': {'bar': {'baz.121': 1}}}
>>> change_keys(foo)
>>> foo
{'foo': {'bar': {'baz$121': 1}}}
You can dump everything to a JSON
replace through the whole string and load the JSON back
def nested_replace(data, old, new):
json_string = json.dumps(data)
replaced = json_string.replace(old, new)
fixed_json = json.loads(replaced)
return fixed_json
Or use a one-liner
def short_replace(data, old, new):
return json.loads(json.dumps(data).replace(old, new))
While jllopezpino's answer works but only limited to the start with the dictionary, here is mine that works with original variable is either list or dict.
def fix_camel_cases(data):
def convert(name):
# https://stackoverflow.com/questions/1175208/elegant-python-function-to-convert-camelcase-to-snake-case
s1 = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
return re.sub('([a-z0-9])([A-Z])', r'\1_\2', s1).lower()
if isinstance(data, dict):
new_dict = {}
for key, value in data.items():
value = fix_camel_cases(value)
snake_key = convert(key)
new_dict[snake_key] = value
return new_dict
if isinstance(data, list):
new_list = []
for value in data:
new_list.append(fix_camel_cases(value))
return new_list
return data
Here's a 1-liner variant of #horejsek 's answer using dict comprehension for those who prefer:
def print_dict(d):
return {k.replace('.', '-'): print_dict(v) for k, v in d.items()} if isinstance(d, dict) else d
I've only tested this in Python 2.7
I am guessing you have the same issue as I have, inserting dictionaries into a MongoDB collection, encountering exceptions when trying to insert dictionaries that have keys with dots (.) in them.
This solution is essentially the same as most other answers here, but it is slightly more compact, and perhaps less readable in that it uses a single statement and calls itself recursively. For Python 3.
def replace_keys(my_dict):
return { k.replace('.', '(dot)'): replace_keys(v) if type(v) == dict else v for k, v in my_dict.items() }