Recursively replace characters in a dictionary - python

How do I change all dots . to underscores (in the dict's keys), given an arbitrarily nested dictionary?
What I tried is write two loops, but then I would be limited to 2-level-nested dictionaries.
This ...
{
"brown.muffins": 5,
"green.pear": 4,
"delicious.apples": {
"green.apples": 2
{
}
... should become:
{
"brown_muffins": 5,
"green_pear": 4,
"delicious_apples": {
"green_apples": 2
{
}
Is there an elegant way?

You can write a recursive function, like this
from collections.abc import Mapping
def rec_key_replace(obj):
if isinstance(obj, Mapping):
return {key.replace('.', '_'): rec_key_replace(val) for key, val in obj.items()}
return obj
and when you invoke this with the dictionary you have shown in the question, you will get a new dictionary, with the dots in keys replaced with _s
{'delicious_apples': {'green_apples': 2}, 'green_pear': 4, 'brown_muffins': 5}
Explanation
Here, we just check if the current object is an instance of dict and if it is, then we iterate the dictionary, replace the key and call the function recursively. If it is actually not a dictionary, then return it as it is.

Assuming . is only present in keys and all the dictionary's contents are primitive literals, the really cheap way would be to use str() or repr(), do the replacement, then ast.literal_eval() to get it back:
d ={
"brown.muffins": 5,
"green.pear": 4,
"delicious_apples": {
"green.apples": 2
} # correct brace
}
Result:
>>> import ast
>>> ast.literal_eval(repr(d).replace('.','_'))
{'delicious_apples': {'green_apples': 2}, 'green_pear': 4, 'brown_muffins': 5}
If the dictionary has . outside of keys, we can replace more carefully by using a regular expression to look for strings like 'ke.y': and replace only those bits:
>>> import re
>>> ast.literal_eval(re.sub(r"'(.*?)':", lambda x: x.group(0).replace('.','_'), repr(d)))
{'delicious_apples': {'green_apples': 2}, 'green_pear': 4, 'brown_muffins': 5}
If your dictionary is very complex, with '.' in values and dictionary-like strings and so on, use a real recursive approach. Like I said at the start, though, this is the cheap way.

Related

How to sum equal key values when inserting them into a new dictionary in Python?

I have the dictionary that I got from a .txt file.
dictOne = {
"AAA": 0,
"BBB": 1,
"AAA": 3,
"BBB": 1,
}
I would like to generate a new dictionary called dictTwo with the sum of values of equal keys. Result:
dictTwo = {
"AAA": 3,
"BBB": 2,
}
I prepared the following code, but it points to error syntax (SyntaxError: invalid syntax):
import json
dictOne = json.loads(text)
dictTwo = {}
for k, v in dictOne.items():
dictTwo [k] = v += v
Can anyone help me what error?
Assuming you resolve the duplicate key issue in dict
dictOne = {
"AAA": 0,
"BBB": 1,
"AAA": 3,
"BBB": 1
}
dictTwo = {
"AAA": 3,
"BBB": 2,
}
for k, v in dictOne.items():
if k in dictTwo:
dictTwo [k] += v
else:
dictTwo[k] = v
print(dictTwo)
You can do this if you do it while reading the JSON input.
JSON permits duplicate keys in objects, although it discourages the practice, noting that different JSON processors produce different results for duplicate keys.
Python does not allow duplicate keys in dictionaries, and Python's json module handles duplicate keys in one of the ways noted by the JSON standard: it ignores all but the last value for any such key. However, it gives you a mechanism to do your own processing of objects, in case you want to do something else with duplicate keys (or produce something other than a dictionary).
You do this by providing the object_pairs_hook parameter to json.load or json.loads. That parameter should be a function whose argument is an iterable of (key, value) pairs, where the key is a string and the value is an already processed JSON object. Whatever the function returns will be the value used by json.load for an object literal; it does not need to return a dict.
That implies that the handling of duplicate keys will be the same for every object literal in the JSON input, which is a bit of a limitation, but it may be acceptable in your case.
Here's a simple example:
import json
def key_combiner(pairs):
rv = {}
for k, v in pairs:
if k in rv: rv[k] += v
else: rv[k] = v
return rv
# Sample usage:
# (Note: JSON doesn't allow trailing commas in objects or lists.)
json_data = '''{
"AAA": 0,
"BBB": 1,
"AAA": 3,
"BBB": 1
}'''
consolidated = json.loads(json_data, object_pairs_hook=key_combiner)
print(consolidated)
This prints {'AAA': 3, 'BBB': 2}.
If I'd known that the values were numbers, I could have used a slightly simpler definition using defaultdict. Writing it the way I did permits combining certain other value types, such as strings or arrays, provided that all the values for the same key in an object are the same type. (Unfortunately, it doesn't allow combining objects, because Python uses | to combine two dicts, instead of +.)
This feature was mostly intended to be used for creating class instances from json objects, but it has many other possible uses.

Changing dictionary keys with a function

Suppose a dictionary like
d1 = {
'a b':1,
'c-d':2,
'Ef':3
}
I want to run a function that renames all the keys according to some rules, for example lowercasing and changing spaces and - to _. So as to get the result:
d2 = {
'a_b':1,
'c_d':2,
'ef':3
}
The difference between this question and the other similar questions on this site about renaming dictionary keys, is that here we don't know the columns beforehand. So we want to run a renaming function on all the keys (to normalize them or something like that).
Declare such a function, then use a dictionary comprehension (search for "dict comprehensions") to loop over the dict items and call that transformer.
def transform_key(key):
return key.lower().replace(" ", "_").replace("-", "_")
d1 = {
'a b':1,
'c-d':2,
'Ef':3
}
d2 = {transform_key(key): value for (key, value) in d1.items()}
print(d2)
outputs
{'a_b': 1, 'c_d': 2, 'ef': 3}

Python set dictionary nested key with dot delineated string

If I have a dictionary that is nested, and I pass in a string like "key1.key2.key3" which would translate to:
myDict["key1"]["key2"]["key3"]
What would be an elegant way to be able to have a method where I could pass on that string and it would translate to that key assignment? Something like
myDict.set_nested('key1.key2.key3', someValue)
Using only builtin stuff:
def set(my_dict, key_string, value):
"""Given `foo`, 'key1.key2.key3', 'something', set foo['key1']['key2']['key3'] = 'something'"""
# Start off pointing at the original dictionary that was passed in.
here = my_dict
# Turn the string of key names into a list of strings.
keys = key_string.split(".")
# For every key *before* the last one, we concentrate on navigating through the dictionary.
for key in keys[:-1]:
# Try to find here[key]. If it doesn't exist, create it with an empty dictionary. Then,
# update our `here` pointer to refer to the thing we just found (or created).
here = here.setdefault(key, {})
# Finally, set the final key to the given value
here[keys[-1]] = value
myDict = {}
set(myDict, "key1.key2.key3", "some_value")
assert myDict == {"key1": {"key2": {"key3": "some_value"}}}
This traverses myDict one key at a time, ensuring that each sub-key refers to a nested dictionary.
You could also solve this recursively, but then you risk RecursionError exceptions without any real benefit.
There are a number of existing modules that will already do this, or something very much like it. For example, the jmespath module will resolve jmespath expressions, so given:
>>> mydict={'key1': {'key2': {'key3': 'value'}}}
You can run:
>>> import jmespath
>>> jmespath.search('key1.key2.key3', mydict)
'value'
The jsonpointer module does something similar, although it likes / for a separator instead of ..
Given the number of pre-existing modules I would avoid trying to write your own code to do this.
EDIT: OP's clarification makes it clear that this answer isn't what he's looking for. I'm leaving it up here for people who find it by title.
I implemented a class that did this a while back... it should serve your purposes.
I achieved this by overriding the default getattr/setattr functions for an object.
Check it out! AndroxxTraxxon/cfgutils
This lets you do some code like the following...
from cfgutils import obj
a = obj({
"b": 123,
"c": "apple",
"d": {
"e": "nested dictionary value"
}
})
print(a.d.e)
>>> nested dictionary value

Extracting fields of a list of dictionaries into a new dictionary using glom

I have the following highly simplified structure
elements = [{"id": "1", "counts": [1, 2, 3]},
{"id": "2", "counts": [4, 5, 6]}]
I'd like to be able to construct, using glom, a new dictionary of the form {<id>: <counts[pos]>}, e.g. for pos = 2:
{"1": 3, "2": 6}
or alternatively a list/tuple of tuples
[("1",3), ("2", 6)]
Using a dict comprehension is easy, but the data structure is more complicated and I'd like to dynamically specify what to extract. The previous example would be the simplest thing I'd like to achieve.
After a while I managed to solve it as follows
from glom import glom, T
elements = [{"id": "1", "counts": [1,2,3]},{"id": "2", "counts": [4,5,6]}]
def extract(elements, pos):
extracted = glom(elements, ({"elements": [lambda v: (v["id"], v["counts"][pos])]}, T))
return dict(extracted["elements"])
But this requires a call to dict. A slight variation that skips a dictionary indirection would be
def extract(elements, pos):
extracted = glom(elements, (([lambda v: {v["id"]: v["counts"][pos]}]), T))
return {k: v for d in extracted for k, v in d.items()}
Now, I could use the merge function called onto the returned values from the glom call
def extract(elements, pos):
return merge(glom(elements, (([lambda v: {v["id"]: v["counts"][pos]}]), T)))
I'm rather satisfied with this, but is there a better approach to do this? And with better I mean building a single cleaner spec callable? Ultimately, I'd like to be able to define at runtime in a user friendly way the values of the dictionary, i.e., v["counts"][pos].
An improvement towards this idea would be to use a callable to be invoked for the value of the internal dictionary
def counts_position(element, **kwargs):
return element["counts"][kwargs["pos"]]
def extract(elements, func, **kwargs):
return merge(glom(elements, (([lambda v: {v["id"]: func(v, **kwargs)}]), T)))
extract(values, counts_position, pos=2)
With this, what's begin extracted from each element can be controlled externally.
To convert a list of dicts with id in each one to an id-keyed dict you could use a simple dict comprehension:
{t["id"]: glom.glom(t, "counts.2") for t in elements}
Or, if you want to use glom for that, use glom.Merge along with glom.T:
glom.glom(elements, glom.Merge([{T['id']: 'counts.2'}])))
To avoid lambdas, you could interpolate the pos param into the spec string, e.g. 'counts.%s' % pos.

Temporary names within an expression

I'm looking for a way to name a value within an expression to use it multiple times within that expression. Since the value is found inside the expression, I can't save it as a variable using a typical assign statement. I also want its use to be in the same function as the rest of the expression, so I would rather not break it out into a separate function.
More specifically, I enjoy comprehension. List/dictionary comprehension is my favorite Python feature. I'm trying to use both to coerce a dictionary of untrusted structure into a trusted structure (all fields exist, and their values are of the correct type). Without what I'm looking for, it would look something like this:
{
...
'outer': [{
...
'inner': {
key: {
...
'foo': {
'a': get_foo_from_value(value)['a'],
'b': get_foo_from_value(value)['b'],
...
}
} for key, value in get_inner_from_outer(outer)
}
} for outer in get_outer_from_dictionary(dictionary)]
}
Those function calls are actually expressions, but I would like to only evaluate get_foo_from_value(value) once. Ideally there would be something like this:
'foo': {
'a': foo['a'],
'b': foo['b'],
...
} with get_foo_from_value(value) as foo
So far the options I've come up with are single-item generators and lambda expressions. I'm going to include an example of each as an answer so they can be discussed separately.
lambda solution
'foo': (lambda foo: {
'a': foo['a'],
'b': foo['b'],
...
})(get_foo_from_value(value))
I feel like this one isn't as readable as it could be. I also don't like creating a lambda that only gets called once. I like that the name appears before it's used, but I don't like the separation of its name and value.
single-item generator solution
This is currently my favorite solution to the problem (I like comprehension, remember).
'foo': next({
'a': foo['a'],
'b': foo['b'],
...
} for foo in [get_foo_from_value(value)])
I like it because the generator expression matches the rest of the comprehension in the expression, but I'm not a huge fan of the next and having to wrap get_foo_from_value(value) in brackets.

Categories