python: behavior of json.dumps on dict - python

I am trying to override‎ the behavior of the dict on json.dumps. For instance, I can order the keys. Thus, I create a class which inherits if dict, and override‎ some of its methods.
import json
class A(dict):
def __iter__(self):
for i in range(10):
yield i
def __getitem__(self, name):
return None
print json.dumps(A())
But it does not call any of my methods and only gives me {}
There is a way to give me the rigt behavior:
import json
class A(dict):
def __init__(self):
dict.__init__(self, {None:None})
def __iter__(self):
for i in range(10):
yield i
def __getitem__(self, name):
return None
print json.dumps(A())
Wich finally gives {"0": null, "1": null, "2": null, "3": null, "4": null, "5": null, "6": null, "7": null, "8": null, "9": null}
Thus, it is clear that the C implementation of json.dumps somehow test if the dict is empty. Unfortunately, I cannot figure out which method is called. First, __getattribute__ does not work, and second I've overrided quite every method dict defines or could define without success.
So, could someone explain to me how the C implementation of json.dumps check if the dict is empty, and is there a way to override it (I find my __init__ pretty ugly).
Thank you.
Edit:
I finally found where this appends in the C code, and it looks not customizable
_json.c line 2083:
if (open_dict == NULL || close_dict == NULL || empty_dict == NULL) {
open_dict = PyString_InternFromString("{");
close_dict = PyString_InternFromString("}");
empty_dict = PyString_InternFromString("{}");
if (open_dict == NULL || close_dict == NULL || empty_dict == NULL)
return -1;
}
if (Py_SIZE(dct) == 0)
return PyList_Append(rval, empty_dict);
So it looks like Py_SIZE is used to check if the dict is empty. But this is a macro (not a function), which only return a property of the python object.
object.h line 114:
#define Py_REFCNT(ob) (((PyObject*)(ob))->ob_refcnt)
#define Py_TYPE(ob) (((PyObject*)(ob))->ob_type)
#define Py_SIZE(ob) (((PyVarObject*)(ob))->ob_size)
So since its not a function, it cannot be overrided and thus its behavior cannot be customized.
Finally, the "non empty dict trick" is necessary if one want to customize json.dumps by inheriting a dict (of course other ways to achive this are possible).

Would it be easier to modify the behaviour of the encoder rather than creating a new dict sub class?
class OrderedDictJSONEncoder(json.JSONEncoder):
def default(self, obj):
if hasattr(obj, 'keys'):
return {} # replace your unordered dict with an OrderedDict from collections
else:
return super(OrderedDictJSONEncoder, self).default(obj)
And use it like so:
json.dumps(my_dict_to_encode, cls=OrderedDictJSONEncoder)
This seems like the right place to turn an unordered Python dict into an ordered JSON object.

I don't know exactly what the encoder does, but it's not written in C, the Python source for the json package is here: http://hg.python.org/cpython/file/2a872126f4a1/Lib/json
Also if you just want to order the items, there's
json.dumps(A(), sort_keys=True)
Also see this question ("How to perfectly override a dict?") and its first answer, that explains that you should subclass collections.MutableMapping in most cases.
Or just give a subclassed encoder, as aychedee mentioned.

Related

Pydantic - How to add a field that keeps changing its name?

I am working with an API that is returning a response that contains fields like this:
{
"0e933a3c-0daa-4a33-92b5-89d38180a142": someValue
}
Where the field name is a UUID that changes depending on the request (but is not included in the actual request parameters). How do I declare that in a dataclass in Python? It would essentially be str: str, but that would interpret the key as literally "str" instead of a type.
I personally feel the simplest approach would be to create a custom Container dataclass. This would then split the dictionary data up, by first the keys and then individually by the values.
The one benefit of this is that you could then access the list by index value instead of searching by the random uuid itself, which from what I understand is something you won't be doing at all. So for example, you could access the first string value like values[0] if you wanted to.
Here is a sample implementation of this:
from dataclasses import dataclass
#dataclass(init=False, slots=True)
class MyContainer:
ids: list[str]
# can be annotated as `str: str` or however you desire
values: list[str]
def __init__(self, input_data: dict):
self.ids = list(input_data)
self.values = list(input_data.values())
def orig_dict(self):
return dict(zip(self.ids, self.values))
input_dict = {
"0e933a3c-0daa-4a33-92b5-89d38180a142": "test",
"25a82f15-abe9-49e2-b039-1fb608c729e0": "hello",
"f9b7e20d-3d11-4620-9780-4f500fee9d65": "world !!",
}
c = MyContainer(input_dict)
print(c)
assert c.values[0] == 'test'
assert c.values[1] == 'hello'
assert c.values[2] == 'world !!'
assert c.orig_dict() == input_dict
Output:
MyClass(values=['test', 'hello', 'world !!'], ids=['0e933a3c-0daa-4a33-92b5-89d38180a142', '25a82f15-abe9-49e2-b039-1fb608c729e0', 'f9b7e20d-3d11-4620-9780-4f500fee9d65'])

Reducing a function using dictionary

I have two functions that are very similar:
def hier_group(self):
if self.sku:
return {f"{self.hierarchic}": f"${self.hierarchic}", "id": "$id", "ix": "$ix"}
else:
return {f"{self.hierarchic}": f"${self.hierarchic}", "ix": "$ix"}
def hier_group_merge(self):
if self.sku:
return {f"{self.hierarchic}": f"${self.hierarchic}", "id": "$id"}
else:
return {f"{self.hierarchic}": f"${self.hierarchic}"}
I am trying to reduce into 1 function that has only one if/else.
The only difference in both functions is "ix": "$ix".
What I am trying to do is the following:
def hier_group(self, ix=True):
if self.sku:
return {f"{self.hierarchic}": f"${self.hierarchic}", "id": "$id" f'{',"ix": "$ix"' if ix == True else ""}'}
else:
return {f"{self.hierarchic}": f"${self.hierarchic}" f'{',"ix": "$ix"' if ix == True else ""}'}
But it's getting trick to return , "ix": "$ix".
Build a base dictionary, then add keys as appropriate.
def hier_group(self, ix=True):
d = { f'{self.hierarchic}': f'${self.hierarchic}' }
if self.sku:
d['id'] = '$id'
if ix:
d['ix'] = '$ix'
return d
However, there are many who believe using two functions, rather than having one function behave like two different functions based on a Boolean argument, is preferable.
def hier_group(self):
d = { f'{self.hierarchic}': f'${self.hierarchic}' }
if self.sku:
d['id'] = '$id'
return d
def hier_group_with_ix(self):
d = self.hier_group()
d.update('ix': '$ix')
return d
You might also use a private method that takes an arbitrary list of attribute names.
# No longer needs self, so make it a static method
#staticmethod
def _build_group(attributes):
return {f'{x}: f'${x} for x in attributes}
def build_group(self, ix=True):
attributes = [self.hierarchic]
if ix:
attributes.append('ix')
if self.sku:
attributes.append('id')
return self._build_group(attributes)
You will probably ask: why is using a Boolean attribute here OK? My justification is that you aren't really altering the control flow
of build_group with such an argument; you are using it to
build up a list of explicit arguments for the private method. (The dataclass decorator in the standard library takes a similar approach: a number of Boolean-valued arguments to indicate whether various methods should be generated automatically.)
You can avoid repeating common parts:
def hier_group(self, ix=True):
out = {f"{self.hierarchic}": f"${self.hierarchic}"}
if self.sku:
out["id"] = "$id"
if ix:
out["ix"] = "$ix"

Python: unable to access generated functions in a python class

I have a class that contains a nested dictionary that I want to make getters and setters for. I use a depth first search to generate the functions and add them to the class's __dict__ attribute, but when I try to call any of the generated functions, I just get an AttributeError: 'MyClass' object has no attribute 'getA'.
import operator
from functools import reduce
class MyClass:
def __init__(self):
self.dictionary = {
"a": {
"b": 1,
"c": 2
},
"d": {
"e": {
"f": 3,
"g": 4
}
}
}
self.addGettersSetters()
def addGettersSetters(self):
def makegetter(self, keyChain):
def func():
return reduce(operator.getitem, keyChain, self.dictionary)
return func
def makesetter(self, keyChain):
def func(arg):
print("setter ", arg)
path = self.dictionary
for i in keyChain[:-1]:
path = path[i]
path[keyChain[-1]] = arg
return func
# depth first search of dictionary
def recurseDict(self, dictionary, keyChain=[]):
for key, value in dictionary.items():
keyChain.append(key)
# capitalize the first letter of each part of the keychain for the function name
capKeyChain = [i.title().replace(" ", "")
for i in keyChain]
# setter version
print('set{}'.format("".join(capKeyChain)))
self.__dict__['set{}'.format(
"".join(capKeyChain))] = makesetter(self, keyChain)
# getter version
print('get{}'.format("".join(capKeyChain)))
self.__dict__['set{}'.format(
"".join(capKeyChain))] = makegetter(self, keyChain)
# recurse down the dictionary chain
if isinstance(value, dict):
recurseDict(self, dictionary=value,
keyChain=keyChain)
# remove the last key for the next iteration
while keyChain[-1] != key:
keyChain = keyChain[: -1]
keyChain = keyChain[: -1]
recurseDict(self, self.dictionary)
print(self.__dict__)
if __name__ == '__main__':
myclass = MyClass()
print(myclass.getA())
If you run this code, it outputs the names of all of the generated functions as well as the state of __dict___ after generating the functions and terminates with the AttributionError.
What has me puzzled is that I used another piece of code that uses essentially the same methodology as an example for how to generate getters and setters this way. That piece of code works just fine, but mine does not and, per my eyes and research, I am at a loss as to why. What am I missing here?
For reference I am running Anaconda Python 3.6.3

Reverse of `__getitem___`

d[x] where d is a dict, invokes d.__getitem__(x). Is there a way to create a class F, so that y=F(X); d[y] would invoke some method in F instead: y.someMethod(d)?
Background: I'm trying to make a dict with "aliased" keys, so that if I have d[a]=42, then d[alias_of_a] would return 42 as well. This is pretty straightforward with the custom __getitem__, for example:
class oneOf(object):
def __init__(self, *keys):
self.keys = keys
class myDict(dict):
def __getitem__(self, item):
if isinstance(item, oneOf):
for k in item.keys:
if k in self:
return self[k]
return dict.__getitem__(self, item)
a = myDict({
'Alpha': 1,
'B': 2,
})
print a[oneOf('A', 'Alpha')]
print a[oneOf('B', 'Bravo')]
However, I'm wondering if it could be possible without overriding dict:
a = {
'Alpha': 1,
'B': 2,
}
print a[???('A', 'Alpha')]
print a[???('B', 'Bravo')]
If this is not possible, how to make it work the other way round:
a = {
???('A', 'Alpha'): 1,
???('B', 'Bravo'): 2,
}
print a['A']
print a['Bravo']
What it important to me is that I'd like to avoid extending dict.
This use-case is impossible:
a = {
'Alpha': 1,
'B': 2,
}
a[???('A', 'Alpha')]
a[???('B', 'Bravo')]
This is because the dict will first hash the object. In order to force a collision, which will allow overriding equality to take hold, the hashes need to match. But ???('A', 'Alpha') can only hash to one of 'A' or 'Alpha', and if it makes the wrong choice it has failed.
The other use-case has a similar deduction applied to it:
a = {
???('A', 'Alpha'): 1,
???('B', 'Bravo'): 2,
}
a['A']
a['Bravo']
a['A'] will look up with a different hash to a['Alpha'], so again ???('A', 'Alpha') needs to have both hashes, which is impossible.
You need cooperation from both the keys and the values in order for this to work.
You could in theory use inspect.getouterframes in the __hash__ method to check the values of the dictionary, but this would only work if dictionaries had Python frames. If your intent is to monkey patch a function that sort-of does what you want but not quite, this might (just about) work(ish, sort of).
import inspect
class VeryHackyAnyOfHack:
def __init__(self, variable_name_hack, *args):
self.variable_name_hack = variable_name_hack
self.equal_to = args
def __eq__(self, other):
return other in self.equal_to
def __hash__(self):
outer_frame = inspect.getouterframes(inspect.currentframe())[1]
assumed_target_dict = outer_frame[0].f_locals[self.variable_name_hack]
for item in self.equal_to:
if item in assumed_target_dict:
return hash(item)
# Failure
return hash(item[0])
This is used like so:
import random
def check_thing_agains_dict(item):
if random.choice([True, False]):
internal_dict = {"red": "password123"}
else:
internal_dict = {"blue": "password123"}
return internal_dict[item]
myhack = VeryHackyAnyOfHack('internal_dict', "red", "blue")
check_thing_agains_dict(myhack)
#>>> 'password123'
Again, the very fact that you have to do this means that in practice it's not possible. It's also a language extension, so this isn't portable.
The built-in dict provides very simple lookup semantics: given a hashable object x, return the object y that x was mapped to previously. If you want multiple keys that map to the same object, you'll need to set that up explicitly:
# First, initialize the dictionary with one key per equivalence class
a = { 'a': 1, 'b': 2 }
# Then, set up any aliases.
a['Alpha'] = a['a']
a['Bravo'] = a['b']
The TransformDict class being considered for inclusion in Python 3.5 would simplify this somewhat by allowing you to replace step 2 with a "secondary" lookup function that would map the given key to its canonical representation prior to the primary lookup. Something like
def key_transform(key):
if key in {'Alpha', 'Aleph'}:
return 'a'
elif key in {'Bravo', 'Beta', 'Beth'}:
return 'b'
a = TransformDict(key_transform, a=1, b=2)
assert a['Alpha'] is a['a']

Using function value as kwargs.get() default

I have a factory function for a model with several foreign keys in my unit tests. I would like for that factory function to be variadic, allowing the user to specify the objects to use as foreign keys as keyword arguments, but calling the relevant factory function to spawn a new one for any that are left out.
I originally wrote something like:
def model_factory(i, **kwargs):
"""Create a new Model for testing"""
test_model_data = {
'fk1': kwargs.get('fk1', fk1_factory(i)),
'fk2': kwargs.get('fk2', fk2_factory(i)),
'fk3': kwargs.get('fk3', fk3_factory(i)),
}
return Model.objects.create(**test_model_data)
but this calls the fkN_factory() methods even if the keyword is present, causing a lot of side effects that are interfering with my tests. My question is whether or not there is a simpler way to do what I intended here without resulting in lots of needless function calls, rather than what I have now, which is more like:
def model_factory(i, **kwargs):
"""Create a new Model for testing"""
test_model_data = {
'fk1': kwargs.get('fk1', None),
'fk2': kwargs.get('fk2', None),
}
if kwargs['f1'] is None:
kwargs['f1'] = fk1_factory(i)
if kwargs['f2'] is None:
kwargs['f2'] = fk2_factory(i)
You want to factor out that repeated code in some way. The simplest is:
def get_value(mapping, key, default_func, *args):
try:
return mapping[key]
except KeyError:
return default_func(*args)
# ...
test_model_data = {
'fk1': get_value(kwargs, 'fk1', fk1_factory, i),
'fk2': get_value(kwargs, 'fk2', fk2_factory, i),
# etc.
}
Almost as simple as your original non-working version.
You could take this even farther:
def map_data(mapping, key_factory_map, *args):
return {key: get_value(mapping, key, factory, *args)
for key, factory in key_factory_map.items()}
# …
test_model_data = map_data(kwargs, {
'fk1': fk1_factory,
'fk2': fk2_factory,
# …
}, i)
But I'm not sure that's actually better. (If you have an obvious place to define that key-to-factory mapping out-of-line, it probably is; if not, probably not.)

Categories