Subclassing python dictionaries - python

I am using ElasticSearch in my Python application and want to be able to create a reusable dictionary object that represents a query. The JSON structures are described here http://pulkitsinghal.blogspot.co.uk/2012/02/how-to-use-elasticsearch-query-dsl.html and I am using PyES to query the search server. With PyES we can pass a dict object which gets jsonified before sending to the server. I want to create a library of common queries where only the actual query term changes, so I thought I would subclass dict so I could pass in the query term via the constructor, for example, and when the dict gets jsonified I would end up with something like this:
{
"fields": [
"name",
"shortDescription",
"longDescription"
],
"query": {
"query_string": {
"fields": [
"name"
],
"query": query_term,
"use_dis_max": true
}
}
}
How would I do this? Is it true that only instance members get returned via __dict__ if so would I have to set up this data structure in the constructor? Is this the best way of doing this or should I create a class that does not extend dict and just create a to_dict() method that returns a dictionary in the correct structure?
Answer:
This seems to work fine, any suggestions for making this more 'pythonic' will be appreciated! (Yes I know there are no doc strings)
class StandardQuery(object):
search_fields = ['meta_keywords', 'meta_description', \
'fields.title.value', 'slug']
return_fields = ['slug', 'fields.title.value']
def __init__(self, query):
self.query = query
def to_dict(self):
output = dict()
output['fields'] = self.search_fields
output['query'] = {'query_string': {'fields': self.return_fields, \
'query': self.query, 'use_dis_max': True}}
return output

If you don't want all of the normal dict behaviour then you should definitely just create a separate class. You can then either give it a to_dict() method, or better since you really want to convert to json create a custom json encoder (and if required decoder) to use with the default argument of json.dumps().
In json.dump() and json.dumps() the optional argument default is a callable that should either return a serialized version of the object or raise TypeError (to get the default behaviour).

Related

Django validate a python dictionary via DictField

I'm new to the django rest framework and have two basic questions about something I don't fully understand. I have a python function which takes as an input a dictionary and would like to make it available via an API.
In [24]: def f(d):
...: pass
...:
I'm testing a post request via postman sending a json file which should be translated to my python dictionary. The input I'm sending looks like
{
"test": {
"name1": {
"list": ["a",
"b",
"c"
],
"numbers": [0.0, 0.1]
}
}
}
As you can see this is a dict currently with one key name1 which itself is a dict with two keys, list and numbers.
The view set I wrote for this example looks like this:
class TestViewSet(viewsets.ViewSet):
def create(self, request):
serializer = TestSerializer(data=request.data)
print(type(request.data.get("test")))
if serializer.is_valid():
print(serializer.validated_data.get("test"))
f(serializer.validated_data.get("test"))
and my serializer looks like this:
class TestSerializer(serializers.Serializer):
test = serializers.DictField()
If I send the json input above I get the printed the following:
<type 'dict'>
None
So I have the following two questions:
As we can see the request.data.get("test") is already the desired type (dict). Why do we want to call the serializer and doing some validation. This might cast certain inputs to not standard python types. E.g. validating a decimal via the DecimalField returns a object of type Decimal which is a Django type not a standard python type. Do I need to recast this after calling the serializer or how do I know that this won't cause any trouble with function expecting for example native python float64 type?
How can I define the serializer in the above example for the dictionary so that I get returned the correct object and not None? This dictionary consists of a dictionary consisting of two keys, where one value is a list of decimals and the other a list of strings. How can I write a validation for this within the DictField?

Django Rest Framework ModelSerializer create if not exists

I'm using Django Rest Framework 3.6.3. I'm trying to write nested create serializers for one optional argument and one required argument inside a child. I know that I need to override create in the base serializer to tell DRF how to create the nested children - http://www.django-rest-framework.org/api-guide/relations/#writable-nested-serializers.
However, I can't figure out how to get it to parse the object information without telling it what the nested child serializer is defined as. Doing that then causes DRF to use it to parse the children objects, which then returns a validation error that I have no control over because the child serializer doesn't call its create method.
Base
Specification(many=True)
OptionalField
RequiredField
The information I pass in is a JSON object:
{
base_id: 1,
base_info: 'blah'
specifications: [
{
specification_id: 1,
optional_info: {
optional_id: 1,
optional_stuff: 'blah'
},
required_info: {
required_id: 1,
required_stuff: 'required',
}
}
]
}
The BaseCreationSerializer calls it's create method. I know that I need to pull out the rest of the information and create it manually. However, I can't figure out how to get the BaseCreationSerializer to parse the data into validated_data without defining specification = SpecificationCreationSerializer(), which then tries to parse that and throws an error. Printing self shows the entire JSON object in data, but then validated_data only contains the subset of things that the serializer knows about. How do I parse the appropriate things from data so that I can create the objects on my own?

Django filter JSONField list of dicts

I run Django 1.9 with the new JSONField and have the following Test model :
class Test(TimeStampedModel):
actions = JSONField()
Let's say the action JSONField looks like this :
[
{
"fixed_key_1": "foo1",
"fixed_key_2": {
"random_key_1": "bar1",
"random_key_2": "bar2",
}
},
{
"fixed_key_1": "foo2",
"fixed_key_2": {
"random_key_3": "bar2",
"random_key_4": "bar3",
}
}
]
I want to be able to filter the foo1 and foo2 keys for every item of the list.
When I do :
>>> Test.objects.filter(actions__1__fixed_key_1="foo2")
The Test is in the queryset. But when I do :
>>> Test.objects.filter(actions__0__fixed_key_1="foo2")
It isn't, which makes sense. I want to do something like :
>>> Test.objects.filter(actions__values__fixed_key_1="foo2")
Or
>>> Test.objects.filter(actions__values__fixed_key_2__values__contains="bar3")
And have the Test in the queryset.
Any idea if this can be done and how ?
If you wan't to filter your data by one of fields in your array of dicts, you can try this query:
Test.objects.filter(actions__contains=[{'fixed_key_1': 'foo2'}])
It will list all Test objects that have at least one object in actions field that contains key fixed_key_1 of value foo2.
Also it should work for nested lookup, even if you don't know actual indexes:
Test(actions=[
{'fixed_key_1': 'foo4', 'fixed_key_3': [
{'key1': 'foo2'},
]}
}).save()
Test.objects.filter(actions__contains=[{'fixed_key_3': [{'key1': 'foo2'}]}])
In simple words, contains will ignore everything else.
Unfortunately, if nested element is an object, you must know key name. Lookup by value won't work in that case.
You should be able to use a __contains lookup for this and pass queried values as list as documented here. The lookup would behave exactly like ArrayField. So, something like this should work:
Test.objects.filter(actions__contains=[{'fixed_key_1': 'foo2'}])
You can use the django-jsonfield package, I guess it's already the one you are using.
from jsonfield import JSONField
class Test(TimeStampedModel):
actions = JSONField()
So to search to make a search with a specific property, you can just do this:
def test_filter(**kwargs):
result = Test.objects.filter(actions__contains=kwargs)
return result
If you are using PostgreSQL, maybe you can take advantage of PostgreSQL specific model fields.
PS: If you are dealing with a lot of JSON structure you have maybe to consider using NoSQL Database.

Mongodb query return type

When I make a query in Mongodb using Mongokit in Python, it returns a json document object. However I need to use the return value as a model type that I have defined. For example, if I have the class:
class User(Document):
structure = {
'name': basestring
}
and make the query
user = db.users.find_one({'name':'Mike'})
I want user to be an object of type User, so that I can embed it into other objects that have fields of type User. However it just returns a json document. Is there a way to cast it or something? This seems like something that should be very intuitive and easy to do.
From what I can see Mongokit is built on the top of pymongo, and pymongo find has an argument called as_class:
as_class (optional): class to use for documents in the query result (default is document_class)
http://api.mongodb.org/python/current/api/pymongo/collection.html#pymongo.collection.Collection.find

Object schemas/models without ORM, DOM or forms

I've used MongoEngine a lot lately. Apart from the MongoDB integration, I like the idea of defining the structures of entities explicitly. Field definitions make code easier to understand. Also, using those definitions, I can validate objects to catch potential bugs or serialize/deserialize them more accurately.
The problem with MongoEngine is that it is designed specifically to work with a storage engine. The same applies for Django and SQLAlchemy models, which also lack list and set types. My question is, then, is there an object schema/model library for Python that does automated object validation and serialization, but not object-relational mapping or any other fancy stuff?
Let me give an example.
class Wheel(Entity):
radius = FloatField(1.0)
class Bicycle(Entity):
front = EntityField(Wheel)
back = EntityField(Wheel)
class Owner(Entity):
name = StringField()
bicycles = ListField(EntityField(Bicycle))
owner = Owner(name='Eser Aygün', bicycles=[])
bmx = Bicycle()
bmx.front = Wheel()
bmx.back = Wheel()
trek = Bicycle()
trek.front = Wheel(1.2)
trek.back = Wheel(1.2)
owner.bicycles.append(bmx)
owner.bicycles.append(trek)
owner.validate() # checks the structure recursively
Given the structure, it is also easy to serialize and deserialize objects. For example, owner.jsonify() may return the dictionary
{
'name': 'Eser Aygün',
'bicycles': [{
'front': {
radius: 1.0
},
'back': {
radius: 1.0
}
}, {
'front': {
radius: 1.2
},
'back': {
radius: 1.2
}
}],
}
and you can easily convert it back calling owner.dejsonify(dic).
If anyone is still looking, as python-entities hasn't been updated for a while, there are some good libraries out there:
schematics - https://schematics.readthedocs.org/en/latest/
colander - http://docs.pylonsproject.org/projects/colander/en/latest/
voluptuous - https://pypi.python.org/pypi/voluptuous (more of a validation library)
Check out mongopersist, which uses mongo as a persistence layer for Python objects like ZODB. It does not perform schema validation, but it lets you move objects between Mongo and Python transparently.
For validation or other serialization/deserialization scenarios (e.g. forms), consider colander. Colander project description:
Colander is useful as a system for validating and deserializing data obtained via XML, JSON, an HTML form post or any other equally simple data serialization. It runs on Python 2.6, 2.7 and 3.2. Colander can be used to:
Define a data schema.
Deserialize a data structure composed of strings, mappings, and lists into an arbitrary Python structure after validating the data structure against a data schema.
Serialize an arbitrary Python structure to a data structure composed of strings, mappings, and lists.
What you're describing can be achieved with remoteobjects, which contains a mechanism (called dataobject) to allow you to define the structure of an object such that it can be validated and it can be marshalled easily to and from JSON.
It also includes some functionality for building a REST client library that makes HTTP requests, but the use of this part is not required.
The main remoteobjects distribution does not come with specific StringField or IntegerField types, but it's easy enough to implement them. Here's an example BooleanField from a codebase I maintain that uses remoteobjects:
class BooleanField(dataobject.fields.Field):
def encode(self, value):
if value is not None and type(value) is not bool:
raise TypeError("Requires boolean")
return super(BooleanField, self).encode(value)
This can then be used in an object definition:
class ThingWithBoolean(dataobject.DataObject):
my_boolean = BooleanField()
And then:
thing = ThingWithBoolean.from_dict({"my_boolean":true})
thing.my_boolean = "hello"
return json.dumps(thing.to_dict()) # will fail because my_boolean is not a boolean
As I said earlier in a comment, I've decided to invent my own wheel. I started implementing an open-source Python library, Entities, that just does what I wanted. You can check it out from https://github.com/eseraygun/python-entities/.
The library supports recursive and non-recursive collection types (list, set and dict), nested entities and reference fields. It can automatically validate, serialize, deserialize and generate hashable keys for entities of any complexity. (In fact, de/serialization feature is not complete yet.)
This is how you use it:
from entities import *
class Account(Entity):
id = IntegerField(group=PRIMARY) # this field is in primary key group
iban = IntegerField(group=SECONDARY) # this is in secondary key group
balance = FloatField(default=0.0)
class Name(Entity):
first_name = StringField(group=SECONDARY)
last_name = StringField(group=SECONDARY)
class Customer(Entity):
id = IntegerField(group=PRIMARY)
name = EntityField(Name, group=SECONDARY)
accounts = ListField(ReferenceField(Account), default=list)
# Create Account objects.
a_1 = Account(1, 111, 10.0) # __init__() recognizes positional arguments
a_2 = Account(id=2, iban=222, balance=20.0) # as well as keyword arguments
# Generate hashable key using primary key.
print a_1.keyify() # prints '(1,)'
# Generate hashable key using secondary key.
print a_2.keyify(SECONDARY) # prints '(222,)'
# Create Customer object.
c = Customer(1, Name('eser', 'aygun'))
# Generate hashable key using primary key.
print c.keyify() # prints '(1,)'
# Generate hashable key using secondary key.
print c.keyify(SECONDARY) # prints '(('eser', 'aygun'),)'
# Try validating an invalid object.
c.accounts.append(123)
try:
c.validate() # fails
except ValidationError:
print 'accounts list is only for Account objects'
# Try validating a valid object.
c.accounts = [a_1, a_2]
c.validate() # succeeds

Categories