serialize children in marshmallow-sqlalchemy - python

Marshmallow normally serializes nested children (assuming nested schema are defined). For example:
{
'id': 2,
'messages' : [
{
'id': 1,
'message': 'foo'
},
{
'id': 2,
'message': 'bar'
}
]
}
However, marshmallow-sqlalchemy causes children to simply be represented by their primary key. For example:
{
'id': 2,
'messages' : [
1,
2
]
}
How can I get marshmallow-sqlalchemy to serialize the child objects. Preferably, I should be able to specify a depth. For example: serialize to 4 layers deep, then use the uid behavior.
Ideally, this should be configurable via schema.load() because it should be dynamic based on where the serialization began. In other words, coupling this to the schema itself doesn't make sense. However, if that's the only way to do it, I'm curious to hear a solution like that as well.

You can used a Nested field to do this.
Assuming you have a schema that looks like this:
class MessageSchema(Schema):
msg_id = fields.Integer()
message = fields.String()
something = fields.String()
class ListSchema(Schema):
list_id = fields.Integer()
messages = fields.Nested(MessageSchema, only=['msg_id', 'message'])
To include only certain fields, use the only parameter when calling Nested (see example above for usage, where only msg_id and message fields are included in the serialized output).
The Marshmallow docs have a more detailed + complete example:

Related

elasticsearch-dsl in python: How can I return the "inner hits" for my query?

I am currently exploring elasticsearch in python using the elasticsearch_dsl library. I am aware that my Elasticsearch knowledge is currently limited.
I have created a model like so:
class Post(InnerDoc):
text = Text()
id = Integer()
class User(Document):
name = Text()
posts = Object(doc_class=Posts)
signed_up_at = Date()
The data for posts is an array like this:
[
{
"text": "Test",
"id": 2
},
]
Storing my posts works. However, to me this seems wrong. I specify the "posts" attribute to be a Post - not a List of Posts.
Querying works, I can:
s = Search(using=client).query("match", posts__text="test")
and will retrieve the User that has a post containing the words as a result.
What I want is that I get the user + all Posts that qualified the user to appear in the result (meaning all posts containing the search phrase). I called that the inner hits, but I am not sure if this is correct.
Help would be highly appreciated!
I tried using "nested" instead of "match" for the query, but that does not work:
[nested] query does not support [posts]
I suspect that this has to do with the fact that my index is specified incorrectly.
I updated my model to this:
class Post(InnerDoc):
text = Text(analyzer="snowball")
id = Integer()
class User(Document):
name = Text()
posts = Nested(doc_class=Posts)
signed_up_at = Date()
This allows me to do the following query:
GET users/_search
{
"query": {
"nested": {
"path": "posts",
"query": {
"match": {
"posts.text": "idea"
}
},
"inner_hits": {}
}
}
}
This translates to the following elasticsearch-dsl query in python:
s = (
Search(using=client).query(
"nested",
path="posts",
query=Q("term", **{"post.text": "Idea"}),
inner_hits={},
)
Access inner hits like this:
Using Nested might be required, because of how elasticsearch represents objects internally (https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html). As lists of objects might be flattened, it might not allow to retrieve complete inner hits that contain the correct association of text and id for a post.

Django Rest Framework: POST and PUT in a single nested request

I'm using Django Rest Framework to create an object. The JSON contains nested objects as well; an array of objects to create and link to the "main object" and an object that should be partially updated.
JSON looks like this:
{
"opcitem_set" : [
{
"comment" : "Test comment",
"grade" : "1",
"name" : "1a"
},
{
"comment" : "Test comment",
"grade" : "2",
"name" : "1b"
},
{
"description" : "Additional test item",
"comment" : "Additional comment",
"grade" : "1",
"name" : "extra_1"
}
],
"is_right_seat_training" : false,
"checked_as" : "FC",
"date" : "2015-10-23",
"check_reason" : "Check ride",
"opc_program" : "2",
"is_pc" : true,
"questionnaire_test_passed" : "Passed",
"pnf_time" : 2,
"other_comments_complete_crew" : "Other comments",
"other_comments_flying_pilot" : "Other comments",
"is_cat_2_training" : false,
"opc_passed" : "Passed",
"pilot" : {
"pc_valid_to" : "2015-10-23",
"id" : 721,
"email" : "jens.nilsson#nextjet.se",
"total_time" : 3120,
"medical_valid_to" : "2015-10-23"
},
"pf_time" : 2,
"aircraft_type" : "S340",
"typeratingexaminer" : 734
}
The "opcitem_set" contains objects of type OpcItem that should be created and have a ForeignKey to the main object. So far so good, I can do this by overriding the create() method on the ModelSerializer as outlined in http://www.django-rest-framework.org/api-guide/serializers/#writable-nested-representations.
Then we have the case of the "pilot" object. This will always contain an ID and some other fields to PATCH the object with that ID.
The "typeratingexaminer" field is just another "Pilot" object, but it shouldn't be PATCHed, just set as a foreign key.
My question is: Can I PATCH (partially update) the "pilot" as well in the create() method, or would that break some sort of design pattern? Since it's really a PATCH and not a POST, should I do it in a separate request after the original request has finished? In that case, can I have a transaction spanning two requests, so that if the second request fail, the first request will be rolled back?
Would love to be able to send only one request from the client instead of splitting it in two requests. Maybe you could separate the JSON already in the ViewSet and send it to different serializers?
Happy to hear your thoughts about this, I'm a bit lost.
If you are not creating a main object but only nested objects you should override .update() method in the serializer and do somethink like this:
def update(self, instance, validated_data):
if 'opcitem_set' in validated_data:
opcitem_set_data = validated_data.pop('opcitem_set')
if 'pilot' in validated_data:
pilot_data = validated_data.pop('pilot')
...
for opcitem_set in opcitem_set_data:
Opcitem.objects.create(main_object=instance,
**opcitem_set)
current_pilot = self.instance.pilot
current_pilot.pc_valid_to = pilot_data.get('name', current_pilot.pc_valid_to)
...
current_pilot.save()
"""
Update instance as well if you need
"""
return instance
If you need to create main object as well then you need to override .create() method. But then PATCHing pilot will be not really a good way to do that.
I would recommend to move away from the serializer create method and build your extensive logic in the view, where you could make good use of simpler, dumb serializers where needed. You could surly do updates in the create method of your serializer but suddenly that's not a serializer anymore, it's more of a controller, thus it would be better placed in the view code, by overwriting the create or post method; this design enables you to have only one request from the client, you can massage the request data in the view code and use simple serializers to instantiate/update objects, with embedded data validation, if needed.
If you have models and serializers you could share, we might be able to comment more to the point.

Django serializing Queryset with related entity fields

I'm trying to join 2 entities, get specific fields from them, and return a JSON of that.
I tried writing the following code:
import datetime
result = Foo.objects.all()
result = result.select_related('bar').extra(select={'bar_has_address':'IF(bar.has_address = '',0,1)'})
result = result.filter(time__gte=datetime.date.today())
return HttpResponse(serializers.serialize('json', result),mimetype="application/json")
Now I'm only getting a json containing the fields of Foo, whereas I want to get Bar's fields as well, ideally the returned JSON would have specific fields from both entities:
[{
'name': 'lorem ipsum', //from Foo
'has_address': 1, //from Bar
'address': 'some address', //from Bar
'id': 1, //from Foo
},... ]
even under result.values('...') I'm not getting any of Bar's fields
What am I missing here?
As far as I know, django built-in serializers cannot work with model related fields. Take a look at:
DjangoFullSerializers
this answer and suggested serializer
relevant open ticket in django issue tracker
Also see:
Django serialization of inherited model
Serialize django models with reverse One-To-One fields to JSON
Hope that helps.

mongokit index does not work

I am developing a Web application using Flask and MongoDB. And I use (Flask-)MongoKit to define a schema to validate my data.
In my database, there is a collection called "users" (see below) that contains a field "email". I try to create a unique index on that field as specified in the MongoKit documentation (http://namlook.github.com/mongokit/indexes.html). However, when I check the collection indexes via MongoDB client shell, there is no index "email" at all.
I found a similar issue on the net: "unique index does not work" (https://github.com/namlook/mongokit/issues/98)
Does someone has any idea why it does not work?
User collection:
#db.register
class User(Model):
__collection__ = 'users'
structure = {
'first_name': basestring,
'last_name': basestring,
'email': basestring,
'password': unicode,
'registration_date': datetime,
}
required_fields = ['first_name', 'last_name', 'email', 'password', 'registration_date']
default_values = {
'registration_date': datetime.utcnow,
}
# Create a unique index on the "email" field
indexes = [
{
'fields': 'email', # note: this may be an array
'unique': True, # only unique values are allowed
'ttl': 0, # create index immediately
},
]
db.users.getIndexes() output:
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "youthmind.users",
"name" : "_id_"
},
]
Note that I also try without 'ttl':0, and I was able to create an index using the following piece of code:
db.users.create_index('email', unique=True)
I think this uses the pymongo Connection object directly.
Thanks in advance for your help.
You are doing it exactly how you should be doing it. Automatic index creation has been removed from MongoKit as of version 0.7.1 (maybe version 0.8?). Here is an issue for it.
The reason behind it is that the it would have to call ensureIndex on the collection. The "ensure" part of the name makes it seem like it would check and then create the index if it doesn't exist, but a developer from Mongo said that it might still wind up (re-)creating the entire index, which could be terribly expensive. The developer also said it should be considered an administrative task, instead of a development task.
The work around is to call create_index yourself on for each index in the list you've defined as part of an upgrade/create script.
Right, you need to use separate script to recreate DB with indexes. It will be called if needed, not each time server runs. Example:
def recreatedb(uri, database_name):
connection = Connection(uri)
connection.drop_database(database_name)
#noinspection PyStatementEffect
connection[database_name]
connection.register(_DOCUMENTS)
for document_name, obj in connection._registered_documents.iteritems():
obj.generate_index(connection[database_name][obj._obj_class.__collection__])
To prevent using database without indexes:
def init_engine(uri, database_name):
global db
connection = Connection(uri)
if database_name not in connection.database_names():
recreatedb(uri, database_name)
connection.register(_DOCUMENTS)
db = connection[database_name]
I use Flask-Script so it was easy to add Marboni's answer as a command to my manage script which is easy to run.
#manager.command
def setup_indexes():
"""
create index for all the registered_documents
"""
for doc in application.db.registered_documents:
collection = application.db[doc.__collection__]
doc.generate_index(collection)
I keep my database as member of app (application.db) for various admin stuff. Now whenever I add few index or change anything I run my manager command.
./manage.py setup_indexes
You can read more about manager module here
http://flask-script.readthedocs.org/en/latest/

Excluding primary key in Django dumpdata with natural keys

How do you exclude the primary key from the JSON produced by Django's dumpdata when natural keys are enabled?
I've constructed a record that I'd like to "export" so others can use it as a template, by loading it into a separate databases with the same schema without conflicting with other records in the same model.
As I understand Django's support for natural keys, this seems like what NKs were designed to do. My record has a unique name field, which is also used as the natural key.
So when I run:
from django.core import serializers
from myapp.models import MyModel
obj = MyModel.objects.get(id=123)
serializers.serialize('json', [obj], indent=4, use_natural_keys=True)
I would expect an output something like:
[
{
"model": "myapp.mymodel",
"fields": {
"name": "foo",
"create_date": "2011-09-22 12:00:00",
"create_user": [
"someusername"
]
}
}
]
which I could then load into another database, using loaddata, expecting it to be dynamically assigned a new primary key. Note, that my "create_user" field is a FK to Django's auth.User model, which supports natural keys, and it output as its natural key instead of the integer primary key.
However, what's generated is actually:
[
{
"pk": 123,
"model": "myapp.mymodel",
"fields": {
"name": "foo",
"create_date": "2011-09-22 12:00:00",
"create_user": [
"someusername"
]
}
}
]
which will clearly conflict with and overwrite any existing record with primary key 123.
What's the best way to fix this? I don't want to retroactively change all the auto-generated primary key integer fields to whatever the equivalent natural keys are, since that would cause a performance hit as well as be labor intensive.
Edit: This seems to be a bug that was reported...2 years ago...and has largely been ignored...
Updating the answer for anyone coming across this in 2018 and beyond.
There is a way to omit the primary key through the use of natural keys and unique_together method. Taken from the Django documentation on serialization:
You can use this command to test :
python manage.py dumpdata app.model --pks 1,2,3 --indent 4 --natural-primary --natural-foreign > dumpdata.json ;
Serialization of natural keys
So how do you get Django to emit a natural key when serializing an object? Firstly, you need to add another method – this time to the model itself:
class Person(models.Model):
objects = PersonManager()
first_name = models.CharField(max_length=100)
last_name = models.CharField(max_length=100)
birthdate = models.DateField()
def natural_key(self):
return (self.first_name, self.last_name)
class Meta:
unique_together = (('first_name', 'last_name'),)
That method should always return a natural key tuple – in this example, (first name, last name). Then, when you call serializers.serialize(), you provide use_natural_foreign_keys=True or use_natural_primary_keys=True arguments:
serializers.serialize('json', [book1, book2], indent=2, use_natural_foreign_keys=True, use_natural_primary_keys=True)
When use_natural_foreign_keys=True is specified, Django will use the natural_key() method to serialize any foreign key reference to objects of the type that defines the method.
When use_natural_primary_keys=True is specified, Django will not provide the primary key in the serialized data of this object since it can be calculated during deserialization:
{
"model": "store.person",
"fields": {
"first_name": "Douglas",
"last_name": "Adams",
"birth_date": "1952-03-11",
}
}
The problem with json is that you can't omit the pk field since it will be required upon loading of the fixture data again. If not existing, json will fail with
$ python manage.py loaddata some_data.json
[...]
File ".../django/core/serializers/python.py", line 85, in Deserializer
data = {Model._meta.pk.attname : Model._meta.pk.to_python(d["pk"])}
KeyError: 'pk'
As pointed out in the answer to this question, you can use yaml or xml if you really want to omit the pk attribute OR just replace the primary key value with null.
import re
from django.core import serializers
some_objects = MyClass.objects.all()
s = serializers.serialize('json', some_objects, use_natural_keys=True)
# Replace id values with null - adjust the regex to your needs
s = re.sub('"pk": [0-9]{1,5}', '"pk": null', s)
Override the Serializer class in a separate module:
from django.core.serializers.json import Serializer as JsonSerializer
class Serializer(JsonSerializer):
def end_object(self, obj):
self.objects.append({
"model" : smart_unicode(obj._meta),
"fields" : self._current,
# Original method adds the pk here
})
self._current = None
Register it in Django:
serializers.register_serializer("json_no_pk", "path.to.module.with.custom.serializer")
Add use it:
serializers.serialize('json_no_pk', [obj], indent=4, use_natural_keys=True)

Categories