How to deserialize a bjson structure to an marshmallow schema - python

I'm trying to convert bjson structure to a schema in marshmallow library.
Below is the marshmallow schema:
class GeneSchema(Schema):
"""description of class"""
id_entrez = fields.Integer(required = True, error_messages={'required': "The 'id_entrez' field is requeired."})
symbol = fields.String()
#validates('id_entrez')
def validate_id_entrez(self, data):
if data <= 0:
raise ValidationError("The 'id_entrez' field must be greater than zero.")
Below is the bjson will be converted to schema:
[{"symbol": "VAMP4", "_id": {"$oid": "57ae3b175a945932fcbdf41d"}, "id_entrez": 8674}, {"symbol": "CCT5", "_id": {"$oid": "57ae3b175a945932fcbdf41e"}, "id_entrez": 22948}]
Note that the bjson has the "_id" as ObjectId - "$oid". This is because the result of the query using the mongodb.
Please, does anyone know why not be to convert from bjson to marshmallow schema correctly ?
Thank you all!

I don't know if this question is still valid, but I would like to show my solution, how to push an ObjectId to a Marshmallow schema:
I simply use the pre processing method pre_load to convert the ObjectId to a String:
#pre_load
def convert_objectid(self, in_data, **kwargs):
if "_id" in in_data:
in_data["_id"] = str(in_data["_id"])
return in_data

You can still use your schema to parse MongoDB output, just ignore extra "_id" field. If on the other hand you do want to parse that "_id", just add extra non-required field in your schema.

Related

Serializing data with marshmallow if data key is unknown

I'm trying to serialise data where the data key is not known beforehand. The data would look something like this:
{
"unknown key": ["list", "of", "random", "strings"]
}
Is this something that's possible with marshmallow? I can't seem to find anything in the docs about it.
You can implement a method to dynamically generatea Schema class at runtime.
from marshmallow import Schema as BaseSchema
class Schema(BaseSchema):
#classmethod
def from_dict(cls, fields_dict):
attrs = fields_dict.copy()
attrs["Meta"] = type(
"GeneratedMeta", (getattr(cls, "Meta", object),), {"register": False}
)
return type("", (cls,), attrs)
Usage:
from marshmallow import fields
MySchema = Schema.from_dict({
"unknown_key": fields.List(fields.Str())
})
Update: Schema.from_dict is now built-in method in marshmallow 3.0, which will be released soon. See https://stevenloria.com/dynamic-schemas-in-marshmallow/ for usage examples.

flask-marshmallow define schema to convert simple json to complex orm

I have some entities in sqlalchemy. They are User and messages which they sent UserMessage:
class User(db.Model):
__tablename__ = 'User'
email = Column(String)
class UserMessage(db.Model):
__tablename__ = 'UserMessages'
date = Column(String)
message = Column(String)
user_id = Column(Integer, ForeignKey('User.uid'))
user = relationship('User')
Now, I want to create a flask resource which let's applications post information in a way which does not map directly to the way the models are actually defined. For example, something like this:
Json I want to parse
{
"date": '12345',
"user": {
"email": "test#email.com"
},
"message": "I'm a message"
}
All the examples I see for marshmallow involve converting objects in json which are structured in the same way as the orm object. In this example, you would expect the json to look more like this:
Json marshmallow wants
{
"user": {
"email": "test#email.com",
"messages": {
"message": "I'm a message"
},
"date": "12345"
}
}
Will marshammlow allow me to define a more arbitrary schema which will translate between this json and my internal orm objects? If so, can someone please point me at documentation or an example?

Peewee ORM JSONField for MySQL

I have a peewee model like so:
class User(peewee.Model):
name = peewee.CharField(unique=True)
some_json_data = peewee.CharField()
requested_at = peewee.DateTimeField(default=datetime.now())
I know that peewee doesn't support a JSONField for a MySQL DB, but anyway, I though if I could just convert it to a string format and save to db, I can retrieve it as is.
Let's say, for example, this is my JSONField that I am writing to the DB:
[
{
'name': 'abcdef',
'address': 'abcdef',
'lat': 43176757,
'lng': 42225601
}
]
When I fetch this (JSONField) data, the output is like so:
u'[{u\'name\': u\'abcdef\', u\'address\': u\'abcdef\', u\'lat\': 43176757, u\'lng\': 42225601\'}]'
Trying a simplejson load of this is giving me an error like so:
JSONDecodeError: Expecting property name enclosed in double quotes:
line 1 column 3 (char 2)
I've tried json dumps of the json data before entering it to the DB and seeing if something would work, but still I have no luck with that.
I am looking for a solution that involves peewee's custom field options and I want to stick my MySQL. Can someone guide me?
What's probably happening in your code is Peewee is calling str() (or unicode()) on the value instead of dumping it to JSON, so the Python string representation is being saved to the database. To do JSON manually, just import json and then call json.dumps(obj) when you're setting the field and json.loads(db_value) when you fetch the field.
It looks like there's a Peewee playhouse extension defined for certain databases (SQLite, PostgreSQL?) that defined a JSONField type -- see JSONField docs here.
Alternatively, I don't think it'd be hard to define a custom JSONField type which does the json loads/dumps automatically. There's a simple example of this in playhouse/kv.py:
class JSONField(TextField):
def db_value(self, value):
return json.dumps(value)
def python_value(self, value):
if value is not None:
return json.loads(value)
Why not using the JSONField field from Peewee's playhouse?
from playhouse.sqlite_ext import *
db = SqliteExtDatabase(':memory:')
class KV(Model):
key = TextField()
value = JSONField()
class Meta:
database = db
KV.create_table()
It takes care of converting Python objects to JSON, and viceversa:
KV.create(key='a', value={'k1': 'v1'})
KV.get(KV.key == 'a').value # print {'k1': 'v1'}
You can query using the JSON keys:
KV.get(KV.value['k1'] == 'v1').key # print 'a'
You can easily update JSON keys:
KV.update(value=KV.value.update({'k2': 'v2', 'k3': 'v3'})).execute() # add keys to JSON

How to set set JSON encoder in marshmallow?

How do I override the JSON encoder used the marshmallow library so that it can serialize a Decimal field?I think I can do this by overriding json_module in the base Schema or Meta class, but I don't know how:
https://github.com/marshmallow-code/marshmallow/blob/dev/marshmallow/schema.py#L194
I trawled all the docs and read the code, but I'm not a Python native.
If you want to serialize a Decimal field (and keep the value as a number), you can override the default json module used by Marshmallow in its dumps() call to use simplejson instead.
To do this, just add a class Meta definition to your schema, and specify a json_module property for this class.
Example:
import simplejson
class MySchema(Schema):
amount = fields.Decimal()
class Meta:
json_module = simplejson
Then, to serialize:
my_schema = MySchema()
my_schema.dumps(my_object)
I think the solution is to use marshmallow.fields.Decimal with as_string=True:
This field serializes to a decimal.Decimal object by default. If you
need to render your data as JSON, keep in mind that the json module
from the standard library does not encode decimal.Decimal. Therefore,
you must use a JSON library that can handle decimals, such as
simplejson, or serialize to a string by passing as_string=True.
I had the same issue and I endup changing the field on Schema to string. In my case, since I'm only going to return it in json, it really doesn't matter if it is string or decimal.
from marshmallow_sqlalchemy import ModelSchema
from marshmallow import fields
class CurrencyValueSchema(ModelSchema):
class Meta:
model = CurrencyValue
value = fields.String()
My returned json:
{
"currency_values": [
{
"id": 9,
"timestamp": "2016-11-18T23:59:59+00:00",
"value": "0.944304"
},
{
"id": 10,
"timestamp": "2016-11-18T23:59:59+00:00",
"value": "3.392204"
},
}

Excluding primary key in Django dumpdata with natural keys

How do you exclude the primary key from the JSON produced by Django's dumpdata when natural keys are enabled?
I've constructed a record that I'd like to "export" so others can use it as a template, by loading it into a separate databases with the same schema without conflicting with other records in the same model.
As I understand Django's support for natural keys, this seems like what NKs were designed to do. My record has a unique name field, which is also used as the natural key.
So when I run:
from django.core import serializers
from myapp.models import MyModel
obj = MyModel.objects.get(id=123)
serializers.serialize('json', [obj], indent=4, use_natural_keys=True)
I would expect an output something like:
[
{
"model": "myapp.mymodel",
"fields": {
"name": "foo",
"create_date": "2011-09-22 12:00:00",
"create_user": [
"someusername"
]
}
}
]
which I could then load into another database, using loaddata, expecting it to be dynamically assigned a new primary key. Note, that my "create_user" field is a FK to Django's auth.User model, which supports natural keys, and it output as its natural key instead of the integer primary key.
However, what's generated is actually:
[
{
"pk": 123,
"model": "myapp.mymodel",
"fields": {
"name": "foo",
"create_date": "2011-09-22 12:00:00",
"create_user": [
"someusername"
]
}
}
]
which will clearly conflict with and overwrite any existing record with primary key 123.
What's the best way to fix this? I don't want to retroactively change all the auto-generated primary key integer fields to whatever the equivalent natural keys are, since that would cause a performance hit as well as be labor intensive.
Edit: This seems to be a bug that was reported...2 years ago...and has largely been ignored...
Updating the answer for anyone coming across this in 2018 and beyond.
There is a way to omit the primary key through the use of natural keys and unique_together method. Taken from the Django documentation on serialization:
You can use this command to test :
python manage.py dumpdata app.model --pks 1,2,3 --indent 4 --natural-primary --natural-foreign > dumpdata.json ;
Serialization of natural keys
So how do you get Django to emit a natural key when serializing an object? Firstly, you need to add another method – this time to the model itself:
class Person(models.Model):
objects = PersonManager()
first_name = models.CharField(max_length=100)
last_name = models.CharField(max_length=100)
birthdate = models.DateField()
def natural_key(self):
return (self.first_name, self.last_name)
class Meta:
unique_together = (('first_name', 'last_name'),)
That method should always return a natural key tuple – in this example, (first name, last name). Then, when you call serializers.serialize(), you provide use_natural_foreign_keys=True or use_natural_primary_keys=True arguments:
serializers.serialize('json', [book1, book2], indent=2, use_natural_foreign_keys=True, use_natural_primary_keys=True)
When use_natural_foreign_keys=True is specified, Django will use the natural_key() method to serialize any foreign key reference to objects of the type that defines the method.
When use_natural_primary_keys=True is specified, Django will not provide the primary key in the serialized data of this object since it can be calculated during deserialization:
{
"model": "store.person",
"fields": {
"first_name": "Douglas",
"last_name": "Adams",
"birth_date": "1952-03-11",
}
}
The problem with json is that you can't omit the pk field since it will be required upon loading of the fixture data again. If not existing, json will fail with
$ python manage.py loaddata some_data.json
[...]
File ".../django/core/serializers/python.py", line 85, in Deserializer
data = {Model._meta.pk.attname : Model._meta.pk.to_python(d["pk"])}
KeyError: 'pk'
As pointed out in the answer to this question, you can use yaml or xml if you really want to omit the pk attribute OR just replace the primary key value with null.
import re
from django.core import serializers
some_objects = MyClass.objects.all()
s = serializers.serialize('json', some_objects, use_natural_keys=True)
# Replace id values with null - adjust the regex to your needs
s = re.sub('"pk": [0-9]{1,5}', '"pk": null', s)
Override the Serializer class in a separate module:
from django.core.serializers.json import Serializer as JsonSerializer
class Serializer(JsonSerializer):
def end_object(self, obj):
self.objects.append({
"model" : smart_unicode(obj._meta),
"fields" : self._current,
# Original method adds the pk here
})
self._current = None
Register it in Django:
serializers.register_serializer("json_no_pk", "path.to.module.with.custom.serializer")
Add use it:
serializers.serialize('json_no_pk', [obj], indent=4, use_natural_keys=True)

Categories