Excluding primary key in Django dumpdata with natural keys - python

How do you exclude the primary key from the JSON produced by Django's dumpdata when natural keys are enabled?
I've constructed a record that I'd like to "export" so others can use it as a template, by loading it into a separate databases with the same schema without conflicting with other records in the same model.
As I understand Django's support for natural keys, this seems like what NKs were designed to do. My record has a unique name field, which is also used as the natural key.
So when I run:
from django.core import serializers
from myapp.models import MyModel
obj = MyModel.objects.get(id=123)
serializers.serialize('json', [obj], indent=4, use_natural_keys=True)
I would expect an output something like:
[
{
"model": "myapp.mymodel",
"fields": {
"name": "foo",
"create_date": "2011-09-22 12:00:00",
"create_user": [
"someusername"
]
}
}
]
which I could then load into another database, using loaddata, expecting it to be dynamically assigned a new primary key. Note, that my "create_user" field is a FK to Django's auth.User model, which supports natural keys, and it output as its natural key instead of the integer primary key.
However, what's generated is actually:
[
{
"pk": 123,
"model": "myapp.mymodel",
"fields": {
"name": "foo",
"create_date": "2011-09-22 12:00:00",
"create_user": [
"someusername"
]
}
}
]
which will clearly conflict with and overwrite any existing record with primary key 123.
What's the best way to fix this? I don't want to retroactively change all the auto-generated primary key integer fields to whatever the equivalent natural keys are, since that would cause a performance hit as well as be labor intensive.
Edit: This seems to be a bug that was reported...2 years ago...and has largely been ignored...

Updating the answer for anyone coming across this in 2018 and beyond.
There is a way to omit the primary key through the use of natural keys and unique_together method. Taken from the Django documentation on serialization:
You can use this command to test :
python manage.py dumpdata app.model --pks 1,2,3 --indent 4 --natural-primary --natural-foreign > dumpdata.json ;
Serialization of natural keys
So how do you get Django to emit a natural key when serializing an object? Firstly, you need to add another method – this time to the model itself:
class Person(models.Model):
objects = PersonManager()
first_name = models.CharField(max_length=100)
last_name = models.CharField(max_length=100)
birthdate = models.DateField()
def natural_key(self):
return (self.first_name, self.last_name)
class Meta:
unique_together = (('first_name', 'last_name'),)
That method should always return a natural key tuple – in this example, (first name, last name). Then, when you call serializers.serialize(), you provide use_natural_foreign_keys=True or use_natural_primary_keys=True arguments:
serializers.serialize('json', [book1, book2], indent=2, use_natural_foreign_keys=True, use_natural_primary_keys=True)
When use_natural_foreign_keys=True is specified, Django will use the natural_key() method to serialize any foreign key reference to objects of the type that defines the method.
When use_natural_primary_keys=True is specified, Django will not provide the primary key in the serialized data of this object since it can be calculated during deserialization:
{
"model": "store.person",
"fields": {
"first_name": "Douglas",
"last_name": "Adams",
"birth_date": "1952-03-11",
}
}

The problem with json is that you can't omit the pk field since it will be required upon loading of the fixture data again. If not existing, json will fail with
$ python manage.py loaddata some_data.json
[...]
File ".../django/core/serializers/python.py", line 85, in Deserializer
data = {Model._meta.pk.attname : Model._meta.pk.to_python(d["pk"])}
KeyError: 'pk'
As pointed out in the answer to this question, you can use yaml or xml if you really want to omit the pk attribute OR just replace the primary key value with null.
import re
from django.core import serializers
some_objects = MyClass.objects.all()
s = serializers.serialize('json', some_objects, use_natural_keys=True)
# Replace id values with null - adjust the regex to your needs
s = re.sub('"pk": [0-9]{1,5}', '"pk": null', s)

Override the Serializer class in a separate module:
from django.core.serializers.json import Serializer as JsonSerializer
class Serializer(JsonSerializer):
def end_object(self, obj):
self.objects.append({
"model" : smart_unicode(obj._meta),
"fields" : self._current,
# Original method adds the pk here
})
self._current = None
Register it in Django:
serializers.register_serializer("json_no_pk", "path.to.module.with.custom.serializer")
Add use it:
serializers.serialize('json_no_pk', [obj], indent=4, use_natural_keys=True)

Related

Sort Django Rest Framework JSON Output

I've been developing a web app on top of Django and I use the Django Rest Framework for my API. There's a model class named Events and my EventsSerializer in DRF is a pretty common serializer without any special configuration. It just dumps data returned by the EventManager.
There is a field "type" in the Event model class.
My json returned now is:
{
events: [
{object1},
{object2},
.....
]
}
, as anything dumped in a DRF api and returned by django.
For some reason, I need my events objects returned categorized by the "type" field. For example, I need to get this:
{
events: [
type1: [{object1}, {object2},...],
type2: [{object3}, {object4}, ...],
.......
]
}
I have literally searched anything related to that but couldn't find a proper solution. Do you have anything to suggest about that?
Thanks in advance
You can use SerializerMethodField and provide custom serialization logic there:
class EventSerializer(serializers.Serializer):
events = serializers.SerializerMethodField(source="get_events")
def get_events(self, events):
event_list = {}
return [event_list[e.type].add({e}) if event.type in event_list else event_list[event.type] = [] for event in events]
I had a model similar to the following:
class Book(models.Model):
title = models.CharField(max_length=200)
class Author(models.Model):
name = models.CharField(max_length=200)
books = models.ManyToManyField(Book)
The JSON that was being generated for an Author looks like this:
{
"name": "Sir Arthur C. Clarke",
"books": [
{
"title": "Rendezvous with Rama",
},
{
"title": "Childhood's End",
}
]
}
In the JSON wanted the books to be sorted by title. Since the books are pulled into the queryset via a prefetch_related adding an order_by to the View's queryset had no effect (generated SQL didn't have a join to the Books table). The solution I came up with was to override the get method. In my version of the get method, I have the super class generate the Response and I modify it's data (a Python dict) before returning it as shown below.
I'm not too worried about performance for two reasons:
because of the prefetch_related the join is already being done in Python rather than in the database
In my case the number of Books per Author is relatively small
class AuthorView(RetrieveUpdateAPIView):
queryset = Author.objects.prefetch_related(
'books',
)
serializer_class = AuthorSerializer
def get(self, request, *args, **kwargs):
response = super().get(request, *args, **kwargs)
def key_func(book_json):
return book_json.get('title', '')
books = response.data.get('books', [])
books = sorted(books, key=key_func)
response.data['books'] = books
return response

How to populate a django database with users

I want to populate a Django database using manage.py loaddata initial_data.json where the json file contains the specifications of several objects. My problem is that these object have 'user' attribute referencing to a Django User object, to indicate which user created them. The model for these objects looks like this:
from django.db import models
from django.contrib.auth.models import User
class MySpecialModel(models.Model):
name = models.CharField(max_length=200)
user = models.ForeignKey(User)
The objects in my json will look like this:
[{"fields":{
"name": "some name",
"user": XXX
}
...]
The problem is that I don't know what to write in place of the XXX to indicate the user. I have tried with user names but Django tells me it expects a number where the XXX are. using a number does not produce any bug but I don't see my database populated. So is there a way to place a Django object in a initial_data.json file ?
You should write a user.pk there, it is an integer value.
[{"fields":{
"name": "some name",
"user": 1
}
...]
Obviously, the user with that pk should be created before you import any object with foreign key to it, so maybe it is easier to create some (fake or not) users through admin before and export they with: python manage.py dumpdata auth.User --indent 4 > users.json

Serializing Django Model and including further information for ForeignKeyField + OneToOneField

Using Django 1.7.
I have a model class Topic that I want to serialize. Topic has an attribute Creator that is a ForeignKey to a class UserProfile. The output from serialization gives me the following string:
'{"fields": {"building": 1, "title": "Foobar", "created": "2015-02-13T16:14:47Z", "creator": 1}, "model": "bb.topic", "pk": 2}'
I want key "creator" to say the username of associated with UserProfile (as opposed to right now, where it is giving me the pk value associated with UserProfile. The username is held within a OneToOneField with django.contrib.auth.models class User.
I tried to implement a UserProfileManager, but either I have done it incorrectly or the following is not an appropriate strategy:
def get_by_natural_key(self, user, picture, company):
return self.get(user_id=user.id, user_pk=user.pk, username=user.get_username, picture=picture, company=company)
Finally, I looked around SO and found pointers to this: https://code.google.com/p/wadofstuff/wiki/DjangoFullSerializers but it says it was last updated in 2011 so I am assuming there is something else out there.
Thanks. Any help is much appreciated.
It looks you haven't implemented all the code needed to serialize the UserProfile with a natural key.
Actually the get_by_natural_key method is called when deserializing an object. If you want it to be serialized with a natural key instead of pk you should provide the natural_key method to the model.
Something like:
class UserProfileManager(models.Manager):
def get_by_natural_key(self, user, company):
return self.get(user__user__username=user, company=company)
class UserProfile(models.Model):
objects = UserProfileManager()
user = models.OneToOneField(User)
company = models.CharField(max_length=255)
def natural_key(self):
return (self.user.name, self.company)
Now, if you serialize a Topic instance:
import django.core.serializers
obj = serializers.serialize('json', Topic.objects.all(), indent=2, use_natural_foreign_keys=True, use_natural_primary_keys=True)
you should get an output similar to:
{
"fields": {
"building": 1,
"title": "Foobar",
"created": "2015-02-13T16:14:47Z",
"creator": [
"dummy",
"ACME Inc."
]
},
"model": "bb.topic", "pk": 2
}
supposing a dummy user exist with a company named ACME Inc. in its user profile.
Hope this helps.

Serialize Class Django

I'm trying to serialize a class in Django, so that I can get all the available fields on a json file, I would need something like
{"tablename": ["Verbose Name", "ModelType", "RelatedClass", "relatefield"]}
The idea is that most of the objects will only have the verbose name and the model type, but for related fields, it will also have the name of the class that the relation refers to, and a field that I can add manually say on helptext as a default, or I can just leave that out probably and work on it differently.
I need to do this to the class not to the object I've tried with pickle and jsonpickle but they don't seem to work as I was expecting, I'm out of ideas, any input would be greatly appreciated.
Thanks.
Edit: Need to clarify better
class Test(models.Model):
name = models.CharField(verbose_name="Name")
email = models.CharField()
books = models.ForeignKey(Book, verbose_name="Books")
class Book(models.Model):
name = models.CharField()
now I just want to serialize Test, no value has to be in it at all, just the class itself, so you'd have.
"name": {"verbose_name": "Name", "type": "CharField"}
"books": {"verbose_name": "Books", "type": "ForeignKey", "related_class": "Book", "related_class": "Country", "related_field": "name"}
I need a json around them lines to come out, but I wont have to run any queries with data in them just model information.
Django’s serialization framework provides a mechanism for “translating” Django models into other formats. Usually these other formats will be text-based and used for sending Django data over a wire, but it’s possible for a serializer to handle any format (text-based or not).
from django.core import serializers
data = serializers.serialize("xml", SomeModel.objects.all())
xml example:
<?xml version="1.0" encoding="utf-8"?>
<django-objects version="1.0">
<object pk="123" model="sessions.session">
<field type="DateTimeField" name="expire_date">2013-01-16T08:16:59.844560+00:00</field>
<!-- ... -->
</object>
</django-objects>
json:
[
{
"pk": "4b678b301dfd8a4e0dad910de3ae245b",
"model": "sessions.session",
"fields": {
"expire_date": "2013-01-16T08:16:59.844Z",
...
}
}
]
Deserializing data is also a fairly simple operation:
for obj in serializers.deserialize("xml", data):
do_something_with(obj)
https://docs.djangoproject.com/en/dev/topics/serialization/
Classes are objects, model fields too, so you can easily inspect them. One solution would be to write your custom json encoder for model classes (not instances). All informations about the model fields, verbose name etc are stored in YourModelClass.__meta.

Loading a Django Fixture containing Natural Keys

How do you load a Django fixture so that models referenced via natural keys don't conflict with pre-existing records?
I'm trying to load such a fixture, but I'm getting IntegrityErrors from my MySQL backend, complaining about Django trying to insert duplicate records, which doesn't make any sense.
As I understand Django's natural key feature, in order to fully support dumpdata and loaddata usage, you need to define a natural_key method in the model, and a get_by_natural_key method in the model's manager.
So, for example, I have two models:
class PersonManager(models.Manager):
def get_by_natural_key(self, name):
return self.get(name=name)
class Person(models.Model):
objects = PersonManager()
name = models.CharField(max_length=255, unique=True)
def natural_key(self):
return (self.name,)
class BookManager(models.Manager):
def get_by_natural_key(self, title, *person_key):
person = Person.objects.get_by_natural_key(*person_key)
return self.get(title=title, person=person)
class Book(models.Model):
objects = BookManager()
author = models.ForeignKey(Person)
title = models.CharField(max_length=255)
def natural_key(self):
return (self.title,) + self.author.natural_key()
natural_key.dependencies = ['myapp.Person']
My test database already contains a sample Person and Book record, which I used to generate the fixture:
[
{
"pk": null,
"model": "myapp.person",
"fields": {
"name": "bob"
}
},
{
"pk": null,
"model": "myapp.book",
"fields": {
"author": [
"bob"
],
"title": "bob's book",
}
}
]
I want to be able to take this fixture and load it into any instance of my database to recreate the records, regardless of whether or not they already exist in the database.
However, when I run python manage.py loaddata myfixture.json I get the error:
IntegrityError: (1062, "Duplicate entry '1-1' for key 'myapp_person_name_uniq'")
Why is Django attempting to re-create the Person record instead of reusing the one that's already there?
Turns out the solution requires a very minor patch to Django's loaddata command. Since it's unlikely the Django devs would accept such a patch, I've forked it in my package of various Django admin related enhancements.
The key code change (lines 189-201 of loaddatanaturally.py) simply involves calling get_natural_key() to find any existing pk inside the loop that iterates over the deserialized objects.
Actually loaddata is not supposed to work with existing data in database, it is normally used for initial load of models.
Look at this question for another way of doing it: Import data into Django model with existing data?

Categories