Django: merging objects

Django: merging objects - python

I have such model:
class Place(models.Model):
name = models.CharField(max_length=80, db_index=True)
city = models.ForeignKey(City)
address = models.CharField(max_length=255, db_index=True)
# and so on
Since I'm importing them from many sources, and users of my website are able to add new Places, I need a way to merge them from an admin interface. Problem is, name is not very reliable since they can be spelled in many different ways, etc
I'm used to use something like this:
class Place(models.Model):
name = models.CharField(max_length=80, db_index=True) # canonical
city = models.ForeignKey(City)
address = models.CharField(max_length=255, db_index=True)
# and so on
class PlaceName(models.Model):
name = models.CharField(max_length=80, db_index=True)
place = models.ForeignKey(Place)
query like this
Place.objects.get(placename__name='St Paul\'s Cathedral', city=london)
and merge like this
class PlaceAdmin(admin.ModelAdmin):
actions = ('merge', )
def merge(self, request, queryset):
main = queryset[0]
tail = queryset[1:]
PlaceName.objects.filter(place__in=tail).update(place=main)
SomeModel1.objects.filter(place__in=tail).update(place=main)
SomeModel2.objects.filter(place__in=tail).update(place=main)
# ... etc ...
for t in tail:
t.delete()
self.message_user(request, "%s is merged with other places, now you can give it a canonical name." % main)
merge.short_description = "Merge places"
as you can see, I have to update all other models with FK to Place with new values. But it's not very good solution since I have to add every new model to this list.
How do I "cascade update" all foreign keys to some objects prior to deleting them?
Or maybe there are other solutions to do/avoid merging

If anyone intersted, here is really generic code for this:
def merge(self, request, queryset):
main = queryset[0]
tail = queryset[1:]
related = main._meta.get_all_related_objects()
valnames = dict()
for r in related:
valnames.setdefault(r.model, []).append(r.field.name)
for place in tail:
for model, field_names in valnames.iteritems():
for field_name in field_names:
model.objects.filter(**{field_name: place}).update(**{field_name: main})
place.delete()
self.message_user(request, "%s is merged with other places, now you can give it a canonical name." % main)

Two libraries now exist with up-to-date model merging functions that incorporate related models:
Django Extensions' merge_model_instances management command.
Django Super Deduper

Tested on Django 1.10. Hope it can serve.
def merge(primary_object, alias_objects, model):
"""Merge 2 or more objects from the same django model
The alias objects will be deleted and all the references
towards them will be replaced by references toward the
primary object
"""
if not isinstance(alias_objects, list):
alias_objects = [alias_objects]
if not isinstance(primary_object, model):
raise TypeError('Only %s instances can be merged' % model)
for alias_object in alias_objects:
if not isinstance(alias_object, model):
raise TypeError('Only %s instances can be merged' % model)
for alias_object in alias_objects:
# Get all the related Models and the corresponding field_name
related_models = [(o.related_model, o.field.name) for o in alias_object._meta.related_objects]
for (related_model, field_name) in related_models:
relType = related_model._meta.get_field(field_name).get_internal_type()
if relType == "ForeignKey":
qs = related_model.objects.filter(**{ field_name: alias_object })
for obj in qs:
setattr(obj, field_name, primary_object)
obj.save()
elif relType == "ManyToManyField":
qs = related_model.objects.filter(**{ field_name: alias_object })
for obj in qs:
mtmRel = getattr(obj, field_name)
mtmRel.remove(alias_object)
mtmRel.add(primary_object)
alias_object.delete()
return True

Based on the snippet provided in the comments in the accepted answer, I was able to develop the following. This code does not handle GenericForeignKeys. I don't ascribe to their use as I believe it indicates a problem with the model you are using.
I used list a lot of code to do this in this answer, but I have updated my code to use django-super-deduper mentioned here. At the time, django-super-deduper did not handle unmanaged models in a good way. I submitted an issue, and it looks like it will be corrected soon. I also use django-audit-log, and I don't want to merge those records. I kept the signature and the #transaction.atomic() decorator. This is helpful in the event of a problem.
from django.db import transaction
from django.db.models import Model, Field
from django_super_deduper.merge import MergedModelInstance
class MyMergedModelInstance(MergedModelInstance):
"""
Custom way to handle Issue #11: Ignore models with managed = False
Also, ignore auditlog models.
"""
def _handle_o2m_related_field(self, related_field: Field, alias_object: Model):
if not alias_object._meta.managed and "auditlog" not in alias_object._meta.model_name:
return super()._handle_o2m_related_field(related_field, alias_object)
def _handle_m2m_related_field(self, related_field: Field, alias_object: Model):
if not alias_object._meta.managed and "auditlog" not in alias_object._meta.model_name:
return super()._handle_m2m_related_field(related_field, alias_object)
def _handle_o2o_related_field(self, related_field: Field, alias_object: Model):
if not alias_object._meta.managed and "auditlog" not in alias_object._meta.model_name:
return super()._handle_o2o_related_field(related_field, alias_object)
#transaction.atomic()
def merge(primary_object, alias_objects):
if not isinstance(alias_objects, list):
alias_objects = [alias_objects]
MyMergedModelInstance.create(primary_object, alias_objects)
return primary_object

I was looking for a solution to merge records in Django Admin, and found a package that is doing it (https://github.com/saxix/django-adminactions).
How to use:
Install package:
pip install django-adminactions
Add adminactions to your INSTALLED_APPS:
INSTALLED_APPS = (
'adminactions',
'django.contrib.admin',
'django.contrib.messages',
)
Add actions to admin.py:
from django.contrib.admin import site
import adminactions.actions as actions
actions.add_to_site(site)
Add service url to your urls.py: url(r'^adminactions/', include('adminactions.urls')),
Tried it just now, it works for me.

Related

Custom Id in Django Models

In my model I need an ID field that is different from the default ID given by Django. I need my IDs in the following format: [year][ascending number]
Example: 2021001,2021002,2021003
The IDs shall not be editable but model entries shall take the year for the ID from a DateTimeField in the model. I am unsure if I use a normal Django ID and also create some kind of additional ID for the model or if I replace the normal Django ID with a Custom ID.

This problem is pretty similar to one I had solved for a previous project of mine. What I had done for this was to simply use the default id for the primary key, while using some extra fields to make the composite identifier needed.
To ensure the uniqueness and the restarting of the count I had made a model which would (only by the logic, no actual constraints) only have one row in it. Whenever a new instance of the model which needs this identifier would be created this row would be updated in a transaction and it's stored value would be used.
The implementation of it is as follows:
from django.db import models, transaction
import datetime
class TokenCounter(models.Model):
counter = models.IntegerField(default=0)
last_update = models.DateField(auto_now=True)
#classmethod
def get_new_token(cls):
with transaction.atomic():
token_counter = cls.objects.select_for_update().first()
if token_counter is None:
token_counter = cls.objects.create()
if token_counter.last_update.year != datetime.date.today().year:
token_counter.counter = 0
token_counter.counter += 1
token_counter.save()
return_value = token_counter.counter
return return_value
def save(self, *args, **kwargs):
if self.pk:
self.__class__.objects.exclude(pk=self.pk).delete()
super().save(*args, **kwargs)
Next suppose you need to use this in some other model:
class YourModel(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
yearly_token = models.IntegerField(default=TokenCounter.get_new_token)
#property
def token_number(self):
return '{}{}'.format(self.created_at.year, str(self.yearly_token).zfill(4))
#classmethod
def get_from_token(cls, token):
year = int(token[:4])
yearly_token = int(token[4:])
try:
obj = cls.objects.get(created_at__year=year, yearly_token=yearly_token)
except cls.DoesNotExist:
obj = None
return obj
Note: This might not be very refined as the code was written when I was very inexperienced, and there may be many areas where it can be refined. For example you can add a unique_for_year in the yearly_token field so:
yearly_token = models.IntegerField(default=TokenCounter.get_new_token, unique_for_year='created_at')

Passing a model function to geojson serializer in GeoDjango

Problem description
Ok, I am trying to add a function defined inside a model (called Highway) to the HttpResponse of the model (Highway) after serializing with geojson serializer without success.
I'm trying to find a solution by going through the source code as no errors are passed and the property does not appear in the HttpResponse. I might however be complicating things and hopefully can get another eyes on this. I'm open to other suggestions, maybe to update the Highway each time a Location is added/modified.
The item appears correctly when passing it to the admin site and all other fields (not shown below) work as intended.
PS. I'm quite new to the whole Django system. Thank you!
Django version: 2.1
Relevant links:
https://docs.djangoproject.com/en/2.1/ref/contrib/gis/serializers/
https://github.com/django/django/blob/master/django/contrib/gis/serializers/geojson.py
Minified code
geom.models.py
class Highway(models.Model):
name = models.CharField(max_length=50, unique=True)
length = models.IntegerField("Length in meters")
mline = models.MultiLineStringField("Geometry", srid=4326)
def cameras_per_km(self):
#THIS IS THE FUNCTION I AM TRYING TO SEND TO GEOJSON SERIALIZER
count = self.location_set.filter(location__icontains="camera").count()
return round(count/(self.length/1000),1)
cameras_per_km.short_description = "Cameras/km"
class Location(models.Model):
location = models.CharField(max_length=30, unique=True)
highway = models.ForeignKey(Highway, null=True, on_delete=models.SET_NULL)
point = models.PointField()
geom.admin.py (when passing the func to list_display the item appears correclty in admin)
class HighwayAdmin(LeafletGeoAdmin):
list_display = ('name', 'cameras_per_km')
admin.site.register(
Highway,
HighwayAdmin,
)
geom.views.py (this doesn't)
def geojson_highway_all(request):
geojsonformat = serialize('geojson',
Highway.objects.all(),
geometry_field='mline',
fields = (
'pk',
'cameras_per_km' # <-- I want this to show
)
)
return HttpResponse(geojsonformat)
geom.urls.py
from django.urls import path
from . import views
app_name = 'geom'
urlpatterns = [
path('highways.geojson', views.geojson_highway_all, name='highways.geojson'),
]

Update: I (Anton vBR) have now rewritten this answer completely but think #ruddra deserves some credit. I am however open for alternative solutions. Hopefullly this answer can helps someone out in the future.
Create a new Serializer based on geojson serializer
geom.views.py
from django.contrib.gis.serializers.geojson import Serializer
class CustomSerializer(Serializer):
def end_object(self, obj):
for field in self.selected_fields:
if field == self.geometry_field or field == 'pk':
continue
elif field in self._current.keys():
continue
else:
try:
self._current[field] = getattr(obj, field)()
except AttributeError:
pass
super(CustomSerializer, self).end_object(obj)
geojsonformat = CustomSerializer().serialize(
Highway.objects.all(),
geometry_field='mline',
fields = (
'pk',
'cameras_per_km'
)

Django Import - Export: IntegrittyError when trying to insert duplicate record in field(s) with unique or unique_together constraints

Update
I have filed a feature request. The idea is to pass on the IntegrittyError produced by the database when unique or unique_together reject a record that already exists in the database.
I have the following model:
class Compositions(models.Model):
composer_key = models.ForeignKey(
Composer,
)
composition = models.CharField(
max_length=383,
)
class Meta(object):
unique_together = (('composer_key', 'composition'), )
Using django-import-export in the admin interface, without providing an id for each entry in the csv file, ... if one pair of the csv file already exists, the procedure will be interrupted with an integrity error
duplicate key value violates unique constraint "data_compositions_composer_key_id_12f91ce7dbac16bf_uniq"
DETAIL: Key (composer_key_id, composition)=(2, Star Wars) already exists.
The CSV file is the following:
id composer_key composition
1 Hot Stuff
2 Star Wars
The idea was to use skip_row and implement it in the admin.
admin.py:
class CompositionsResource(resources.ModelResource):
class Meta:
model = Compositions
skip_unchanged = True
report_skipped = True
class CompositionsAdmin(ImportExportModelAdmin):
resource_class = CompositionsResource
admin.site.register(Compositions, CompositionsAdmin)
This will not cure the problem, however, because skip_row expects an id in the csv file in order to check if each row is the same with the very specific database entry.
Considering that this control can be performed by the database when using unique(_together) would not it be effective to catch this error and then return skip_row = True or alternatively pass on this error?

Only one Change is need. And you can use django-import-export
models.py
class Compositions(models.Model):
composer_key = models.ForeignKey(
Composer,
)
composition = models.CharField(
max_length=383,
unique=False
)
date_created = models.DateTimeField(default=timezone.now)
class Meta(object):
unique_together = (('composer_key','composition'),)
override save_instance with try. And ignore error when fail.
admin.py
class CompositionsResource(resources.ModelResource):
class Meta:
model = Compositions
skip_unchanged = True
report_skipped = True
def save_instance(self, instance, using_transactions=True, dry_run=False):
try:
super(CompositionsResource, self).save_instance(instance, using_transactions, dry_run)
except IntegrityError:
pass
class CompositionsAdmin(ImportExportModelAdmin):
resource_class = CompositionsResource
admin.site.register(Compositions, CompositionsAdmin)
and import this
from django.db import IntegrityError

A note on the accepted answer: it will give the desired result, but will slam the disk usage and time with large files.
A more efficient approach I've been using (after spending a lot of time going through the docs) is to override skip_row, and use a set of tuples as a unique constraint as part of the class. I still override save_instance as the other answer suggests to handle IntegrityErrors that get through, of course.
Python sets don't create duplicate entries, so they seem appropriate for this kind of unique index.
class CompositionsResource(resources.ModelResource):
set_unique = set()
class Meta:
model = Composers
skip_unchanged = True
report_skipped = True
def before_import(self, dataset, using_transactions, dry_run, **kwargs):
# Clear out anything that may be there from a dry_run,
# such as the admin mixin preview
self.set_unique = set()
def skip_row(self, instance, original):
composer_key = instance.composer_key # Could also use composer_key_id
composition = instance.composition
tuple_unique = (composer_key, composition)
if tuple_unique in self.set_unique:
return true
else:
self.set_unique.add(tuple_unique)
return super(CompositionsResource, self).skip_row(instance, original)
# save_instance override should still go here to pass on IntegrityError
This approach will at least cut down on duplicates encountered within the same dataset. I used it to deal with multiple flat files that were ~60000 lines each, but had lots of repetitive/nested foreign keys. This made that initial data import way faster.

models.py:
class Compositions(models.Model):
composer_key = models.ForeignKey(
Composer,
)
composition = models.CharField(
max_length=383,
unique=False
)
date_created = models.DateTimeField(default=timezone.now)
class Meta(object):
unique_together = (('composer_key','composition'),)
This is a script I have written 'on the fly' for the above model in order to automatically discard duplicate entries. I have saved it to ./project_name/csv.py and import it from shell when I fill the relevant columns of the file duc.csv with data. The columns should not contain headers. Only data.
$./manage.py shell
>>> from project_name import csv
csv.py:
from data.models import Composer, Compositions
import csv
import sys, traceback
from django.utils import timezone
filename = '/path/to/duc.csv'
with open(filename, newline='') as csvfile:
all_lines = csv.reader(csvfile, delimiter=',', quotechar='"')
for each_line in all_lines:
print (each_line)
try:
instance = Compositions(
id=None,
date_created=timezone.now(),
composer_key=Composer.objects.get(id=each_line[2]),
composition=each_line[3]
)
instance.save()
print ("Saved composition: {0}".format(each_line[3]))
except: // exception type must be inserted here
exc_type, exc_value, exc_traceback = sys.exc_info() //debugging mostly
print (exc_value)

Django Rest Framework and Django-Hvad

so i needed to had some model-translation support for my DRF API and i started using django-hvad.
It seems to work well with my django application but i am getting some issues with the DRF APi.
I am trying to create a simple POST request and i am getting a error:
Accessing a translated field requires that the instance has a translation loaded, or a valid translation in current language (en) loadable from the database
Here are my models, serializers and viewsets:
Model:
class Mission(TranslatableModel):
translations = TranslatedFields(
mission=models.CharField(max_length=255, help_text="Mission name"),
)
def __unicode__(self):
return self.lazy_translation_getter('mission', str(self.pk))
Serializer:
class MissionSerializer(serializers.ModelSerializer):
mission = serializers.CharField(source='mission')
class Meta:
model = Mission
Viewset:
class MissionViewSet(viewsets.ModelViewSet):
queryset = Mission.objects.language().all()
serializer_class = MissionSerializer
authentication_classes = (NoAuthentication,)
permission_classes = (AllowAny,)
def get_queryset(self):
# Set Language For Translations
user_language = self.request.GET.get('language')
if user_language:
translation.activate(user_language)
return Mission.objects.language().all()
Does anyone know how i can get around this?? I am also opened to other suggested apps known to work but i would really like to have this one working

I got this to work thanks to the Spectras here https://github.com/KristianOellegaard/django-hvad/issues/211
The issue, I guess is DRF tries to do some introspection on the model. I do use DRF in a project of mine, on a TranslatableModel. It needs some glue to work properly. I once suggested adding that to hvad, but we concluded that that would be overextending the feature set. Maybe another module some day, but I don't have enough time to maintain both hvad and that.
It's been some time since I implemented it, so here it is as is:
# hvad compatibility for rest_framework - JHA
class TranslatableModelSerializerOptions(serializers.ModelSerializerOptions):
def __init__(self, meta):
super(TranslatableModelSerializerOptions, self).__init__(meta)
# We need this ugly hack as ModelSerializer hardcodes a read_only_fields check
self.translated_read_only_fields = getattr(meta, 'translated_read_only_fields', ())
self.translated_write_only_fields = getattr(meta, 'translated_write_only_fields', ())
class HyperlinkedTranslatableModelSerializerOptions(serializers.HyperlinkedModelSerializerOptions):
def __init__(self, meta):
super(HyperlinkedTranslatableModelSerializerOptions, self).__init__(meta)
# We need this ugly hack as ModelSerializer hardcodes a read_only_fields check
self.translated_read_only_fields = getattr(meta, 'translated_read_only_fields', ())
self.translated_write_only_fields = getattr(meta, 'translated_write_only_fields', ())
class TranslatableModelMixin(object):
def get_default_fields(self):
fields = super(TranslatableModelMixin, self).get_default_fields()
fields.update(self._get_translated_fields())
return fields
def _get_translated_fields(self):
ret = OrderedDict()
trans_model = self.opts.model._meta.translations_model
opts = trans_model._meta
forward_rels = [field for field in opts.fields
if field.serialize and not field.name in ('id', 'master')]
for trans_field in forward_rels:
if trans_field.rel:
raise RuntimeError()
field = self.get_field(trans_field)
if field:
ret[trans_field.name] = field
for field_name in self.opts.translated_read_only_fields:
assert field_name in ret
ret[field_name].read_only = True
for field_name in self.opts.translated_write_only_fields:
assert field_name in ret
ret[field_name].write_only = True
return ret
def restore_object(self, attrs, instance=None):
new_attrs = attrs.copy()
lang = attrs['language_code']
del new_attrs['language_code']
if instance is None:
# create an empty instance, pre-translated
instance = self.opts.model()
instance.translate(lang)
else:
# check we are updating the correct translation
tcache = self.opts.model._meta.translations_cache
translation = getattr(instance, tcache, None)
if not translation or translation.language_code != lang:
# nope, get the translation we are updating, or create it if needed
try:
translation = instance.translations.get_language(lang)
except instance.translations.model.DoesNotExist:
instance.translate(lang)
else:
setattr(instance, tcache, translation)
return super(TranslatableModelMixin, self).restore_object(new_attrs, instance)
class TranslatableModelSerializer(TranslatableModelMixin, serializers.ModelSerializer):
_options_class = TranslatableModelSerializerOptions
class HyperlinkedTranslatableModelSerializer(TranslatableModelMixin,
serializers.HyperlinkedModelSerializer):
_options_class = HyperlinkedTranslatableModelSerializerOptions
From there, you just inherit your serializers from TranslatableModelSerializer or HyperlinkedTranslatableModelSerializer. When POSTing, you should simple add language_code as a normal field as part of your JSON / XML / whatever.
The main trick is in the restore_object method. Object creation needs to include translation loading.

django-contenttypes - List All Generic Relations for Model

I would like to reflect on a model and list all its backward generic relations.
My model looks like this:
class Service(models.Model):
host = models.ForeignKey(Host)
statuses = generic.GenericRelation(Status)
The Status object looks like this:
class Status(TrackedModel):
content_type = models.ForeignKey(ContentType)
object_id = models.PositiveIntegerField()
content_object = generic.GenericForeignKey()
class Meta:
verbose_name_plural = 'statuses'
I would like to programatically learn that statuses is a generic relation for the Service model. Is this possible? Status._meta.fields does not show statuses, but Status._meta.get_all_field_names() does, only it shows other unwanted things too.
I thought that this might be a possible solution, but it seems really messy to me. I'd love to hear of a better one.
from django.db.models.fields import FieldDoesNotExist
from django.contrib.contenttypes import generic
generic_relations = []
for field_name in Service._meta.get_all_field_names():
try:
field = Service._meta.get_field(field_name)
except FieldDoesNotExist:
continue
if isinstance(field, generic.GenericRelation):
generic_relations.append(field)
Thank you!

The GenericRelation works similarly as ManyToManyField. You could find it in Service._meta.many_to_many:
filter(lambda f:isinstance(f, generic.GenericRelation), Service._meta.many_to_many)

UPDATE 2021:
To list all the GenericRelations() fields:
print(Service._meta.private_fields)
Output:
[<django.contrib.contenttypes.fields.GenericRelation: statuses>]
Nevertheless, if you have more fields with GenericRelations() relationship they will be shown into the output list.
Check the documentation:
https://docs.djangoproject.com/en/3.2/releases/1.10/#id3
Or you can return all of the fields that have a GenericRelation() field type.
ex:
# models.py
class MyModel(models.Model):
. . .
my_model_field = GenericRelation(OtherPolyModel)
def get_generic_relation_fields(self):
"""
This function returns all the GenericRelation
fields needed to return the values that are
related to a polymorphic model.
"""
fields = [f.attname for f in self.Meta.model._meta.get_fields()]
file_fields = []
for field in fields:
get_type = self.Meta.model._meta.get_field(field)
field_type = get_type.__class__.__name__
if field_type == "GenericRelation":
file_fields.append(field)
return file_fields
Output:
['my_model_field']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django: merging objects - python

Two libraries now exist with up-to-date model merging functions that incorporate related models: Django Extensions' merge_model_instances management command. Django Super Deduper

Related

Custom Id in Django Models

Passing a model function to geojson serializer in GeoDjango

Django Import - Export: IntegrittyError when trying to insert duplicate record in field(s) with unique or unique_together constraints

Django Rest Framework and Django-Hvad

django-contenttypes - List All Generic Relations for Model

Categories

Resources