Building object models around external data - python

I want to integrate external data into a Django app. Let's say, for example, I want to work with GitHub issues as if they were formulated as normal models within Django. So underneath these objects, I use the GitHub API to retrieve and store data.
In particular, I also want to be able to reference the GitHub issues from models but not the other way around. I.e., I don't intend to modify or extend the external data directly.
The views would use this abstraction to fetch data, but also to follow the references from "normal objects" to properties of the external data. Simple joins would also be nice to have, but clearly there would be limitations.
Are there any examples of how to achieve this in an idiomatic way?
Ideally, this would be would also be split in a general part that describes the API in general, and a descriptive part of the classes similar to how normal ORM classes are described.

If you want to use Django Model-like interface for your Github Issues, why don't use real Django models? You can, for example, create a method fetch in your model, that will load data from the remote api and save it to your model. That way you won't need to make external requests everywhere in your code, but only when you need it. A minimal example will look like these:
import requests
from django.db import models
from .exceptions import GithubAPIError
class GithubRepo(models.Model):
api_url = models.URLField() # e.g. https://api.github.com/repos/octocat/Hello-World
class GithubIssue(models.Model):
issue_id = models.IntegerField()
repo = models.ForeignKey(GithubRepo, on_delete=models.CASCADE)
node_id = models.CharField(max_length=100)
title = models.CharField(max_length=255, null=True, blank=True)
body = models.TextField(null=True, blank=True)
"""
Other fields
"""
class Meta:
unique_together = [["issue_id", "repo"]]
#property
def url(self):
return f"{self.repo.api_url}/issues/{self.issue_id}"
def fetch_data(self):
response = requests.get(self.url)
if response.status != 200:
raise GithubAPIError("Something went wrong")
data = response.json()
# populate fields from repsonse
self.title = data['title']
self.body = data['body']
def save(
self, force_insert=False, force_update=False, using=None, update_fields=None
):
if self.pk is None: # fetch on first created
self.fetch_data()
super(GithubIssue, self).save(
force_insert, force_update, using, update_fields
)
You can also write a custom Manager for your model that will fetch data every time you call a create method - GithubIssue.objects.create()

The django way in this case would be to write a custom "db" backend.
This repo looks abandoned but still can lead you to some ideas.

I would suggest to just use normal OOP principles, Polymorphism, Association etc. to get a similar feel to real models.
But I'm not sure I would try to simulate behavior as close as I could, because the ORM is specifically designed for database interaction. I would just write my custom methods.

Related

Need architectural advice on how to handle and/or avoid circular imports when using Django/DRF

I would like advice regarding an architectural problem I've encountered many times now.
I have a Model Event in events.py:
# models.py
import events.serializers
class Event(models.Model):
id = models.AutoField(primary_key=True)
name = models.CharField(max_length=255, db_index=True)
...
def save(self, *args, **kwargs):
is_new = not self.pk
super().save(*args, **kwargs)
if is_new:
self.notify_organiser_followers()
def notify_organiser_followers(self):
if self.organiser:
event_data = events.serializers.BaseEventSerializer(self).data
payload = {'title': f'New event by {self.organiser.name}',
'body': f'{self.organiser.name} has just created a new event: {self.name}',
'data': {'event': event_data}}
send_fcm_messages(self.organiser.followers.all(), payload)
The model has a serializer called BaseEventSerializer. In the save method, I use notify_organiser_followers and in the process serialize the current event being saved. To do this, I have to import BaseEventSerializer.
Here's how the code of events.serializers looks:
# serializers.py
import events.models
class EventTrackSerializer(serializers.ModelSerializer):
class Meta:
model = events.models.EventTrack
fields = ('id', 'name', 'color')
class BaseEventSerializer(serializers.ModelSerializer):
event_type = serializers.CharField(source='get_event_type_display')
locations = serializers.SerializerMethodField()
As you can see, serializers.py has to import models to use them in ModelSerializers. In this step, I end up with an obvious circular import.
The way I solved this was by importing BaseEventSerializer locally in the notify_organiser_followers function:
def notify_organiser_followers(self):
if self.organiser:
from events.serializers import BaseEventSerializer
This eliminates the issue, but I would really like avoiding this, especially because I would have to do the same fix on multiple spots in my repo. Another approach I thought of would be to separate 'regular' serializers and model serializers into separate files. However, this still feels like it's only healing the symptom and not the cause.
What I would like is advice on how to avoid this situation altogether. I had the same problem when importing two different apps that use each other's serializer. E.g. User serializes Events he's attending, Event serializes its attendees. How would you go about decoupling these two models? It seems that model relationships often force me into these situations and avoiding circular imports becomes really hard.
I would also appreciate if you had any larger Django/DRF Github projects showcasing how this was avoided, since this issue keeps popping up for me as soon as my application gets large enough.
There are a few different ways to do this, and likely different combinations of strategies.
The two I would recommend here is using django signals, and creating a service class.
First, use django signals to trigger specific actions after a Models event. Django has built in listeners for when things like save happens on a model, and you can perform different things based on this. This allows you to decouple your event layer from your model.
For the singleton piece:
I think a Models method should not be performing actions that are not related to itself. I think it would be best to create some kind of generic service as a singleton, something like
EventService():
def do_event_related_logic(self, event):
...
event_service = EventService()
and you can simply import the event_service, and use that class for all event related logic. In this case, you could call one of your singleton's methods on a post_save event.

Selecting only active records django query

In my Django project, I have an is_active boolean column in every table of my database. Every time I or the framework accesses the database, I want only the active records to show up. What is the standard way to achieve this? Certainly I don't want to check for is_active in every queries I make.
The easiest way to do this is to create a custom model manager, like this:
class OnlyActiveManager(models.Manager):
def get_queryset(self):
return super(OnlyActiveManager, self).get_queryset().filter(is_active=True)
Then, add it to your models:
class MyModel(models.Model):
objects = models.Manager()
active = OnlyActiveManager()
Next, use it like this:
foo = MyModel.active.all()
You can also use it to replace the default manager (called objects), but then you'll have to do custom queries to get all records that are in-active.
You can write a manager class for your model, A sample model manager is given below, for more you can refer Django official website
class MediaManager(models.Manager):
def get_queryset(self):
return MediaQuerySet(self.model, using=self._db)
def active(self):
return self.filter(is_active=True)
class Media(models.Model):
(..model fields..)
objects = MediaManager()
The query should be like
media = Media.objects.active()
you can do it by using django model managers.
please check django documentaion for detail django documentaion

How to add custom field to mongoengine model?

I have a Project model as follows:
class Project(me.Document):
title = me.StringField(max_length=64, required=True, unique=True)
start_date = me.DateTimeField(default=datetime.utcnow())
end_date = me.DateTimeField(default=datetime.utcnow())
duration = me.IntField() # sprint duration
sequence = me.IntField()
def __init__(self, *args, **values):
super(Project, self).__init__( *args, **values)
def __str__(self):
return self.title
def get_current_sprint(self):
''' A logic here to calculate the current sprint.'''
And anther model sprint:
class Sprint(me.Document):
start_date = me.DateTimeField()
end_date = me.DateTimeField()
sequence = me.IntField(required=True, default=0, unique_with='project')
project = me.ReferenceField('Project')
If I have project instance then I can get current sprint by calling the method as
project.get_current_sprint()
But What I am trying to is ; whenever a project object is being queried, rather than calling a method to get the current sprint, it should have an attribute project.current_sprint which has the current sprint info.
Is there a way to achieve it?
Any help would be really appreciated.
I think the concept of what you're looking for is called Database References in MongoDB.
In MongoEngine, you would probably create a ReferenceField in your Project model, which would reference a Sprint document.
I am trying to achieve something similar, and while I don't know the entire answer, I'll post what I have identified. The thing you probably want done should be presumably enabled by a query set (which you would access through Project.objects). Mongoengine creates one, but allows you to replace it, so that when you get (Project.objects.get(...)) for instance, it might "prefetches" the sprint relevant to this project. How to do that is probably through the mongoengine query syntax, which I'm not yet familiar with.
In the end, it's possible you'll have to combine properties and cache to achieve what you want. the queried project will have a dynamic reference to a sprint (say project.sprint) and you could have a property on Project to check whether this data exists (and if not, query it).

Django models and Python properties

I've tried to set up a Django model with a python property, like so:
class Post(models.Model):
_summary = models.TextField(blank=True)
body = models.TextField()
#property
def summary(self):
if self._summary:
return self._summary
else:
return self.body
#summary.setter
def summary(self, value):
self._summary = value
#summary.deleter
def summary(self):
self._summary = ''
So far so good, and in the console I can interact with the summary property just fine. But when I try to do anything Django-y with this, like Post(title="foo", summary="bar"), it throws a fit. Is there any way to get Django to play nice with Python properties?
Unfortunately, Django models don't play very nice with Python properties. The way it works, the ORM only recognizes the names of field instances in QuerySet filters.
You won't be able to refer to summary in your filters, instead you'll have to use _summary. This gets messy real quick, for example to refer to this field in a multi-table query, you'd have to use something like
User.objects.filter(post___summary__contains="some string")
See https://code.djangoproject.com/ticket/3148 for more detail on property support.

ForeignKey to abstract class (generic relations)

I'm building a personal project with Django, to train myself (because I love Django, but I miss skills). I have the basic requirements, I know Python, I carefully read the Django book twice if not thrice.
My goal is to create a simple monitoring service, with a Django-based web interface allowing me to check status of my "nodes" (servers). Each node has multiple "services". The application checks the availability of each service for each node.
My problem is that I have no idea how to represent different types of services in my database. I thought of two "solutions" :
single service model, with a "serviceType" field, and a big mess with the fields. (I have no great experience in database modeling, but this looks... "bad" to me)
multiple service models. i like this solution, but then I have no idea how I can reference these DIFFERENT services in the same field.
This is a short excerpt from my models.py file : (I removed everything that is not related to this problem)
from django.db import models
# Create your models here.
class service(models.Model):
port = models.PositiveIntegerField()
class Meta:
abstract = True
class sshService(service):
username = models.CharField(max_length=64)
pkey = models.TextField()
class telnetService(service):
username = models.CharField(max_length=64)
password = models.CharField(max_length=64)
class genericTcpService(service):
pass
class genericUdpService(service):
pass
class node(models.Model):
name = models.CharField(max_length=64)
# various fields
services = models.ManyToManyField(service)
Of course, the line with the ManyToManyField is bogus. I have no idea what to put in place of "*Service". I honestly searched for solutions about this, I heard of "generic relations", triple-join tables, but I did'nt really understand these things.
Moreover, English is not my native language, so coming to database structure and semantics, my knowledge and understanding of what I read is limited (but that's my problem)
For a start, use Django's multi-table inheritance, rather than the abstract model you have currently.
Your code would then become:
from django.db import models
class Service(models.Model):
port = models.PositiveIntegerField()
class SSHService(Service):
username = models.CharField(max_length=64)
pkey = models.TextField()
class TelnetService(Service):
username = models.CharField(max_length=64)
password = models.CharField(max_length=64)
class GenericTcpService(Service):
pass
class GenericUDPService(Service):
pass
class Node(models.Model):
name = models.CharField(max_length=64)
# various fields
services = models.ManyToManyField(Service)
On the database level, this will create a 'service' table, the rows of which will be linked via one to one relationships with separate tables for each child service.
The only difficulty with this approach is that when you do something like the following:
node = Node.objects.get(pk=node_id)
for service in node.services.all():
# Do something with the service
The 'service' objects you access in the loop will be of the parent type.
If you know what child type these will have beforehand, you can just access the child class in the following way:
from django.core.exceptions import ObjectDoesNotExist
try:
telnet_service = service.telnetservice
except (AttributeError, ObjectDoesNotExist):
# You chose the wrong child type!
telnet_service = None
If you don't know the child type beforehand, it gets a bit trickier. There are a few hacky/messy solutions, including a 'serviceType' field on the parent model, but a better way, as Joe J mentioned, is to use a 'subclassing queryset'. The InheritanceManager class from django-model-utils is probably the easiest to use. Read the documentation for it here, it's a really nice little bit of code.
I think one approach that you might consider is a "subclassing queryset". Basically, it allows you to query the parent model and it will return instances of the child models in the result queryset. It would let you do queries like:
models.service.objects.all()
and have it return to you results like the following:
[ <sshServiceInstance>, <telnetServiceInstance>, <telnetServiceInstance>, ...]
For some examples on how to do this, check out the links on the blog post linked below.
http://jazstudios.blogspot.com/2009/10/django-model-inheritance-with.html
However, if you use this approach, you shouldn't declare your service model as abstract as you do in the example. Granted, you will be introducing an extra join, but overall I've found the subclassing queryset to work pretty well for returning a mixed set of objects in a queryset.
Anyway, hope this helps,
Joe
If you are looking for generic foreign key relations you should check the Django contenttypes framework (built into Django). The docs pretty much explain how to use it and how to work with generic relations.
An actual service can only be on one node, right? In that case when not have a field
node = models.ForeignKey('node', related_name='services')
in the service class?

Categories