An efficient way to save parsed XML content to Django Model - python

This is my first question so I will do my best to conform to the question guidelines. I'm also learning how to code so please ELI5.
I'm working on a django project that parses XML to django models. Specifically Podcast XMLs.
I currently have this code in my model:
from django.db import models
import feedparser
class Channel(models.Model):
channel_title = models.CharField(max_length=100)
def __str__(self):
return self.channel_title
class Item(models.Model):
channel = models.ForeignKey(Channel, on_delete=models.CASCADE)
item_title = models.CharField(max_length=100)
def __str__(self):
return self.item_title
radiolab = feedparser.parse('radiolab.xml')
if Channel.objects.filter(channel_title = 'Radiolab').exists():
pass
else:
channel_title= radiolab.feed.title
a = Channel.objects.create(channel_title=channel_title)
a.save()
for episode in radiolab.entries:
item_title = episode.title
channel_title = Channel.objects.get(channel_title="Radiolab")
b = Item.objects.create(channel=channel_title, item_title=item_title)
b.save()
radiolab.xml is a feed I've saved locally from Radiolab Podcast Feed.
Because this code is run whenever I python manage.py runserver, the parsed xml content is sent to my database just like I want to but this happens every time I runserver, meaning duplicate records.
I'd love some help in finding a way to make this happen just once and also a DRY mechanism for adding different feeds so they're parsed and saved to database preferably with the feed url submitted via forms.

If you don't want it run every time, don't put it in models.py. The only thing that belongs there are the model definitions themselves.
Stuff that happens in response to a user action on the site goes in a view. Or, if you want this to be done from the admin site, it should go in the admin.py file.

Related

Django 3 models.Q - app_lable gets displayed inside html

since I upgraded to Django 3.x I have a strage behaviour.
Imaging the following field at your models.py
content_type = models.ForeignKey(ContentType, limit_choices_to=filter_choice, on_delete=models.CASCADE, null=True, blank=True)
which refers to:
filter_choice = models.Q(app_label='App', model='model_x') | models.Q(app_label='App', model='model_y')
If I now display the content_type field on my html templates it look like this: "App| Model Y" which looks quite stupid, same goes for Django admin. Is this a Bug? I'm asking because on Django 2.2.7 (Latest version of 2.x) I dont had this behaviour and only model_x and model_y have been displayed as expected.
Would be awesome if only model_x and model_y getting displayd without there app lables. Is there any solution for this, maybe a new option that comes with django 3.x?
Thanks in advance :)
If I now display the content_type field on my html templates it look like this: "App| Model Y" which looks quite stupid.
This is how the __str__ of a ContentType is implemented. Indeed, if we take a look at the source code [GitHub], we see:
class ContentType(models.Model):
# …
def __str__(self):
return self.app_labeled_name
# …
#property
def app_labeled_name(self):
model = self.model_class()
if not model:
return self.model
return '%s | %s' % (model._meta.app_label, model._meta.verbose_name)
If you want to render the model name however, you can for example use:
{{ object.content_type.model_class._meta.verbose_name }}
It makes sense to include the app label, since the same model name can be used in different apps, hence it is possible that your project has two Model Ys, in two different apps.
Furthermore it is not very common to render a ContentType in the template. Normally this is part of the technical details of your project, that you likely do not want to expose. If you need to show the type of the object in a GenericForeignKey, you can simply follow the GenericForeignKey, and render the ._meta.verbose_name of that object.

Write to Django DB from lists if criteria from database are met?

New to Django, so bear with me. I am working on a simple link aggregator site. I have a script that pulls links and associated info (titles, date, etc) from xml files and stores them as lists. This is a file called scraper.py and is under my project app folder news.
scraper.py generates a series of lists from XML files. The scaper.py code is essentially as follows:
def MakeLists():
###lots of code to get to this point###
###returns the following series of lists###
return Article_date, Article_link, Article_vote, Article_title, Article_publisher
These outputs correspond to my Django models.py file, which is as follows:
class Article(models.Model):
title = models.TextField()
publisher = models.URLField()
link = models.URLField()
date = models.DateField()
pull_date = models.DateTimeField(auto_now=True)
vote = models.IntegerField(default=1)
And here is the view that makes my home page, with my latest attempt at getting the new scraped data into my db (data is gathered in MakeLists()):
class ArticleList(ListView):
model = Article
context_object_name = 'Articles'
pull_date = Article.objects.aggregate(Max('pull_date'))
def get_new_db_stuff(self):
check_time = datetime.datetime.now()-datetime.timedelta(hours=4)
if pull_date > check_time: #i.e., more than 4 hours ago
Article_date, Article_link, Article_vote, Article_title, Article_publisher = MakeLists()
for i in range(0, len(Article_link)):
if Article.object.filter(link=Article_link[i]).exists()==False:
a = Article(link=Article_link[i], date=Article_date[i], vote=Article_vote[i],
title=Article_title[i], publisher = Article_publisher[i])
a.save()
The issue is that it just doesn't seem to be doing anything... nothing is being written to DB. There aren't an errors popping up when I runserver or when I click on pages.
Questions:
1. How do I check if anything is being done? i.e., figure out if variables are being created etc.?
2. My thought is that I am botching the query, but how can I troubleshoot that?
This is the wrong place to implement this type of solution. It is better done by putting code all in a separate file and then placing that in a folder structure that is:
app_name>management>commands>file_name.py
By putting it in this structure, it can either be ran from the manage.py shell or can be set up as a chron job (better).
For the comparison
if Article.objects.filter(link=Article_link[i]).exists() == False:
works just fine.

How to retrieve a single record using icontains in django

I'm trying to query a single record using the following but I get a 500 error.
cdtools = CdTool.objects.get(product=product_record.id,
platform=platform_record.id, url__icontains=toolname)
models.py
class CdTool(models.Model):
product = models.ForeignKey(Product)
platform = models.ForeignKey(Platform)
url = models.CharField(max_length=300)
a model instance has an full url, but in my query 'toolname' is the short form (ie: google instead of https://google.com), so I'm trying to figure out what the correct query would be here. I'm open to modifying models.py as long as there is no data migration stuff involved.

How to handle concurrency with django queryset get method?

I'm using django (1.5 with mysql) select_for_update method for fetching data from one model and serve this data to user upon request, but when two user request at simultaneously it returns same data for both of the user, see the sample code below
models.py
class SaveAccessCode(models.Model):
code = models.CharField(max_length=10)
class AccessCode(models.Model):
code = models.CharField(max_length=10)
state = models.CharField(max_length=10, default='OPEN')
views.py
def view(request, code):
# for example code = 234567
acccess_code = AccessCode.objects.select_for_update().filter(
code=code, state='OPEN')
acccess_code.delete()
SaveAccessCode.objects.create(code=code)
return
Concurrent request will generate two records of SaveAccessCode with same code, Please guide me how to handle this scenario in better way
You need to set some flag on the model when doing select_for_update, something like:
qs.first().update(is_locked=True)`
and before that should do select like
qs = self.select_for_update().filter(state='OPEN', is_locked=False).order_by('id')
Then after the user, I presume, has done something with it and saved, set the flag is_locked=False and save.
Also make the fetch_available_code as a #staticmethod.

How do I cascade delete in this scenario using MongoEngine?

I have this simple model:
from mongoengine import *
from datetime import datetime
class Person(Document):
firstname = StringField(required=True)
#property
def comments(self):
return Comment.objects(author=self).all()
class Comment(Document):
text = StringField(required=True)
timestamp = DateTimeField(required=True, default=datetime.now())
author = ReferenceField('Person', required=True, reverse_delete_rule=CASCADE)
class Program(Document):
title = StringField(required=True)
comments = ListField(ReferenceField('Comment'))
class Episode(Document):
title = StringField(required=True)
comments = ListField(ReferenceField('Comment'))
As you can see, both Programs and Episodes can have comments. Initially, I tried to embed the comments but I seemed to run into a brick wall. So I'm trying Comments as a Document class instead. My question is, how do I model it so that:
When a Person is deleted, so are all their comments
When a Comment is deleted (either directly or indirectly), it is removed from its parent
When a Program or Episode is deleted, so are the Comment objects
I'm use to doing all this manually in MongoDB (and SQLa, for that matter), but I'm new to MongoEngine and I'm struggling a bit. Any help would be awesome!
Not all of these are possible without writing application code to handle the logic. I would write signals to handle some of the edge cases.
The main issue you have is global updates / removes aren't handled - so you'd have to ensure that the api you write in the api is used, to ensure a clean database state.

Categories