Regular, timed editing of a text file in Django - python

Every time a certain table is saved/created in my application I want a text file on the server to be updated in tandem. I've been thinking that this could be either done each time the model's save() method is called, or perhaps just achieved as a regular job every hour, for example.
I can't see a standard Django-y way of actually implementing this, does anyone have a suggestion, or perhaps a better idea?
Thanks very much

Maybe you can use the Django signals to write the model changes in your file.

If you're looking for revision support for your models you could always use django-reversion
https://github.com/etianen/django-reversion
This will keep track of all model changes.
If you want it to run every hour instead of on change, I recommend using django-celery to set up a task
https://github.com/ask/django-celery

Related

Django - Best way to create snapshots of objects

I am currently working on a Django 2+ project involving a blockchain, and I want to make copies of some of my object's states into that blockchain.
Basically, I have a model (say "contract") that has a list of several "signature" objects.
I want to make a snapshot of that contract, with the signatures. What I am basically doing is taking the contract at some point in time (when it's created for example) and building a JSON from it.
My problem is: I want to update that snapshot anytime a signature is added/updated/deleted, and each time the contract is modified.
The intuitive solution would be to override each "delete", "create", "update" of each of the models involved in that snapshot, and pray that all of them the are implemented right, and that I didn't forget any. But I think that this is not scalable at all, hard to debug and ton maintain.
I have thought of a solution that might be more centralized: using a periodical job to get the last update date of my object, compare it to the date of my snapshot, and update the snapshot if necessary.
However with that solution, I can identify changes when objects are modified or created, but not when they are deleted.
So, this is my big question mark: how with django can you identify deletions in relationships, without any prior context, just by looking at the current database's state ? Is there a django module to record deleted objects ? What are your thoughts on my issue ?
All right?
I think that, as I understand your problem, you are in need of a module like Django Signals, which listens for changes in the database and, when identified (and if all the desired conditions are met), executes certain commands in your application ( even be able to execute in the database).
This is the most recent documentation:
https://docs.djangoproject.com/en/3.1/topics/signals/

Communicating with the outside world from within an atomic database transaction

I am implementing an import tool (Django 1.6) that takes a potentially very large CSV file, validates it and depending on user confirmation imports it or not. Given the potential large filesize, the processing of the file is done via flowy (a python wrapper over Amazon's SWF). Each import job is saved in a table in the DB and the workflow, which is quite simple and consists of only one activity, basically calls a method that runs the import and saves all necessary information about the processing of the file in the job's record in the database.
The tricky thing is: We now have to make this import atomic. Either all records are saved or none. But one of the things saved in the import table is the progress of the import, which is calculated based on the position of the file reader:
progress = (raw_data.tell() * 100.0) / filesize
And this progress is used by an AJAX progressbar widget in the client side. So simply adding #transaction.atomic to the method that loops through the file and imports the rows is not a solution, because the progress will only be saved on commit.
The CSV files only contain one type of record and affect a single table. If I could somehow do a transaction only on this table, leaving the job table free for me to update the progress column, it would be ideal. But from what I've found so far it seems impossible. The only solution I could think of so far is opening a new thread and a new database connection inside it every time I need to update the progress. But I keep wondering… will this even work? Isn't there a simpler solution?
One simple approach would be to use the READ UNCOMMITTED transaction isolation level. That could allow dirty reads, which would allow your other processes to see the progress even though the transaction hasn't been committed. However, whether this works or not will be database-dependent. (I'm not familiar with MySQL, but this wouldn't work in PostgreSQL because READ UNCOMMITTED works the same way as READ COMMITTED.)
Regarding your proposed solution, you don't necessarily need a new thread, you really just need a fresh connection to the database. One way to do that in Django might be to take advantage of the multiple database support. I'm imagining something like this:
As described in the documentation, add a new entry to DATABASES with a different name, but the same setup as default. From Django's perspective we are using multiple databases, even though we in fact just want to get multiple connections to the same database.
When it's time to update the progress, do something like:
JobData.objects.using('second_db').filter(id=5).update(progress=0.5)
That should take place in its own autocommitted transaction, allowing the progress to be seen by your web server.
Now, does this work? I honestly don't know, I've never tried anything like it!

Django Models: Keep track of activity through related models?

I have something of a master table of Persons. Everything in my Django app some relates to one or more People, either directly or through long fk chains. Also, all my models have the standard bookkeeping fields 'created_at' and 'updated_at'. I want to add a field on my Person table called 'last_active_at', mostly for raw sql ordering purposes.
Creating or editing certain related models produces new timestamps for those objects. I need to somehow update Person.'last_active_at' with those values. Functionally, this isn't too hard to accomplish, but I'm concerned about undue stress on the app.
My two greatest causes of concern are that I'm restricted to a real db field--I can't assign a function to the Person table as a #property--and one of these 'activity' models receives and processes new instances from a foreign datasource I have no control over, sporadically receiving a lot of data at once.
My first thought was to add a post_save hook to the 'activity' models. Still seems like my best option, but I know nothing about them, how hard they hit the db, etc.
My second thought was to write some sort of script that goes through the day's activity and updates those models over the night. My employers a 'live'er stream, though.
My third thought was to modify the post_save algo to check if the 'updated_at' is less than half an hour from the Person's 'last_active_at', and not update the person if true.
Are my thoughts tending in a scalable direction? Are there other approaches I should pursue?
It is said that premature optimization is the mother of all problems. You should start with the dumbest implementation (update it every time), and then measure and - if needed - replace it with something more efficient.
First of all, let's put a method to update the last_active_at field on Person. That way, all the updating logic itself is concentrated here, and we can easily modify it later.
The signals are quite easy to use : it's just about declaring a function and registering it as a receiver, and it will be ran each time the signal is emitted. See the documentation for the full explanation, but here is what it might look like :
from django.db.models.signals import post_save
from django.dispatch import receiver
#receiver(post_save, sender=RelatedModel)
def my_handler(sender, **kwargs):
# sender is the object being saved
person = # Person to be updated
person.update_activity()
As for the updating itself, start with the dumbest way to do it.
def update_activity(self):
self.last_active_at = now()
Then measure and decide if it's a problem or not. If it's a problem, some of the things you can do are :
Check if the previous update is recent before updating again. Might be useless if a read to you database is not faster than a write. Not a problem if you use a cache.
Write it down somewhere for a deferred process to update later. No need to be daily : if the problem is that you have 100 updates per seconds, you can just have a script update the database every 10 seconds, or every minutes. You can probably find a good performance/uptodatiness trade-off using this technique.
These are just some though based on what you proposed, but the right choice depends on the kind of figures you have. Determine what kind of load you'll have, what kind of reaction time is needed for that field, and experiment.

Write custom translation allocation in Pyramid

Pyramid uses gettext *.po files for translations, a very good and stable way to internationalize an application. It's one disadvantage is it cannot be changed from the app itself. I need some way to give a normal user the ability to change the translations on his own. Django allows changes in the file directly and after the change it restarts the whole app. I do not have that freedom, because the changes will be quite frequent.
Since I could not find any package that will help me with the task, I decided to override the Localizer. My idea is based on using a Translation Domain like Zope projects use and make Localizer search for registered domain, and if not found, back off to default translation strategy.
The problem is that I could not find a good way to place a custom translation solution into the Localizer itself. All I could think of is to reimplement the get_localizer method and rewrite the whole Localizer. But there are several things, that need to be copypasted here, such as interpolation of mappings and other tweeks related to translation strings.
I don't know how much things you have in there but I did something alike a while ago.. Will have to do it once again. The implementation is pretty simple...
If you can be sure that all calls will be handled trough _() or something alike. You can provide your own function. It will look something like it.
def _(val):
val = db.translations.find({key: id, locale: request.locale_name})
if val:
return val['value']
else:
return real_gettext(val)
this is pretty simple... then you need to have something that will dump the database into the file...
But I guess overriding the localizer makes more sense.. I did it a long time ago and overiding the function was easier than searching in the code.
The plus side of Localiser is that it will work everywhere. Monkey patch is pretty cool but it's also pretty ugly. If I had to do it again, I'd provide my own localizer that will load from a database first and then display it's own value. The reason behind database is that if someone closes the server and the file hasn't been modified you won't see the results.
If the DB is more than needed, then Localize is good enough and you can update the file on every update. If the server will get restarted it will load the new files... You'll have to compile the catalog first.

Model to create threaded comments on Google Appengine with Python

I am trying to write a comments model which can be threaded (no limits to the number of child threads). How do I do this in Appengine? and what is the fastest way to read all comments?
I am trying to do this in a scalable way so that app engine's new pricing does not kill my startup :)
It depends on how nested threads you expect to get and if you want to optimize reading or writing. If we guess that the threads normally are quite shallow and that you want to optimize reading all subcomments on all levels of a comment, I think you should store each comment in a separate entry and then put a reference to it in a list on the parent and the parent's parent and so forth.
That way getting all subcomments are always just one call, but writing a new comment is a bit slower, since you have to modify all parents.
If you need fast way to read all comments, keep in comment model not only a reference to the parent comment but add a reference to the whole topic also. Then you will be able to get all comments for the topic with Comment.query(Comment.topic=12334).fetch().
If you are worrying about datastore usage cost use http://code.google.com/p/appengine-ndb-experiment/. It allows to use reduce saved data amount and number of queries. Shortly it will replace current db module.

Categories