I need to make a function that will be launched in celery and will take records from the model in turn, check something and write data to another model with onetoone relationship. There are a lot of entries and using model_name.objects.all () is not appropriate (it will take a lot of memory and time) how to do it correctly.
You can use an iterator over the queryset https://docs.djangoproject.com/en/dev/ref/models/querysets/#iterator so your records are fetched on by one
model_iterator = your_model.objects.all().iterator()
for record in model_iterator:
do_something(record)
Related
Suppose I have a QuerySet of all db entries:
all_db_entries = Entry.objects.all()
And then I want to get some specific objects from it by calling get(param=value) (or any other method).
The problem is, that In documentation of QuerySet methods it is said: "These methods do not use a cache. Rather, they query the database each time they’re called.".
But what I want to achieve is to load all data once (like doing Select *), and only after do some searches on them. I don't want to open a connection to the db every time I call get() in order to avoid a heavy load on it.
You can use values to convert your resulting queryset in an ordinary python list, which you can use to do searches etc., e.g.:
list(MyModel.objects.values('pk', 'field'))
values will fetch the queryset once.
I need to cache a mid-sized queryset (about 500 rows). I had a look on some solutions, django-cache-machine being the most promising.
Since the queryset is pretty much static (it's a table of cities that's been populated in advance and gets updated only by me and anyway, almost never), I just need to serve the same queryset at every request for filtering.
In my search, one detail was really not clear to me: is the cache a sort of singleton object, which is available to every request? By which I mean, if two different users access the same page, and the queryset is evaluated for the first user, does the second one get the cached queryset?
I could not figure out, what problem you are exactly facing. What you are saying is the classical use case for caching. Memcache and redis are two most popular options. You just needs to write some method or function which first tries to load the result from cache, if it not there , the it queries the database. E.g:-
from django.core.cache import cache
def cache_user(userid):
key = "user_{0}".format(userid)
value = cache.get(key)
if value is None:
# fetch value from db
cache.set(value)
return value
Although for simplicity, I have written this as function, ideally this should be a manager method of the concerned model.
I noticed that there's no guarantee that the data base is updated synchronously after calling save() on a model.
I have done a simple test by making an ajax call to the following method
def save(request, id)
product = ProductModel.objects.find(id = id)
product.name = 'New Product Name'
product.save()
return HTTPResponse('success')
On the client side I wait for a response from the above method and then execute findAll method that retrieves the list of products. The returned list of products contains the old value for the name of the updated product.
However, if I delay the request for the list of products then it contains the new value, just like it is should.
This means that return HTTPResponse('success') if fired before the new values are written into the data base.
If the above is true then is there a way to return the HTTP response only after the data base is updated.
You should have mentioned App Engine more prominently. I've added it to the tags.
This is very definitely because of your lack of understanding of GAE, rather than anything to do with Django. You should read the GAE documentation on eventual consistency in the datastore, and structure your models and queries appropriately.
Normal Django, running with a standard relational database, would not have this issue.
The view should not return anything prior to the .save() function ends its flow.
As for the flow itself, the Django's docs declare it quite explicitly:
When you save an object, Django performs the following steps:
1) Emit a pre-save signal. The signal django.db.models.signals.pre_save is sent, allowing any functions listening for that signal to take some customized action.
2) Pre-process the data. Each field on the object is asked to perform any automated data modification that the field may need to perform.
Most fields do no pre-processing — the field data is kept as-is. Pre-processing is only used on fields that have special behavior. For example, if your model has a DateField with auto_now=True, the pre-save phase will alter the data in the object to ensure that the date field contains the current date stamp. (Our documentation doesn’t yet include a list of all the fields with this “special behavior.”)
3) Prepare the data for the database. Each field is asked to provide its current value in a data type that can be written to the database.
Most fields require no data preparation. Simple data types, such as integers and strings, are ‘ready to write’ as a Python object. However, more complex data types often require some modification.
For example, DateField fields use a Python datetime object to store data. Databases don’t store datetime objects, so the field value must be converted into an ISO-compliant date string for insertion into the database.
4) Insert the data into the database. The pre-processed, prepared data is then composed into an SQL statement for insertion into the database.
5) Emit a post-save signal. The signal django.db.models.signals.post_save is sent, allowing any functions listening for that signal to take some customized action.
Let me note that the behaviour you're receiving is possible if you've applied #transaction.commit_on_success decorator to your view, though, I don't see it in your code.
More on transactions: https://docs.djangoproject.com/en/1.5/topics/db/transactions/
Let's say I have this data model:
class Workflow(models.Model):
...
class Command(models.Model):
workflow = models.ForeignKey(Workflow)
...
class Job(models.Model):
command = models.ForeignKey(Command)
...
Suppose somewhere I want to loop through all the Workflow objects, and for each workflow I want to loop through its Commands, and for each Command I want to loop through each Job. Is there a way to structure this with a single query?
That is, I'd like Workflow.objects.all() to join in its dependent models, so I get a collection that has dependent objects already cached, so workflows[0].command_set.get() doesn't produce an additional query.
Is this possible?
The other way around it's easy since you can do
all_jobs = Job.objects.select_related().all()
And any job.command or job.command.workflow won't produce additional query.
Not sure if it's possible with a Workflow query.
I think the only way you could do that would be using django.db.connection and write your own query.
Since this would be iterating all instances of Job (your ForeignKeys aren't set null) anyway you could select all Job's and then group them outside of the ORM
I am trying to design a tagging system with a model like this:
Tag:
content = CharField
creator = ForeignKey
used = IntergerField
It is a many-to-many relationship between tags and what's been tagged.
Everytime I insert a record into the assotication table,
Tag.used is incremented by one, and decremented by one in case of deletion.
Tag.used is maintained because I want to speed up answering the question 'How many times this tag is used?'.
However, this seems to slow insertion down obviously.
Please tell me how to improve this design.
Thanks in advance.
http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html
If your database support materialized indexed views then you might want to create one for this. You can get a large performance boost for frequently run queries that aggregate data, which I think you have here.
your view would be on a query like:
SELECT
TagID,COUNT(*)
FROM YourTable
GROUP BY TagID
The aggregations can be precomputed and stored in the index to minimize expensive computations during query execution.
I don't think it's a good idea to denormalize your data like that.
I think a more elegant solution is to use django aggregation to track how many times the tag has been used http://docs.djangoproject.com/en/dev/topics/db/aggregation/
You could attach the used count to your tag object by calling something like this:
my_tag = Tag.objects.annotate(used=Count('post'))[0]
and then accessing it like this:
my_tag.used
assuming that you have a Post model class that has a ManyToMany field to your Tag class
You can order the Tags by the named annotated field if needed:
Tag.objects.annotate(used=Count('post')).order_by('-used')