Have django ORM prepare sql, without executing - python

I am doing a massive data conversion for data that will end up in a django managed database. For reasons of efficiency and politics, we need to fill the destination database with manually run mass INSERTS.
I would like to have my Django ORM prepare those statements, so I can write them to a file to be run later.
So I need somthing like this:
50000_or_so_Foos = [...]
sql_str = Foo.objects.bulk_create_sql(50000_or_so_Foos)
with file("pre_preped.sql", 'w') as f:
f.write(sql_str)
Then we will pass pre_preped.sql to another department and they will play it into the database.
Is there a way to do this?
Is this actually going to save us any time?
ADDED Question: Should I be creating a csv for LOADDATA instead?
(I should note that in the real world, we have more than one model and way more than 50000 objects)

I am not sure of any easy way to get the query from bulk_create, because it executes query when it is called, as opposed to somethign like filtering where you can view the querysets query property.
As I was quickly scanning source code, it looks like you can manually build a query using the sql object, the same way django does in bulk_create. https://github.com/django/django/blob/master/django/db/models/query.py#L917 can provide a blueprint on how to do that.

Related

data base or text file and excel file in Django

if you have some fixed data in Django, for example, ten rows and 5 columns.
Is it better to create a database for it and read it from the database, or is it not good and it is better to create a dictionary and read the data from the dictionary?
In terms of speed and logic and ...
If the database is not a good choice, should I write the data as a dictionary in View Django or inside a text file or inside an Excel file?
Whichever method is better, please explain why.
It depends upon the application.. but if there is doubt, create a model for it and put it in the database. And here's why I say that:
If your data needs to be changed, or if you want to view it, you can easily do so in the Django Admin app.
If your applications contains models which relate to this data, you can use a foreign key to reference it, rather than replicating it or using references that aren't enforced by the database.
It makes it much easier to do queries on your whole database if everything is in the database. For example, let's say that you have a table of "houses" and each house has a "color".. but you've stored the list of color names in a dictionary outside the database. Now you want a list of houses that are "Bright Blue". First you have to look in your dictionary to find the id of the color "Bright Blue", then you have to do your database lookup using the id you found. It takes something that would normally be a very simple one-line query in Django and makes it much harder.
By the same logic, if you wanted a list of houses along with their color, this would be a very simple query if done entirely in the database but is extra work if you keep some data elsewhere.

Insert statment created by django ORM at bulk_create

I am kind of new to python and django.
I am using bulk_create to insert a lot of rows and as a former DBA I would very much like to see what insert statments are being executed. I know that for querys you can use .query but for insert statments I can't find a command.
Is there something I'm missing or is there no easy way to see it? (A regular print is fine by me.)
The easiest way is to set DEBUG = True and check connection.queries after executing the query. This stores the raw queries and the time each query takes.
from django.db import connection
MyModel.objects.bulk_create(...)
print(connection.queries[-1]['sql'])
There's more information in the docs.
A great tool to make this information easily accessible is the django-debug-toolbar.

quickest and cleanest method for inserting multiple rows django

I am in a situation where I would have to insert multiple records into a postgre db through an ajax call, based on a foreignkey.
Currently I am using db1.db2_set.create(...) for each record, looping over a list of dictionaries.
Is this the best way to do it? It seems like I'm hitting the database for every insert.
I'm pretty sure that django will query database when you call save() method. So if you do something like:
for i in objects:
db1.db2_set.create(i)
db1.save()
It may acess database only once. But still, this might be usefull:
http://djangosnippets.org/snippets/766/
It's a middleware that you can add to see how many querys your django application uses in the every page you are accessing.

How to create and restore a backup from SqlAlchemy?

I'm writing a Pylons app, and am trying to create a simple backup system where every table is serialized and tarred up into a single file for an administrator to download, and use to restore the app should something bad happen.
I can serialize my table data just fine using the SqlAlchemy serializer, and I can deserialize it fine as well, but I can't figure out how to commit those changes back to the database.
In order to serialize my data I am doing this:
from myproject.model.meta import Session
from sqlalchemy.ext.serializer import loads, dumps
q = Session.query(MyTable)
serialized_data = dumps(q.all())
In order to test things out, I go ahead and truncation MyTable, and then attempt to restore using serialized_data:
from myproject.model import meta
restore_q = loads(serialized_data, meta.metadata, Session)
This doesn't seem to do anything... I've tried calling a Session.commit after the fact, individually walking through all the objects in restore_q and adding them, but nothing seems to work.
What am I missing? Or is there a better way to do what I'm aiming for? I don't want to shell out and directly touch the database, since SqlAlchemy supports different database engines.
You have to use Session.merge() method instead of Session.add() to put deserialized object back into the session.

Django objects.filter, how "expensive" would this be?

I am trying to make a search view in Django. It is a search form with freetext input + some options to select, so that you can filter on years and so on. This is some of the code I have in the view so far, the part that does the filtering. And I would like some input on how expensive this would be on the database server.
soknad_list = Soknad.objects.all()
if var1:
soknad_list = soknad_list.filter(pub_date__year=var1)
if var2:
soknad_list = soknad_list.filter(muncipality__name__exact=var2)
if var3:
soknad_list = soknad_list.filter(genre__name__exact=var3)
# TEXT SEARCH
stop_word_list = re.compile(STOP_WORDS, re.IGNORECASE)
search_term = '%s' % request.GET['q']
cleaned_search_term = stop_word_list.sub('', search_term)
cleaned_search_term = cleaned_search_term.strip()
if len(cleaned_search_term) != 0:
soknad_list = soknad_list.filter(Q(dream__icontains=cleaned_search_term) | Q(tags__icontains=cleaned_search_term) | Q(name__icontains=cleaned_search_term) | Q(school__name__icontains=cleaned_search_term))
So what I do is, first make a list of all objects, then I check which variables exists (I fetch these with GET on an earlier point) and then I filter the results if they exists. But this doesn't seem too elegant, it probably does a lot of queries to achieve the result, so is there a better way to this?
It does exactly what I want, but I guess there is a better/smarter way to do this. Any ideas?
filter itself doesn't execute a query, no query is executed until you explicitly fetch items from query (e.g. get), and list( query ) also executes it.
You can see the query that will be generated by using:
soknad_list.query.as_sql()[0]
You can then put that into your database shell to see how long the query takes, or use EXPLAIN (if your database backend supports it) to see how expensive it is.
As Aaron mentioned, you should get a hold of the query text that is going to be run against the database and use an EXPLAIN (or other some method) to view the query execution plan. Once you have a hold of the execution plan for the query you can see what is going on in the database itself. There are a lot of operations that see very expensive to run through procedural code that are very trivial for any database to run, especially if you provide indexes that the database can use for speeding up your query.
If I read your question correctly, you're retrieving a result set of all rows in the Soknad table. Once you have these results back you use the filter() method to trim down your results meet your criteria. From looking at the Django documentation, it looks like this will do an in-memory filter rather than re-query the database (of course, this really depends on which data access layer you're using and not on Django itself).
The most optimal solution would be to use a full-text search engine (Lucene, ferret, etc) to handle this for you. If that is not available or practical the next best option would be to to construct a query predicate (WHERE clause) before issuing your query to the database and let the database perform the filtering.
However, as with all things that involve the database, the real answer is 'it depends.' The best suggestion is to try out several different approaches using data that is close to production and benchmark them over at least 3 iterations before settling on a final solution to the problem. It may be just as fast, or even faster, to filter in memory rather than filter in the database.

Categories