Insert statment created by django ORM at bulk_create - python

I am kind of new to python and django.
I am using bulk_create to insert a lot of rows and as a former DBA I would very much like to see what insert statments are being executed. I know that for querys you can use .query but for insert statments I can't find a command.
Is there something I'm missing or is there no easy way to see it? (A regular print is fine by me.)

The easiest way is to set DEBUG = True and check connection.queries after executing the query. This stores the raw queries and the time each query takes.
from django.db import connection
MyModel.objects.bulk_create(...)
print(connection.queries[-1]['sql'])
There's more information in the docs.
A great tool to make this information easily accessible is the django-debug-toolbar.

Related

Autocomplete SQL query in db.execute('') on VScode

When writing SQL query in python (Flask, if that's necessary) execute(), is there a setting or extensions that would recognize SQL keywords like SELECT, UPDATE, and suggest them with IntelliSense or the like?
Right now the query is recognized as in the picture and keywords are not being suggested.
SQL query keywords in VScode are not recognized (the whole query is green)
No, because you're just putting in a string into execute() that is later read by SQLAlchemy (which I assume you're using). These aren't actually python keywords which IntelliSense can predict. However, you can use the SQLAlchemy ORM, or the higher level query object, which does not use SQL keywords but python methods to manipulate your database. Using this, you might find that IntelliSense can find the definition/declaration of the SQLAlchemy method and offer the usual pointers and helpers it does.
There are other advantages of using the higher level query class of SQLAlchemy, a significant one being you are less likely to be subject to SQL injections and attacks. Because you are executing raw SQL with the execute() command and simply putting an id from the session in, an attacker could alter the session value and inject harmful SQL into your application.
Anyway, that's beside the point but I thought it was worth letting you know.

How can I track all SQL query timings and counts in Django?

I'd like to have a Django application record how much time each SQL query took.
The first problem is that SQL queries differ, even when they originate from the same code. That can be solved by normalizing them, so that
SELECT first_name, last_name FROM people WHERE NOW() - birth_date < interval '20' years;
would become something like
SELECT $ FROM people WHERE $ - birth_date < $;
After getting that done, we could just log the normalized query and the query timing to a file, syslog or statsd (for statsd, I'd probably also use a hash of the query as a key, and keep an index of hash->query relations elsewhere).
The bigger problem, however, is figuring out where that action can be performed. The best place for that I could find is this: https://github.com/django/django/blob/b5bacdea00c8ca980ff5885e15f7cd7b26b4dbb9/django/db/backends/util.py#L46 (note: we do use that ancient version of Django, but I'm fine with suggestions that are relevant only to newer versions).
Ideally, I'd like to make this a Django extension, rather than modifying Django source code. Sounds like I can make another backend, inheriting from the one we currently use, and make its CursorWrapper's class execute method record the timing and counter.
Is that the right approach, or should I be using some other primitives, like QuerySet or something?
Django debug toolbar has a panel that shows "SQL queries including time to execute and links to EXPLAIN each query"
http://django-debug-toolbar.readthedocs.io/en/stable/panels.html#sql

get_or_create in Peewee

The paragraph titled Get or create on the peewee documentation says:
While peewee has a get_or_create() method, this should really not be
used outside of tests as it is vulnerable to a race condition. The
proper way to perform a get or create with peewee is to rely on the
database to enforce a constraint.
And then it goes on with an example that only shows the create part, not the get part.
What is the best way to perform a get or create with peewee?
Everything you are doing inside a transaction is atomic.
So as long as you are calling get_or_create() inside a transaction, that paragraph is wrong.

Have django ORM prepare sql, without executing

I am doing a massive data conversion for data that will end up in a django managed database. For reasons of efficiency and politics, we need to fill the destination database with manually run mass INSERTS.
I would like to have my Django ORM prepare those statements, so I can write them to a file to be run later.
So I need somthing like this:
50000_or_so_Foos = [...]
sql_str = Foo.objects.bulk_create_sql(50000_or_so_Foos)
with file("pre_preped.sql", 'w') as f:
f.write(sql_str)
Then we will pass pre_preped.sql to another department and they will play it into the database.
Is there a way to do this?
Is this actually going to save us any time?
ADDED Question: Should I be creating a csv for LOADDATA instead?
(I should note that in the real world, we have more than one model and way more than 50000 objects)
I am not sure of any easy way to get the query from bulk_create, because it executes query when it is called, as opposed to somethign like filtering where you can view the querysets query property.
As I was quickly scanning source code, it looks like you can manually build a query using the sql object, the same way django does in bulk_create. https://github.com/django/django/blob/master/django/db/models/query.py#L917 can provide a blueprint on how to do that.

Parsing SQL Query into a DOM-like tree to enable automatic permutation?

I have a large and complicated sql view that I am attempting to debug. There is a record not showing in the view and I need to determine which clause or join is causing the record to now show up. At the moment I am doing this in a very manual way, removing clauses one at a time and running the query to see if the required row shows up.
I think that it would be great if I could do this programmatically because I end up diving into queries like this about once a fortnight.
Does anybody know if there is a way to parse an SQL query into a tree of objects (for example those in sqlalchemy.sql.expression) so I am able to permuate the tree and execute the results?
If you don't already have the view defined in SQLAlchemy, I don't think it can help you.
You could try something like sqlparse which might get you some of the way there.
You could permute it's output and execute the permutations as raw sql using SQLA.

Categories