Parsing SQL Query into a DOM-like tree to enable automatic permutation? - python

I have a large and complicated sql view that I am attempting to debug. There is a record not showing in the view and I need to determine which clause or join is causing the record to now show up. At the moment I am doing this in a very manual way, removing clauses one at a time and running the query to see if the required row shows up.
I think that it would be great if I could do this programmatically because I end up diving into queries like this about once a fortnight.
Does anybody know if there is a way to parse an SQL query into a tree of objects (for example those in sqlalchemy.sql.expression) so I am able to permuate the tree and execute the results?

If you don't already have the view defined in SQLAlchemy, I don't think it can help you.
You could try something like sqlparse which might get you some of the way there.
You could permute it's output and execute the permutations as raw sql using SQLA.

Related

Insert statment created by django ORM at bulk_create

I am kind of new to python and django.
I am using bulk_create to insert a lot of rows and as a former DBA I would very much like to see what insert statments are being executed. I know that for querys you can use .query but for insert statments I can't find a command.
Is there something I'm missing or is there no easy way to see it? (A regular print is fine by me.)
The easiest way is to set DEBUG = True and check connection.queries after executing the query. This stores the raw queries and the time each query takes.
from django.db import connection
MyModel.objects.bulk_create(...)
print(connection.queries[-1]['sql'])
There's more information in the docs.
A great tool to make this information easily accessible is the django-debug-toolbar.

Get last few records put in table in order of ID with SQLAlchemy

I have a table where I would like to get the last 3 records from in order of when they were added to the database. I have the following:
session.query(User).order_by(User.id.desc()).limit(3)
This gets the last 3 records from the database however they are in the backwards order has I descend the ID. I would like to have the records in order however using multiple order queries doesn't seem to work.
Is there a workaround or solution? Thanks.
I don't see any way to do that other than using a subquery to perform the second order by you want, to generate SQL as suggested by this answer: Get another order after limit with mysql
I believe you'll have to do something like this in SQLAlchemy:
session.query(session.query(User).order_by(User.id.desc()).limit(3)\
.subquery().alias('sUser')).order_by('sUser.id')
This is untested code and I'm pretty sure you'll end up with a series of KeyedTuple objects, not instances of your User class, although there might be a way to fix that.
However, as you're doing this with the ORM, I don't see much of a point in doing that with an SQLAlchemy subquery at the SQL level if you can do it with Python. Something like:
reversed(session.query(User).order_by(User.id.desc()).limit(3).all())
Or even this, if you want a list and not a reverse iterator:
session.query(User).order_by(User.id.desc()).limit(3).all()[::-1]
The truth is I have a similar problem but when I saw how you wrote the command I realized what I needed to do and it worked,
then:
First thing thanks
Second thing I think you need to implement the following command:
desc3user = User.query.order_by(User.id.desc()).limit(3)

Escaping special characters in django-haystack query

I'm trying to use Solr specific syntax with some of my django-haytack queries. For example I'd like to search: "state:Georgia", but haystack sends it to Solr as "state\:Georgia", breaking the query. A Raw query can be used but it seems to need to know the field and query beforehand like so: sqs = SearchQuerySet().filter(author=Raw('state:Georgia')), but I'm not always sure beforehand what the exact field should be. It could be state, collector, material, category or a number of others. Does anyone know a way around this or how I can access the user's actual query?
Or perhaps I'm going down the wrong path and a custom Clean method would be in order?
Thanks

Python: RE vs. Query

I am building a website using Django, and this website uses blocks which are enabled for a certain page.
Right now I use a textfield containing paths were a block is enabled. When a page is requested, Django retrieves all blocks from database and does re.search on the TextField.
However, I was wondering if it is not a better idea to use a separate DB table for block/paths, were each row contains a single path and reference to a block, in terms of overhead.
A seperate DB table is definitely the "right" way to do it, because mysql has to send all the data from your TEXT fields every time you query. As you add more rows and the TEXT fields get bigger, you'll start to notice performance issues and eventually crash the server. Also, you'll be able to use VARCHAR and add a unique index to the paths, making lookups lightning fast.
I am not exactly familiar with Django, but if I am understanding the situation correctly, you should use a table.
In fact this is exactly the kind of use that DB software is designed and optimized for.
No worries. It will actually be faster.
By doing the search yourself, you are trying to implement part of the DB logic on your own. Fun, certainly, but not so fast. :)
Here are some nice links on designing a database:
http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html
http://en.wikipedia.org/wiki/Third_normal_form
Hope this helps. Good luck. :-)

Django objects.filter, how "expensive" would this be?

I am trying to make a search view in Django. It is a search form with freetext input + some options to select, so that you can filter on years and so on. This is some of the code I have in the view so far, the part that does the filtering. And I would like some input on how expensive this would be on the database server.
soknad_list = Soknad.objects.all()
if var1:
soknad_list = soknad_list.filter(pub_date__year=var1)
if var2:
soknad_list = soknad_list.filter(muncipality__name__exact=var2)
if var3:
soknad_list = soknad_list.filter(genre__name__exact=var3)
# TEXT SEARCH
stop_word_list = re.compile(STOP_WORDS, re.IGNORECASE)
search_term = '%s' % request.GET['q']
cleaned_search_term = stop_word_list.sub('', search_term)
cleaned_search_term = cleaned_search_term.strip()
if len(cleaned_search_term) != 0:
soknad_list = soknad_list.filter(Q(dream__icontains=cleaned_search_term) | Q(tags__icontains=cleaned_search_term) | Q(name__icontains=cleaned_search_term) | Q(school__name__icontains=cleaned_search_term))
So what I do is, first make a list of all objects, then I check which variables exists (I fetch these with GET on an earlier point) and then I filter the results if they exists. But this doesn't seem too elegant, it probably does a lot of queries to achieve the result, so is there a better way to this?
It does exactly what I want, but I guess there is a better/smarter way to do this. Any ideas?
filter itself doesn't execute a query, no query is executed until you explicitly fetch items from query (e.g. get), and list( query ) also executes it.
You can see the query that will be generated by using:
soknad_list.query.as_sql()[0]
You can then put that into your database shell to see how long the query takes, or use EXPLAIN (if your database backend supports it) to see how expensive it is.
As Aaron mentioned, you should get a hold of the query text that is going to be run against the database and use an EXPLAIN (or other some method) to view the query execution plan. Once you have a hold of the execution plan for the query you can see what is going on in the database itself. There are a lot of operations that see very expensive to run through procedural code that are very trivial for any database to run, especially if you provide indexes that the database can use for speeding up your query.
If I read your question correctly, you're retrieving a result set of all rows in the Soknad table. Once you have these results back you use the filter() method to trim down your results meet your criteria. From looking at the Django documentation, it looks like this will do an in-memory filter rather than re-query the database (of course, this really depends on which data access layer you're using and not on Django itself).
The most optimal solution would be to use a full-text search engine (Lucene, ferret, etc) to handle this for you. If that is not available or practical the next best option would be to to construct a query predicate (WHERE clause) before issuing your query to the database and let the database perform the filtering.
However, as with all things that involve the database, the real answer is 'it depends.' The best suggestion is to try out several different approaches using data that is close to production and benchmark them over at least 3 iterations before settling on a final solution to the problem. It may be just as fast, or even faster, to filter in memory rather than filter in the database.

Categories