django-tables2 flooding database with queries

django-tables2 flooding database with queries - python

im using django-tables2 in order to show values from a database query. And everythings works fine. Im now using Django-dabug-toolbar and was looking through my pages with it. More out of curiosity than performance needs. When a lokked at the page with the table i saw that the debug toolbar registerd over 300 queries for a table with a little over 300 entries. I dont think flooding the DB with so many queries is a good idea even if there is no performance impact (at least not now). All the data should be coming from only one query.
Why is this happening and how can i reduce the number of queries?

Im posting this as a future reference for myself and other who might have the same problem.
After searching for a bit I found out that django-tables2 was sending a single query for each row. The query was something like SELECT * FROM "table" LIMIT 1 OFFSET 1 with increasing offset.
I reduced the number of sql calls by calling query = list(query) before i create the table and pass the query. By evaluating the query in the python view code the table now seems to work with the evaulated data instead and there is only one database call instead of hundreds.

This was a bug and has been fixed in https://github.com/bradleyayers/django-tables2/issues/427

Related

Super slow SQLAlchemy query to Google Cloud SQL on a very large table

I have a very large (and growing) table of URLs, and I want to query the table to check if an item exists and return that item so I can edit it, or choose to add a new item. The code below works but runs very very slowly and, given the volume of queries I need to perform (several thousand per hour) is creating some issues. I haven't been able to find a better solution than below. I have a good sense of what is happening - it is loading the entire table every time, but there must be a faster way here.
Session = sessionmaker(bind=engine)
formatted_url = "%{}%".format(url)
matching_url = None
with Session.begin() as session:
matching_url = session.query(Link.id).filter(Link.URL.like(formatted_url)).yield_per(200).first()
This works great if the URL exists and is recent, but especially if the URL isn't in the database at all, the process takes as long as one minute.

You are doing a select from table where Linkid like %formatted_url% limit 1;
This needs a full table scan in the database.
If you are lucky, the row is still in memory or cache.
If not, or if it does not exist the database will need that full table scan.
If you are using postgres on cloud SQL, this question will help you to remediate the problem PostgreSQL: Full Text Search - How to search partial words?

Python/Django/MySQL optimisation

Just a logic question really...I have a script that takes rows of data from a CSV, parses the cell values to uniform the data and makes a check on the database that a key/primary value does not exist so as to prevent duplicates! At the moment, the 1st 10-15k entries commit to the DB fairly quick but then it starts really slowing as there are more entries in the DB to check against for duplicates....by the time there are 100k rows in the DB the commit speed is about 1/sec argh...
So my question, is it (pythonically) more efficient to extract and parse the data separately to the DB commit procedure (maybe in a class based script or?? Could I add multiprocessing to the csv parsing or DB commit) and is there a quicker method to check the database for duplicates if i am only cross-referencing 1 table and 1 value??
Much appreciated
Kuda

If the first 10-15k entries worked fine, probably the issue is with the database query. Do you have a suitable index, and is that index used by the database? You can use an EXPLAIN statement to see what the database is doing, whether it's actually using the index for the particular query used by Django.
If the table starts empty, it might also help to run ANALYZE TABLE after the first few thousand rows; the query optimiser might have stale statistics from when the table was empty. To test this hypothesis, you can connect to the database while the script is running, when it starts to slow down, and run ANALYSE TABLE manually. If it immediately speeds up, the problem was indeed stale statistics.
As for optimisation of database commits themselves, it probably isn't an issue in your case (since the first 10k rows perform fine), but one aspect is the round-trips; for every query, it has to go to the database and get the results back. This is especially noticeable if the database is across a network. If you need to speed that up, Django has a bulk_create() method to insert many rows at once. However, if you do that, you'll only get an error for the whole batch of rows if you try to insert duplicates forbidden by the database indexes; you'll then have to find the particular row causing the error using other code.

In Python + MySQL, is it better to use an SSCursor, or to use a paginated Stored Procedure?

I'm on Python 2.7 (sigh) and MySQL 5.6.23.
I need to retrieve a lot of data from MySQL using Python.
I read about SSCursors, and I also saw something about creating a paginated query (stored procedure in my case).
Apparently SSCursors have pluses and minuses, as perhaps do paginated queries.
SSCursors don't appear to like concurrency - when using an SSCursor, you can only have one query in flight at a time (?). Also, they appear to be less than portable?
And with paginated queries, I have to wonder what would happen if relevant tables were modified During the query - would you possibly miss a row or get a duplicate row? Also with paginated queries, I heard that the last page of data takes time proportionate to the entire query, so they might not help me much?
So what's better - SSCursors or Paginated SP's/Queries? And is the above stuff true?
How to efficiently use MySQLDB SScursor? is useful and one of my source documents, but does not appear to address paginated queries or portability.
How to get a row-by-row MySQL ResultSet in python is closer. I added a comment there about the portability of what's discussed.

Parsing SQL Query into a DOM-like tree to enable automatic permutation?

I have a large and complicated sql view that I am attempting to debug. There is a record not showing in the view and I need to determine which clause or join is causing the record to now show up. At the moment I am doing this in a very manual way, removing clauses one at a time and running the query to see if the required row shows up.
I think that it would be great if I could do this programmatically because I end up diving into queries like this about once a fortnight.
Does anybody know if there is a way to parse an SQL query into a tree of objects (for example those in sqlalchemy.sql.expression) so I am able to permuate the tree and execute the results?

If you don't already have the view defined in SQLAlchemy, I don't think it can help you.
You could try something like sqlparse which might get you some of the way there.
You could permute it's output and execute the permutations as raw sql using SQLA.

Django objects.filter, how "expensive" would this be?

I am trying to make a search view in Django. It is a search form with freetext input + some options to select, so that you can filter on years and so on. This is some of the code I have in the view so far, the part that does the filtering. And I would like some input on how expensive this would be on the database server.
soknad_list = Soknad.objects.all()
if var1:
soknad_list = soknad_list.filter(pub_date__year=var1)
if var2:
soknad_list = soknad_list.filter(muncipality__name__exact=var2)
if var3:
soknad_list = soknad_list.filter(genre__name__exact=var3)
# TEXT SEARCH
stop_word_list = re.compile(STOP_WORDS, re.IGNORECASE)
search_term = '%s' % request.GET['q']
cleaned_search_term = stop_word_list.sub('', search_term)
cleaned_search_term = cleaned_search_term.strip()
if len(cleaned_search_term) != 0:
soknad_list = soknad_list.filter(Q(dream__icontains=cleaned_search_term) | Q(tags__icontains=cleaned_search_term) | Q(name__icontains=cleaned_search_term) | Q(school__name__icontains=cleaned_search_term))
So what I do is, first make a list of all objects, then I check which variables exists (I fetch these with GET on an earlier point) and then I filter the results if they exists. But this doesn't seem too elegant, it probably does a lot of queries to achieve the result, so is there a better way to this?
It does exactly what I want, but I guess there is a better/smarter way to do this. Any ideas?

filter itself doesn't execute a query, no query is executed until you explicitly fetch items from query (e.g. get), and list( query ) also executes it.

You can see the query that will be generated by using:
soknad_list.query.as_sql()[0]
You can then put that into your database shell to see how long the query takes, or use EXPLAIN (if your database backend supports it) to see how expensive it is.

As Aaron mentioned, you should get a hold of the query text that is going to be run against the database and use an EXPLAIN (or other some method) to view the query execution plan. Once you have a hold of the execution plan for the query you can see what is going on in the database itself. There are a lot of operations that see very expensive to run through procedural code that are very trivial for any database to run, especially if you provide indexes that the database can use for speeding up your query.
If I read your question correctly, you're retrieving a result set of all rows in the Soknad table. Once you have these results back you use the filter() method to trim down your results meet your criteria. From looking at the Django documentation, it looks like this will do an in-memory filter rather than re-query the database (of course, this really depends on which data access layer you're using and not on Django itself).
The most optimal solution would be to use a full-text search engine (Lucene, ferret, etc) to handle this for you. If that is not available or practical the next best option would be to to construct a query predicate (WHERE clause) before issuing your query to the database and let the database perform the filtering.
However, as with all things that involve the database, the real answer is 'it depends.' The best suggestion is to try out several different approaches using data that is close to production and benchmark them over at least 3 iterations before settling on a final solution to the problem. It may be just as fast, or even faster, to filter in memory rather than filter in the database.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.