Raw query in Django very much slower than the same in Postgres - python

I face the problem of an extremely slow (raw) query in my Django app. Strangely enough, it's not slow when I launch the isolated query from the shell (ex: python manage.py my_code_query) but it's slow when I run the whole program that contains all my querys (it "blocks" always at the same query; actually it eventually completes but it's something like 100x slower). It's like if all the queries that are before the problematic one are consuming memory and there is not enough memory left when my query starts. The query ran directly from Postgres has no problem at all.
I read somewhere (Django cursor.execute(QUERY) much slower than running the query in the postgres database) that it can be the work_mem setting in Postgres that causes the problem but they are not very clear about the way they set it from Django. Do I have to make a call from my connection.cursor.execute() to set the work_mem parameter? Once only?
Could it be another problem than the work_mem setting?
Any hint will be very appreciated.
Thanks,
Patrick

Inspired by that post (How can I tell Django to execute all queries using 10MB statement mem?), I made this call before executing my cursor:
cursor.execute("set work_mem='100MB'") #set statement_mem does not work
It's running timely now.
--EDIT: Well, that was yesterday. Today it's not running timely anymore. Don't know why.

Related

CherryPy with embedded database (SQLite)

I am developing an application that uses CherryPy. Now I need to implement a database, and I would very much like it to be embedded with the app, to save users some headache. The obvious first choice is of course SQLite, seeing how it's part of the standard library.
There seems to be a lot of different takes on this. Some saying that you should never use SQLite in a threaded application, some saying it's ok, and with wildly differing estimates of how many writes per second I can expect.
Is using SQLite in this way viable, and how slow can I expect writing to the database will be?
If viable, what is the best method of implementing it? Subscribing a connection to each start_thread? Start a connection every time a page is exposed, as some seem to do?
I've read that turning PRAGMA synchronous=OFF in SQLite can improve performance at the cost of "if you lose power in the middle of a transaction, your database file might go corrupt." What are the probabilities here? Is this an acceptable choice perhaps in conjunction with some sort of backup system?
Are there any other embedded databases that would be a better choice?
Should I just give up on this and use a PostgreSQL database at the cost of user convenience?
Thanks in advance.

Django 1.7 Migrations hanging

I have a django migration I am trying to apply. It gets made fine (it's small, it's only adding a CharField to two different Models. However when I run the actual migrate it hangs (no failure, no success, just sits).
Through googling I've found that other open connections can mess with it so I restarted the DB. However this DB is connect to continuously running jobs and new queries do sneak in right away. However they are small, and last time I tried restarting I THINK I was able to execute my migrate before anything else. Still nothing.
Are there any other known issues that cause something like this?
At least in PostgreSQL you cannot modify tables (even if it's just adding new columns) while there are active transactions. The easiest workaround for this is usually to:
run the migration script (which will hang)
restart your webserver/wsgi container
When restarting your webserver all open transactions will be aborted (assuming you don't have background processes which also have transactions open), so as soon as no transactions are blocking your table, the migration will finish.
I was having this same problem today. I discovered that you can clear out any hanging transactions in PostgreSQL using the following SQL immediately before running your transaction:
-- View all the current activity
-- SELECT * FROM pg_stat_activity;
-- terminate other connections (make sure to add your own IP address)
SELECT pg_terminate_backend(procpid) FROM pg_stat_activity WHERE client_addr <> 'YOUR IP HERE'
This will terminate any connections that aren't yours, which might not be ideal in all circumstances, but works like a charm.
Worth noting for future readers that the migrations can hang when trying to apply a migration for an incorrect size CharField (DB implementation dependent). I was trying to alter a CharField to be greater than size 255 and it was just hanging. Even after terminating the connections as stated it would not fix it as a CharField of size greater than 255 as that was incorrect with my implementation (postgresql).
TLDR; Ensure your CharField is 255 or less, if greater change your CharField to a TextField and it could fix your problem!

Debugging idle postgres query executed from sqlalchemy

I have a batch query that I'm running daily on my database. However, it seems to get stuck in idle state, and I'm having a lot of difficulty debugging what's going on.
The query is an aggregation on a table that is simultaneously getting inserted, which I'm guessing somehow relates to the issue. (The aggregation is on the previous days data, so the insertions shouldn't affect results.)
Clues
I'm running this inside a python script using sqlalchemy. However, I've set transaction level to autocommit, so I don't think things are getting wrapped inside a transaction. On the other hand, I don't see the query hang when I run it manually in sql terminal.
By querying pg_stat_activity, the query initially comes into the database as state='active'. After maybe 15 seconds, the state changes to 'idle' and additionally, the xact_start is set to NULL. The waiting flag is never set to true.
Before I figured out the transaction level autocommit for sqlalchemy, it would instead hang in state 'idle in transaction' rather than 'idle'. And it possibly hangs slightly less frequently since making that change?
I feel like I'm not equipped to dig any deeper than I have on this. Any feedback, even explaining more about different states and relevant postgres internals without giving a definite answer, would be greatly appreciated.

Python processes and MySQL

I am running several thousand python processes on multiple servers which go off, lookup a website, do some analysis and then write the results to a central MySQL database.
It all works fine for about 8 hours and then my scripts start to wait for a MySQL connection.
On checking top it's clear that the MySQL daemon is overloaded as it is using up to 90% of most of the CPUs.
When I stop all my scripts, MySQL continues to use resources for some time afterwards.
I assume it is still updating the indexes? - If so, is there anyway of determining which indexes it is working on, or if not what it is actually doing?
Many thanks in advance.
Try enabling the slow query log: http://dev.mysql.com/doc/refman/5.1/en/slow-query-log.html
Also, take a look at the output of SHOW PROCESSLIST; on the mysql shell, it should give you some more information.
There are a lot of tweaks that can be done to improve the performance of MySQL. Given your workload, you would probably benefit a lot from mysql 5.5 and higher, which improved performance on multiprocessor machines. Is the machine in question hitting VM? if it is paging out, then the performance of mysql will be horrible.
My suggestions:
check version of mysql. If possible, get the latest 5.5 version.
Look at the config files for mysql called my.cnf. Make sure that it makes sense on your machine. There are example config files for small, medium, large, etc machines to run MySQL. I think the default setup is for a machine with < 1 Gig of ram.
As the other answer suggests, turn on slow query logging.

Sometimes can't delete an Oracle database row using Django

I have a unit test which contains the following line of code
Site.objects.get(name="UnitTest").delete()
and this has worked just fine until now. However, that statement is currently hanging. It'll sit there forever trying to execute the delete. If I just say
print Site.objects.get(name="UnitTest")
then it works, so I know that it can retrieve the site. No other program is connected to Oracle, so it's not like there are two developers stepping on each other somehow. I assume that some sort of table lock hasn't been released.
So short of shutting down the Oracle database and bringing it back up, how do I release that lock or whatever is blocking me? I'd like to not resort to a database shutdown because in the future that may be disruptive to some of the other developers.
EDIT: Justin suggested that I look at the DBA_BLOCKERS and DBA_WAITERS tables. Unfortunately, I don't understand these tables at all, and I'm not sure what I'm looking for. So here's the information that seemed relevant to me:
The DBA_WAITERS table has 182 entries with lock type "DML". The DBA_BLOCKERS table has 14 entries whose session ids all correspond to the username used by our application code.
Since this needs to get resolved, I'm going to just restart the web server, but I'd still appreciate any suggestions about what to do if this problem repeats itself. I'm a real novice when it comes to Oracle administration and have mostly just used MySQL in the past, so I'm definitely out of my element.
EDIT #2: It turns out that despite what I thought, another programmer was indeed accessing the database at the same time as me. So what's the best way to detect this in the future? Perhaps I should have shut down my program and then queried the DBA_WAITERS and DBA_BLOCKERS tables to make sure they were empty.
From a separate session, can you query the DBA_BLOCKERS and DBA_WAITERS data dictionary tables and post the results? That will tell you if your session is getting blocked by a lock held by some other session, as well as what other session is holding the lock.

Categories