I have a batch query that I'm running daily on my database. However, it seems to get stuck in idle state, and I'm having a lot of difficulty debugging what's going on.
The query is an aggregation on a table that is simultaneously getting inserted, which I'm guessing somehow relates to the issue. (The aggregation is on the previous days data, so the insertions shouldn't affect results.)
Clues
I'm running this inside a python script using sqlalchemy. However, I've set transaction level to autocommit, so I don't think things are getting wrapped inside a transaction. On the other hand, I don't see the query hang when I run it manually in sql terminal.
By querying pg_stat_activity, the query initially comes into the database as state='active'. After maybe 15 seconds, the state changes to 'idle' and additionally, the xact_start is set to NULL. The waiting flag is never set to true.
Before I figured out the transaction level autocommit for sqlalchemy, it would instead hang in state 'idle in transaction' rather than 'idle'. And it possibly hangs slightly less frequently since making that change?
I feel like I'm not equipped to dig any deeper than I have on this. Any feedback, even explaining more about different states and relevant postgres internals without giving a definite answer, would be greatly appreciated.
Related
My colleague run a script that pulls data from the db periodically. He is using the query:
SELECT url, data FROM table LIMIT {} OFFSET {}'.format( OFFSET, PAGE * OFFSET
We use Amazon AURORAS and he has his own slaves server but everytime it touches 98%+
Table have millions of records.
Would it be nice if we go for sqldump instead of SQL queries for fetching data?
The options come in my mind are:
SQL DUMP of selective tables( not sure of benchmark)
Federate tables based on certain reference(date, ID etc)
Thanks
I'm making some fairly big assumptions here, but from
without choking it
I'm guessing you mean that when your colleague runs the SELECT to grab the large amount of data, the database performance drops for all other operations - presumably your primary application - while the data is being prepared for export.
You mentioned SQL Dump so I'm also assuming that this colleague will be satisfied with data that is roughly correct, ie: it doesn't have to be up to the instant transactionally correct data. Just good enough for something like analytics work.
If those assumptions are close, your colleague and your database might benefit from
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
This line of code should be used carefully and almost never in a line of business application but it can help people querying the live database with big queries, as long as you fully understand the implications.
To use it, simply start a transaction and put this line before any queries you run.
The 'choking'
What you are seeing when your colleague runs a large query is record locking. Your database engine is - quite correctly - set up to provide an accurate view of your data, at any point. So, when a large query comes along the database engine first waits for all write locks (transactions) to clear, runs the large query and holds all future write locks until the query has run.
This actually happens for all transactions, but you only really notice it for the big ones.
What READ UNCOMMITTED does
By setting the transaction isolation level to READ UNCOMMITTED, you are telling the database engine that this transaction doesn't care about write locks and to go ahead and read anyway.
This is known as a 'dirty read', in that the long-running query could well read a table with a write lock on it and will ignore the lock. The data actually read could be the data before the write transaction has completed, or a different transaction could start and modify records before this query gets to it.
The data returned from anything with READ UNCOMMITTED is not guaranteed to be correct in the ACID sense of a database engine, but for some use cases it is good enough.
What the effect is
Your large queries magically run faster and don't lock the database while they are running.
Use with caution and understand what it does before you use it though.
MySQL Manual on transaction isolation levels
I face the problem of an extremely slow (raw) query in my Django app. Strangely enough, it's not slow when I launch the isolated query from the shell (ex: python manage.py my_code_query) but it's slow when I run the whole program that contains all my querys (it "blocks" always at the same query; actually it eventually completes but it's something like 100x slower). It's like if all the queries that are before the problematic one are consuming memory and there is not enough memory left when my query starts. The query ran directly from Postgres has no problem at all.
I read somewhere (Django cursor.execute(QUERY) much slower than running the query in the postgres database) that it can be the work_mem setting in Postgres that causes the problem but they are not very clear about the way they set it from Django. Do I have to make a call from my connection.cursor.execute() to set the work_mem parameter? Once only?
Could it be another problem than the work_mem setting?
Any hint will be very appreciated.
Thanks,
Patrick
Inspired by that post (How can I tell Django to execute all queries using 10MB statement mem?), I made this call before executing my cursor:
cursor.execute("set work_mem='100MB'") #set statement_mem does not work
It's running timely now.
--EDIT: Well, that was yesterday. Today it's not running timely anymore. Don't know why.
I have a twisted daemon that does some xml feed parsing.
I store my data in PostgreSQL via twisted.enterprise.adbapi , which IIRC is wrapping psycopg2
I've run into a few problems with storing data into database -- with duplicate data periodically getting in there.
To be honest, there are some underlying issues with my implementation which should be redone and designed much better. I lack the time and resources to do that though - so we're in 'just keep it running' mode for now.
I think the problem may happen from either my usage of deferToThread or how I've spawned the server at the start.
As a quick overview of the functionality I think is at fault:
Twisted queries Postgres for Accounts that should be analyzed , and sets a block on them
SELECT
id
FROM
xml_sources
WHERE
timestamp_last_process < ( CURRENT_TIMESTAMP AT TIME ZONE 'UTC' - INTERVAL '4 HOUR' )
AND
is_processing_block IS NULL ;
lock_ids = [ i['id'] for i in results ]
UPDATE xml_sources SET is_processing_block = True WHERE id IN %(lock_ids)s
What I think is happening, is (accidentally) having multiple servers running or various other issues results in multiple threads processing this data.
I think this would likely be fixed - or at least ruled out as an issue - if I wrap this quick section in an exclusive table lock.
I've never done table locking through twisted before though. can anyone point me in the right direction ?
You can do a SELECT FOR UPDATE instead of a plain SELECT, and that will lock the rows returned by your query. If you actually want table locking you can just issue a LOCK statement, but based on the rest of your question I think you want row locking.
If you are using adbapi, then keep in mind that you will need to use runInteraction if you want to run more than one statement in a transaction. Functions passed to runInteraction will run in a thread, so you may need to use callFromThread or blockingCallFromThread to reach from the database interaction back into the reactor.
However, locking may not be your problem. For one thing, if you are mixing deferToThread and adbapi, something's likely wrong. adbapi is already doing the equivalent of deferToThread for you. You should be able to do everything on the main thread.
You'll have to include a representative example for a more specific answer though. Consider your question: it's basically "Sometimes I get duplicate data, with a self-admittedly problematic implementation, that is big and I can't fix and I also can't show you." This is not a question which it is possible to answer.
I'm using SQLAlchemy as the ORM within an application i've been building for some time.
So far, it's been quite a painless ORM to implement and use, however, a recent feature I'm working on requires a persistent & distributed queue (list & worker) style implementation, which I've built in MySQL and Python.
It's all worked quite well until I tested it in a scaled environment.
I've used InnoDB row level locking to ensure each row is only read once, while the row is locked, I update an 'in_use' value, to make sure that others don't grab at the entry.
Since MySQL doesn't offer a "NOWAIT" method like Postgre or Oracle does, I've run into locking issues where worker threads hang and wait for the locked row to become available.
In an attempt to overcome this limitation, I've tried to put all the required processing into a single statement, and run it through the ORM's execute() method, although, SQLAlchemy is refusing to return the query result.
Here's an example.
My SQL statement is:
SELECT id INTO #update_id FROM myTable WHERE in_use=0 ORDER BY id LIMIT 1 FOR UPDATE;
UPDATE myTable SET in_use=1 WHERE id=#update_id;
SELECT * FROM myTable WHERE id=#update_id;
And I run this code in the console:
engine = create_engine('mysql://<user details>#<server details>/myDatabase', pool_recycle=90, echo=True)
result = engine.execute(sqlStatement)
result.fetchall()
Only to get this result
[]
I'm certain the statement is running since I can see the update take effect in the database, and if I execute through the mysql terminal or other tools, I get the modified row returned.
It just seems to be SQLAlchemy that doesn't want to acknowledge the returned row.
Is there anything specific that needs to be done to ensure that the ORM picks up the response?
Cheers
You have executed 3 queries and MySQLdb creates a result set for each. You have to fetch first result, then call cursor.nextset(), fetch second and so on.
This answers your question, but won't be useful for you, because it won't solve locking issue. You have to understand how FOR UPDATE works first: it locks returned rows till the end of transaction. To avoid long lock wait you have to make it as short as possible: SELECT ... FOR UPDATE, UPDATE SET in_use=1 ..., COMMIT. You actually don't need to put them into single SQL statement, 3 execute() calls will be OK too. But you have have to commit before long computation, otherwise lock will be held too long and updating in_use (offline lock) is meaningless. And sure you can do the same thing using ORM too.
I have a unit test which contains the following line of code
Site.objects.get(name="UnitTest").delete()
and this has worked just fine until now. However, that statement is currently hanging. It'll sit there forever trying to execute the delete. If I just say
print Site.objects.get(name="UnitTest")
then it works, so I know that it can retrieve the site. No other program is connected to Oracle, so it's not like there are two developers stepping on each other somehow. I assume that some sort of table lock hasn't been released.
So short of shutting down the Oracle database and bringing it back up, how do I release that lock or whatever is blocking me? I'd like to not resort to a database shutdown because in the future that may be disruptive to some of the other developers.
EDIT: Justin suggested that I look at the DBA_BLOCKERS and DBA_WAITERS tables. Unfortunately, I don't understand these tables at all, and I'm not sure what I'm looking for. So here's the information that seemed relevant to me:
The DBA_WAITERS table has 182 entries with lock type "DML". The DBA_BLOCKERS table has 14 entries whose session ids all correspond to the username used by our application code.
Since this needs to get resolved, I'm going to just restart the web server, but I'd still appreciate any suggestions about what to do if this problem repeats itself. I'm a real novice when it comes to Oracle administration and have mostly just used MySQL in the past, so I'm definitely out of my element.
EDIT #2: It turns out that despite what I thought, another programmer was indeed accessing the database at the same time as me. So what's the best way to detect this in the future? Perhaps I should have shut down my program and then queried the DBA_WAITERS and DBA_BLOCKERS tables to make sure they were empty.
From a separate session, can you query the DBA_BLOCKERS and DBA_WAITERS data dictionary tables and post the results? That will tell you if your session is getting blocked by a lock held by some other session, as well as what other session is holding the lock.