SQLAlchemy not returning selected data - python

I'm using SQLAlchemy as the ORM within an application i've been building for some time.
So far, it's been quite a painless ORM to implement and use, however, a recent feature I'm working on requires a persistent & distributed queue (list & worker) style implementation, which I've built in MySQL and Python.
It's all worked quite well until I tested it in a scaled environment.
I've used InnoDB row level locking to ensure each row is only read once, while the row is locked, I update an 'in_use' value, to make sure that others don't grab at the entry.
Since MySQL doesn't offer a "NOWAIT" method like Postgre or Oracle does, I've run into locking issues where worker threads hang and wait for the locked row to become available.
In an attempt to overcome this limitation, I've tried to put all the required processing into a single statement, and run it through the ORM's execute() method, although, SQLAlchemy is refusing to return the query result.
Here's an example.
My SQL statement is:
SELECT id INTO #update_id FROM myTable WHERE in_use=0 ORDER BY id LIMIT 1 FOR UPDATE;
UPDATE myTable SET in_use=1 WHERE id=#update_id;
SELECT * FROM myTable WHERE id=#update_id;
And I run this code in the console:
engine = create_engine('mysql://<user details>#<server details>/myDatabase', pool_recycle=90, echo=True)
result = engine.execute(sqlStatement)
result.fetchall()
Only to get this result
[]
I'm certain the statement is running since I can see the update take effect in the database, and if I execute through the mysql terminal or other tools, I get the modified row returned.
It just seems to be SQLAlchemy that doesn't want to acknowledge the returned row.
Is there anything specific that needs to be done to ensure that the ORM picks up the response?
Cheers

You have executed 3 queries and MySQLdb creates a result set for each. You have to fetch first result, then call cursor.nextset(), fetch second and so on.
This answers your question, but won't be useful for you, because it won't solve locking issue. You have to understand how FOR UPDATE works first: it locks returned rows till the end of transaction. To avoid long lock wait you have to make it as short as possible: SELECT ... FOR UPDATE, UPDATE SET in_use=1 ..., COMMIT. You actually don't need to put them into single SQL statement, 3 execute() calls will be OK too. But you have have to commit before long computation, otherwise lock will be held too long and updating in_use (offline lock) is meaningless. And sure you can do the same thing using ORM too.

Related

MySQL: How to pull large amount of Data from MySQL without choking it?

My colleague run a script that pulls data from the db periodically. He is using the query:
SELECT url, data FROM table LIMIT {} OFFSET {}'.format( OFFSET, PAGE * OFFSET
We use Amazon AURORAS and he has his own slaves server but everytime it touches 98%+
Table have millions of records.
Would it be nice if we go for sqldump instead of SQL queries for fetching data?
The options come in my mind are:
SQL DUMP of selective tables( not sure of benchmark)
Federate tables based on certain reference(date, ID etc)
Thanks
I'm making some fairly big assumptions here, but from
without choking it
I'm guessing you mean that when your colleague runs the SELECT to grab the large amount of data, the database performance drops for all other operations - presumably your primary application - while the data is being prepared for export.
You mentioned SQL Dump so I'm also assuming that this colleague will be satisfied with data that is roughly correct, ie: it doesn't have to be up to the instant transactionally correct data. Just good enough for something like analytics work.
If those assumptions are close, your colleague and your database might benefit from
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
This line of code should be used carefully and almost never in a line of business application but it can help people querying the live database with big queries, as long as you fully understand the implications.
To use it, simply start a transaction and put this line before any queries you run.
The 'choking'
What you are seeing when your colleague runs a large query is record locking. Your database engine is - quite correctly - set up to provide an accurate view of your data, at any point. So, when a large query comes along the database engine first waits for all write locks (transactions) to clear, runs the large query and holds all future write locks until the query has run.
This actually happens for all transactions, but you only really notice it for the big ones.
What READ UNCOMMITTED does
By setting the transaction isolation level to READ UNCOMMITTED, you are telling the database engine that this transaction doesn't care about write locks and to go ahead and read anyway.
This is known as a 'dirty read', in that the long-running query could well read a table with a write lock on it and will ignore the lock. The data actually read could be the data before the write transaction has completed, or a different transaction could start and modify records before this query gets to it.
The data returned from anything with READ UNCOMMITTED is not guaranteed to be correct in the ACID sense of a database engine, but for some use cases it is good enough.
What the effect is
Your large queries magically run faster and don't lock the database while they are running.
Use with caution and understand what it does before you use it though.
MySQL Manual on transaction isolation levels

How to avoid a program querying sql database before another commits change

I am using an sql database to manage the order in which I undertake a series of tasks:
I have an sql database, and select from it a row with the lowest value on the priority column, where the value on the processed column has the word "unprocessed". I then immediately change the processed status to work_in_progress and commit the change (while my code goes about working on that row).
I intend to have multiple separate instances of the program interface with the same sql database (i.e. looking for the lowest priority unprocessed row).
How do i avoid the situation where two separate programs (i.e. separate connections) will query the database concurrently, (i.e. before the change is committed by the first one) - i want to have it such that each connection queries, updates and commits before the next connection is able to query.
For reference, I am using sqlite3 on python.
Maybe this is your answer?
Locking a sqlite3 database in Python (re-asking for clarification)
Also if you like tinkering, refer to: Sqlite docs - locking

Debugging idle postgres query executed from sqlalchemy

I have a batch query that I'm running daily on my database. However, it seems to get stuck in idle state, and I'm having a lot of difficulty debugging what's going on.
The query is an aggregation on a table that is simultaneously getting inserted, which I'm guessing somehow relates to the issue. (The aggregation is on the previous days data, so the insertions shouldn't affect results.)
Clues
I'm running this inside a python script using sqlalchemy. However, I've set transaction level to autocommit, so I don't think things are getting wrapped inside a transaction. On the other hand, I don't see the query hang when I run it manually in sql terminal.
By querying pg_stat_activity, the query initially comes into the database as state='active'. After maybe 15 seconds, the state changes to 'idle' and additionally, the xact_start is set to NULL. The waiting flag is never set to true.
Before I figured out the transaction level autocommit for sqlalchemy, it would instead hang in state 'idle in transaction' rather than 'idle'. And it possibly hangs slightly less frequently since making that change?
I feel like I'm not equipped to dig any deeper than I have on this. Any feedback, even explaining more about different states and relevant postgres internals without giving a definite answer, would be greatly appreciated.

Django, innodb and row-level locking

I have a table with data to parse and a worker which takes several records from it, process it and saves them back. It also sets flag to 'parsed'.
Now I want to run several instances of the worker and make sure two workers won't pick the same row to process at once. So I need to block it somehow.
I'm using django and from what I read in MySQL manual it's possible to obtain a row-level lock but I can't find any example of doing this properly. The only one says it's extremely slow :) http://djangosnippets.org/snippets/2039/
I could have another field saying 'lock until' which would be a timestamp updated to now+X minutes after a row has been selected by the worker. This would shorten the time of the lock (immiediate update after select) and would prevent selecting this row by another worker which would check if it's not 'locked', but the problem of locking between select and update still exists.
thanks!
ian
2 predominant ways to store data in Mysql is MyISAM & InnoDB. Each have their own pros & cons -
InnoDB recovers from a crash or other unexpected shutdown by replaying its logs.
InnoDB can be run in a mode where it has lower reliability but in some cases higher performance.
InnoDB automatically groups together multiple concurrent inserts and flushes them to disk at the same time.
InnoDB flushes the transaction log after each transaction, greatly improving reliability.
Unlike InnoDB, MyISAM has built-in full-text search
MyISAM is still widely used in web applications as it has traditionally been perceived as faster than InnoDB in situations where most DB access is reads.
While writing/updating data into a InnoDB table, only that particular row is locked, whereas in MyISAM the entire table is locked.
InnoDB provides Full Transaction support.
As far as django models are concerned, they support myisam table creation by default. If you need your tables to have row level locks you need innodb. This page should be good starting point:
It documents a way to hook into the post_syncdb hook to dynamically issue ALTER SQL commands to change the engine for the tables. (Note that this was written 4 years ago, and may need to be updated to the current version of Django).
It should be straightforward for you to add metadata to your models, that specify which storage engine to use for each table. Then you can modify the above example to key off of that metadata.
with a lock, the second worker would just get stuck waiting for the lock to release.
maybe you could mark entries as "work started on this entry at [timestamp]" before starting to process, and have subsequent workers ignore such rows. you can then have a cron job or similar "releasing" rows that have a timestamp older than some threshold, but not yet marked as "done" (indicating the worker died or something else went wrong)

how to enforce sqlite select for update transaction behavior in sqlalchemy

Yesterday I was working with some sqlalchemy stuff that needed a "select ... for update" concept to avoid a race condition. Adding .with_lockmode('update') to the query works a treat on InnoDB and Postgres, but for sqlite I end up having to sneak in a
if session.bind.name == 'sqlite':
session.execute('begin immediate transaction')
before doing the select.
This seems to work for now, but it feels like cheating. Is there a better way to do this?
SELECT ... FOR UPDATE OF ... is not supported. This is understandable
considering the mechanics of SQLite in that row locking is redundant
as the entire database is locked when updating any bit of it. However,
it would be good if a future version of SQLite supports it for SQL
interchageability reasons if nothing else. The only functionality
required is to ensure a "RESERVED" lock is placed on the database if
not already there.
excerpt from
https://www2.sqlite.org/cvstrac/wiki?p=UnsupportedSql
[EDIT] also see https://sqlite.org/isolation.html thanks #michauwilliam.
i think you have to synchronize the access to the whole database. normal synchronization mechanism should also apply here file lock, process synchronization etc
I think a SELECT FOR UPDATE is relevant for SQLite. There is no way to lock the database BEFORE I start to write. By then it's too late. Here is the scenario:
I have two servers and one database queue table. Each server is looking for work and when it picks up a job, it updates the queue table with an "I got it” so the other server doesn’t also pick it up the same work. I need to leave the record in the queue in case of recovery.
Server 1 reads the first unclaimed item and has it in memory. Server 2 reads the same record and now has it in memory too. Server 1 then updates the record, locking the database, updates, then unlocks. Server 2 then locks the database, updates, and unlocks. The result is both servers now work on the same job. The table shows Server 2 has it and the Server 1 update is lost.
I solved this by creating a lock database table. Server 1 begins a transaction, writes to the lock table which locks the database for writing. Server 2 now tries to begin a transaction and write to the lock table, but is prevented. Server 1 now reads the first queue record and then updates it with the “I got it” code. Then deletes the record it just wrote to the lock table, commits and releases the lock. Now server 2 is able to begin its transaction, write to the lock table, read the 2nd queue record, update it with its “I got it” code, delete it’s lock record, commits and the database is available for the next server looking for work.

Categories