I have a table in a database, mapped with SQLAlchemy ORM module (I have a "scoped_session" Variable)
I want multiple instances of my program (not just threads, also from several servers) to be able to work on the same table and NOT work on the same data.
so i have coded a manual "row-lock" mechanism to make sure each row is handled in this method i use "Full Lock" on the table while i "row-lock" it:
def instance:
s = scoped_session(sessionmaker(bind=engine)
engine.execute("LOCK TABLES my_data WRITE")
rows = s.query(Row_model).filter(Row_model.condition == 1).filter(Row_model.is_locked == 0).limit(10)
for row in rows:
row.is_locked = 1
row.lock_time = datetime.now()
s.commit()
engine.execute("UNLOCK TABLES")
for row in row:
manipulate_data(row)
row.is_locked = 0
s.commit()
for i in range(10):
t = threading.Thread(target=instance)
t.start()
The problem is that while running some instances, several threads are collapsing and produce this error (each):
sqlalchemy.exc.DatabaseError: (raised as a result of Query-invoked
autoflush; consider using a session.no_autoflush block if this flush
is occurring prematurely) (DatabaseError) 1205 (HY000): Lock wait
timeout exceeded; try restarting transaction 'UPDATE my_daya SET
row_var = 1}
Where is the catch? what makes my DB table to not UNLOCK successfully?
Thanks.
Locks are evil. Avoid them. Things go very bad when errors occur. Especially when you mix sessions with raw SQL statements, like you do.
The beauty of the scoped session is that it wraps a database transaction. This transaction makes the modifications to the database atomic, and also takes care of cleaning up when things go wrong.
Use scoped sessions as follows:
with scoped_session(sessionmaker(bind=engine) as s:
<ORM actions using s>
It may be some work to rewrite your code so that it becomes properly transactional, but it will be worth it! Sqlalchemy has tricks to help you with that.
Related
I wrote an API that was directly using psycopg2 to interface with a PostgreSQL database but decided to re-write to use SQLAlchemy ORM. For the most part I'm very happy with the transition, but there are a few of trickier things I did that have been tough to translate. I was able to build a query to do what I wanted, but I'd much rather handle it with a HybridProperty/HybridMethod or perhaps a Custom Comparator (tried both but couldn't get them to work). I'm fairly new to SQLAlchemy ORM so I'd be happy to explore all options but I'd prefer something in the database model rather than the API code.
For background, there are several loosely coupled API consumers that have a few mandatory identifying columns that belong to the API and then a LargeBinary column that they can basically do whatever they want with. In the code below, the API consumers need to be able to select messages based on their own identifiers that are not parsed by the API (since each consumer is likely different).
Old code:
select_sql = sql.SQL("""SELECT {field1}
FROM {table}
WHERE {field2}={f2}
AND {field3}={f3}
AND {field4}={f4}
AND left(encode({f5}, 'hex'), {numChar})={selectBytes} # Relevant clause
AND {f6}=false;
"""
).format(field1=sql.Identifier("key"),
field2=sql.Identifier("f2"),
field3=sql.Identifier("f3"),
field4=sql.Identifier("f4"),
field5=sql.Identifier("f5"),
field6=sql.Identifier("f6"),
numChar=sql.Literal(len(data['bytes'])),
table=sql.Identifier("incoming"),
f2=sql.Literal(data['2']),
selectBytes=sql.Literal(data['bytes']),
f3=sql.Literal(data['3']),
f1=sql.Literal(data['1']))
try:
cur = incoming_conn.cursor()
cur.execute(select_sql)
keys = [x[0] for x in cur.fetchall()]
cur.close()
return keys, 200
except psycopg2.DatabaseError as error:
logging.error(error)
incoming_conn.reset()
return "Error reading from DB", 500
New code:
try:
session = Session()
messages = (
session.query(IncomingMessage)
.filter_by(deleted=False)
.filter_by(f2=data['2'])
.filter_by(f3=data['3'])
.filter_by(f4=data['4'])
.filter(func.left(func.encode(IncomingMessage.payload, # Relevant clause
'hex'),
len(data['bytes'])) == data['bytes'])
)
keys = [x.key for x in messages]
session.close()
return keys, 200
except exc.OperationalError as error:
logging.error(error)
session.close()
return "Database failure", 500
The problem that I kept running into was how to limit the number of stored bytes that were being compared. I don't think it's really a problem in the Comparator, but I feel like there would be a performance cost if I were loading several megabytes just to compare the first eight or so bytes.
Context
So I am trying to figure out how to properly override the auto-transaction when using SQLite in Python. When I try and run
cursor.execute("BEGIN;")
.....an assortment of insert statements...
cursor.execute("END;")
I get the following error:
OperationalError: cannot commit - no transaction is active
Which I understand is because SQLite in Python automatically opens a transaction on each modifying statement, which in this case is an INSERT.
Question:
I am trying to speed my insertion by doing one transaction per several thousand records.
How can I overcome the automatic opening of transactions?
As #CL. said you have to set isolation level to None. Code example:
s = sqlite3.connect("./data.db")
s.isolation_level = None
try:
c = s.cursor()
c.execute("begin")
...
c.execute("commit")
except:
c.execute("rollback")
The documentaton says:
You can control which kind of BEGIN statements sqlite3 implicitly executes (or none at all) via the isolation_level parameter to the connect() call, or via the isolation_level property of connections.
If you want autocommit mode, then set isolation_level to None.
Some processes at the same time read table. Each process takes on one task. Is it possbile don't use LOCK table in this case ?
db.session.execute('LOCK TABLE "Task"')
query = db.session.query(models.Task).order_by(models.Task.ordr).limit(1)
for row in query:
task = row
db.session.delete(row)
db.session.commit()
By locking table you use pessimistic approach to concurrency.
Alterntively, intead of locking the table, you can be optimistic about the things going the right way. I would wrap the code to retrieve a task to work on in a continues retry statement with error handling in case the commit fails because some other process already removed this very task this process tried to get.
Something like this, perhaps:
def get_next_task():
session = ...
task = None
while not(task):
try:
query = session.query(models.Task).order_by(models.Task.ordr).limit(1)
for row in query:
task = row
session.delete(row)
session.commit()
if not(task):
return # no more tasks found
except TODO_FIND_PROPER_EXCEPTION_TO_HANDLE as _exc:
pass # or log the statement
# maybe need to make_transient
return task
Whether this solution is better will depend on the use case, though.
I'm using SQLAlchemy with a Postgres backend to do a bulk insert-or-update. To try to improve performance, I'm attempting to commit only once every thousand rows or so:
trans = engine.begin()
for i, rec in enumerate(records):
if i % 1000 == 0:
trans.commit()
trans = engine.begin()
try:
inserter.execute(...)
except sa.exceptions.SQLError:
my_table.update(...).execute()
trans.commit()
However, this isn't working. It seems that when the INSERT fails, it leaves things in a weird state that prevents the UPDATE from happening. Is it automatically rolling back the transaction? If so, can this be stopped? I don't want my entire transaction rolled back in the event of a problem, which is why I'm trying to catch the exception in the first place.
The error message I'm getting, BTW, is "sqlalchemy.exc.InternalError: (InternalError) current transaction is aborted, commands ignored until end of transaction block", and it happens on the update().execute() call.
You're hitting some weird Postgresql-specific behavior: if an error happens in a transaction, it forces the whole transaction to be rolled back. I consider this a Postgres design bug; it takes quite a bit of SQL contortionism to work around in some cases.
One workaround is to do the UPDATE first. Detect if it actually modified a row by looking at cursor.rowcount; if it didn't modify any rows, it didn't exist, so do the INSERT. (This will be faster if you update more frequently than you insert, of course.)
Another workaround is to use savepoints:
SAVEPOINT a;
INSERT INTO ....;
-- on error:
ROLLBACK TO SAVEPOINT a;
UPDATE ...;
-- on success:
RELEASE SAVEPOINT a;
This has a serious problem for production-quality code: you have to detect the error accurately. Presumably you're expecting to hit a unique constraint check, but you may hit something unexpected, and it may be next to impossible to reliably distinguish the expected error from the unexpected one. If this hits the error condition incorrectly, it'll lead to obscure problems where nothing will be updated or inserted and no error will be seen. Be very careful with this. You can narrow down the error case by looking at Postgresql's error code to make sure it's the error type you're expecting, but the potential problem is still there.
Finally, if you really want to do batch-insert-or-update, you actually want to do many of them in a few commands, not one item per command. This requires trickier SQL: SELECT nested inside an INSERT, filtering out the right items to insert and update.
This error is from PostgreSQL. PostgreSQL doesn't allow you to execute commands in the same transaction if one command creates an error. To fix this you can use nested transactions (implemented using SQL savepoints) via conn.begin_nested(). Heres something that might work. I made the code use explicit connections, factored out the chunking part and made the code use the context manager to manage transactions correctly.
from itertools import chain, islice
def chunked(seq, chunksize):
"""Yields items from an iterator in chunks."""
it = iter(seq)
while True:
yield chain([it.next()], islice(it, chunksize-1))
conn = engine.commit()
for chunk in chunked(records, 1000):
with conn.begin():
for rec in chunk:
try:
with conn.begin_nested():
conn.execute(inserter, ...)
except sa.exceptions.SQLError:
conn.execute(my_table.update(...))
This still won't have stellar performance though due to nested transaction overhead. If you want better performance try to detect which rows will create errors beforehand with a select query and use executemany support (execute can take a list of dicts if all inserts use the same columns). If you need to handle concurrent updates, you'll still need to do error handling either via retrying or falling back to one by one inserts.
I have a script that waits until some row in a db is updated:
con = MySQLdb.connect(server, user, pwd, db)
When the script starts the row's value is "running", and it waits for the value to become "finished"
while(True):
sql = '''select value from table where some_condition'''
cur = self.getCursor()
cur.execute(sql)
r = cur.fetchone()
cur.close()
res = r['value']
if res == 'finished':
break
print res
time.sleep(5)
When I run this script it hangs forever. Even though I see the value of the row has changed to "finished" when I query the table, the printout of the script is still "running".
Is there some setting I didn't set?
EDIT: The python script only queries the table. The update to the table is carried out by a tomcat webapp, using JDBC, that is set on autocommit.
This is an InnoDB table, right? InnoDB is transactional storage engine. Setting autocommit to true will probably fix this behavior for you.
conn.autocommit(True)
Alternatively, you could change the transaction isolation level. You can read more about this here:
http://dev.mysql.com/doc/refman/5.0/en/set-transaction.html
The reason for this behavior is that inside a single transaction the reads need to be consistent. All consistent reads within the same transaction read the snapshot established by the first read. Even if you script only reads the table this is considered a transaction too. This is the default behavior in InnoDB and you need to change that or run conn.commit() after each read.
This page explains this in more details: http://dev.mysql.com/doc/refman/5.0/en/innodb-consistent-read.html
I worked around this by running
c.execute("""set session transaction isolation level READ COMMITTED""")
early on in my reading session. Updates from other threads do come through now.
In my instance I was keeping connections open for a long time (inside mod_python) and so updates by other processes weren't being seen at all.