PyMongo max_time_ms - python

I would like to use the max_time_ms flag during a find on mongodb, but I woulld like to understand how this flag works and how to verify that it is working.
pymongo find().max_time_ms(500)
Is there any way to verify?
I tried to db.fsyncLock(), but I get this is applicable only for inserts.
I thought that a possible solution should be insert too many entries and reduce to max_time_ms(1), so the query will not have enough time to take results.
Any suggestions?
Tks

Passing the max_time_ms option this way
cursor = db.collection.find().max_time_ms(1)
or
cursor = db.collection.find(max_time_ms=1)
sets a time limit for the query and errors out with a pymongo.errors.ExecutionTimeout exception when the time limit specified is exceeded for the query.
Since cursors are lazy, this exception is raised when accessing results from the cursor e.g.
for doc in cursor:
print(doc)
ExecutionTimeout: operation exceeded time limit
max_time_ms (optional): Specifies a time limit for a query
operation. If the specified time is exceeded, the operation will be
aborted and :exc:~pymongo.errors.ExecutionTimeout is raised. Pass
this as an alternative to calling
[Source: Docs]

Related

mysql.connector.errors.DatabaseError: 1205 Lock wait timeout

I am trying to resolve a timeout issue regarding my SQL database. The error occurs in the:
SQLUpdate="UPDATE scoutinfo SET patrolID=1 WHERE patrolID=%s"
It seems this command takes too long to execute as I receive this error.
mysql.connector.errors.DatabaseError: 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
Is their some setting I need to change on MySQL to allow python to update/delete rows in the databae. The database is relatively small (Number of rows in each table <25)
SQLPatrolID="SELECT patrolID FROM patrols WHERE patrolname=%s"
mycursor.execute(SQLPatrolID,(DPatrol.get(), ))
myresult=mycursor.fetchall()
PatrolID=myresult[0][0]
print(PatrolID)
SQLUpdate="UPDATE scoutinfo SET patrolID=1 WHERE patrolID=%s"
mycursor.execute(SQLUpdate,(PatrolID, ))
mydb.commit()
print("Success!")
SQLDeletePatrol="DELETE patrolinfo WHERE patrolID=%s"
mycursor.execute(SQLDeletePatrol,(PatrolID, ))
mydb.commit()
Any extra information you require I can happily provide.
No. Your query is not taking too long to execute. It's taking too long to acquire a lock on the tuples you are about to update.
What does this mean? There is another query/transaction, updating the exact same records at the same time. It's probably right there on your code, or it's probably a different thread/application. I would think it is the first case though.
You can see who's holding a lock by inspecting these tables:
INNODB_LOCK_WAITS
INNODB_LOCKS
Or, run the following command:
> show engine innodb status;
This will work if you are using INNODB engine. Which you probably are.

Python - SQLAlchemy - MySQL - multiple instances work on same data

I have a table in a database, mapped with SQLAlchemy ORM module (I have a "scoped_session" Variable)
I want multiple instances of my program (not just threads, also from several servers) to be able to work on the same table and NOT work on the same data.
so i have coded a manual "row-lock" mechanism to make sure each row is handled in this method i use "Full Lock" on the table while i "row-lock" it:
def instance:
s = scoped_session(sessionmaker(bind=engine)
engine.execute("LOCK TABLES my_data WRITE")
rows = s.query(Row_model).filter(Row_model.condition == 1).filter(Row_model.is_locked == 0).limit(10)
for row in rows:
row.is_locked = 1
row.lock_time = datetime.now()
s.commit()
engine.execute("UNLOCK TABLES")
for row in row:
manipulate_data(row)
row.is_locked = 0
s.commit()
for i in range(10):
t = threading.Thread(target=instance)
t.start()
The problem is that while running some instances, several threads are collapsing and produce this error (each):
sqlalchemy.exc.DatabaseError: (raised as a result of Query-invoked
autoflush; consider using a session.no_autoflush block if this flush
is occurring prematurely) (DatabaseError) 1205 (HY000): Lock wait
timeout exceeded; try restarting transaction 'UPDATE my_daya SET
row_var = 1}
Where is the catch? what makes my DB table to not UNLOCK successfully?
Thanks.
Locks are evil. Avoid them. Things go very bad when errors occur. Especially when you mix sessions with raw SQL statements, like you do.
The beauty of the scoped session is that it wraps a database transaction. This transaction makes the modifications to the database atomic, and also takes care of cleaning up when things go wrong.
Use scoped sessions as follows:
with scoped_session(sessionmaker(bind=engine) as s:
<ORM actions using s>
It may be some work to rewrite your code so that it becomes properly transactional, but it will be worth it! Sqlalchemy has tricks to help you with that.

pymongo db.collection.update operationFailure

I have a large collection of documents which I'm trying to update using the pymongo.update function. I am finding all documents that fall within a certain polygon and updating all the
points found with "update_value".
for element in geomShapeCollection:
db.collectionName.update({"coordinates":{"$geoWithin":{"$geometry":element["geometry_part"]}}}, {"$set":{"Update_key": update_value}}, multi = True, timeout=False)
For smaller collections this command works as expected. In the largest dataset
the command works for 70-80% of the data and then throws the error:
pymongo.errors.OperationFailure: cursor id '428737620678732339' not
valid at server
The pymongo documentation tells me that this is possibly due to a timeout issue.
Cursors in MongoDB can timeout on the server if they’ve been open for
a long time without any operations being performed on them.
Reading through the pymongo documentation, the find() function has a boolean flag for timeout.
find(spec=None, fields=None, skip=0, limit=0, timeout=True, snapshot=False, tailable=False, _sock=None, _must_use_master=False,_is_command=False)
However the update function appears not to have this:
update(spec, document, upsert=False, manipulate=False, safe=False, multi=False)
Is there any way to set this timeout flag for the update function? Is there any way I can change this so that I do not get this OperationFailure error? Am I correct in assuming this is an timeout error as pymongo states that it throws this error when
Raised when a database operation fails.
After some research and lots of experimentation I found that it was the outer loop cursor that was causing the error.
for element in geomShapeCollection:
geomShapeCollection is a cursor to a mongodb collection. There are several elements in geoShapeCollection where large amounts of elements fall, because these updates take such a considerable amount of time the geomShapeCollection cursor closes.
The problem was not with the update function at all. Adding a (timeout=False) to the outer cursor solves this problem.
for element in db.geomShapeCollectionName.find(timeout=False):
db.collectionName.update({"coordinates":{"$geoWithin":{"$geometry":element["geometry_part"]}}}, {"$set":{"Update_key": update_value}}, multi = True, timeout=False)

GAE: Is it necessary to call fetch on a query before getting its cursor?

When the following code is executed:
q = MyKind.all()
taskqueue.add(url="/admin/build", params={'cursor': q.cursor()})
I get:
AssertionError: No cursor available.
Why does this happen? Do I need to fetch something first? (I'd rather not; the code is cleaner just to get the query and pass it on.)
I'm using Python on Google App Engine 1.3.5.
Yes, a cursor is only available if you've fetched something; there's no cursor for the first result in the query.
As a workaround, you could wrap the call to cursor() in a try/except and pass on None to the next task if there isn't a cursor available.

How do I efficiently do a bulk insert-or-update with SQLAlchemy?

I'm using SQLAlchemy with a Postgres backend to do a bulk insert-or-update. To try to improve performance, I'm attempting to commit only once every thousand rows or so:
trans = engine.begin()
for i, rec in enumerate(records):
if i % 1000 == 0:
trans.commit()
trans = engine.begin()
try:
inserter.execute(...)
except sa.exceptions.SQLError:
my_table.update(...).execute()
trans.commit()
However, this isn't working. It seems that when the INSERT fails, it leaves things in a weird state that prevents the UPDATE from happening. Is it automatically rolling back the transaction? If so, can this be stopped? I don't want my entire transaction rolled back in the event of a problem, which is why I'm trying to catch the exception in the first place.
The error message I'm getting, BTW, is "sqlalchemy.exc.InternalError: (InternalError) current transaction is aborted, commands ignored until end of transaction block", and it happens on the update().execute() call.
You're hitting some weird Postgresql-specific behavior: if an error happens in a transaction, it forces the whole transaction to be rolled back. I consider this a Postgres design bug; it takes quite a bit of SQL contortionism to work around in some cases.
One workaround is to do the UPDATE first. Detect if it actually modified a row by looking at cursor.rowcount; if it didn't modify any rows, it didn't exist, so do the INSERT. (This will be faster if you update more frequently than you insert, of course.)
Another workaround is to use savepoints:
SAVEPOINT a;
INSERT INTO ....;
-- on error:
ROLLBACK TO SAVEPOINT a;
UPDATE ...;
-- on success:
RELEASE SAVEPOINT a;
This has a serious problem for production-quality code: you have to detect the error accurately. Presumably you're expecting to hit a unique constraint check, but you may hit something unexpected, and it may be next to impossible to reliably distinguish the expected error from the unexpected one. If this hits the error condition incorrectly, it'll lead to obscure problems where nothing will be updated or inserted and no error will be seen. Be very careful with this. You can narrow down the error case by looking at Postgresql's error code to make sure it's the error type you're expecting, but the potential problem is still there.
Finally, if you really want to do batch-insert-or-update, you actually want to do many of them in a few commands, not one item per command. This requires trickier SQL: SELECT nested inside an INSERT, filtering out the right items to insert and update.
This error is from PostgreSQL. PostgreSQL doesn't allow you to execute commands in the same transaction if one command creates an error. To fix this you can use nested transactions (implemented using SQL savepoints) via conn.begin_nested(). Heres something that might work. I made the code use explicit connections, factored out the chunking part and made the code use the context manager to manage transactions correctly.
from itertools import chain, islice
def chunked(seq, chunksize):
"""Yields items from an iterator in chunks."""
it = iter(seq)
while True:
yield chain([it.next()], islice(it, chunksize-1))
conn = engine.commit()
for chunk in chunked(records, 1000):
with conn.begin():
for rec in chunk:
try:
with conn.begin_nested():
conn.execute(inserter, ...)
except sa.exceptions.SQLError:
conn.execute(my_table.update(...))
This still won't have stellar performance though due to nested transaction overhead. If you want better performance try to detect which rows will create errors beforehand with a select query and use executemany support (execute can take a list of dicts if all inserts use the same columns). If you need to handle concurrent updates, you'll still need to do error handling either via retrying or falling back to one by one inserts.

Categories