I'm new to Python - coming from PHP - and have been bouncing back and forth between Python official documentation and SqlAlchemy (which I'm trying to use as easily as Laravel's DB class)
I have this bit of code:
from sqlalchemy import *
engine = create_engine('mysql://root:pass==#aws.com/db')
db_connection = engine.connect()
meta = MetaData()
video_processing = Table('video_processing', meta, autoload=True, autoload_with=engine)
while True:
sleep(1)
stmt = select([video_processing]).where(video_processing.c.finished_processing == 0).where(video_processing.c.in_progress == 0)
result = db_connection.execute(stmt)
rows = result.fetchall()
print len(rows)
stmt = None
result = None
rows = None
When I execute my statement, I let it run and print out the number of rows that it fetches.
While that's going, I go in and delete rows from my db.
The problem is that even though I'm resetting pretty much everything I can think of that is related to the query, it's still printing out the same number of fetched rows in every iteration of my loop, even though I'm changing the underlying data.
Any ideas?
The tricky part is that the connection needs to be closed if using engine.connect() with db_connection.close() otherwise you might not see new data changes.
I ended up bypassing the connection and executing my statement directly on the engine, which makes more sense logically anyways:
result = engine.execute(stmt)
Related
I wrote an API that was directly using psycopg2 to interface with a PostgreSQL database but decided to re-write to use SQLAlchemy ORM. For the most part I'm very happy with the transition, but there are a few of trickier things I did that have been tough to translate. I was able to build a query to do what I wanted, but I'd much rather handle it with a HybridProperty/HybridMethod or perhaps a Custom Comparator (tried both but couldn't get them to work). I'm fairly new to SQLAlchemy ORM so I'd be happy to explore all options but I'd prefer something in the database model rather than the API code.
For background, there are several loosely coupled API consumers that have a few mandatory identifying columns that belong to the API and then a LargeBinary column that they can basically do whatever they want with. In the code below, the API consumers need to be able to select messages based on their own identifiers that are not parsed by the API (since each consumer is likely different).
Old code:
select_sql = sql.SQL("""SELECT {field1}
FROM {table}
WHERE {field2}={f2}
AND {field3}={f3}
AND {field4}={f4}
AND left(encode({f5}, 'hex'), {numChar})={selectBytes} # Relevant clause
AND {f6}=false;
"""
).format(field1=sql.Identifier("key"),
field2=sql.Identifier("f2"),
field3=sql.Identifier("f3"),
field4=sql.Identifier("f4"),
field5=sql.Identifier("f5"),
field6=sql.Identifier("f6"),
numChar=sql.Literal(len(data['bytes'])),
table=sql.Identifier("incoming"),
f2=sql.Literal(data['2']),
selectBytes=sql.Literal(data['bytes']),
f3=sql.Literal(data['3']),
f1=sql.Literal(data['1']))
try:
cur = incoming_conn.cursor()
cur.execute(select_sql)
keys = [x[0] for x in cur.fetchall()]
cur.close()
return keys, 200
except psycopg2.DatabaseError as error:
logging.error(error)
incoming_conn.reset()
return "Error reading from DB", 500
New code:
try:
session = Session()
messages = (
session.query(IncomingMessage)
.filter_by(deleted=False)
.filter_by(f2=data['2'])
.filter_by(f3=data['3'])
.filter_by(f4=data['4'])
.filter(func.left(func.encode(IncomingMessage.payload, # Relevant clause
'hex'),
len(data['bytes'])) == data['bytes'])
)
keys = [x.key for x in messages]
session.close()
return keys, 200
except exc.OperationalError as error:
logging.error(error)
session.close()
return "Database failure", 500
The problem that I kept running into was how to limit the number of stored bytes that were being compared. I don't think it's really a problem in the Comparator, but I feel like there would be a performance cost if I were loading several megabytes just to compare the first eight or so bytes.
I'm using AWS RDS, which I'm accessing with pymysql. I have a python lambda function that inserts a row into one of my tables. I then call cursor.commit() on the pymysql cursor object. Later, my lambda invokes a second lambda; this second lambda (using a different db connection) executes a SELECT to look for the newly-added row. Unfortunately, the row is not found immediately. As a debugging step, I added code like this:
lambda_handler.py
...
uuid_values = [uuid_value] # A single-item list
things = queries.get_things(uuid_values)
# Added for debugging
if not things:
print('For debugging: things not found.')
time.sleep(5)
things = queries.get_things()
print(f'for debugging: {str(things)}')
return things
queries.py
def get_things(uuid_values):
# Creates a string of the form 'UUID_TO_BIN(%s), UUID_TO_BIN(%s)' for use in the query below
format_string = ','.join(['UUID_TO_BIN(%s)'] * len(uuid_values))
tuple_of_keys = tuple([str(key) for key in uuid_values])
with db_conn.get_cursor() as cursor:
# Lightly simplified query
cursor.execute( '''
SELECT ...
FROM table1 t1
JOIN table2 t2 ON t1.id = t2.t1_id
WHERE
t1.uuid_value IN ({format_string})
AND t2.status_id = 1
''' % format_string,
tuple_of_keys)
results = cursor.fetchall()
db_conn.conn.commit()
return results
This outputs
'For debugging: things not found.'
'\<thing list\>'
meaning the row is not found immediately, but is after a brief delay. I'd rather not leave this delay in when I ship to production. I'm not doing anything with transactions or isolation level. So it's very strange to me that this second query would not find the newly-inserted row. Any idea what might be causing this?
I have a table in a database, mapped with SQLAlchemy ORM module (I have a "scoped_session" Variable)
I want multiple instances of my program (not just threads, also from several servers) to be able to work on the same table and NOT work on the same data.
so i have coded a manual "row-lock" mechanism to make sure each row is handled in this method i use "Full Lock" on the table while i "row-lock" it:
def instance:
s = scoped_session(sessionmaker(bind=engine)
engine.execute("LOCK TABLES my_data WRITE")
rows = s.query(Row_model).filter(Row_model.condition == 1).filter(Row_model.is_locked == 0).limit(10)
for row in rows:
row.is_locked = 1
row.lock_time = datetime.now()
s.commit()
engine.execute("UNLOCK TABLES")
for row in row:
manipulate_data(row)
row.is_locked = 0
s.commit()
for i in range(10):
t = threading.Thread(target=instance)
t.start()
The problem is that while running some instances, several threads are collapsing and produce this error (each):
sqlalchemy.exc.DatabaseError: (raised as a result of Query-invoked
autoflush; consider using a session.no_autoflush block if this flush
is occurring prematurely) (DatabaseError) 1205 (HY000): Lock wait
timeout exceeded; try restarting transaction 'UPDATE my_daya SET
row_var = 1}
Where is the catch? what makes my DB table to not UNLOCK successfully?
Thanks.
Locks are evil. Avoid them. Things go very bad when errors occur. Especially when you mix sessions with raw SQL statements, like you do.
The beauty of the scoped session is that it wraps a database transaction. This transaction makes the modifications to the database atomic, and also takes care of cleaning up when things go wrong.
Use scoped sessions as follows:
with scoped_session(sessionmaker(bind=engine) as s:
<ORM actions using s>
It may be some work to rewrite your code so that it becomes properly transactional, but it will be worth it! Sqlalchemy has tricks to help you with that.
Given this piece of code:
record = session.query(Foo).filter(Foo.id == 1).first()
session.delete(record)
session.flush()
has_record = session.query(Foo).filter(Foo.id == 1).first()
I think the 'has_record' should be None here, but it turns out to be the same row as record.
Did I miss something to get the assumed result. Or is there any way that can make the delete take effect without commit?
Mysql would behave in a different way under similar process.
start transaction;
select * from Foo where id = 1; # Hit one record
delete from Foo where id = 1; # Nothing goes to the disk
select * from Foo where id = 1; # Empty set
commit; # Everything geos to the disk
I made a stupid mistake here. The session I'm using is a routing session, which has a master/slave session behind it. The fact might be that the delete is flushed to master and the query still goes to slave, so of course I can query the record again.
I'm writing a python CGI script that will query a MySQL database. I'm using the MySQLdb module. Since the database will be queryed repeatedly, I wrote this function....
def getDatabaseResult(sqlQuery,connectioninfohere):
# connect to the database
vDatabase = MySQLdb.connect(connectioninfohere)
# create a cursor, execute and SQL statement and get the result as a tuple
cursor = vDatabase.cursor()
try:
cursor.execute(sqlQuery)
except:
cursor.close()
return None
result = cursor.fetchall()
cursor.close()
return result
My question is... Is this the best practice? Of should I reuse my cursor within my functions? For example. Which is better...
def callsANewCursorAndConnectionEachTime():
result1 = getDatabaseResult(someQuery1)
result2 = getDatabaseResult(someQuery2)
result3 = getDatabaseResult(someQuery3)
result4 = getDatabaseResult(someQuery4)
or do away with the getDatabaseeResult function all together and do something like..
def reusesTheSameCursor():
vDatabase = MySQLdb.connect(connectionInfohere)
cursor = vDatabase.cursor()
cursor.execute(someQuery1)
result1 = cursor.fetchall()
cursor.execute(someQuery2)
result2 = cursor.fetchall()
cursor.execute(someQuery3)
result3 = cursor.fetchall()
cursor.execute(someQuery4)
result4 = cursor.fetchall()
The MySQLdb developer recommends building an application specific API that does the DB access stuff for you so that you don't have to worry about the mysql query strings in the application code. It'll make the code a bit more extendable (link).
As for the cursors my understanding is that the best thing is to create a cursor per operation/transaction. So some check value -> update value -> read value type of transaction could use the same cursor, but for the next one you would create a new one. This is again pointing to the direction of building an internal API for the db access instead of having a generic executeSql method.
Also remember to close your cursors, and commit changes to the connection after the queries are done.
Your getDatabaseResult function doesn't need to have a connect for every separate query though. You can share the connection between the queries as long as you act responsible with the cursors.