Why is a subsequent query not able to find newly-inserted rows? - python

I'm using AWS RDS, which I'm accessing with pymysql. I have a python lambda function that inserts a row into one of my tables. I then call cursor.commit() on the pymysql cursor object. Later, my lambda invokes a second lambda; this second lambda (using a different db connection) executes a SELECT to look for the newly-added row. Unfortunately, the row is not found immediately. As a debugging step, I added code like this:
lambda_handler.py
...
uuid_values = [uuid_value] # A single-item list
things = queries.get_things(uuid_values)
# Added for debugging
if not things:
print('For debugging: things not found.')
time.sleep(5)
things = queries.get_things()
print(f'for debugging: {str(things)}')
return things
queries.py
def get_things(uuid_values):
# Creates a string of the form 'UUID_TO_BIN(%s), UUID_TO_BIN(%s)' for use in the query below
format_string = ','.join(['UUID_TO_BIN(%s)'] * len(uuid_values))
tuple_of_keys = tuple([str(key) for key in uuid_values])
with db_conn.get_cursor() as cursor:
# Lightly simplified query
cursor.execute( '''
SELECT ...
FROM table1 t1
JOIN table2 t2 ON t1.id = t2.t1_id
WHERE
t1.uuid_value IN ({format_string})
AND t2.status_id = 1
''' % format_string,
tuple_of_keys)
results = cursor.fetchall()
db_conn.conn.commit()
return results
This outputs
'For debugging: things not found.'
'\<thing list\>'
meaning the row is not found immediately, but is after a brief delay. I'd rather not leave this delay in when I ship to production. I'm not doing anything with transactions or isolation level. So it's very strange to me that this second query would not find the newly-inserted row. Any idea what might be causing this?

Related

Problems implementing a python db listener

I'm writing a module for a program that needs to listen for new entries in a db, and execute a function on the event of new rows being posted to this table... aka a trigger.
I have written some code, but it does not work. Here's my logic:
connect to db, query for the newest row, compare that row with variable, if not equal, run function, store newest row to variable, else close. Run every 2 seconds to compare newest row with whatever is stored in the variable/object.
Everything runs fine and pulls the expected results from the db, however I'm getting a 'local variable 'last_sent' referenced before assignment.
This confuses me for 2 reasons.
I thought I set last_sent to 'nothing' as a global variable/object before the functions are called.
In order for my comparison logic to work, I can't set last_sent within the sendListener() function before the if/else
Here's the code.
from Logger import Logger
from sendSMS import sendSMS
from Needles import dbUser, dbHost, dbPassword, pull_stmt
import pyodbc
import time
#set last_sent to something
last_sent = ''
def sendListener():
#connect to db
cnxn = pyodbc.connect('UID='+dbUser+';PWD='+dbPassword+';DSN='+dbHost)
cursor = cnxn.cursor()
#run query to pull newest row
cursor.execute(pull_stmt)
results = cursor.fetchone()
#if query results different from results stored in last_sent, run function.
#then set last_sent object to the query results for next comparison.
if results != last_sent:
sendSMS()
last_sent = results
else:
cnxn.close()
# a loop to run the check every 2 seconds- as to lessen cpu usage
def sleepLoop():
while 0 == 0:
sendListener()
time.sleep(2.0)
sleepLoop()
I'm sure there is a better way to implement this.
Here:
if results != last_sent:
sendSMS()
last_sent = results
else:
cnxn.close()
Python sees that you're assigning to last_sent, but it's not marked as global in this function, so it must be local. Yet you're reading it in results != last_sent before its definition, so you get the error.
To solve this, mark it as global at the beginning of the function:
def sendListener():
global last_sent
...

Executing MySQL Queries using Python

I'm attempting to run some MySQL queries and output the results in my Python program. I've created this function that is called and the cursor is passed through. However, I am running into a problem where running the below code will always return None / nothing.
Here is what I have:
def showInformation(cursor):
number_rows = 'SELECT COUNT(*) FROM information_schema.tables WHERE table_schema = "DB"'
testing = cursor.execute(number_rows)
print(testing)
When using the cursor object itself, I do not run into any problems:
for row in cursor:
print(row)
I guess you need:
print(cursor.fetchone())
because you are returning only a count and so you expect one row.
Calling execute is not supposed to return anything unless multi=True is specified according to mysql documentation. The programmer can only iterate the cursor like you did, or call fetchone to retrieve one row or call fetchall to retrieve all rows or call fetchmany to retrieve some rows.

Get new result in infinite loop with SqlAlchemy

I'm new to Python - coming from PHP - and have been bouncing back and forth between Python official documentation and SqlAlchemy (which I'm trying to use as easily as Laravel's DB class)
I have this bit of code:
from sqlalchemy import *
engine = create_engine('mysql://root:pass==#aws.com/db')
db_connection = engine.connect()
meta = MetaData()
video_processing = Table('video_processing', meta, autoload=True, autoload_with=engine)
while True:
sleep(1)
stmt = select([video_processing]).where(video_processing.c.finished_processing == 0).where(video_processing.c.in_progress == 0)
result = db_connection.execute(stmt)
rows = result.fetchall()
print len(rows)
stmt = None
result = None
rows = None
When I execute my statement, I let it run and print out the number of rows that it fetches.
While that's going, I go in and delete rows from my db.
The problem is that even though I'm resetting pretty much everything I can think of that is related to the query, it's still printing out the same number of fetched rows in every iteration of my loop, even though I'm changing the underlying data.
Any ideas?
The tricky part is that the connection needs to be closed if using engine.connect() with db_connection.close() otherwise you might not see new data changes.
I ended up bypassing the connection and executing my statement directly on the engine, which makes more sense logically anyways:
result = engine.execute(stmt)

Python psycopg2 - Logging events

I'm using psycopg2, and I have a problem with logging events (executed queries, notifications, errors) to a file. I want to get effect like in PgAdmin history window.
For example I'm executing this query:
insert into city(id, name, countrycode, district, population) values (4080,'Savilla', 'ESP', 'andalucia', 1000000)
And in PgAdmin I see effect like this:
Executing query:
insert into city(id, name, countrycode, district, population) values (4080,'Sevilla', 'ESP', 'andalucia', 1000000)
Query executed in 26 ms.
One row affected.
Can I get similar effect using psycopg2?
I tried to use LoggingCursor, but it not satisfactory for me, because it logs only queries.
Thanks for help.
EDIT:
My code:
conn = psycopg2.extras.LoggingConnection(DSN)
File=open('log.log','a')
File.write('================================')
psycopg2.extras.LoggingConnection.initialize(conn,File)
File.write('\n'+time.strftime("%Y-%m-%d %H:%M:%S") + '---Executing query:\n\t')
q="""insert into city(id, name, countrycode, district, population) values (4080,'Sevilla', 'ESP', 'andalucia', 10000)"""
c=conn.cursor()
c.execute(q)
File.write('\n'+time.strftime("%Y-%m-%d %H:%M:%S") + '---Executing query:\n\t')
q="""delete from city where id = 4080"""
c=conn.cursor()
c.execute(q)
conn.commit()
File.close()
And this is my output log:
================================
2012-12-30 22:42:31---Executing query:
insert into city(id, name, countrycode, district, population) values (4080,'Sevilla', 'ESP', 'andalucia', 10000)
2012-12-30 22:42:31---Executing query:
delete from city where id = 4080
I want to see in log file informations about how many rows was affected and informations about errors. Finally I want to have a complete log file with all events.
From what I can see, you have three requirements that are not fulfilled by the LoggingCursor class
Query execution time
Number of rows affected
A complete log file with all events.
For the first requirement, take a look at the source code for the MinTimeLoggingConnection class in psycopg2.extras. It sub-classes LoggingConnection and outputs the execution time of queries that exceed a minimum time (note that this needs to be used in conjunction with the MinTimeLoggingCursor).
For the second requirement, the rowcount attribute of the cursor class specifies
the number of rows that the last execute*() produced (for DQL
statements like SELECT) or affected (for DML statements like UPDATE or
INSERT)
Thus it should be possible to create your own type of LoggingConnection and LoggingCursor that includes this additional functionality.
My attempt is as follows. Just replace LoggingConnection with LoggingConnection2 in your code and this should all work. As a side-note, you don't need to create a new cursor for your second query. You can just call c.execute(q) again after you've defined your second query.
import psycopg2
import os
import time
from psycopg2.extras import LoggingConnection
from psycopg2.extras import LoggingCursor
class LoggingConnection2(psycopg2.extras.LoggingConnection):
def initialize(self, logobj):
LoggingConnection.initialize(self, logobj)
def filter(self, msg, curs):
t = (time.time() - curs.timestamp) * 1000
return msg + os.linesep + 'Query executed in: {0:.2f} ms. {1} row(s) affected.'.format(t, curs.rowcount)
def cursor(self, *args, **kwargs):
kwargs.setdefault('cursor_factory', LoggingCursor2)
return super(LoggingConnection, self).cursor(*args, **kwargs)
class LoggingCursor2(psycopg2.extras.LoggingCursor):
def execute(self, query, vars=None):
self.timestamp = time.time()
return LoggingCursor.execute(self, query, vars)
def callproc(self, procname, vars=None):
self.timestamp = time.time()
return LoggingCursor.execute(self, procname, vars)
I'm not sure how to create a complete log of all events, but the notices attribute of the connection class may be of interest.
Maybe you can get what you're looking for without writing any code.
There's an option in postgresql itself called "log_min_duration" that might help you out.
You can set it to zero, and every query will be logged, along with its run-time cost. Or you can set it to some positive number, like say, 500, and postgresql will only record queries that take at least 500 ms to run.
You won't get the results of the query in your log file, but you'll get the exact query, including the interpolated bound parameters.
If this works well for you, later on, check out the auto_explain module.
Good luck!
Just take a look at how the LoggingCursor is implemented and write your own cursor subclass: it's very easy.

should I reuse the cursor in the python MySQLdb module

I'm writing a python CGI script that will query a MySQL database. I'm using the MySQLdb module. Since the database will be queryed repeatedly, I wrote this function....
def getDatabaseResult(sqlQuery,connectioninfohere):
# connect to the database
vDatabase = MySQLdb.connect(connectioninfohere)
# create a cursor, execute and SQL statement and get the result as a tuple
cursor = vDatabase.cursor()
try:
cursor.execute(sqlQuery)
except:
cursor.close()
return None
result = cursor.fetchall()
cursor.close()
return result
My question is... Is this the best practice? Of should I reuse my cursor within my functions? For example. Which is better...
def callsANewCursorAndConnectionEachTime():
result1 = getDatabaseResult(someQuery1)
result2 = getDatabaseResult(someQuery2)
result3 = getDatabaseResult(someQuery3)
result4 = getDatabaseResult(someQuery4)
or do away with the getDatabaseeResult function all together and do something like..
def reusesTheSameCursor():
vDatabase = MySQLdb.connect(connectionInfohere)
cursor = vDatabase.cursor()
cursor.execute(someQuery1)
result1 = cursor.fetchall()
cursor.execute(someQuery2)
result2 = cursor.fetchall()
cursor.execute(someQuery3)
result3 = cursor.fetchall()
cursor.execute(someQuery4)
result4 = cursor.fetchall()
The MySQLdb developer recommends building an application specific API that does the DB access stuff for you so that you don't have to worry about the mysql query strings in the application code. It'll make the code a bit more extendable (link).
As for the cursors my understanding is that the best thing is to create a cursor per operation/transaction. So some check value -> update value -> read value type of transaction could use the same cursor, but for the next one you would create a new one. This is again pointing to the direction of building an internal API for the db access instead of having a generic executeSql method.
Also remember to close your cursors, and commit changes to the connection after the queries are done.
Your getDatabaseResult function doesn't need to have a connect for every separate query though. You can share the connection between the queries as long as you act responsible with the cursors.

Categories