Can I use the same cursor while looping through it? - python

I am iterating through a SELECT result, like this:
import MySQLdb
conn = MySQLdb.connect(host = 127.0.0.1, user = ...) # and so on
cur = conn.cursor()
cur.execute("SELECT * FROM some_table")
for row in cur:
# some stuff I'm doing
# sometimes I need to perform another SELECT here
The question is, can I use cur again inside the for loop, or do I have to create another cursor (or even more - another connection)?
I guess I am missing some basic knowledge about databases or Python here... I am actually quite new with both. Also, my attempts to google the answer have failed.
I would even guess myself that I have to create another cursor, but I think I have actually used it for some time like this before I realized that it might be wrong and it seemed to work. But I am a bit confused now and can't guarantee it. So I just want to make sure.

You have to create a new cursor. Otherwise, cur is now holding the results of your new "inner" select instead of your "outer" one.
This may work anyway, depending on your database library and your luck, but you shouldn't count on it. I'll try to explain below.
You don't need a new connection, however.
So:
cur.execute("SELECT * FROM some_table")
for row in cur:
# some stuff I'm doing
inner_cur = conn.cursor()
inner_cur.execute("SELECT * FROM other_table WHERE column = row[1]")
for inner_row in inner_cur:
# stuff
So, why does it sometimes work?
Well, let's look at what a for row in cur: loop really does under the covers:
temp_iter = iter(cur)
while True:
try:
row = next(temp_iter)
except StopIteration:
break
# your code runs here
Now, that iter(cur) calls the __iter__ method on the cursor object. What does that do? That's up to cur, an object of the cursor type provided by your database library.
The obvious implementation is to return some kind of iterator that has a reference to either the cursor object, or to the same row collection that the cursor object is using under the covers. This is what happens when you call iter on a list, for example.
But there's nothing requiring the database library to implement its __iter__ that way. It could create a copy of the row set for the iterator to use. Or, more plausibly, it could make the iterator refer to the current row set… but then change the cursor itself to refer to a different one when you next call execute. If it does that, then the old iterator keeps reading the old row set, and you can get a new iterator that iterates the new row set.
You shouldn't rely on that happening just because a database library is allowed to do that. But you also shouldn't rely on it not happening, of course.
In more concrete terms, imagine this type:
class Cursor(object):
# other stuff
def __iter__(self):
return iter(self.rowset)
def execute(self, sql, *args):
self.rowset = self.db.do_the_real_work(sql, *args)

Related

It is possible to check a variable value in a function in testing?

I have the following function:
def create_tables(tables: list) -> None:
template = jinja2.Template(
open("/opt/airflow/etl_scripts/postgres/staging_orientdb_create_tables.sql", "r").read()
)
pg_hook = PostgresHook(POSTGRES_CONN_ID) # this is airflow module for Postgres
conn = pg_hook.get_conn()
for table in tables:
sql = template.render(TABLE_NAME=table)
with conn.cursor() as cur:
cur.execute(sql)
conn.commit()
Is there a solution to check the content of the "execute" or "sql" internal variable?
My test looks like this, but because I return nothing, I can't check:
def test_create_tables(mocker):
pg_mock = MagicMock(name="pg")
mocker.patch.object(
PostgresHook,
"get_conn",
return_value=pg_mock
)
mock_file_content = "CREATE TABLE IF NOT EXISTS {{table}}"
mocker.patch("builtins.open", mock_open(read_data=mock_file_content))
create_tables(tables=["mock_table_1", "mock_table_2"])
Thanks,
You cannot access internal variables of a function from the outside.
I strongly suggest refactoring your create_tables function then, if you already see that you want to test a specific part of its algorithm.
You could create a function like get_sql_from_table for example. Then test that separately and mock it out in your test_create_tables.
Although, since all it does is call the template rendering function form an external library, I am not sure, if that is even something you should be testing. You should assume/know that they test their functions themselves.
Same goes for the Postgres functions.
But the general advice stands: If you want to verify that a specific part of your code does what you expect it to do, factor it out into its own unit/function and test that separately.

How to check in pyodbc package if last impala statement was a select statement?

I would like to have a function based on pyodbc package which runs a query against impala and fetch results if there is something to fetch, otherwise, just execute the statement. Unfortunately, i do not know how to check if i have something to fetch.
def execute_my_query(connection, query):
cur = connection.cursor()
cur.execute(query)
res = cur.fetchall()
return res
Unfortunately, if i execute something with no result set, such as:
execute_my_query(con, 'drop table if exists my_schama.my_table')
it fails with the error that there is no result set to return. So i'd like to check if there is a result that i should be returning, and skip if there is no reason to return anything.
In the meantime, i've been able to produce one solution which seems to work in the desired way.
Based on what is written in pyodbc documentation for cursor attribute description, the attribute "will be None for operations that do not return rows or if one of the execute methods has not been called".
Note that if you wanted to use rowcount attribute instead, this would not work in impala, as you will get rowcount=-1 even if there is a non-empty result set.
Therefore, one can rewrite the function from the question as:
def execute_my_query(connection, query):
res = None
cur = connection.cursor()
cur.execute(query)
if cur.description is not None:
res = cur.fetchall()
return res
That being said, if there is a better way of handling this, i'd still very much like to hear it.

segmentation fault extending psycopg2._psycopg.cursor

This small code snippet results in SIGSEGV (I thought this wouldn't be possible in a language with garbage collection like python, but I'm used to be an ace in creating new kind of bugs) even though the database exists and the connection works, anyway I was trying to extend psycopg2._psycopg.cursor class to have a function returning query results in dictionary form, what am I doing wrong?
import psycopg2
class dcursor(psycopg2._psycopg.cursor):
def __init__(self,parent_cursor):
self=parent_cursor
def dictfetchall(self):
"Returns all rows from a cursor as a dict"
desc = cursor.description
return [
dict(zip([col[0] for col in desc], row))
for row in cursor.fetchall()
]
conn = psycopg2.connect("dbname=dbpgs user=openerp")
cur = dcursor(conn.cursor())
cur.execute('select name_related from hr_employee;')
print cur.dictfetchall()
The cursor signature takes a connection as first argument. The way you have overridden __init__ makes it take a cursor. Segfault follows. Your class is more a wrapper than a cursor. You are also not calling the __init__ base class, and self=parent_cursor doesn't do anything.
The right way to subclass a cursor taking your example is something like:
class dcursor(psycopg2.extensions.cursor):
def dictfetchall(self):
"Returns all rows from a cursor as a dict"
desc = self.description
return [
dict(zip([col[0] for col in desc], row))
for row in self.fetchall()
]
conn = psycopg2.connect("dbname=dbpgs user=openerp")
cur = conn.cursor(cursor_factory=dcursor)
cur.execute('select name_related from hr_employee;')
print cur.dictfetchall()
but see also fog's suggestion about using DictCursor.
It is possible, because psycopg2 is a module written in C, it only exposes its API to Python. You can see the code here: http://github.com/psycopg/psycopg2.git
I guess what you've encountered is a bug in Psycopg. That said, underscore in the _psycopg package name indicates, that classes defined there are not really meant to be subclassed.
Why don't you define dictfetchall() as a standalone helper function? It doesn't access any internal state of the cursor object, there's no real need to make it a cursor method.
psycopg2 is written in C and unless you know what you're doing it is possible to cause a SIGSEGV when calling/extending the module. All common functions and method carefully check their arguments both to avoid breakage and security problems but there are areas where, right now, the burden of doing the Right Thing is on the client code. You just hit one of those areas: extending the connection or cursor types.
To do this right you need to do some specific work in your __init__ method, as shown here:
https://github.com/psycopg/psycopg2/blob/master/lib/extras.py#L49
Specifically cursor (and connection) are new-style classes and need to be initialized using super() and the full list of parameters passed to __init__. At minimum:
def __init__(self, *args, **kwargs):
super(DictCursorBase, self).__init__(*args, **kwargs)
I linked that example specifically because it already does what you need, i.e., fetches data and makes it available as dicts. Just import psycopg.extras.DictCursor (to use a dict-like row class) or import psycopg.extras.RealDictCursor (to use a real dict for every row) and you're done.
I had the same error and the solution was to replace the psycopg2 with a different version. I replaced 2.6 version for 2.4 version and the problem was fixed.
You can validate this by running python interpreter and importing psycopg2.

SQLAlchemy and explicit locking

I have multiple processes that can potentially insert duplicate rows into the database. These inserts do not happen very frequently (a few times every hour) so it is not performance critical.
I've tried an exist check before doing the insert, like so:
#Assume we're inserting a camera object, that's a valid SQLAlchemy ORM object that inherits from declarative_base...
try:
stmt = exists().where(Camera.id == camera_id)
exists_result = session.query(Camera).with_lockmode("update").filter(stmt).first()
if exists_result is None:
session.add(Camera(...)) #Lots of parameters, just assume it works
session.commit()
except IntegrityError as e:
session.rollback()
The problem I'm running into is that the exist() check doesn't lock the table, and so there is a chance that multiple processes could attempt to insert the same object at the same time. In such a scenario, one process succeeds with the insert and the others fail with an IntegrityError exception. While this works, it doesn't feel "clean" to me.
I would really like some way of locking the Camera table before doing the exists() check.
Pehaps this might be of interest to you:
https://groups.google.com/forum/?fromgroups=#!topic/sqlalchemy/8WLhbsp2nls
You can lock the tables by executing the SQL directly. I'm not sure what that looks like in Elixir, but in plain SA it'd be something like:
conn = engine.connect()
conn.execute("LOCK TABLES Pointer WRITE")
#do stuff with conn
conn.execute("UNLOCK TABLES")

SQLAlchamy Database Construction & Reuse

there's something I'm struggling to understand with SQLAlchamy from it's documentation and tutorials.
I see how to autoload classes from a DB table, and I see how to design a class and create from it (declaratively or using the mapper()) a table that is added to the DB.
My question is how does one write code that both creates the table (e.g. on first run) and then reuses it?
I don't want to have to create the database with one tool or one piece of code and have separate code to use the database.
Thanks in advance,
Peter
create_all() does not do anything if a table exists already, so just call it as soon as you set up your engine or connection.
(Note that if you change your table schema, create_all() will not update it! So you still need "another program" to do that.)
This is the usual pattern:
def createEngine(metadata, dsn, **args):
engine = create_engine(dsn, **args)
metadata.create_all(engine)
return engine
def doStuff(engine):
res = engine.execute('select * from mytable')
# etc etc
def main():
engine = createEngine(metadata, 'sqlite:///:memory:')
doStuff(engine)
if __name__=='__main__':
main()
I think you're perhaps over-thinking the situation. If you want to create the database afresh, you normally just call Base.metadata.create_all() or equivalent, and if you don't want to do that, you don't call it.
You could try calling it every time and handling the exception if it goes wrong, assuming that the database is already set up.
Or you could try querying for a certain table and if that fails, call create_all() to put everything in place.
Every other part of your app should work in the same way whether you perform the db creation or not.

Categories