segmentation fault extending psycopg2._psycopg.cursor - python

This small code snippet results in SIGSEGV (I thought this wouldn't be possible in a language with garbage collection like python, but I'm used to be an ace in creating new kind of bugs) even though the database exists and the connection works, anyway I was trying to extend psycopg2._psycopg.cursor class to have a function returning query results in dictionary form, what am I doing wrong?
import psycopg2
class dcursor(psycopg2._psycopg.cursor):
def __init__(self,parent_cursor):
self=parent_cursor
def dictfetchall(self):
"Returns all rows from a cursor as a dict"
desc = cursor.description
return [
dict(zip([col[0] for col in desc], row))
for row in cursor.fetchall()
]
conn = psycopg2.connect("dbname=dbpgs user=openerp")
cur = dcursor(conn.cursor())
cur.execute('select name_related from hr_employee;')
print cur.dictfetchall()

The cursor signature takes a connection as first argument. The way you have overridden __init__ makes it take a cursor. Segfault follows. Your class is more a wrapper than a cursor. You are also not calling the __init__ base class, and self=parent_cursor doesn't do anything.
The right way to subclass a cursor taking your example is something like:
class dcursor(psycopg2.extensions.cursor):
def dictfetchall(self):
"Returns all rows from a cursor as a dict"
desc = self.description
return [
dict(zip([col[0] for col in desc], row))
for row in self.fetchall()
]
conn = psycopg2.connect("dbname=dbpgs user=openerp")
cur = conn.cursor(cursor_factory=dcursor)
cur.execute('select name_related from hr_employee;')
print cur.dictfetchall()
but see also fog's suggestion about using DictCursor.

It is possible, because psycopg2 is a module written in C, it only exposes its API to Python. You can see the code here: http://github.com/psycopg/psycopg2.git
I guess what you've encountered is a bug in Psycopg. That said, underscore in the _psycopg package name indicates, that classes defined there are not really meant to be subclassed.
Why don't you define dictfetchall() as a standalone helper function? It doesn't access any internal state of the cursor object, there's no real need to make it a cursor method.

psycopg2 is written in C and unless you know what you're doing it is possible to cause a SIGSEGV when calling/extending the module. All common functions and method carefully check their arguments both to avoid breakage and security problems but there are areas where, right now, the burden of doing the Right Thing is on the client code. You just hit one of those areas: extending the connection or cursor types.
To do this right you need to do some specific work in your __init__ method, as shown here:
https://github.com/psycopg/psycopg2/blob/master/lib/extras.py#L49
Specifically cursor (and connection) are new-style classes and need to be initialized using super() and the full list of parameters passed to __init__. At minimum:
def __init__(self, *args, **kwargs):
super(DictCursorBase, self).__init__(*args, **kwargs)
I linked that example specifically because it already does what you need, i.e., fetches data and makes it available as dicts. Just import psycopg.extras.DictCursor (to use a dict-like row class) or import psycopg.extras.RealDictCursor (to use a real dict for every row) and you're done.

I had the same error and the solution was to replace the psycopg2 with a different version. I replaced 2.6 version for 2.4 version and the problem was fixed.
You can validate this by running python interpreter and importing psycopg2.

Related

It is possible to check a variable value in a function in testing?

I have the following function:
def create_tables(tables: list) -> None:
template = jinja2.Template(
open("/opt/airflow/etl_scripts/postgres/staging_orientdb_create_tables.sql", "r").read()
)
pg_hook = PostgresHook(POSTGRES_CONN_ID) # this is airflow module for Postgres
conn = pg_hook.get_conn()
for table in tables:
sql = template.render(TABLE_NAME=table)
with conn.cursor() as cur:
cur.execute(sql)
conn.commit()
Is there a solution to check the content of the "execute" or "sql" internal variable?
My test looks like this, but because I return nothing, I can't check:
def test_create_tables(mocker):
pg_mock = MagicMock(name="pg")
mocker.patch.object(
PostgresHook,
"get_conn",
return_value=pg_mock
)
mock_file_content = "CREATE TABLE IF NOT EXISTS {{table}}"
mocker.patch("builtins.open", mock_open(read_data=mock_file_content))
create_tables(tables=["mock_table_1", "mock_table_2"])
Thanks,
You cannot access internal variables of a function from the outside.
I strongly suggest refactoring your create_tables function then, if you already see that you want to test a specific part of its algorithm.
You could create a function like get_sql_from_table for example. Then test that separately and mock it out in your test_create_tables.
Although, since all it does is call the template rendering function form an external library, I am not sure, if that is even something you should be testing. You should assume/know that they test their functions themselves.
Same goes for the Postgres functions.
But the general advice stands: If you want to verify that a specific part of your code does what you expect it to do, factor it out into its own unit/function and test that separately.

How to receive notices with PyGreSQL?

I'm using PyGreSQL 4.1.1 with Postgres 9.5, and have written some stored functions. I use RAISE with different levels inside of the functions for debugging purposes, which works very well in psql, but I haven't found a way to access those messages in Python.
Example:
CREATE OR REPLACE FUNCTION my_function() RETURNS BOOLEAN AS $_$
BEGIN
RAISE NOTICE 'A notice from my function.';
RETURN TRUE;
END
$_$ LANGUAGE plpgsql;
My Python code looks like this:
conn = pgdb.connect(database = 'mydb', user = 'myself')
cursor = conn.cursor()
cursor.execute("SELECT my_function()"):
How can I access the notice (A notice from my function.) after running my_function()?
Due to #klin's comment I found a somewhat unclean solution. The pgdb.Connection object stores the underlying pg.Connection object in a private property named _cnx. Thus, you can set the notice receiver like this:
def my_notice_receiver(notice):
logging.info("Notice: %s", notice)
conn._cnx.set_notice_receiver(my_notice_receiver)

Instantiate a class in a module

I come from a ruby background and I am noticing some differences to python... In ruby, when I need to create a helper I usually go for a module, something like the below:
module QueryHelper
def db_client
#db ||= DBClient.new
end
def query
db_client.select('whateverquery')
end
end
In python tho, I do something like the following:
db_client = DBClient()
def query():
return db_client.select('whateverquery')
My only worry with the above is that every time I call query() function it will try to instantiate DBClient() over and over... but based on reading and testing, that does not seem to occur due to some caching mechanism in python when I import a module...
The question is if the above in python is bad practice, if so, why and how can it be improved? perhaps lazy evaluating it? Or if you guys believe it's ok as is...
No. The query function will not be re-instantiated every time you call it. This is because you've already created an instance of DBClient outside of the query function. This means that your current code is fine as is.
If your intention was to create a new instance of DBClient every time query is called, then you should just move the declaration into the query function, like this:
def query():
db_client = DBClient()
return db_client.select( ... )
In short you would like to add a method to the DBClient object? Why not adding it dynamically?
# defining the method to add
def query(self, command):
return self.select(command)
# Actually adding it to the DBClient class
DBClient.query = query
# Instances now come with the newly added method
db_client = DBClient()
# Using the method
return_command_1 = db_client.query("my_command_1")
return_command_2 = db_client.query("my_command_2")
Credits to Igor Sobreira.

Can I use the same cursor while looping through it?

I am iterating through a SELECT result, like this:
import MySQLdb
conn = MySQLdb.connect(host = 127.0.0.1, user = ...) # and so on
cur = conn.cursor()
cur.execute("SELECT * FROM some_table")
for row in cur:
# some stuff I'm doing
# sometimes I need to perform another SELECT here
The question is, can I use cur again inside the for loop, or do I have to create another cursor (or even more - another connection)?
I guess I am missing some basic knowledge about databases or Python here... I am actually quite new with both. Also, my attempts to google the answer have failed.
I would even guess myself that I have to create another cursor, but I think I have actually used it for some time like this before I realized that it might be wrong and it seemed to work. But I am a bit confused now and can't guarantee it. So I just want to make sure.
You have to create a new cursor. Otherwise, cur is now holding the results of your new "inner" select instead of your "outer" one.
This may work anyway, depending on your database library and your luck, but you shouldn't count on it. I'll try to explain below.
You don't need a new connection, however.
So:
cur.execute("SELECT * FROM some_table")
for row in cur:
# some stuff I'm doing
inner_cur = conn.cursor()
inner_cur.execute("SELECT * FROM other_table WHERE column = row[1]")
for inner_row in inner_cur:
# stuff
So, why does it sometimes work?
Well, let's look at what a for row in cur: loop really does under the covers:
temp_iter = iter(cur)
while True:
try:
row = next(temp_iter)
except StopIteration:
break
# your code runs here
Now, that iter(cur) calls the __iter__ method on the cursor object. What does that do? That's up to cur, an object of the cursor type provided by your database library.
The obvious implementation is to return some kind of iterator that has a reference to either the cursor object, or to the same row collection that the cursor object is using under the covers. This is what happens when you call iter on a list, for example.
But there's nothing requiring the database library to implement its __iter__ that way. It could create a copy of the row set for the iterator to use. Or, more plausibly, it could make the iterator refer to the current row set… but then change the cursor itself to refer to a different one when you next call execute. If it does that, then the old iterator keeps reading the old row set, and you can get a new iterator that iterates the new row set.
You shouldn't rely on that happening just because a database library is allowed to do that. But you also shouldn't rely on it not happening, of course.
In more concrete terms, imagine this type:
class Cursor(object):
# other stuff
def __iter__(self):
return iter(self.rowset)
def execute(self, sql, *args):
self.rowset = self.db.do_the_real_work(sql, *args)

Creating pooled objects (Python)

I am writing a script that requires interacting with several databases (not concurrently). In order to facilitate this, I am mainting the db related information (connections etc) in a dictionary. As an aside, I am using sqlAlchemy for all interaction with the db. I don't know whether that is relevant to this question or not.
I have a function to set up the pool. It looks somewhat like this:
def setupPool():
global pooled_objects
for name in NAMES:
engine = create_engine("postgresql+psycopg2://postgres:pwd#localhost/%s" % name)
metadata = MetaData(engine)
conn = engine.connect()
tbl = Table('my_table', metadata, autoload=True)
info = {'db_connection': conn, 'table': tbl }
pooled_objects[name] = info
I am not sure if there are any gotchas in the code above, since I am using the same variable names, and its not clear (to me atleast), how the underlying pointers to the resources (connection are being handled). For example, will creating another engine (to a different db) and assigning it to the 'engine' variable cause the previous instance to be 'harvested' by the GC (since no code is using that reference yet - the pool is still being setup).
In short, is the code above OK?, and if not, why not - i.e. how may I fix it with respect to the issues mentioned above?
The code you have is perfectly good.
Just because you use the same variable name does not mean you are overriding (or freeing) another object that was assigned to that variable. In fact, you can look at the names as temporary labels to your objects.
Now, you store the final objects in the global dictionary pooled_objects, which means that until your program is done or your delete data from there explicitely, GC is not going to free them.

Categories