Using decorators for database access with psycopg2 - python

I am constructing a model that does large parts of its calculations in a Postgresql database (for performance reasons). It looks somewhat like this:
def sql_func1(conn):
# prepare some data, crunch some number, etc.
curs = conn.cursor()
curs.execute("SOME SQL COMMAND")
curs.commit()
curs.close()
if __name__ == "__main__":
connection = psycopg2.connect(dbname='name', user='user', password='pass', host='localhost', port=1234)
sql_func1(conn)
sql_func2(conn)
sql_func3(conn)
connection.close()
The script uses around 30 individual functions like sql_func1. Obviously it is a little awkward to manage the connection and cursor in each function all the time. Thus I started using a decorator as described here. Now I can simply wrap sql_func1 with a decorator #db_connect and pass the connection from there. However, that means I am opening and closing the connection all the time, which is not good practice either. The psycopg2 FAQ says:
Creating a connection can be slow (think of SSL over TCP) so the best
practice is to create a single connection and keep it open as long as
required. It is also good practice to rollback or commit frequently
(even after a single SELECT statement) to make sure the backend is
never left “idle in transaction”. See also psycopg2.pool for
lightweight connection pooling.
Could you please give me some insights which would be an ideal practice im my case. Should I rather use a decorator that passes the cursor object instead of the connection? If so, please provide a code sample for the decorator. As I am rather new to programming, please let me also know in case you think my overall approach is wrong.

What about storing the connection in a global variable without closing it in the finally block? Something like this (according to the example you linked):
cnn = None
def with_connection(f):
def with_connection_(*args, **kwargs):
global cnn
if not cnn:
cnn = psycopg.connect(DSN)
try:
rv = f(cnn, *args, **kwargs)
except Exception, e:
cnn.rollback()
raise
else:
cnn.commit() # or maybe not
return rv
return with_connection_

Related

simple example of working with neo4j python driver?

Is there a simple example of working with the neo4j python driver?
How do I just pass cypher query to the driver to run and return a cursor?
If I'm reading for example this it seems the demo has a class wrapper, with a private member func I pass to the session.write,
session.write_transaction(self._create_and_return_greeting, ...
That then gets called it with a transaction as a first parameter...
def _create_and_return_greeting(tx, message):
that in turn runs the cypher
result = tx.run("CREATE (a:Greeting) "
This seems 10X more complicated than it needs to be.
I did just try a simpler:
def raw_query(query, **kwargs):
neodriver = neo_connect() # cached dbconn
with neodriver.session() as session:
try:
result = session.run(query, **kwargs)
return result.data()
But this results in a socket error on the query, probably because the session goes out of scope?
[dfcx/__init__] ERROR | Underlying socket connection gone (_ssl.c:2396)
[dfcx/__init__] ERROR | Failed to write data to connection IPv4Address(('neo4j-core-8afc8558-3.production-orch-0042.neo4j.io', 7687)) (IPv4Address(('34.82.120.138', 7687)))
Also I can't return a cursor/iterator, just the data()
When the session goes out of scope, the query result seems to die with it.
If I manually open and close a session, then I'd have the same problems?
Python must be the most popular language this DB is used with, does everyone use a different driver?
Py2neo seems cute, but completely lacking in ORM wrapper function for most of the cypher language features, so you have to drop down to raw cypher anyway. And I'm not sure it supports **kwargs argument interpolation in the same way.
I guess that big raise should help iron out some kinks :D
Slightly longer version trying to get a working DB wrapper:
def neo_connect() -> Union[neo4j.BoltDriver, neo4j.Neo4jDriver]:
global raw_driver
if raw_driver:
# print('reuse driver')
return raw_driver
neoconfig = NEOCONFIG
raw_driver = neo4j.GraphDatabase.driver(
neoconfig['url'], auth=(
neoconfig['user'], neoconfig['pass']))
if raw_driver is None:
raise BaseException("cannot connect to neo4j")
else:
return raw_driver
def raw_query(query, **kwargs):
# just get data, no cursor
neodriver = neo_connect()
session = neodriver.session()
# logging.info('neoquery %s', query)
# with neodriver.session() as session:
try:
result = session.run(query, **kwargs)
data = result.data()
return data
except neo4j.exceptions.CypherSyntaxError as err:
logging.error('neo error %s', err)
logging.error('failed query: %s', query)
raise err
# finally:
# logging.info('close session')
# session.close()
update: someone pointed me to this example which is another way to use the tx wrapper.
https://github.com/neo4j-graph-examples/northwind/blob/main/code/python/example.py#L16-L21
def raw_query(query, **kwargs):
neodriver = neo_connect() # cached dbconn
with neodriver.session() as session:
try:
result = session.run(query, **kwargs)
return result.data()
This is perfectly fine and works as intended on my end.
The error you're seeing is stating that there is a connection problem. So there must be something going on between the server and the driver that's outside of its influence.
Also, please note, that there is a difference between all of these ways to run a query:
with driver.session():
result = session.run("<SOME CYPHER>")
def work(tx):
result = tx.run("<SOME CYPHER>")
with driver.session():
session.write_transaction(work)
The latter one might be 3 lines longer and the team working on the drivers collected some feedback regarding this. However, there are more things to consider here. Firstly, changing the API surface is something that needs careful planning and cannot be done in say a patch release. Secondly, there are technical hurdles to overcome. Here are the semantics, anyway:
Auto-commit transaction. Runs only that query as one unit of work.
If you run a new auto-commit transaction within the same session, the previous result will buffer all available records for you (depending on the query, this will consume a lot of memory). This can be avoided by calling result.consume(). However, if the session goes out of scope, the result will be consumed automatically. This means you cannot extract further records from it. Lastly, any error will be raised and needs handling in the application code.
Managed transaction. Runs whatever unit of work you want within that function. A transaction is implicitly started and committed (unless you rollback explicitly) around the function.
If the transaction ends (end of function or rollback), the result will be consumed and become invalid. You'll have to extract all records you need before that.
This is the recommended way of using the driver because it will not raise all errors but handle some internally (where appropriate) and retry the work function (e.g. if the server is only temporarily unavailable). Since the function might be executed multiple time, you must make sure it's idempotent.
Closing thoughts:
Please remember that stackoverlfow is monitored on a best-effort basis and what can be perceived as hasty comments may get in the way of getting helpful answers to your questions

Is it okay to use only one cursor to execute all operations with database?

I have a program, which must work with database (and it can work in few threads). I wrote a special class, which starts from:
class DataBase:
def __init__(self):
self.connection = sqlite3.connect("some_db.db", check_same_thread=False)
self.cursor = self.connection.cursor()
Is it okay to use such class? Or I should go another way and use few cursors? My program works fine now and I didn't notice any problems, but I'm not sure can it lead to big problems in future
I have sqlite3 working in classes and I don't need it to be self-ed. Ive got it like...
connection = sqlite3.connect("some_db.db")
c = connection.cursor()
Yeah I think its fine, until you need multiple databases where all you need to do is name you connection and cursors something different like...
connection2 = sqlite3.connect("another_db.db")
c2 = connection2.cursor()
You can have them running at the same time too. Just remember to commit and close when you don't need them anymore.

Using functions to open/close sqlite db connections?

Being new to both Python and sqlite, I've been playing around with them both recently trying to figure things out. In particular with sqlite I've learned how to open/close/commit data to a db. But now I'm trying to clean things up a bit so that I can open/close the db via function calls. For instance, I'd like to do something like:
def open_db():
conn = sqlite3.connect("path)
c = conn.cursor()
def close_db():
c.close()
conn.close()
def create_db():
open_db()
c.execute("CREATE STUFF")
close_db()
Then when I run the program, before I query or write to the table, I could do something like:
open_db()
c.execute('SELECT * DO STUFF')
OR
c.execute('DELETE * DO OTHER STUFF')
conn.commit
close_db()
I've read about context managers but I'm not sure I understand entirely whats going on with them. What would be the easiest solution to cleaning up the way I open/close my DB connections so I'm not always having to type in the cursor command.
This is because the connection you define is local to the open db function. Change it as follows
def open_db():
conn = sqlite3.connect("path)
return conn.cursor()
and then
c = open_db()
c.execute('SELECT * DO STUFF')
It should be noted that writing function like this purely as a learning exercise might be ok, but generally it's not very useful to write a thin wrapper around a database connectivity api.
I don't know that there is an easy way. As already suggested, if you make the name of a database cursor or connection local to a function then these will be lost upon exit from that function. The answer might be to write code using the contextlib module (which is included with the Python distribution, and documented in the help file); I wouldn't call that easy. The documentation for sqlite3 does mention that connection objects can be used as context managers; I suspect you've already noticed that. I also see that there's some sort of context manager for MySQL but I haven't used it.

Retry on deadlock for MySQL / SQLAlchemy

I have searched for quite some time now and can't found a solution to my problem. We are using SQLAlchemy in conjunction with MySQL for our project and we encounter several time the dreaded error:
1213, 'Deadlock found when trying to get lock; try restarting transaction'.
We would like to try to restart the transaction at most three times in this case.
I have started to write a decorator that does this but i don't know how to save the session state before the fail and retry the same transaction after it ? (As SQLAlchemy requires a rollback whenever an exception is raised)
My work so far,
def retry_on_deadlock_decorator(func):
lock_messages_error = ['Deadlock found', 'Lock wait timeout exceeded']
#wraps(func)
def wrapper(*args, **kwargs):
attempt_count = 0
while attempt_count < settings.MAXIMUM_RETRY_ON_DEADLOCK:
try:
return func(*args, **kwargs)
except OperationalError as e:
if any(msg in e.message for msg in lock_messages_error) \
and attempt_count <= settings.MAXIMUM_RETRY_ON_DEADLOCK:
logger.error('Deadlock detected. Trying sql transaction once more. Attempts count: %s'
% (attempt_count + 1))
else:
raise
attempt_count += 1
return wrapper
You can't really do that with the Session from the outside. Session would have to support this internally. It would involve saving a lot of private state, so this may not be worth your time.
I completely ditched most ORM stuff in favour of the lower level SQLAlchemy Core interface. Using that (or even any dbapi interface) you can trivially use your retry_on_deadlock_decorator decorator (see question above) to make a retry-aware db.execute wrapper.
#retry_on_deadlock_decorator
def deadlock_safe_execute(db, stmt, *args, **kw):
return db.execute(stmt, *args, **kw)
And instead of
db.execute("UPDATE users SET active=0")
you do
deadlock_safe_execute(db, "UPDATE users SET active=0")
which will retry automatically if a deadlock happens.
Did you use code like this?
try:
Perform table transaction
break
except:
rollback
delay
try again to perform table transaction
The only way to truly handle deadlocks is to write your code to expect
them. This generally isn't very difficult if your database code is
well written. Often you can just put a try/catch around the query
execution logic and look for a deadlock when errors occur. If you
catch one, the normal thing to do is just attempt to execute the
failed query again.
Usefull links:
How to Cope with Deadlocks
Understanding Autocommit
Property MySQLConnection.autocommit

How to connect to Cassandra inside a Pylons app?

I created a new Pylons project, and would like to use Cassandra as my database server. I plan on using Pycassa to be able to use cassandra 0.7beta.
Unfortunately, I don't know where to instantiate the connection to make it available in my application.
The goal would be to :
Create a pool when the application is launched
Get a connection from the pool for each request, and make it available to my controllers and libraries (in the context of the request). The best would be to get a connexion from the pool "lazily", i.e. only if needed
If a connexion has been used, release it when the request has been processed
Additionally, is there something important I should know about it ? When I see some comments like "Be careful when using a QueuePool with use_threadlocal=True, especially with retries enabled. Synchronization may be required to prevent the connection from changing while another thread is using it.", what does it mean exactly ?
Thanks.
--
Pierre
Well. I worked a little more. In fact, using a connection manager was probably not a good idea as this should be the template context. Additionally, opening a connection for each thread is not really a big deal. Opening a connection per request would be.
I ended up with just pycassa.connect_thread_local() in app_globals, and there I go.
Okay.
I worked a little, I learned a lot, and I found a possible answer.
Creating the pool
The best place to create the pool seems to be in the app_globals.py file, which is basically a container for objects which will be accessible "throughout the life of the application". Exactly what I want for a pool, in fact.
I just added at the end of the file my init code, which takes settings from the pylons configuration file :
"""Creating an instance of the Pycassa Pool"""
kwargs = {}
# Parsing servers
if 'cassandra.servers' in config['app_conf']:
servers = config['app_conf']['cassandra.servers'].split(',')
if len(servers):
kwargs['server_list'] = servers
# Parsing timeout
if 'cassandra.timeout' in config['app_conf']:
try:
kwargs['timeout'] = float(config['app_conf']['cassandra.timeout'])
except:
pass
# Finally creating the pool
self.cass_pool = pycassa.QueuePool(keyspace='Keyspace1', **kwargs)
I could have done better, like moving that in a function, or supporting more parameters (pool size, ...). Which I'll do.
Getting a connection at each request
Well. There seems to be the simple way : in the file base.py, adding something like c.conn = g.cass_pool.get() before calling WSGIController, something like c.conn.return_to_pool() after. This is simple, and works. But this gets a connection from the pool even when it's not required by the controller. I have to dig a little deeper.
Creating a connection manager
I had the simple idea to create a class which would be instantiated at each request in the base.py file, and which would automatically grab a connection from the pool when requested (and release it after). This is a really simple class :
class LocalManager:
'''Requests a connection from a Pycassa Pool when needed, and releases it at the end of the object's life'''
def __init__(self, pool):
'''Class constructor'''
assert isinstance(pool, Pool)
self._pool = pool
self._conn = None
def get(self):
'''Grabs a connection from the pool if not already done, and returns it'''
if self._conn is None:
self._conn = self._pool.get()
return self._conn
def __getattr__(self, key):
'''It's cooler to write "c.conn" than "c.get()" in the code, isn't it?'''
if key == 'conn':
return self.get()
else:
return self.__dict__[key]
def __del__(self):
'''Releases the connection, if needed'''
if not self._conn is None:
self._conn.return_to_pool()
Just added c.cass = CassandraLocalManager(g.cass_pool) before calling WSGIController in base.py, del(c.cass) after, and I'm all done.
And it works :
conn = c.cass.conn
cf = pycassa.ColumnFamily(conn, 'TestCF')
print cf.get('foo')
\o/
I don't know if this is the best way to do this. If not, please let me know =)
Plus, I still did not understand the "synchronization" part in Pycassa source code. If it is needed in my case, and what should I do to avoid problems.
Thanks.

Categories