Should I pass Database connection or Cursor to a class - python

I'm writing a Python script to move data from production db to dev db. I'm using vertica-python (something very similar to pyodbc) for db connection and airflow for scheduling.
The script is divided into two files, one for DAG and one for the actual migration job. I use try-except-finally block for all SQL execution functions in the migration job:
try:
# autocommit set to False
# Execute a SQL script
except DatabaseError:
# Logging information
# Rollback
finally:
# autocommit set to False
You can see that setting autocommit and Rollback needs to access the connection, and executing a SQL script needs to access the cursor. The current solution is to simply create two DB connections in DAG and pass them to the migration script. But I also read from a Stackoverflow post that I should pass only the cursor:
Python, sharing mysql connection in multiple functions - pass connection or cursor?
My question is: Is it possible to only pass the cursor from the DAG to the migration script, and still retain the ability to rollback and setting autocommit?

Yes, you can change the autocommit setting via the Cursor:
>>> import pyodbc
>>> cnxn = pyodbc.connect("DSN=mssqlLocal")
>>> cnxn.autocommit
False
>>> crsr = cnxn.cursor()
>>> crsr.connection.autocommit = True
>>> cnxn.autocommit
True
>>>
pyodbc also provides commit() and rollback() methods on the Cursor object, but be aware that they affect all cursors created by the same connection, i.e., crsr.rollback() is exactly the same as calling cnxn.rollback().

Related

How to handle idle in transaction in python for Postgres?

Hello everyone I have the following issue,
I am trying to run a simple UPDATE query using sqlalchemy and psycopg2 for Postgres.
the query is
update = f"""
UPDATE {requests_table_name}
SET status = '{status}', {column_db_table} = '{dt.datetime.now()}'
WHERE request_id = '{request_id}'
"""
and then commit the changes using
cursor.execute(update).commit()
But it throws an error that AttributeError: 'NoneType' object has no attribute 'commit'
My connection string is
engine = create_engine(
f'postgresql://{self.params["user"]}:{self.params["password"]}#{self.params["host"]}:{self.params["port"]}/{self.params["database"]}')
conn = engine.connect().connection
cursor = conn.cursor()
The other thing is that cursor is always closed <cursor object at 0x00000253827D6D60; closed: 0>
The connection with the database is ok, I can etch tables and update them using pandas pd_to_sql method, but with commmiting using cursor it does not work. It works perfect with sql server but not with postgres.
In postgres, however, it creates a PID with the status "idle in transaction" and Client: ClientRead, every time I run cursor.execute(update).commit().
I connot get where is the problem, in the code or in the database.
I tried to use different methods to initiate a cursor, like raw_connection(), but without a result.
I checked for Client: ClientRead with idle in transaction but am not sure how to overcome it.
You have to call commit() on the connection object.
According to the documentation, execute() returns None.
Note that even if you use a context manager like this:
with my_connection.cursor() as cur:
cur.execute('INSERT INTO ..')
You may find your database processes still getting stuck in the idle in transaction state. The COMMIT is handled at the connection level, like #laurenz-albe said, so you need to wrap that too:
with my_connection as conn:
with conn.cursor() as cur:
cur.execute('INSERT INTO ..')
It's spelled out clearly in the documentation, but I still managed to overlook it.

How to create a postgres database using SQLAlchemy/Python and avoid session in use error [duplicate]

This is a sample code I'd like to run:
for i in range(1,2000):
db = create_engine('mysql://root#localhost/test_database')
conn = db.connect()
#some simple data operations
conn.close()
db.dispose()
Is there a way of running this without getting "Too many connections" errors from MySQL?
I already know I can handle the connection otherwise or have a connection pool. I'd just like to understand how to properly close a connection from sqlalchemy.
Here's how to write that code correctly:
db = create_engine('mysql://root#localhost/test_database')
for i in range(1,2000):
conn = db.connect()
#some simple data operations
conn.close()
db.dispose()
That is, the Engine is a factory for connections as well as a pool of connections, not the connection itself. When you say conn.close(), the connection is returned to the connection pool within the Engine, not actually closed.
If you do want the connection to be actually closed, that is, not pooled, disable pooling via NullPool:
from sqlalchemy.pool import NullPool
db = create_engine('mysql://root#localhost/test_database', poolclass=NullPool)
With the above Engine configuration, each call to conn.close() will close the underlying DBAPI connection.
If OTOH you actually want to connect to different databases on each call, that is, your hardcoded "localhost/test_database" is just an example and you actually have lots of different databases, then the approach using dispose() is fine; it will close out every connection that is not checked out from the pool.
In all of the above cases, the important thing is that the Connection object is closed via close(). If you're using any kind of "connectionless" execution, that is engine.execute() or statement.execute(), the ResultProxy object returned from that execute call should be fully read, or otherwise explicitly closed via close(). A Connection or ResultProxy that's still open will prohibit the NullPool or dispose() approaches from closing every last connection.
Tried to figure out a solution to disconnect from database for an unrelated problem (must disconnect before forking).
You need to invalidate the connection from the connection Pool too.
In your example:
for i in range(1,2000):
db = create_engine('mysql://root#localhost/test_database')
conn = db.connect()
# some simple data operations
# session.close() if needed
conn.invalidate()
db.dispose()
I use this one
engine = create_engine('...')
with engine.connect() as conn:
conn.execute(text(f"CREATE SCHEMA IF NOT EXISTS...")
engine.dispose()
In my case these always works and I am able to close!
So using invalidate() before close() makes the trick. Otherwise close() sucks.
conn = engine.raw_connection()
conn.get_warnings = True
curSql = xx_tmpsql
myresults = cur.execute(curSql, multi=True)
print("Warnings: #####")
print(cur.fetchwarnings())
for curresult in myresults:
print(curresult)
if curresult.with_rows:
print(curresult.column_names)
print(curresult.fetchall())
else:
print("no rows returned")
cur.close()
conn.invalidate()
conn.close()
engine.dispose()

Connect to SQLite Memory Database from seperate file (Python)

the application I am building will require a single sqlite memory database that separate routines and threads will need to access. I am having difficulties achieving this.
I understand that:
file:my_db?mode=memory&cache=shared', uri=True
should create a memory database that can be modified and accessed by separate connections.
Here is my test that return an error:
"sqlite3.OperationalError: no such table: my_table"
Code below saved as "test_create.py":
import sqlite3
def create_a_table():
db = sqlite3.connect('file:my_db?mode=memory&cache=shared', uri=True)
cursor = db.cursor()
cursor.execute('''
CREATE TABLE my_table(id INTEGER PRIMARY KEY, some_data TEXT)
''')
db.commit()
db.close()
The above code is imported into the code below in a separate file:
import sqlite3
import test_create
test_create.create_a_table()
db = sqlite3.connect('file:my_db')
cursor = db.cursor()
# add a row of data
cursor.execute('''INSERT INTO my_table(some_data) VALUES(?)''', ("a bit of data",))
db.commit()
The above code works fine is written in a single file. Can anyone advise how I can keep the code in separate files which will hopefully allow me to make multiple separate connections?
Note: I don't to save the database. Thanks.
Edit: If you want use threading ensure you enable the following option.
check_same_thread=False
e.g.
db = sqlite3.connect('file:my_db?mode=memory&cache=shared', check_same_thread=False, uri=True)
You opened a named, in-memory database connection with shared cache. Yes, you can share the cache on that database, but only if you use the exact same name. This means you need to use the full URI scheme!
If you connect with db = sqlite3.connect('file:my_db?mode=memory&cache=shared', uri=True), any additional connection within the process can see the same table, provided the original connection is still open, and you don't mind that the table is 'private', in-memory only and not available to other processes or connections that use a different name. When the last connection to the database closes, the table is gone.
So you also need to keep the connection open in the other module for this to work!
For example, if you change the module to use a global connection object:
db = None
def create_a_table():
global db
if db is None:
db = sqlite3.connect('file:my_db?mode=memory&cache=shared', uri=True)
with db:
cursor = db.cursor()
cursor.execute('''
CREATE TABLE my_table(id INTEGER PRIMARY KEY, some_data TEXT)
''')
and then use that module, the table is there:
>>> import test_create
>>> test_create.create_a_table()
>>> import sqlite3
>>> db = sqlite3.connect('file:my_db?mode=memory&cache=shared', uri=True)
>>> with db:
... cursor = db.cursor()
... cursor.execute('''INSERT INTO my_table(some_data) VALUES(?)''', ("a bit of data",))
...
<sqlite3.Cursor object at 0x100d36650>
>>> list(db.cursor().execute('select * from my_table'))
[(1, 'a bit of data')]
Another way to achieve this is to open a database connection in the main code before calling the function; that creates a first connection to the in-memory database, and opening and closing additional connections won't cause the changes to be lost.
From the documentation:
When an in-memory database is named in this way, it will only share its cache with another connection that uses exactly the same name.
If you didn't mean for the database to be just in-memory, and you wanted the table to be committed to disk (to be there next time you open the connection), drop the mode=memory component.

In a database program what do these lines of code mean and do?

In a database program what does these lines of code mean and do?
conn=sqlite3.connect(filename)
c=conn.cursor()
conn.commit()
You could think of conn = sqlite3.connect(filename) as creating a connection, or a reference, to that database specified in the filename. So anytime you carry out an action with conn, it will be an action performed on the database specified by filename.
c = conn.cursor() is a cursor object, which allows you to carry out SQL queries on the database. It is created using a call on the conn variable created earlier, and so is a cursor object for that specific database. This is most commonly useful for its .execute() method, which is used to execute SQL commands on the database.
conn.commit() 'commits' the changes to the database; that is, when this command is called, any changes that had been made by the cursor will be saved to the database.

Accessing a sqlite database with multiple connections

I want to access the same SQLite database from multiple instances.
I tried that from two Python shells but didn't get really consistent results in showing me new entries on the other connection. Does this actually work or was is simply a fluke (or a misunderstanding on my side)?
I was using the following code snippets:
>>> import sqlite3
>>> conn = sqlite3.connect("test.db")
>>> conn.cursor().execute("SELECT * from foo").fetchall()
>>> conn.execute("INSERT INTO foo VALUES (1, 2)")
Of course I wasn't always adding new entries.
It's not a fluke, just a misunderstanding of how the connections are being handled. From the docs:
When a database is accessed by multiple connections, and one of the
processes modifies the database, the SQLite database is locked until
that transaction is committed. The timeout parameter specifies how
long the connection should wait for the lock to go away until raising
an exception. The default for the timeout parameter is 5.0 (five
seconds).
In order to see the changes on other connection you will have to commit() the changes from your execute() command. Again, from the docs:
If you don’t call this method, anything you did since the last call to
commit() is not visible from other database connections. If you wonder
why you don’t see the data you’ve written to the database, please
check you didn’t forget to call this method.
You should also include commit after any DML statements. if the autocommit property of your connection string is set to false
>>> import sqlite3
>>> conn = sqlite3.connect("test.db")
>>> conn.cursor().execute("SELECT * from foo").fetchall()
>>> conn.execute("INSERT INTO foo VALUES (1, 2)")
>>> conn.commit()

Categories