I use a Python script that runs various SQL auto committed queries over a AWS Redshift database using psycopg2 library. The script is manually executed from my local workstation. The flow is the following:
Create a database connection with psycopg2.connect()
Execute auto-committed queries over database with execute()
Close connection.
For various reasons, the database can be unavailable (network issue, many queries already running...) and it is better to stop the Python script. At this point, I then kill the already committed (and unfinished) queries through a SQL client (SQL workbench) by retrieving the pid associated to those queries. I would like to automate the last step directly in the Python script when user stops it (ctrl+c). The flow would be:
Create a database connection with psycopg2.connect()
Execute auto-committed queries over database with execute()
Store the current PID associated to the query using info.backend_pid Connection attribute
If InterruptionKey exception is received, kill the running query using the previously stored PID
Close connection.
I did some test on a Notebook to check if I could retrieve the back_pid information:
log = logging.getLogger(__name__)
session = psycopg2.connect(
connection_factory=LoggingConnection,
host=host,
port=port,
dbname=database,
user=user,
password=password,
sslmode="require",
)
session.initialize(log)
session.set_session(autocommit=True)
query = """
CREATE OR REPLACE FUNCTION janky_sleep (x float) RETURNS bool IMMUTABLE as $$
from time import sleep
sleep(x)
return True
$$ LANGUAGE plpythonu;
"""
cur = session.cursor()
cur.execute(query)
cur.execute("select janky_sleep(60.0)")
I used a sleep function to replicate the behaviour of a query that would take 60s to finish.
When getting the backend_pid as following:
session.info.backend_pid
Issue is that session object is already in use by the execute() method (running the query) and the backend_pid information is only resulting when session is free, i.e when the query has finished.
I thought of spinning a concurrent Python process that would monitor the parent one. Once the parent process is stopped, the child would get the backend_pid through a second database connection and then run the kill query. However this approach seems overkill.
What would be the correct way to handle this situation?
Thanks
I finally used the resource found from the documentation:
http://initd.org/psycopg/docs/faq.html#faq-interrupt-query. It enables Psycopg2 to get SIGINT signal and killed subsequent queries.
>>> psycopg2.extensions.set_wait_callback(psycopg2.extras.wait_select)
>>> cnn = psycopg2.connect('')
>>> cur = cnn.cursor()
>>> cur.execute("select pg_sleep(10)")
^C
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
QueryCanceledError: canceling statement due to user request
>>> cnn.rollback()
>>> # You can use the connection and cursor again from here
Related
I'm writing a Python script to move data from production db to dev db. I'm using vertica-python (something very similar to pyodbc) for db connection and airflow for scheduling.
The script is divided into two files, one for DAG and one for the actual migration job. I use try-except-finally block for all SQL execution functions in the migration job:
try:
# autocommit set to False
# Execute a SQL script
except DatabaseError:
# Logging information
# Rollback
finally:
# autocommit set to False
You can see that setting autocommit and Rollback needs to access the connection, and executing a SQL script needs to access the cursor. The current solution is to simply create two DB connections in DAG and pass them to the migration script. But I also read from a Stackoverflow post that I should pass only the cursor:
Python, sharing mysql connection in multiple functions - pass connection or cursor?
My question is: Is it possible to only pass the cursor from the DAG to the migration script, and still retain the ability to rollback and setting autocommit?
Yes, you can change the autocommit setting via the Cursor:
>>> import pyodbc
>>> cnxn = pyodbc.connect("DSN=mssqlLocal")
>>> cnxn.autocommit
False
>>> crsr = cnxn.cursor()
>>> crsr.connection.autocommit = True
>>> cnxn.autocommit
True
>>>
pyodbc also provides commit() and rollback() methods on the Cursor object, but be aware that they affect all cursors created by the same connection, i.e., crsr.rollback() is exactly the same as calling cnxn.rollback().
Hello everyone I have the following issue,
I am trying to run a simple UPDATE query using sqlalchemy and psycopg2 for Postgres.
the query is
update = f"""
UPDATE {requests_table_name}
SET status = '{status}', {column_db_table} = '{dt.datetime.now()}'
WHERE request_id = '{request_id}'
"""
and then commit the changes using
cursor.execute(update).commit()
But it throws an error that AttributeError: 'NoneType' object has no attribute 'commit'
My connection string is
engine = create_engine(
f'postgresql://{self.params["user"]}:{self.params["password"]}#{self.params["host"]}:{self.params["port"]}/{self.params["database"]}')
conn = engine.connect().connection
cursor = conn.cursor()
The other thing is that cursor is always closed <cursor object at 0x00000253827D6D60; closed: 0>
The connection with the database is ok, I can etch tables and update them using pandas pd_to_sql method, but with commmiting using cursor it does not work. It works perfect with sql server but not with postgres.
In postgres, however, it creates a PID with the status "idle in transaction" and Client: ClientRead, every time I run cursor.execute(update).commit().
I connot get where is the problem, in the code or in the database.
I tried to use different methods to initiate a cursor, like raw_connection(), but without a result.
I checked for Client: ClientRead with idle in transaction but am not sure how to overcome it.
You have to call commit() on the connection object.
According to the documentation, execute() returns None.
Note that even if you use a context manager like this:
with my_connection.cursor() as cur:
cur.execute('INSERT INTO ..')
You may find your database processes still getting stuck in the idle in transaction state. The COMMIT is handled at the connection level, like #laurenz-albe said, so you need to wrap that too:
with my_connection as conn:
with conn.cursor() as cur:
cur.execute('INSERT INTO ..')
It's spelled out clearly in the documentation, but I still managed to overlook it.
I am trying to write a python script to do a backup of a SQL Server database and then restore it into a new database.
The SQL script itself seems to work fine when run in SQL Server:
BACKUP DATABASE TEST_DB
TO DISK = 'D:/test/test_db.BAK';
However, when I try to run it from a python script it fails:
con = pyodbc.connect('UID=xx; PWD=xxxxxx, driver='{SQL Server}', server=r'xxxxxx', database='TEST_DB')
sql_cursor = con.cursor()
query = ("""BACKUP DATABASE TEST_DB
TO DISK = 'D:/test/test_db.BAK';""")
con.autocommit = True
sql_cursor.execute(query1)
con.commit()
First of all, if I don't add the line "con.autocommit = True", it will fail with the message:
Cannot perform a backup or restore operation within a transaction. (3021)
No idea what a transaction is. I read in another post that the line "con.autocommit = True" removes the error, and indeed it does. I have no clue why though.
Finally, when I run the python script with con.autocommit set to True, no errors are thrown, the BAK file can be seen temporarily in the expected location ('D:/test/test_db.BAK'), but when the script finishes running, the BAK file disappears (????). Does anyone know why this is happening?
The solution, as described in this GitHub issue, is to call .nextset() repeatedly after executing the BACKUP statement ...
crsr.execute(backup_statement)
while (crsr.nextset()):
pass
... to "consume" the progress messages issued by the BACKUP. If those messages are not consumed before the connection is closed then SQL Server figures that something went wrong and cancels the backup.
SSMS can apparently capture those messages directly, but an ODBC connection must issue calls to the ODBC SQLMoreResults function to retrieve each message, and that's what happens when we call the pyodbc .nextset() method.
This is a sample code I'd like to run:
for i in range(1,2000):
db = create_engine('mysql://root#localhost/test_database')
conn = db.connect()
#some simple data operations
conn.close()
db.dispose()
Is there a way of running this without getting "Too many connections" errors from MySQL?
I already know I can handle the connection otherwise or have a connection pool. I'd just like to understand how to properly close a connection from sqlalchemy.
Here's how to write that code correctly:
db = create_engine('mysql://root#localhost/test_database')
for i in range(1,2000):
conn = db.connect()
#some simple data operations
conn.close()
db.dispose()
That is, the Engine is a factory for connections as well as a pool of connections, not the connection itself. When you say conn.close(), the connection is returned to the connection pool within the Engine, not actually closed.
If you do want the connection to be actually closed, that is, not pooled, disable pooling via NullPool:
from sqlalchemy.pool import NullPool
db = create_engine('mysql://root#localhost/test_database', poolclass=NullPool)
With the above Engine configuration, each call to conn.close() will close the underlying DBAPI connection.
If OTOH you actually want to connect to different databases on each call, that is, your hardcoded "localhost/test_database" is just an example and you actually have lots of different databases, then the approach using dispose() is fine; it will close out every connection that is not checked out from the pool.
In all of the above cases, the important thing is that the Connection object is closed via close(). If you're using any kind of "connectionless" execution, that is engine.execute() or statement.execute(), the ResultProxy object returned from that execute call should be fully read, or otherwise explicitly closed via close(). A Connection or ResultProxy that's still open will prohibit the NullPool or dispose() approaches from closing every last connection.
Tried to figure out a solution to disconnect from database for an unrelated problem (must disconnect before forking).
You need to invalidate the connection from the connection Pool too.
In your example:
for i in range(1,2000):
db = create_engine('mysql://root#localhost/test_database')
conn = db.connect()
# some simple data operations
# session.close() if needed
conn.invalidate()
db.dispose()
I use this one
engine = create_engine('...')
with engine.connect() as conn:
conn.execute(text(f"CREATE SCHEMA IF NOT EXISTS...")
engine.dispose()
In my case these always works and I am able to close!
So using invalidate() before close() makes the trick. Otherwise close() sucks.
conn = engine.raw_connection()
conn.get_warnings = True
curSql = xx_tmpsql
myresults = cur.execute(curSql, multi=True)
print("Warnings: #####")
print(cur.fetchwarnings())
for curresult in myresults:
print(curresult)
if curresult.with_rows:
print(curresult.column_names)
print(curresult.fetchall())
else:
print("no rows returned")
cur.close()
conn.invalidate()
conn.close()
engine.dispose()
I am writing a script to shutdown Oracle database
I have the below script
import cx_Oracle
# need to connect as SYSDBA or SYSOPER
connection = cx_Oracle.connect("/", mode = cx_Oracle.SYSDBA)
# first shutdown() call must specify the mode, if DBSHUTDOWN_ABORT is used,
# there is no need for any of the other steps
connection.shutdown(mode = cx_Oracle.DBSHUTDOWN_IMMEDIATE)
# now close and dismount the database
cursor = connection.cursor()
cursor.execute("alter database close immediate")
cursor.execute("alter database dismount")
# perform the final shutdown call
connection.shutdown(mode = cx_Oracle.DBSHUTDOWN_FINAL)
In this script there is a chance that "cursor.execute("alter database close immediate")" may run for a long time in unforeseen circumstances. How can make the script wait on this for 5 mins and if does not complete take an alternative action like stop this command or execute an alternate command
thanks,
Tanveer
You can configure the Oracle Net layer used by cx_Oracle by creating a sqlnet.ora configuration file with various timeout parameters such as SQLNET.INBOUND_CONNECT_TIMEOUT, SQLNET.RECV_TIMEOUT and SQLNET.SEND_TIMEOUT etc.
You can read the documentation here and there's more details in this answer.