How to reach postgres maximum connection with sqlalchemy? - python

This is related to sqlalchemy and pg8000.
I have read everywhere that I should close the ResultProxy object so that connection could be returned to the pool.
The local test database allows a maximum of 100 connections:
$ psql -h 127.0.0.1 -U postgres
Password for user postgres:
psql (9.5.5, server 9.6.0)
WARNING: psql major version 9.5, server major version 9.6.
Some psql features might not work.
Type "help" for help.
postgres=# show max_connections;
max_connections
-----------------
100
(1 row)
The following test script creates an engine in every loop and does not read nor close the ResultProxy object. It really is as bad as it can get.
The weird thing is, it also does not generate a too many connections kind of error. This is really confusing to me. Does sqlalchemy performs some magic? Or maybe postgres is actually magic?
#!/usr/bin/env python2.7
from __future__ import print_function
import sqlalchemy
def handle():
url = 'postgresql+pg8000://{}:{}#{}:{}/{}'
url = url.format("postgres", "pass", "127.0.0.1", "5432", "usercity")
conn = sqlalchemy.create_engine(url, client_encoding='utf8')
meta = sqlalchemy.MetaData(bind=conn, reflect=True)
table = meta.tables['events']
clause = table.select()
result = conn.execute(clause)
if __name__=='__main__':
for i in range(2000):
print(i)
handle()

No magic, just garbage collection. Since handle() doesn't return anything (or modify global data), there is no way for a reference to the connection or cursor it creates to live beyond the scope of handle(). When they go out of scope, their reference counts drop to 0, and they get deleted (there is no hard guarantee about when this happens, but in practice, in CPython this happens immediately).

Related

How to properly kill MySQL/ MariaDB connections in Django using custom connectors

I am currently working on a project and I use the MariaDB connector to run the queries.
I can't use ORM so I have to use raw queries.
In general, the system works fine, as expected, but when I make a bit 'big' queries I get a Too many connections error message.
This has happened to me for both MySQL and MariaDB connectors, but I mainly use MariaDB.
Example of my code (truncated / simplified):
import mariadb
def get_cursor():
conn = mariadb.connect(
user="user",
password="pass",
host="localhost",
database="db")
return conn, conn.cursor(named_tuple=True)
def get_duplicated_variants():
results = []
conn_cursor = get_cursor()
cursor = conn_cursor[1]
conn = conn_cursor[0]
try:
cursor.execute("SELECT * FROM `db`.s_data;")
columns = [column[0] for column in cursor.description]
results = []
for row in cursor.fetchall():
results.append(dict(zip(columns, row)))
cursor.close()
conn.close()
return results
except mariadb.Error as e:
print(f"Error: {e}")
What I've tried:
show status like '%onn%';
And also: show variables like 'max_connections';
So the max_used_connections = 152 and I have 2503 Connections.
I also tried to execute the following query:
SELECT
CONCAT('KILL ', id, ';')
FROM INFORMATION_SCHEMA.PROCESSLIST
WHERE `User` = 'user'
AND `Host` = 'localhost'
AND `db` = 'db';
As seen in this question.
But the number of connections is the same after running the query, it does not work.
How could I close the connections properly?
I don't understand why the connections are still active since I use both cursor.close() to close the cursor and conn.close() to close the connection, but the connection is still active apparently.
I know I can increase max_connections with something like: set global max_connections = 500; but I would like to close the connections from the backend after the queries are done.
Any idea?
The API located here for connection close() certainly is clear that the connection is closed.
I realize you said you truncated the code, but seeing your comment on 2,503 connections in a single program certainly makes it seem like you aren't sharing that connection and are creating new connections for each query. I would suggest you inspect the code that you did not include to ensure you are properly storing and reusing that connection which will be expensive to keep recreating.
Finally, I would instead be looking at the state tables with something like netstat to see which connections are really going and where they are coming/going from. It isn't entirely clear to me that you are excluding connections which may be from other entities to/from the DB or that the connection isn't actually getting destroyed. In short, I am somewhat unsure if you are chasing a red herring here. I still think the >2000 connections is something which seems unexpected and you should be chasing that down first as to why so many connections are getting created in the first place, based on the code you provided.

How to handle idle in transaction in python for Postgres?

Hello everyone I have the following issue,
I am trying to run a simple UPDATE query using sqlalchemy and psycopg2 for Postgres.
the query is
update = f"""
UPDATE {requests_table_name}
SET status = '{status}', {column_db_table} = '{dt.datetime.now()}'
WHERE request_id = '{request_id}'
"""
and then commit the changes using
cursor.execute(update).commit()
But it throws an error that AttributeError: 'NoneType' object has no attribute 'commit'
My connection string is
engine = create_engine(
f'postgresql://{self.params["user"]}:{self.params["password"]}#{self.params["host"]}:{self.params["port"]}/{self.params["database"]}')
conn = engine.connect().connection
cursor = conn.cursor()
The other thing is that cursor is always closed <cursor object at 0x00000253827D6D60; closed: 0>
The connection with the database is ok, I can etch tables and update them using pandas pd_to_sql method, but with commmiting using cursor it does not work. It works perfect with sql server but not with postgres.
In postgres, however, it creates a PID with the status "idle in transaction" and Client: ClientRead, every time I run cursor.execute(update).commit().
I connot get where is the problem, in the code or in the database.
I tried to use different methods to initiate a cursor, like raw_connection(), but without a result.
I checked for Client: ClientRead with idle in transaction but am not sure how to overcome it.
You have to call commit() on the connection object.
According to the documentation, execute() returns None.
Note that even if you use a context manager like this:
with my_connection.cursor() as cur:
cur.execute('INSERT INTO ..')
You may find your database processes still getting stuck in the idle in transaction state. The COMMIT is handled at the connection level, like #laurenz-albe said, so you need to wrap that too:
with my_connection as conn:
with conn.cursor() as cur:
cur.execute('INSERT INTO ..')
It's spelled out clearly in the documentation, but I still managed to overlook it.

Kill unfinished SQL commited query with psycopg2 when InterruptionKey

I use a Python script that runs various SQL auto committed queries over a AWS Redshift database using psycopg2 library. The script is manually executed from my local workstation. The flow is the following:
Create a database connection with psycopg2.connect()
Execute auto-committed queries over database with execute()
Close connection.
For various reasons, the database can be unavailable (network issue, many queries already running...) and it is better to stop the Python script. At this point, I then kill the already committed (and unfinished) queries through a SQL client (SQL workbench) by retrieving the pid associated to those queries. I would like to automate the last step directly in the Python script when user stops it (ctrl+c). The flow would be:
Create a database connection with psycopg2.connect()
Execute auto-committed queries over database with execute()
Store the current PID associated to the query using info.backend_pid Connection attribute
If InterruptionKey exception is received, kill the running query using the previously stored PID
Close connection.
I did some test on a Notebook to check if I could retrieve the back_pid information:
log = logging.getLogger(__name__)
session = psycopg2.connect(
connection_factory=LoggingConnection,
host=host,
port=port,
dbname=database,
user=user,
password=password,
sslmode="require",
)
session.initialize(log)
session.set_session(autocommit=True)
query = """
CREATE OR REPLACE FUNCTION janky_sleep (x float) RETURNS bool IMMUTABLE as $$
from time import sleep
sleep(x)
return True
$$ LANGUAGE plpythonu;
"""
cur = session.cursor()
cur.execute(query)
cur.execute("select janky_sleep(60.0)")
I used a sleep function to replicate the behaviour of a query that would take 60s to finish.
When getting the backend_pid as following:
session.info.backend_pid
Issue is that session object is already in use by the execute() method (running the query) and the backend_pid information is only resulting when session is free, i.e when the query has finished.
I thought of spinning a concurrent Python process that would monitor the parent one. Once the parent process is stopped, the child would get the backend_pid through a second database connection and then run the kill query. However this approach seems overkill.
What would be the correct way to handle this situation?
Thanks
I finally used the resource found from the documentation:
http://initd.org/psycopg/docs/faq.html#faq-interrupt-query. It enables Psycopg2 to get SIGINT signal and killed subsequent queries.
>>> psycopg2.extensions.set_wait_callback(psycopg2.extras.wait_select)
>>> cnn = psycopg2.connect('')
>>> cur = cnn.cursor()
>>> cur.execute("select pg_sleep(10)")
^C
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
QueryCanceledError: canceling statement due to user request
>>> cnn.rollback()
>>> # You can use the connection and cursor again from here

Python SQLite - How to manually BEGIN and END transactions?

Context
So I am trying to figure out how to properly override the auto-transaction when using SQLite in Python. When I try and run
cursor.execute("BEGIN;")
.....an assortment of insert statements...
cursor.execute("END;")
I get the following error:
OperationalError: cannot commit - no transaction is active
Which I understand is because SQLite in Python automatically opens a transaction on each modifying statement, which in this case is an INSERT.
Question:
I am trying to speed my insertion by doing one transaction per several thousand records.
How can I overcome the automatic opening of transactions?
As #CL. said you have to set isolation level to None. Code example:
s = sqlite3.connect("./data.db")
s.isolation_level = None
try:
c = s.cursor()
c.execute("begin")
...
c.execute("commit")
except:
c.execute("rollback")
The documentaton says:
You can control which kind of BEGIN statements sqlite3 implicitly executes (or none at all) via the isolation_level parameter to the connect() call, or via the isolation_level property of connections.
If you want autocommit mode, then set isolation_level to None.

Accessing a sqlite database with multiple connections

I want to access the same SQLite database from multiple instances.
I tried that from two Python shells but didn't get really consistent results in showing me new entries on the other connection. Does this actually work or was is simply a fluke (or a misunderstanding on my side)?
I was using the following code snippets:
>>> import sqlite3
>>> conn = sqlite3.connect("test.db")
>>> conn.cursor().execute("SELECT * from foo").fetchall()
>>> conn.execute("INSERT INTO foo VALUES (1, 2)")
Of course I wasn't always adding new entries.
It's not a fluke, just a misunderstanding of how the connections are being handled. From the docs:
When a database is accessed by multiple connections, and one of the
processes modifies the database, the SQLite database is locked until
that transaction is committed. The timeout parameter specifies how
long the connection should wait for the lock to go away until raising
an exception. The default for the timeout parameter is 5.0 (five
seconds).
In order to see the changes on other connection you will have to commit() the changes from your execute() command. Again, from the docs:
If you don’t call this method, anything you did since the last call to
commit() is not visible from other database connections. If you wonder
why you don’t see the data you’ve written to the database, please
check you didn’t forget to call this method.
You should also include commit after any DML statements. if the autocommit property of your connection string is set to false
>>> import sqlite3
>>> conn = sqlite3.connect("test.db")
>>> conn.cursor().execute("SELECT * from foo").fetchall()
>>> conn.execute("INSERT INTO foo VALUES (1, 2)")
>>> conn.commit()

Categories