SQLAlchemy memory hog on select statement - python

As per the SQLAlchemy, select statements are treated as iterables in for loops. The effect is that a select statement that would return a massive amount of rows does not use excessive memory.
I am finding that the following statement on a MySQL table:
for row in my_connections.execute(MyTable.__table__.select()):
yield row
Does not seem to follow this, as I overflow available memory and begin thrashing before the first row is yielded. What am I doing wrong?

The basic MySQLdb cursor fetches the entire query result at once from the server.
This can consume a lot of memory and time.
Use MySQLdb.cursors.SSCursor when you want to make a huge query and
pull results from the server one at a time.
Therefore, try passing connect_args={'cursorclass': MySQLdb.cursors.SSCursor}
when creating the engine:
from sqlalchemy import create_engine, MetaData
import MySQLdb.cursors
engine = create_engine('mysql://root:zenoss#localhost/e2', connect_args={'cursorclass': MySQLdb.cursors.SSCursor})
meta = MetaData(engine, reflect=True)
conn = engine.connect()
rs = s.execution_options(stream_results=True).execute()
See http://www.sqlalchemy.org/trac/ticket/1089
Note that using SSCursor locks the table until the fetch is complete. This affects other cursors using the same connection: Two cursors from the same connection can not read from the table concurrently.
However, cursors from different connections can read from the same table concurrently.
Here is some code demonstrating the problem:
import MySQLdb
import MySQLdb.cursors as cursors
import threading
import logging
import config
logger = logging.getLogger(__name__)
query = 'SELECT * FROM huge_table LIMIT 200'
def oursql_conn():
import oursql
conn = oursql.connect(
host=config.HOST, user=config.USER, passwd=config.PASS,
db=config.MYDB)
return conn
def mysqldb_conn():
conn = MySQLdb.connect(
host=config.HOST, user=config.USER,
passwd=config.PASS, db=config.MYDB,
cursorclass=cursors.SSCursor)
return conn
def two_cursors_one_conn():
"""Two SSCursors can not use one connection concurrently"""
def worker(conn):
cursor = conn.cursor()
cursor.execute(query)
for row in cursor:
logger.info(row)
conn = mysqldb_conn()
threads = [threading.Thread(target=worker, args=(conn, ))
for n in range(2)]
for t in threads:
t.daemon = True
t.start()
# Second thread may hang or raise OperationalError:
# File "/usr/lib/pymodules/python2.7/MySQLdb/cursors.py", line 289, in _fetch_row
# return self._result.fetch_row(size, self._fetch_type)
# OperationalError: (2013, 'Lost connection to MySQL server during query')
for t in threads:
t.join()
def two_cursors_two_conn():
"""Two SSCursors from independent connections can use the same table concurrently"""
def worker():
conn = mysqldb_conn()
cursor = conn.cursor()
cursor.execute(query)
for row in cursor:
logger.info(row)
threads = [threading.Thread(target=worker) for n in range(2)]
for t in threads:
t.daemon = True
t.start()
for t in threads:
t.join()
logging.basicConfig(level=logging.DEBUG,
format='[%(asctime)s %(threadName)s] %(message)s',
datefmt='%H:%M:%S')
two_cursors_one_conn()
two_cursors_two_conn()
Note that oursql is an alternative set of MySQL bindings for Python. oursql cursors are true server-side cursors which fetch rows lazily by default. With oursql installed, if you change
conn = mysqldb_conn()
to
conn = oursql_conn()
then two_cursors_one_conn() runs without hanging or raising an exception.

Related

Postgres connections in many threads

I need advaice in a special case.
I have a program like this:
data = [...]
multithread.Pool(n, data)
def slow_function(data)
db = psycopg2.connect(credentials)
cursor = db.cursor()
new_data = realy_slow_func()
some_query = "some update query"
cursor.execute(some_query )
Is opening new connection in each thread safe? It doesn't matter if it's slow, and faster approaches exists.
Threads are necessary because realy_slow_func() is slow.
Credentials for database are the same for each threads
I am using psycopg2
You should be using a connection pool, which will create a pool of connections and reuse the same connections across your thread. I would suggest using a ThreadPool too so that the number of threads running at a time is equal to the number of connections available in the DB Connection Pool. But for the scope of this question, I will talk about DB Connection Pool
I have not tested the code, but this is how it would look. You first create a connectionPool and then get a connection from it within your thread, and once complete release the connection. You could also manage the get connection and release, outside of the thread and just pass the connection as parameter, and release once thread completes
Highlighting ThreadedConnectionPool as the class used to create the pool as the name suggests works with threads.
From docs:
A connection pool that works with the threading module.
Note This pool class can be safely used in multi-threaded applications.
import psycopg2
from psycopg2 import pool
postgreSQL_pool = psycopg2.pool.ThreadedConnectionPool(1, 20, user="postgres",
password="pass##29",
host="127.0.0.1",
port="5432",
database="postgres_db")
data = [...]
multithread.Pool(n, data)
def slow_function(data):
db = postgreSQL_pool.getconn()
cursor = db.cursor()
new_data = realy_slow_func()
some_query = "some update query"
cursor.execute(some_query)
cursor.close()
postgreSQL_pool.putconn(db)
Source: https://pynative.com/psycopg2-python-postgresql-connection-pooling/
Docs: https://www.psycopg.org/docs/pool.html

multithreading in mysql and python

I am learning python and mysql and with that I want to use multythreading to write in mysql database
when I try to do that and try to make multiple threads It shows error like connection not found but If I try with 1 thread it works fine but It has lower speed i.e. 40 rows p/s
please help me to do that and if I am doing wrong please let me know if there is a good way to do that thanx
import mysql.connector
from queue import Queue
from threading import Thread
mydb = mysql.connector.connect(
host="localhost",
user="root",
password="",
database="list"
)
def do_stuff(q):
while True:
mycursor = mydb.cursor()
a= q.get()
sql = "INSERT INTO demo1 (id, name, price, tmp) VALUES (%s, %s, %s, %s)"
val = (a[0], a[1],a[3],a[2])
mycursor.execute(sql, val)
mydb.commit()
q.task_done()
q = Queue(maxsize=0)
num_threads = 1 #if I try more then 1 it throw error "IndexError: bytearray index out of range"
for i in range(num_threads):
worker = Thread(target=do_stuff, args=(q,))
worker.setDaemon(True)
worker.start()
def strt():
mycursor = mydb.cursor()
sql = f"SELECT * FROM demo ORDER BY id"
mycursor.execute(sql)
myresult = mycursor.fetchall()
for x in myresult:
q.put(x)
strt()
Hi in order to do the transactions you have to open connection to each thread. How this works is when you open a connection those connections are getting by a pool. If you open one connection that connection is always used by one process and not letting the other to connect.
It will not make any bottle necks because when one connection is free that connection will be chosen from the pool.
Source

Does MySQL database connections pool expire if it was not used within long amount of time or if application which used it collapsed?

I am connecting database MySQL (MariaDB) from Python script using MySQLConnectionPool. I use context manager to hadle connection in the pool. I wonder if pool can expire if it is not used for a long amount of time or if my program collapsed. I've found that connection to MySQL db expires, do it is released even if you've forgot or have not been able to close connection in your program, what's situation with connections pool?
from contextlib import contextmanager
import mysql.connector
from mysql.connector.errors import Error
from mysql.connector import pooling
SQL_CONN_POOL = pooling.MySQLConnectionPool(
pool_name="mysqlpool",
pool_size=1,
user=DB_USER,
password=DB_PASS,
host=DB_HOST,
database=DATABASE,
auth_plugin=DB_PLUGIN
)
#contextmanager
def mysql_connection_from_pool() -> "conn":
conn_pool = SQL_CONN_POOL # get connection from the pool, all the rest is the same
_conn = conn_pool.get_connection()
try:
yield _conn
except (Exception, Error) as ex:
# if error happened all made changes during the connection will be rolled back:
_conn.rollback()
# this statement re-raise error to let it be handled in outer scope:
raise
else:
# if everything is fine commit all changes to save them in db:
_conn.commit()
finally:
# actually it returns connection to the pool, rather than close it
_conn.close()
#contextmanager
def mysql_curs_from_pool() -> "curs":
with mysql_connection_from_pool() as _conn:
_curs = _conn.cursor()
try:
yield _curs
finally:
_curs.close()
Yes it can be time outed. There are two timeout configuration.
See wait_timeout and interactive_timeout

PYTHON MYSQL doesn't work second time

Am receiving json data (from an other python script) to put inside MYSQL database, the code work fine the first time but the second time I got this error:
raise errors.OperationalError("MySQL Connection not available.")
mysql.connector.errors.OperationalError: MySQL Connection not available.
For troubleshooting am sending always the same data, but it still write an error the second time.
I tried also from information found on furums to place : cur = mydb.cursor() at diferents places but I have never been able to get this code work the second time.
There is my code :
import mysql.connector
import json
mydb = mysql.connector.connect(
host="localhost",
user="***",
passwd="***",
database="***"
)
def DATA_REPARTITION(Topic, jsonData):
if Topic == "test":
#print ("Start")
INSERT_DEBIT(jsonData)
def INSERT_DEBIT(jsonData):
cur = mydb.cursor()
#Read json from MQTT
print("Start read data to insert")
json_Dict = json.loads(jsonData)
debit = json_Dict['debit']
print("I send")
print(debit)
#Insert into DB Table
sql = ("INSERT INTO debit (data_debit) VALUES (%s)")
val=debit,
cur.execute(sql,val)
mydb.commit()
print(cur.rowcount, "record inserted.")
cur.close()
mydb.close()
Thanks for your help!
You only open your database connection once, at the start of the script, and you close that connection after making the first insert. Hence, second and subsequent inserts are failing. You should create a helper function which returns a database connection, and then call it each time you want to do DML:
def getConnection():
mydb = mysql.connector.connect(
host="localhost",
user="***",
passwd="***",
database="***")
return mydb
def INSERT_DEBIT(jsonData):
mydb = getConnection()
cur = mydb.cursor()
# Read json from MQTT
# rest of your code here...
cur.close()
mydb.close()

Python Database connection Close

Using the code below leaves me with an open connection, how do I close?
import pyodbc
conn = pyodbc.connect('DRIVER=MySQL ODBC 5.1 driver;SERVER=localhost;DATABASE=spt;UID=who;PWD=testest')
csr = conn.cursor()
csr.close()
del csr
Connections have a close method as specified in PEP-249 (Python Database API Specification v2.0):
import pyodbc
conn = pyodbc.connect('DRIVER=MySQL ODBC 5.1 driver;SERVER=localhost;DATABASE=spt;UID=who;PWD=testest')
csr = conn.cursor()
csr.close()
conn.close() #<--- Close the connection
Since the pyodbc connection and cursor are both context managers, nowadays it would be more convenient (and preferable) to write this as:
import pyodbc
conn = pyodbc.connect('DRIVER=MySQL ODBC 5.1 driver;SERVER=localhost;DATABASE=spt;UID=who;PWD=testest')
with conn:
crs = conn.cursor()
do_stuff
# conn.commit() will automatically be called when Python leaves the outer `with` statement
# Neither crs.close() nor conn.close() will be called upon leaving the `with` statement!!
See https://github.com/mkleehammer/pyodbc/issues/43 for an explanation for why conn.close() is not called.
Note that unlike the original code, this causes conn.commit() to be called. Use the outer with statement to control when you want commit to be called.
Also note that regardless of whether or not you use the with statements, per the docs,
Connections are automatically closed when they are deleted (typically when they go out of scope) so you should not normally need to call [conn.close()], but you can explicitly close the connection if you wish.
and similarly for cursors (my emphasis):
Cursors are closed automatically when they are deleted (typically when they go out of scope), so calling [csr.close()] is not usually necessary.
You can wrap the whole connection in a context manager, like the following:
from contextlib import contextmanager
import pyodbc
import sys
#contextmanager
def open_db_connection(connection_string, commit=False):
connection = pyodbc.connect(connection_string)
cursor = connection.cursor()
try:
yield cursor
except pyodbc.DatabaseError as err:
error, = err.args
sys.stderr.write(error.message)
cursor.execute("ROLLBACK")
raise err
else:
if commit:
cursor.execute("COMMIT")
else:
cursor.execute("ROLLBACK")
finally:
connection.close()
Then do something like this where ever you need a database connection:
with open_db_connection("...") as cursor:
# Your code here
The connection will close when you leave the with block. This will also rollback the transaction if an exception occurs or if you didn't open the block using with open_db_connection("...", commit=True).
You might try turning off pooling, which is enabled by default. See this discussion for more information.
import pyodbc
pyodbc.pooling = False
conn = pyodbc.connect('DRIVER=MySQL ODBC 5.1 driver;SERVER=localhost;DATABASE=spt;UID=who;PWD=testest')
csr = conn.cursor()
csr.close()
del csr
You can define a DB class as below. Also, as andrewf suggested, use a context manager for cursor access.I'd define it as a member function.
This way it keeps the connection open across multiple transactions from the app code and saves unnecessary reconnections to the server.
import pyodbc
class MS_DB():
""" Collection of helper methods to query the MS SQL Server database.
"""
def __init__(self, username, password, host, port=1433, initial_db='dev_db'):
self.username = username
self._password = password
self.host = host
self.port = str(port)
self.db = initial_db
conn_str = 'DRIVER=DRIVER=ODBC Driver 13 for SQL Server;SERVER='+ \
self.host + ';PORT='+ self.port +';DATABASE='+ \
self.db +';UID='+ self.username +';PWD='+ \
self._password +';'
print('Connected to DB:', conn_str)
self._connection = pyodbc.connect(conn_str)
pyodbc.pooling = False
def __repr__(self):
return f"MS-SQLServer('{self.username}', <password hidden>, '{self.host}', '{self.port}', '{self.db}')"
def __str__(self):
return f"MS-SQLServer Module for STP on {self.host}"
def __del__(self):
self._connection.close()
print("Connection closed.")
#contextmanager
def cursor(self, commit: bool = False):
"""
A context manager style of using a DB cursor for database operations.
This function should be used for any database queries or operations that
need to be done.
:param commit:
A boolean value that says whether to commit any database changes to the database. Defaults to False.
:type commit: bool
"""
cursor = self._connection.cursor()
try:
yield cursor
except pyodbc.DatabaseError as err:
print("DatabaseError {} ".format(err))
cursor.rollback()
raise err
else:
if commit:
cursor.commit()
finally:
cursor.close()
ms_db = MS_DB(username='my_user', password='my_secret', host='hostname')
with ms_db.cursor() as cursor:
cursor.execute("SELECT ##version;")
print(cur.fetchall())
According to pyodbc documentation, connections to the SQL server are not closed by default. Some database drivers do not close connections when close() is called in order to save round-trips to the server.
To close your connection when you call close() you should set pooling to False:
import pyodbc
pyodbc.pooling = False
The most common way to handle connections, if the language does not have a self closing construct like Using in .NET, then you should use a try -> finally to close the objects. Its possible that pyodbc does have some form of automatic closing but here is the code I do just in case:
conn = cursor = None
try:
conn = pyodbc.connect('DRIVER=MySQL ODBC 5.1 driver;SERVER=localhost;DATABASE=spt;UID=who;PWD=testest')
cursor = conn.cursor()
# ... do stuff ...
finally:
try: cursor.close()
except: pass
try: conn.close()
except: pass

Categories