I need advaice in a special case.
I have a program like this:
data = [...]
multithread.Pool(n, data)
def slow_function(data)
db = psycopg2.connect(credentials)
cursor = db.cursor()
new_data = realy_slow_func()
some_query = "some update query"
cursor.execute(some_query )
Is opening new connection in each thread safe? It doesn't matter if it's slow, and faster approaches exists.
Threads are necessary because realy_slow_func() is slow.
Credentials for database are the same for each threads
I am using psycopg2
You should be using a connection pool, which will create a pool of connections and reuse the same connections across your thread. I would suggest using a ThreadPool too so that the number of threads running at a time is equal to the number of connections available in the DB Connection Pool. But for the scope of this question, I will talk about DB Connection Pool
I have not tested the code, but this is how it would look. You first create a connectionPool and then get a connection from it within your thread, and once complete release the connection. You could also manage the get connection and release, outside of the thread and just pass the connection as parameter, and release once thread completes
Highlighting ThreadedConnectionPool as the class used to create the pool as the name suggests works with threads.
From docs:
A connection pool that works with the threading module.
Note This pool class can be safely used in multi-threaded applications.
import psycopg2
from psycopg2 import pool
postgreSQL_pool = psycopg2.pool.ThreadedConnectionPool(1, 20, user="postgres",
password="pass##29",
host="127.0.0.1",
port="5432",
database="postgres_db")
data = [...]
multithread.Pool(n, data)
def slow_function(data):
db = postgreSQL_pool.getconn()
cursor = db.cursor()
new_data = realy_slow_func()
some_query = "some update query"
cursor.execute(some_query)
cursor.close()
postgreSQL_pool.putconn(db)
Source: https://pynative.com/psycopg2-python-postgresql-connection-pooling/
Docs: https://www.psycopg.org/docs/pool.html
Related
I am trying to retrieve data from a database for use in an api context. However I noticed that conn.close() was taking a relatively long time to execute (in this context conn is a connection from a mysql connection pool). Since closing the connection is not blocking the api's ability to return data I figured I would use asyncio to close the connection async so it wouldn't block the data being returned.
async def get_data(stuff):
conn = api.db.get_connection()
cursor = conn.cursor(dictionary=True)
data = execute_query(stuff, conn, cursor)
cursor.close()
asyncio.ensure_future(close_conn(conn))
return helper_rows
async def close_conn(conn):
conn.close()
results = asyncio.run(get_data(stuff))
However despite the fact the asyncio.ensure_future(close(conn)) is not blocking (I put timing statements in to see how long everything was taking and the ones before and after this command were about 1ms different) the actual result won't be gotten until close_conn is completed. (I verified this using time statements and the difference in time between when it reaches the return statement in get_data and when the line after results=asyncio.run(get_data(stuff)) is about 200ms).
So my question is how do I make this code close the connection in the background so I am free to go ahead and process the data without having to wait for it.
Since conn.close() is not a coroutine it blocks the event loop when close_conn is scheduled. If you want to do what you described, use an async sql client and do await conn.close().
You could try using an asynchronous context manager. (async with statement)
async def get_data(stuff):
async with api.db.get_connection() as conn:
cursor = conn.cursor(dictionary=True)
data = execute_query(stuff, conn, cursor)
cursor.close()
asyncio.ensure_future(close_conn(conn))
return helper_rows
results = asyncio.run(get_data(stuff))
If that doesn't work the sql client you are using try with aiosqlite.
https://github.com/omnilib/aiosqlite
import aiosqlite
I am connecting database MySQL (MariaDB) from Python script using MySQLConnectionPool. I use context manager to hadle connection in the pool. I wonder if pool can expire if it is not used for a long amount of time or if my program collapsed. I've found that connection to MySQL db expires, do it is released even if you've forgot or have not been able to close connection in your program, what's situation with connections pool?
from contextlib import contextmanager
import mysql.connector
from mysql.connector.errors import Error
from mysql.connector import pooling
SQL_CONN_POOL = pooling.MySQLConnectionPool(
pool_name="mysqlpool",
pool_size=1,
user=DB_USER,
password=DB_PASS,
host=DB_HOST,
database=DATABASE,
auth_plugin=DB_PLUGIN
)
#contextmanager
def mysql_connection_from_pool() -> "conn":
conn_pool = SQL_CONN_POOL # get connection from the pool, all the rest is the same
_conn = conn_pool.get_connection()
try:
yield _conn
except (Exception, Error) as ex:
# if error happened all made changes during the connection will be rolled back:
_conn.rollback()
# this statement re-raise error to let it be handled in outer scope:
raise
else:
# if everything is fine commit all changes to save them in db:
_conn.commit()
finally:
# actually it returns connection to the pool, rather than close it
_conn.close()
#contextmanager
def mysql_curs_from_pool() -> "curs":
with mysql_connection_from_pool() as _conn:
_curs = _conn.cursor()
try:
yield _curs
finally:
_curs.close()
Yes it can be time outed. There are two timeout configuration.
See wait_timeout and interactive_timeout
Things I would like to Achieve:
I want to store my mongodb connections as 3-4 threads on Memory by which I want to make a pool of connections. I don't want to create a connection everytime when my core functions work. If I have a pool of connections then I can take connections from the pool, use it and release it back to the pool, this is the typical usecase.
What I have tried:
I thought of creating a Daemon process by which according to the number of workers corresponding threads will be created. But the thing is how can I keep the connections always alive, so that whenever I need it, I can use the connection and release it.
Links I have referred
I read that mongodb do have internal connection pooling mechanism and I can achieve it by setting maxPoolSize=200 from https://api.mongodb.com/python/current/faq.html#how-does-connection-pooling-work-in-pymongo. But here in my case still for each process one db connection will be opened and I can't afford that so would like to avoid that instead of that I would like to create and keep some connections alive, when my server is booted.
https://stackoverflow.com/a/14700365 for variable sharing between processes, using this I am able to talk between python scripts.
https://stackoverflow.com/a/14299004 for concurrent.futures library in Python, using this I am able to create pool.
https://realpython.com/intro-to-python-threading/ for various threading related libraries in Python
I am not able to combine above programs.
Question
Am I doing the right thing?
Do we have any other ways to pool the connections and store it in RAM so that I can access it as and when my core scipt needs a db connection.
I don't want to create the same via sockets as because connecting over socket may also become an overhead (I feel so but I am not sure.)
In the following script I tried to create threads with the help of thread pool and create some connections. I am able to do so, but I am not sure about how I can store this in memory so that each time I can access the connection.
import threading
import time
import logging
import configparser
from pymongo import MongoClient
logging.basicConfig(level=logging.DEBUG,
format='(%(threadName)-9s) %(message)s',)
class ThreadPool(object):
def __init__(self):
super(ThreadPool, self).__init__()
self.active = []
self.lock = threading.Lock()
def makeActive(self, name):
with self.lock:
self.active.append(name)
logging.debug('Running: %s', self.active)
def makeInactive(self, name):
with self.lock:
self.active.remove(name)
logging.debug('Running: %s', self.active)
def f(s, pool):
logging.debug('Waiting to join the pool')
with s:
name = threading.currentThread().getName()
config = configparser.ConfigParser()
config.read('.env')
url = config['mongoDB']['url']
port = config['mongoDB']['port']
user = config['mongoDB']['user']
password = config['mongoDB']['password']
db = config['mongoDB']['db']
connectionString = 'mongodb://' + user + ':' + password + '#' + url + ':' + port + '/' + db
pool.makeActive(name)
conn = MongoClient(connectionString)
logging.debug(conn)
#time.sleep(0.5)
pool.makeInactive(name)
if __name__ == '__main__':
pool = ThreadPool()
s = threading.Semaphore(2)
for i in range(10):
t = threading.Thread(target=f, name='thread_'+str(i), args=(s, pool))
t.daemon = True
t.start()
I am building a threaded class to run MySQL queries using Python and MySQLdb. I don't understand why running these queries threaded is slower than running them non-threaded. Here's my code to show what I'm doing.
First, here's the non-threaded function.
def testQueryDo(query_list):
db = MySQLdb.connect('localhost', 'user', 'pass', 'db_name')
cursor = db.cursor()
q_list = query_list
for each in q_list:
cursor.execute(each)
results = cursor.fetchall()
db.close()
Here's my threaded class:
class queryThread(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
self.db = MySQLdb.connect('localhost', 'user', 'pass', 'db_name')
self.cursor = self.db.cursor()
def run(self):
cur_query = self.queue.get()
self.cursor.execute(cur_query)
results = self.cursor.fetchall()
self.db.close()
self.queue.task_done()
And here's the handler:
def queryHandler(query_list):
queue = Queue.Queue()
for query in query_list:
queue.put(query)
total_queries = len(query_list)
for query in range(total_queries):
t = queryThread(queue)
t.setDaemon(True)
t.start()
queue.join()
I'm not sure why this threaded code is running slower. What's interesting is that if I use the same code, only do something simple like addition of numbers, the threaded code is significantly faster.
I understand that I must be missing something completely obvious, however any support would be much appreciated!
You're starting N threads, each of which creates its own connection to MySQL, and you're using a synchronous queue to deliver the queries to the threads. Each thread is blocking on queue.get() (acquiring an exclusive lock) to get a query, then creating a connection to the database, and then calling task_done() which lets the next thread proceed. So while thread 1 is working, N-1 threads are doing nothing. This overhead of lock acquire/release, plus the addditional overhead of serially creating and closing several connections to the database adds up.
As per the SQLAlchemy, select statements are treated as iterables in for loops. The effect is that a select statement that would return a massive amount of rows does not use excessive memory.
I am finding that the following statement on a MySQL table:
for row in my_connections.execute(MyTable.__table__.select()):
yield row
Does not seem to follow this, as I overflow available memory and begin thrashing before the first row is yielded. What am I doing wrong?
The basic MySQLdb cursor fetches the entire query result at once from the server.
This can consume a lot of memory and time.
Use MySQLdb.cursors.SSCursor when you want to make a huge query and
pull results from the server one at a time.
Therefore, try passing connect_args={'cursorclass': MySQLdb.cursors.SSCursor}
when creating the engine:
from sqlalchemy import create_engine, MetaData
import MySQLdb.cursors
engine = create_engine('mysql://root:zenoss#localhost/e2', connect_args={'cursorclass': MySQLdb.cursors.SSCursor})
meta = MetaData(engine, reflect=True)
conn = engine.connect()
rs = s.execution_options(stream_results=True).execute()
See http://www.sqlalchemy.org/trac/ticket/1089
Note that using SSCursor locks the table until the fetch is complete. This affects other cursors using the same connection: Two cursors from the same connection can not read from the table concurrently.
However, cursors from different connections can read from the same table concurrently.
Here is some code demonstrating the problem:
import MySQLdb
import MySQLdb.cursors as cursors
import threading
import logging
import config
logger = logging.getLogger(__name__)
query = 'SELECT * FROM huge_table LIMIT 200'
def oursql_conn():
import oursql
conn = oursql.connect(
host=config.HOST, user=config.USER, passwd=config.PASS,
db=config.MYDB)
return conn
def mysqldb_conn():
conn = MySQLdb.connect(
host=config.HOST, user=config.USER,
passwd=config.PASS, db=config.MYDB,
cursorclass=cursors.SSCursor)
return conn
def two_cursors_one_conn():
"""Two SSCursors can not use one connection concurrently"""
def worker(conn):
cursor = conn.cursor()
cursor.execute(query)
for row in cursor:
logger.info(row)
conn = mysqldb_conn()
threads = [threading.Thread(target=worker, args=(conn, ))
for n in range(2)]
for t in threads:
t.daemon = True
t.start()
# Second thread may hang or raise OperationalError:
# File "/usr/lib/pymodules/python2.7/MySQLdb/cursors.py", line 289, in _fetch_row
# return self._result.fetch_row(size, self._fetch_type)
# OperationalError: (2013, 'Lost connection to MySQL server during query')
for t in threads:
t.join()
def two_cursors_two_conn():
"""Two SSCursors from independent connections can use the same table concurrently"""
def worker():
conn = mysqldb_conn()
cursor = conn.cursor()
cursor.execute(query)
for row in cursor:
logger.info(row)
threads = [threading.Thread(target=worker) for n in range(2)]
for t in threads:
t.daemon = True
t.start()
for t in threads:
t.join()
logging.basicConfig(level=logging.DEBUG,
format='[%(asctime)s %(threadName)s] %(message)s',
datefmt='%H:%M:%S')
two_cursors_one_conn()
two_cursors_two_conn()
Note that oursql is an alternative set of MySQL bindings for Python. oursql cursors are true server-side cursors which fetch rows lazily by default. With oursql installed, if you change
conn = mysqldb_conn()
to
conn = oursql_conn()
then two_cursors_one_conn() runs without hanging or raising an exception.