I recently had a Python 2.7x project where I needed to use mysql.connector to execute multiple, semicolon delineated statements in one query. This is explained nicely in the this post..
However, I needed to use mysql.connector with Twisted for my current project, which means using Twisted's excellent enterprise.adbapi module to make my new blocking database connection non-blocking.
config = {"user": username, "password": password, "host": hostname,
"database": database_name, "raise_on_warnings": True}
cp = adbapi.ConnectionPool("mysql.connector", **config)
my test statements are defined below. I apologize that they are a bit of a frivolous example, but I know the results that I expect, and it should be enough to verify that I'm getting results for multiple statements.
statement1 = "SELECT * FROM queue WHERE id = 27;"
statement2 = "SELECT * FROM order WHERE id = 1;"
statement_list = [statement1, statement2]
statements = " ".join(statement_list)
The problem comes when I now try to execute the ConnectionPool method .runQuery()
def _print_result(result):
if result:
print("this is a result")
print(result)
else:
print("no result")
reactor.stop()
d = cp.runQuery(statements, multi=True)
d.addBoth(_print_result)
this gets me the following result:
this is a result [Failure instance: Traceback: : No result set to fetch from.
How can I use Twisted's adbapi module to get the results that I know are there?
So, it turns out that when using adbapi.ConnectionPool.runQuery(), the default behavior is to send the result of the database interrogation to the cursor.fetchall() method. However, when using mysql.connector, this doesn't work, even without twisted. Instead one needs to iterate over the result set, and call fetchall() on each member of the set.
So, the way I solved this was with the following subclass.
from twisted.enterprise import adbapi
class NEWadbapiConnectionPool(adbapi.ConnectionPool):
def __init__(self, dbapiName, *connargs, **connkw):
adbapi.ConnectionPool.__init__(self, dbapiName, *connargs, **connkw)
def runMultiQuery(self, *args, **kw):
return self.runInteraction(self._runMultiQuery, *args, **kw)
def _runMultiQuery(self, trans, *args, **kw):
result = trans.execute(*args, **kw)
result_list = []
for item in result:
if item.with_rows:
result_list.append(item.fetchall())
return result_list
so now I create the following:
def _print_result(result):
if result:
print("this is a result")
print(result)
else:
print("no result")
reactor.stop()
cp = NEWadbapiConnectionPool("mysql.connector", **config)
d = cp.runMultiQuery(statements, multi=True)
d.addBoth(_print_result)
and get a list of the results for each statement.
I hope someone else finds this useful.
RunQuery always expects results. The right way to do it is to call runOperation() which does not fetch results.
If you want to use .runQuery, it expects results to fetch so you need to return something
dbpool.runQuery(
"UPDATE something SET col1=true WHERE some_id=123 RETURNING *"
)
.runOperation does not expect results
dbpool.runOperation(
"UPDATE something SET col1=true WHERE some_id=123"
)
Related
I'm really new with Python and currently I'm trying to understand how it works and I'm building small codes for practising.
I have a table contains the details of the users: called luzer.
I want to update the password field (called: JELSZO in the table) is a varbinary (1000).
The MysqlDB connection works via a pool.
So, I use this def to execute Sql queries and commit updates etc...
def execute(self, sql, args=None, commit=False):
""" Execute a sql """
# get connection form connection pool.
conn = self.pool.get_connection()
cursor = conn.cursor()
if args:
cursor.execute(sql, args)
else:
cursor.execute(sql)
if commit is True:
conn.commit()
self.close(conn, cursor)
return None
else:
res = cursor.fetchall()
self.close(conn, cursor)
return res
And this is how I try to update the password field (JELSZO)
sql_update_query = "Update luzer SET JELSZO = %s where AZON = %s" #the AZON is the userid in the table.
pas2 = testing(MySQLPool, sql_update_query, (jelszoid1, loginid, ), True) #if the commit = True then it should run the conn.commit() above.
It runs without any error but when I try to check if it commited the update succesfully then I see that nothing happend.
The password is a binary string (generated using Fernet Key).
I would really appreciate if you have any idea what could go wrong here?
I solved this bloody issue. I used a different named def for it.
def update(request, bid, *args, **kwargs):
mysql_pool = MySQLPool()
query = sql_update_query
result = mysql_pool.execute(query, (jelszoid1, loginid,), True)
print('RESULT : \n', result)
return result
while the original (def testing now I renamed to 'selection') function stayed as it was:
def selection(request, bid, *args, **kwargs):
mysql_pool = MySQLPool()
query = query1
result = mysql_pool.execute(query, (loginid,), False)
print('RESULT : \n', result)
return result
As you can see the only difference is between the parameters of the function call.
I was stupid because I disregarded the basic sql rule that the select queries do not need commit.
My questions basically is is there a best practice approach to db interaction and am I doing something silly / wrong in the below that is costing processing time.
My program pulls data from a website and writes to a SQL database. Speed is very important and I want to be able to refresh the data as quickly as possible. I've tried a number of ways and I feel its still way too slow - i.e. could be much better with a better approach / design to interaction with the db and I'm sure I'm making all sorts of mistakes. I can download the data to memory very quickly but the writes to the db take much much longer.
The 3 main approaches I've tried are:
Threads that pull the data and populate a list of SQL commands, when
threads complete run sql in main thread
Threads that pull data and push to SQL (as per below code)
Threads that pull data and populate a q with separate thread(s)
polling the q and pushing to the db.
Code as below:
import MySQLdb as mydb
class DatabaseUtility():
def __init__(self):
"""set db parameters"""
def updateCommand(self, cmd):
"""run SQL commands and return number of matched rows"""
try:
self.cur.execute(cmd)
return int(re.search('Rows matched: (\d+)', self.cur._info).group(1))
except Exception, e:
print ('runCmd error: ' + str(e))
print ('With SQL: ' + cmd)
return 0
def addCommand(self, cmd):
"""write SQL command to db"""
try:
self.cur.execute(cmd)
return self.cur.rowcount
except Exception, e:
print ('runCmd error: ' + str(e))
print ('With SQL: ' + cmd)
return 0
I've created a class that instantiates a db connection and is called as below:
from Queue import Queue
from threading import Thread
import urllib2
import json
from databasemanager import DatabaseUtility as dbU
from datalinks import getDataLink, allDataLinks
numThreads = 3
q = Queue()
dbu = dbU()
class OddScrape():
def __init__(self, name, q):
self.name = name
self.getOddsData(self.name, q)
def getOddsData(self, i, q):
"""Worker thread - parse each datalink and update / insert to db"""
while True:
#get datalink, create db connection
self.dbu = dbU()
matchData = q.get()
#load data link using urllib2 and do a bunch of stuff
#to parse the data to the required format
#try to update in db and insert if not found
sql = "sql to update %s" %(params)
update = self.dbu.updateCommand(sql)
if update < 1:
sql = "sql to insert" %(params)
self.dbu.addCommand(sql)
q.task_done()
self.dbu.dbConClose()
print eventlink
def threadQ():
#set up some threads
for i in range(numThreads):
worker = Thread(target=OddScrape, args=(i, q,))
worker.start()
#get urldata for all matches required and add to q
matchids = dbu.runCommand("sql code to determine scope of urls")
for match in matchids:
sql = "sql code to get url data %s" %match
q.put(dbu.runCommand(sql))
q.join()
I've also added an index to the table I'm writing too which seemed to help a tiny bit but not noticeably:
CREATE INDEX `idx_oddsdata_bookid_datalinkid`
ON `dbname`.`oddsdata` (bookid, datalinkid) COMMENT '' ALGORITHM DEFAULT LOCK DEFAULT;
Multiple threads implies multiple connections. Although getting a connection is "fast" in MySQL, it is not instantaneous. I do not know the relative speed of getting a connection versus running a query, but I doubt if you multi-threaded idea will win.
Could you show us examples of the actual queries (SQL, not python code) you need to run. We may have suggestions on combining queries, improved indexes, etc. Please provide SHOW CREATE TABLE, too. (You mentioned a CREATE INDEX, but it is useless out of context.)
It looks like you are doing a multi-step process that could be collapsed into INSERT ... ON DUPLICATE KEY UPDATE ....
I am running in to the dreaded MySQL Commands out of Sync when using a custom DB library and celery.
The library is as follows:
import pymysql
import pymysql.cursors
from furl import furl
from flask import current_app
class LegacyDB:
"""Db
Legacy Database connectivity library
"""
def __init__(self,app):
with app.app_context():
self.rc = current_app.config['RAVEN']
self.logger = current_app.logger
self.data = {}
# setup Mysql
try:
uri = furl(current_app.config['DBCX'])
self.dbcx = pymysql.connect(
host=uri.host,
user=uri.username,
passwd=uri.password,
db=str(uri.path.segments[0]),
port=int(uri.port),
cursorclass=pymysql.cursors.DictCursor
)
except:
self.rc.captureException()
def query(self, sql, params = None, TTL=36):
# INPUT 1 : SQL query
# INPUT 2 : Parameters
# INPUT 3 : Time To Live
# OUTPUT : Array of result
# check that we're still connected to the
# database before we fire off the query
try:
db_cursor = self.dbcx.cursor()
if params:
self.logger.debug("%s : %s" % (sql, params))
db_cursor.execute(sql,params)
self.dbcx.commit()
else:
self.logger.debug("%s" % sql)
db_cursor.execute(sql)
self.data = db_cursor.fetchall()
if self.data == None:
self.data = {}
db_cursor.close()
except Exception as ex:
if ex[0] == "2006":
db_cursor.close()
self.connect()
db_cursor = self.dbcx.cursor()
if params:
db_cursor.execute(sql,params)
self.dbcx.commit()
else:
db_cursor.execute(sql)
self.data = db_cursor.fetchall()
db_cursor.close()
else:
self.rc.captureException()
return self.data
The purpose of the library is to work alongside SQLAlchemy whilst I migrate a legacy database schema from a C++-based system to a Python based system.
All configuration is done via a Flask application and the app.config['DBCX'] value reads the same as a SQLAlchemy String ("mysql://user:pass#host:port/dbname") allowing me to easily switch over in future.
I have a number of tasks that run "INSERT" statements via celery, all of which utilise this library. As you can imagine, the main reason for running Celery is so that I can increase throughput on this application, however I seem to be hitting an issue with the threading in my library or the application as after a while (around 500 processed messages) I see the following in the logs:
Stacktrace (most recent call last):
File "legacy/legacydb.py", line 49, in query
self.dbcx.commit()
File "pymysql/connections.py", line 662, in commit
self._read_ok_packet()
File "pymysql/connections.py", line 643, in _read_ok_packet
raise OperationalError(2014, "Command Out of Sync")
I'm obviously doing something wrong to hit this error, however it doesn't seem to matter whether MySQL has autocommit enabled/disabled or where I place my connection.commit() call.
If I leave out the connection.commit() then I don't get anything inserted into the database.
I've recently moved from mysqldb to pymysql and the occurrences appear to be lower, however given that these are simple "insert" commands and not a complicated select (there aren't even any foreign key constraints on this database!) I'm struggling to work out where the issue is.
As things stand at present, I am unable to use executemany as I cannot prepare the statements in advance (I am pulling data from a "firehose" message queue and storing it locally for later processing).
First of all, make sure that the celery thingamajig uses its own connection(s) since
>>> pymysql.threadsafety
1
Which means: "threads may share the module but not connections".
Is the init called once, or per-worker? If only once, you need to move the initialisation.
How about lazily initialising the connection in a thread-local variable the first time query is called?
We have a little bit of a complicated setup:
In our normal code, we connect manually to a mysql db. We're doing this because I guess the connections django normally uses are not threadsafe? So we let django make the connection, extract the information from it, and then use a mysqldb connection to do the actual querying.
Our code is largely an update process, so we have autocommit turned off to save time.
For ease of creating test data, I created django models that represent the tables, and use them to create rows to test on. So I have functions like:
def make_thing(**overrides):
fields = deepcopy(DEFAULT_THING)
fields.update(overrides)
s = Thing(**fields)
s.save()
transaction.commit(using='ourdb')
reset_queries()
return s
However, it doesn't seem to actually be committing! After I make an object, I later have code that executes raw sql against the mysqldb connection:
def get_information(self, value):
print self.api.rawSql("select count(*) from thing")[0][0]
query = 'select info from thing where column = %s' % value
return self.api.rawSql(query)[0][0]
This print statement prints 0! Why?
Also, if I turn autocommit off, I get
TransactionManagementError: This is forbidden when an 'atomic' block is active.
when we try to alter the autocommit level later.
EDIT: I also just tried https://groups.google.com/forum/#!topic/django-users/4lzsQAWYwG0, which did not help.
EDIT2: I checked from a shell against the database--the commit is working, it's just not getting picked up. I've tried setting the transaction isolation level but it isn't helping. I should add that a function further up from get_information uses this decorator:
def single_transaction(fn):
from django.db import transaction
from django.db import connection
def wrapper(*args, **kwargs):
prior_autocommit = transaction.get_autocommit()
transaction.set_autocommit(False)
connection.cursor().execute('set transaction isolation level read committed')
connection.cursor().execute("SELECT ##session.tx_isolation")
try:
result = fn(*args, **kwargs)
transaction.commit()
return result
finally:
transaction.set_autocommit(prior_autocommit)
django.db.reset_queries()
gc.collect()
wrapper.__name__ = fn.__name__
return wrapper
I have a weird issue, which is probably easy to resolve.
I have a class Database with an __init__ and an executeDictMore method (among others).
class Database():
def __init__(self, database, server,login, password ):
self.database = database
my_conv = { FIELD_TYPE.LONG: int }
self.conn = MySQLdb.Connection(user=login, passwd=password, db=self.database, host=server, conv=my_conv)
self.cursor = self.conn.cursor()
def executeDictMore(self, query):
self.cursor.execute(query)
data = self.cursor.fetchall()
if data == None :
return None
result = []
for d in data:
desc = self.cursor.description
dict = {}
for (name, value) in zip(desc, d) :
dict[name[0]] = value
result.append(dict)
return result
Then I instantiate this class in a file db_functions.py :
from Database import Database
db = Database()
And I call the executeDictMore method from a function of db_functions :
def test(id):
query = "SELECT * FROM table WHERE table_id=%s;" %(id)
return db.executeDictMore(query)
Now comes the weird part.
If I import db_functions and call db_functions.test(id) from a python console:
import db_functions
t = db_functions.test(12)
it works just fine.
But if I do the same thing from another python file I get the following error :
AttributeError: Database instance has no attribute 'executeDictMore'
I really don't understand what is going on here. I don't think I have another Database class interfering. And I append the folder where the modules are in sys.path, so it should call the right module anyway.
If someone has an idea, it's very welcome.
You have another Database module or package in your path somewhere, and it is getting imported instead.
To diagnose where that other module is living, add:
import Database
print Database.__file__
before the from Database import Database line; it'll print the filename of the module. You'll have to rename one or the other module to not conflict.
You could at least try to avoid SQL injection. Python provides such neat ways to do so:
def executeDictMore(self, query, data=None):
self.cursor.execute(query, data)
and
def test(id):
query = "SELECT * FROM table WHERE table_id=%s"
return db.executeDictMore(query, id)
are the ways to do so.
Sorry, this should rather be a comment, but an answer allows for better formatting. Iam aware that it doesn't answer your question...
You should insert (not append) into your sys.path if you want it first in the search path:
sys.path.insert(0, '/path/to/your/Database/class')
Im not too sure what is wrong but you could try passing the database object to the function as an argument like
db_functions.test(db, 12) with db being your Database class