I am trying to write a multi-threaded Python application in which a single SQlite connection is shared among threads. I am unable to get this to work. The real application is a cherrypy web server, but the following simple code demonstrates my problem.
What change or changes to I need to make to run the sample code, below, successfully?
When I run this program with THREAD_COUNT set to 1 it works fine and my database is updated as I expect (that is, letter "X" is added to the text value in the SectorGroup column).
When I run it with THREAD_COUNT set to anything higher than 1, all threads but 1 terminate prematurely with SQLite related exceptions. Different threads throw different exceptions (with no discernible pattern) including:
OperationalError: cannot start a transaction within a transaction
(occurs on the UPDATE statement)
OperationalError: cannot commit - no transaction is active
(occurs on the .commit() call)
InterfaceError: Error binding parameter 0 - probably unsupported type.
(occurs on the UPDATE and the SELECT statements)
IndexError: tuple index out of range
(this one has me completely puzzled, it occurs on the statement group = rows[0][0] or '', but only when multiple threads are running)
Here is the code:
CONNECTION = sqlite3.connect('./database/mydb', detect_types=sqlite3.PARSE_DECLTYPES, check_same_thread = False)
CONNECTION.row_factory = sqlite3.Row
def commands(start_id):
# loop over 100 records, read the SectorGroup column, and write it back with "X" appended.
for inv_id in range(start_id, start_id + 100):
rows = CONNECTION.execute('SELECT SectorGroup FROM Investment WHERE InvestmentID = ?;', [inv_id]).fetchall()
if rows:
group = rows[0][0] or ''
msg = '{} inv {} = {}'.format(current_thread().name, inv_id, group)
print msg
CONNECTION.execute('UPDATE Investment SET SectorGroup = ? WHERE InvestmentID = ?;', [group + 'X', inv_id])
CONNECTION.commit()
if __name__ == '__main__':
THREAD_COUNT = 10
for i in range(THREAD_COUNT):
t = Thread(target=commands, args=(i*100,))
t.start()
It's not safe to share a connection between threads; at the very least you need to use a lock to serialize access. Do also read http://docs.python.org/2/library/sqlite3.html#multithreading as older SQLite versions have more issues still.
The check_same_thread option appears deliberately under-documented in that respect, see http://bugs.python.org/issue16509.
You could use a connection per thread instead, or look to SQLAlchemy for a connection pool (and a very efficient statement-of-work and queuing system to boot).
I ran into the SqLite threading problem when writing a simple WSGI server for fun and learning.
WSGI is multi-threaded by nature when running under Apache.
The following code seems to work for me:
import sqlite3
import threading
class LockableCursor:
def __init__ (self, cursor):
self.cursor = cursor
self.lock = threading.Lock ()
def execute (self, arg0, arg1 = None):
self.lock.acquire ()
try:
self.cursor.execute (arg1 if arg1 else arg0)
if arg1:
if arg0 == 'all':
result = self.cursor.fetchall ()
elif arg0 == 'one':
result = self.cursor.fetchone ()
except Exception as exception:
raise exception
finally:
self.lock.release ()
if arg1:
return result
def dictFactory (cursor, row):
aDict = {}
for iField, field in enumerate (cursor.description):
aDict [field [0]] = row [iField]
return aDict
class Db:
def __init__ (self, app):
self.app = app
def connect (self):
self.connection = sqlite3.connect (self.app.dbFileName, check_same_thread = False, isolation_level = None) # Will create db if nonexistent
self.connection.row_factory = dictFactory
self.cs = LockableCursor (self.connection.cursor ())
Example of use:
if not ok and self.user: # Not logged out
# Get role data for any later use
userIdsRoleIds = self.cs.execute ('all', 'SELECT role_id FROM users_roles WHERE user_id == {}'.format (self.user ['id']))
for userIdRoleId in userIdsRoleIds:
self.userRoles.append (self.cs.execute ('one', 'SELECT name FROM roles WHERE id == {}'.format (userIdRoleId ['role_id'])))
Another example:
self.cs.execute ('CREATE TABLE users (id INTEGER PRIMARY KEY, email_address, password, token)')
self.cs.execute ('INSERT INTO users (email_address, password) VALUES ("{}", "{}")'.format (self.app.defaultUserEmailAddress, self.app.defaultUserPassword))
# Create roles table and insert default role
self.cs.execute ('CREATE TABLE roles (id INTEGER PRIMARY KEY, name)')
self.cs.execute ('INSERT INTO roles (name) VALUES ("{}")'.format (self.app.defaultRoleName))
# Create users_roles table and assign default role to default user
self.cs.execute ('CREATE TABLE users_roles (id INTEGER PRIMARY KEY, user_id, role_id)')
defaultUserId = self.cs.execute ('one', 'SELECT id FROM users WHERE email_address = "{}"'.format (self.app.defaultUserEmailAddress)) ['id']
defaultRoleId = self.cs.execute ('one', 'SELECT id FROM roles WHERE name = "{}"'.format (self.app.defaultRoleName)) ['id']
self.cs.execute ('INSERT INTO users_roles (user_id, role_id) VALUES ({}, {})'.format (defaultUserId, defaultRoleId))
Complete program using this construction downloadable at:
http://www.josmith.org/
N.B. The code above is experimental, there may be (fundamental) issues when using this with (many) concurrent requests (e.g. as part of a WSGI server). Performance is not critical for my application. The simplest thing probably would have been to just use MySql, but I like to experiment a little, and the zero installation thing about SqLite appealed to me. If anyone thinks the code above is fundamentally flawed, please react, as my purpose is to learn. If not, I hope this is useful for others.
I'm guessing here, but it looks like the reason why you are doing this is a performance concern.
Python threads aren't performant in any meaningful way for this use case. Instead, use sqlite transactions, which are super fast.
If you do all your updates in a transaction, you'll find an order of magnitude speedup.
Related
I'm really new with Python and currently I'm trying to understand how it works and I'm building small codes for practising.
I have a table contains the details of the users: called luzer.
I want to update the password field (called: JELSZO in the table) is a varbinary (1000).
The MysqlDB connection works via a pool.
So, I use this def to execute Sql queries and commit updates etc...
def execute(self, sql, args=None, commit=False):
""" Execute a sql """
# get connection form connection pool.
conn = self.pool.get_connection()
cursor = conn.cursor()
if args:
cursor.execute(sql, args)
else:
cursor.execute(sql)
if commit is True:
conn.commit()
self.close(conn, cursor)
return None
else:
res = cursor.fetchall()
self.close(conn, cursor)
return res
And this is how I try to update the password field (JELSZO)
sql_update_query = "Update luzer SET JELSZO = %s where AZON = %s" #the AZON is the userid in the table.
pas2 = testing(MySQLPool, sql_update_query, (jelszoid1, loginid, ), True) #if the commit = True then it should run the conn.commit() above.
It runs without any error but when I try to check if it commited the update succesfully then I see that nothing happend.
The password is a binary string (generated using Fernet Key).
I would really appreciate if you have any idea what could go wrong here?
I solved this bloody issue. I used a different named def for it.
def update(request, bid, *args, **kwargs):
mysql_pool = MySQLPool()
query = sql_update_query
result = mysql_pool.execute(query, (jelszoid1, loginid,), True)
print('RESULT : \n', result)
return result
while the original (def testing now I renamed to 'selection') function stayed as it was:
def selection(request, bid, *args, **kwargs):
mysql_pool = MySQLPool()
query = query1
result = mysql_pool.execute(query, (loginid,), False)
print('RESULT : \n', result)
return result
As you can see the only difference is between the parameters of the function call.
I was stupid because I disregarded the basic sql rule that the select queries do not need commit.
I have the following Python Code [pyhdb] to connect to SAP HANA Express:
Is there an error in my code? or has it something to do with the SYSTEM user?
Error Message is:
Could not find table/view TABLE in schema APP: line 1 col 19 (at pos 18)
import os
import random
import platform
from constant import *
import pyhdb
def is_rpi():
return 'arm' in platform.uname()[4]
if is_rpi():
import Adafruit_DHT
def read_dht():
if is_rpi():
sensor = Adafruit_DHT.DHT22
humidity, temperature = Adafruit_DHT.read_retry(sensor, DHT_PIN)
if humidity is not None and temperature is not None:
print('Temp={0:0.1f}*C Humidity={1:0.1f}%'.format(temperature, humidity))
return int(humidity), int(temperature)
else:
return None, None
else:
return random.randint(20, 30), random.randint(40, 70)
if __name__ == '__main__':
connection = pyhdb.connect(host=SAP_HOST, port=39015, user=SAP_USER, password=SAP_PWD)
cursor = connection.cursor()
temp, humi = read_dht()
query = "INSERT INTO \"{}\".\"{}\" VALUES(\'{}\', {}, {}, \'{}\')".format(
SAP_SCHEMA, SAP_TABLE, DEVICE_ID, temp, humi, ROOM_NAME)
print("Executing query: "), query
cursor.execute(query)
print("New Row count: "), cursor.rowcount
connection.close()
And here is the constant code:
DHT_PIN = 4
DEVICE_ID = '0ada9de4-bc4f-4e53-990a-cbcfccaed4c4'
ROOM_NAME = 'room 101
SAP_HOST = 'hxehost'
SAP_USER = 'SYSTEM'
SAP_PWD = 'XXXXXXXXXXXX'
SAP_SCHEMA = 'APP'
SAP_TABLE = 'TABLE'
The error message
Could not find table/view TABLE in schema APP
points to the fact that the table does not exist. In order to check whether the table is known to the system you could, e.g., also run the SQL statement
SELECT * FROM TABLES WHERE SCHEMA_NAME='APP' AND TABLE_NAME='TABLE';
which would lead to an empty result set for a non-existing table.
In case of an authorization problem you could rather expect an error like
insufficient privilege: Not authorized
Regarding the question about checking the authorization, you might want to take a look into the system views EFFECTIVE_PRIVILEGES, EFFECTIVE_ROLES resp. GRANTED_PRIVILEGES and GRANTED_ROLES (refer to the SAP HANA Security Guide). Generally, a privilege can be granted either by a user or a role. Roles can contain other roles, which might make finding the authorization a bit more complex.
However, in your specific case, you could probably try the rather simple SQL query:
SELECT * FROM "PUBLIC"."EFFECTIVE_PRIVILEGES"
WHERE USER_NAME='SYSTEM' AND SCHEMA_NAME='APP' AND PRIVILEGE='INSERT';
(Depending on your scenario, you might also want to check for the UPDATE privilege.)
Please allow me to add the remark that your INSERT statement from the example probably needs to be explicitly committed to be effective, as by default the connection sets autocommit=False, if I remember correctly.
The user SYSTEM had not enough privilege to insert into table. Solved Thanks to everyone.
My questions basically is is there a best practice approach to db interaction and am I doing something silly / wrong in the below that is costing processing time.
My program pulls data from a website and writes to a SQL database. Speed is very important and I want to be able to refresh the data as quickly as possible. I've tried a number of ways and I feel its still way too slow - i.e. could be much better with a better approach / design to interaction with the db and I'm sure I'm making all sorts of mistakes. I can download the data to memory very quickly but the writes to the db take much much longer.
The 3 main approaches I've tried are:
Threads that pull the data and populate a list of SQL commands, when
threads complete run sql in main thread
Threads that pull data and push to SQL (as per below code)
Threads that pull data and populate a q with separate thread(s)
polling the q and pushing to the db.
Code as below:
import MySQLdb as mydb
class DatabaseUtility():
def __init__(self):
"""set db parameters"""
def updateCommand(self, cmd):
"""run SQL commands and return number of matched rows"""
try:
self.cur.execute(cmd)
return int(re.search('Rows matched: (\d+)', self.cur._info).group(1))
except Exception, e:
print ('runCmd error: ' + str(e))
print ('With SQL: ' + cmd)
return 0
def addCommand(self, cmd):
"""write SQL command to db"""
try:
self.cur.execute(cmd)
return self.cur.rowcount
except Exception, e:
print ('runCmd error: ' + str(e))
print ('With SQL: ' + cmd)
return 0
I've created a class that instantiates a db connection and is called as below:
from Queue import Queue
from threading import Thread
import urllib2
import json
from databasemanager import DatabaseUtility as dbU
from datalinks import getDataLink, allDataLinks
numThreads = 3
q = Queue()
dbu = dbU()
class OddScrape():
def __init__(self, name, q):
self.name = name
self.getOddsData(self.name, q)
def getOddsData(self, i, q):
"""Worker thread - parse each datalink and update / insert to db"""
while True:
#get datalink, create db connection
self.dbu = dbU()
matchData = q.get()
#load data link using urllib2 and do a bunch of stuff
#to parse the data to the required format
#try to update in db and insert if not found
sql = "sql to update %s" %(params)
update = self.dbu.updateCommand(sql)
if update < 1:
sql = "sql to insert" %(params)
self.dbu.addCommand(sql)
q.task_done()
self.dbu.dbConClose()
print eventlink
def threadQ():
#set up some threads
for i in range(numThreads):
worker = Thread(target=OddScrape, args=(i, q,))
worker.start()
#get urldata for all matches required and add to q
matchids = dbu.runCommand("sql code to determine scope of urls")
for match in matchids:
sql = "sql code to get url data %s" %match
q.put(dbu.runCommand(sql))
q.join()
I've also added an index to the table I'm writing too which seemed to help a tiny bit but not noticeably:
CREATE INDEX `idx_oddsdata_bookid_datalinkid`
ON `dbname`.`oddsdata` (bookid, datalinkid) COMMENT '' ALGORITHM DEFAULT LOCK DEFAULT;
Multiple threads implies multiple connections. Although getting a connection is "fast" in MySQL, it is not instantaneous. I do not know the relative speed of getting a connection versus running a query, but I doubt if you multi-threaded idea will win.
Could you show us examples of the actual queries (SQL, not python code) you need to run. We may have suggestions on combining queries, improved indexes, etc. Please provide SHOW CREATE TABLE, too. (You mentioned a CREATE INDEX, but it is useless out of context.)
It looks like you are doing a multi-step process that could be collapsed into INSERT ... ON DUPLICATE KEY UPDATE ....
Using Flask, I'm curious to know if SQLAlchemy is still the best way to go for querying my database with raw SQL (direct SELECT x FROM table WHERE ...) instead of using the ORM or if there is an simpler yet powerful alternative ?
Thank for your reply.
I use SQLAlchemy for direct queries all the time.
Primary advantage: it gives you the best protection against SQL injection attacks. SQLAlchemy does the Right Thing whatever parameters you throw at it.
I find it works wonders for adjusting the generated SQL based on conditions as well. Displaying a result set with multiple filter controls above it? Just build your query in a set of if/elif/else constructs and you know your SQL will be golden still.
Here is an excerpt from some live code (older SA version, so syntax could differ a little):
# Pull start and end dates from form
# ...
# Build a constraint if `start` and / or `end` have been set.
created = None
if start and end:
created = sa.sql.between(msg.c.create_time_stamp,
start.replace(hour=0, minute=0, second=0),
end.replace(hour=23, minute=59, second=59))
elif start:
created = (msg.c.create_time_stamp >=
start.replace(hour=0, minute=0, second=0))
elif end:
created = (msg.c.create_time_stamp <=
end.replace(hour=23, minute=59, second=59))
# More complex `from_` object built here, elided for example
# [...]
# Final query build
query = sa.select([unit.c.eli_uid], from_obj=[from_])
query = query.column(count(msg.c.id).label('sent'))
query = query.where(current_store)
if created:
query = query.where(created)
The code where this comes from is a lot more complex, but I wanted to highlight the date range code here. If I had to build the SQL using string formatting, I'd probably have introduced a SQL injection hole somewhere as it is much easier to forget to quote values.
After I worked on a small project of mine, I decided to try to just use MySQLDB, without SQL Alchemy.
It works fine and it's quite easy to use, here's an example (I created a small class that handles all the work to the database)
import MySQLdb
from MySQLdb.cursors import DictCursor
class DatabaseBridge():
def __init__(self, *args, **kwargs):
kwargs['cursorclass'] = DictCursor
self.cnx = MySQLdb.connect (**kwargs)
self.cnx.autocommit(True)
self.cursor = self.cnx.cursor()
def query_all(self, query, *args):
self.cursor.execute(query, *args)
return self.cursor.fetchall()
def find_unique(self, query, *args):
rows = self.query_all(query, *args);
if len(rows) == 1:
return rows[0]
return None
def execute(self, query, params):
self.cursor.execute(query, params)
return self.cursor.rowcount
def get_last_id(self):
return self.cnx.insert_id()
def close(self):
self.cursor.close()
self.cnx.close()
database = DatabaseBridge(**{
'user': 'user',
'passwd': 'password',
'db': 'my_db'
})
rows = database.query_all("SELECT id, name, email FROM users WHERE is_active = %s AND project = %s", (1, "My First Project"))
(It's a dumb example).
It works like a charm BUT you have to take these into consideration :
Multithreading is not supported ! It's ok if you don't work with multiprocessing from Python.
You won't have all the advantages of SQLAlchemy (Database to Class (model) wrapper, Query generation (select, where, order_by, etc)). This is the key point on how you want to work with your database.
But on the other hand, and like SQLAlchemy, there is protections agains't SQL injection attacks :
A basic query would be like this :
cursor.execute("SELECT * FROM users WHERE data = %s" % "Some value") # THIS IS DANGEROUS
But you should do :
cursor.execute("SELECT * FROM users WHERE data = %s", ("Some value",)) # This is secure!
Saw the difference ? Read again ;)
The difference is that I replaced %, by , : We pass the arguments as ... arguments to the execute, and these are escaped. When using %, arguments aren't escaped, enabling SQL Injection attacks!
The final word here is that it depends on your usage and what you plan to do with your project. For me, SQLAlchemy was on overkill (it's a basic shell script !), so MysqlDB was perfect.
I made a table using SQLAlchemy and forgot to add a column. I basically want to do this:
users.addColumn('user_id', ForeignKey('users.user_id'))
What's the syntax for this? I couldn't find it in the docs.
I have the same problem, and a thought of using migration library only for this trivial thing makes me
tremble. Anyway, this is my attempt so far:
def add_column(engine, table_name, column):
column_name = column.compile(dialect=engine.dialect)
column_type = column.type.compile(engine.dialect)
engine.execute('ALTER TABLE %s ADD COLUMN %s %s' % (table_name, column_name, column_type))
column = Column('new_column_name', String(100), primary_key=True)
add_column(engine, table_name, column)
Still, I don't know how to insert primary_key=True into raw SQL request.
This is referred to as database migration (SQLAlchemy doesn't support migration out of the box). You can look at using sqlalchemy-migrate to help in these kinds of situations, or you can just ALTER TABLE through your chosen database's command line utility,
See this section of the SQLAlchemy documentation: http://docs.sqlalchemy.org/en/latest/core/metadata.html#altering-schemas-through-migrations
Alembic is the latest software to offer this type of functionality and is made by the same author as SQLAlchemy.
I have a database called "ncaaf.db" built with sqlite3 and a table called "games". So I would CD into the same directory on my linux command prompt and do
sqlite3 ncaaf.db
alter table games add column q4 type float
and that is all it takes! Just make sure you update your definitions in your sqlalchemy code.
from sqlalchemy import create_engine
engine = create_engine('sqlite:///db.sqlite3')
engine.execute('alter table table_name add column column_name String')
I had the same problem, I ended up just writing my own function in raw sql. If you are using SQLITE3 this might be useful.
Then if you add the column to your class definition at the same time it seems to do the trick.
import sqlite3
def add_column(database_name, table_name, column_name, data_type):
connection = sqlite3.connect(database_name)
cursor = connection.cursor()
if data_type == "Integer":
data_type_formatted = "INTEGER"
elif data_type == "String":
data_type_formatted = "VARCHAR(100)"
base_command = ("ALTER TABLE '{table_name}' ADD column '{column_name}' '{data_type}'")
sql_command = base_command.format(table_name=table_name, column_name=column_name, data_type=data_type_formatted)
cursor.execute(sql_command)
connection.commit()
connection.close()
I've recently had this same issue so I took a point from AlexP in an earlier answer. The problem was in getting the new column into my program's metadata. Using sqlAlchemy's append_column functionality had some unexpected downstream effects ('str' object has no attribute 'dialect impl'). I corrected this by adding the column with DDL (MySQL database in this case) and then reflecting the table back from the DB into my metadata.
Code is as roughly as follows (modified slightly from what I have in order to reduce it to its minimal essence. I apologize for any mistakes - if there, they should be minor)...
try:
# Use back quotes as a protection against SQL Injection Attacks. Can we do more?
common.qry_engine.execute('ALTER TABLE %s ADD COLUMN %s %s' %
('`' + self.tbl.schema + '`.`' + self.tbl.name + '`',
'`' + self.outputs[new_col] + '`', 'VARCHAR(50)'))
except exc.SQLAlchemyError as msg:
raise GRError(desc='Unable to physically add derived column to table. Contact support.',
data=str(self.outputs), other_info=str(msg))
try: # Refresh the metadata to show the new column
self.tbl = sqlalchemy.Table(self.tbl.name, self.tbl.metadata, extend_existing=True, autoload=True)
except exc.SQLAlchemyError as msg:
raise GRError(desc='Unable to establish metadata for new column. Contact support.',
data=str(self.outputs), other_info=str(msg))
Yes you can
Install sqlalchemy-migrate (pip install sqlalchemy-migrate) and use it in your script to call Table and Column create() method:
from sqlalchemy import String, MetaData, create_engine
from migrate.versioning.schema import Table, Column
db_engine = create_engine(app.config.get('SQLALCHEMY_DATABASE_URI'))
db_meta = MetaData(bind=db_engine)
table = Table('tabel_name' , db_meta)
col = Column('new_column_name', String(20), default='foo')
col.create(table)
Just continuing the simple way proposed by chasmani, little improvement
'''
# simple migration
# columns to add:
# last_status_change = Column(BigInteger, default=None)
# last_complete_phase = Column(String, default=None)
# complete_percentage = Column(DECIMAL, default=0.0)
'''
import sqlite3
from config import APP_STATUS_DB
from sqlalchemy import types
def add_column(database_name: str, table_name: str, column_name: str, data_type: types, default=None):
ret = False
if default is not None:
try:
float(default)
ddl = ("ALTER TABLE '{table_name}' ADD column '{column_name}' '{data_type}' DEFAULT {default}")
except:
ddl = ("ALTER TABLE '{table_name}' ADD column '{column_name}' '{data_type}' DEFAULT '{default}'")
else:
ddl = ("ALTER TABLE '{table_name}' ADD column '{column_name}' '{data_type}'")
sql_command = ddl.format(table_name=table_name, column_name=column_name, data_type=data_type.__name__,
default=default)
try:
connection = sqlite3.connect(database_name)
cursor = connection.cursor()
cursor.execute(sql_command)
connection.commit()
connection.close()
ret = True
except Exception as e:
print(e)
ret = False
return ret
add_column(APP_STATUS_DB, 'procedures', 'last_status_change', types.BigInteger)
add_column(APP_STATUS_DB, 'procedures', 'last_complete_phase', types.String)
add_column(APP_STATUS_DB, 'procedures', 'complete_percentage', types.DECIMAL, 0.0)
If using docker:
go to the terminal of the container holding your DB
get into the db: psql -U usr [YOUR_DB_NAME]
now you can alter tables using raw SQL: alter table [TABLE_NAME] add column [COLUMN_NAME] [TYPE]
Note you will need to have mounted your DB for the changes to persist between builds.
Adding the column "manually" (not using python or SQLAlchemy) is perhaps the easiest?
Same problem over here. What I will do is iterating over the db and add each entry to a new database with the extra column, then delete the old db and rename the new to this one.