I am trying to select all the records from a sqlite db I have with sqlalchemy, loop over each one and do an update on it. I am doing this because I need to reformat ever record in my name column.
Here is the code I am using to do a simple test:
def loadDb(name):
sqlite3.connect(name)
engine = create_engine('sqlite:///'+dbPath(), echo=False)
metadata = MetaData(bind=engine)
return metadata
db = database("dealers.db")
metadata = db.loadDb()
dealers = Table('dealers', metadata, autoload=True)
dealer = dealers.select().order_by(asc(dealers.c.id)).execute()
for d in dealer:
u = dealers.update(dealers.c.id==d.id)
u.execute(name="hi")
break
I'm getting the error:
sqlalchemy.exc.OperationalError: (OperationalError) database table is locked u'UPDATE dealers SET name=? WHERE dealers.id = ?' ('hi', 1)
I'm very new to sqlalchemy and I'm not sure what this error means or how to fix it. This seems like it should be a really simple task, so I know I am doing something wrong.
With SQLite, you can't update the database while you are still performing the select. You need to force the select query to finish and store all of the data, then perform your loop. I think this would do the job (untested):
dealer = list(dealers.select().order_by(asc(dealers.c.id)).execute())
Another option would be to make a slightly more complicated SQL statement so that the loop executes inside the database instead of in Python. That will certainly give you a big performance boost.
Related
I am executing a number of SELECT queries on a postgres database using psycopg2, but I am getting ERROR: Out of shared memory. It suggests that I should increase max_locks_per_transaction., but this confuses me because each SELECT query is operating on only one table, and max_locks_per_transaction is already set to 512, 8 times the default.
I am using TimescaleDB, which could be the result of a larger than normal number of locks (one for each chunk rather than one for each table, maybe), but this still can't explain running out when so many are allowed. I'm assuming what is happening here is that all the queries are all being run as part of one transaction.
The code I am using looks something as follows.
db = DatabaseConnector(**connection_params)
tables = db.get_table_list()
for table in tables:
result = db.query(f"""SELECT a, b, COUNT(c) FROM {table} GROUP BY a, b""")
print(result)
Where db.query is defined as:
def query(self, sql):
with self._connection.cursor() as cur:
cur.execute(sql)
return_value = cur.fetchall()
return return_value
and self._connection is:
self._connection = psycopg2.connect(**connection_params)
Do I need to explicitly end the transaction in some way to free up locks? And how can I go about doing this in psycopg2? I would have assumed that there was an implicit end to the transaction when the cursor is closed on __exit__. I know if I was inserting or deleting rows I would use COMMIT at the end, but it seems strange to use as I am not changing the table.
UPDATE: When I explicitly open and close the connection in the loop, the error does not show. However, I assume there is a better way to end the transaction after each SELECT than this.
I've been running a Jupyter Notebook with Python3 code for quite some time. It uses a combination of pyodbc and SQLAlchemy to connect to a few SQL Server databases on my company intranet. The purpose of the notebook is to pull data from an initial SQL Server database and store it in memory as a Pandas dataframe. The file then extracts specific values from one of the columns and sends that list of values through to two different SQL Server databases to pull back a mapping list.
All of this has been working great until I decided to rewrite the raw SQL queries as SQLAlchemy Core statements. I've gone though the process of validating that the SQLAlchemy queries compile to match the raw SQL queries. However, the queries run unimaginably slow. For instance, the initial raw SQL query runs in 25 seconds and the same query rewritten in SQLAlchemy Core runs in 15 minutes! The remaining queries didn't finish even after letting them run for 2 hours.
This could have something to do with how I'm reflecting the existing tables. I even took some time to override the ForeignKey and primary_key on the tables hoping that'd help improve performance. No dice.
I also know "if it ain't broke, don't fix it." But SQLAlchemy just looks so much nicer than a nasty block of hard coded SQL.
Can anyone explain why the SQLAlchemy queries are running so slowly. The SQLAlchemy docs don't give much insight. I'm running SQLAlchemy version 1.2.11.
import sqlalchemy
sqlalchemy.__version__
'1.2.11'
Here are the relevant lines. I'm excluding the exports for brevity but in case anyone needs to see that I'll be happy to supply it.
engine_dr2 = create_engine("mssql+pyodbc://{}:{}#SER-DR2".format(usr, pwd))
conn = engine_dr2.connect()
metadata_dr2 = MetaData()
bv = Table('BarVisits', metadata_dr2, autoload=True, autoload_with=engine_dr2, schema='livecsdb.dbo')
bb = Table('BarBillsUB92Claims', metadata_dr2, autoload=True, autoload_with=engine_dr2, schema='livecsdb.dbo')
mask = data['UCRN'].str[:2].isin(['DA', 'DB', 'DC'])
dr2 = data.loc[mask, 'UCRN'].unique().tolist()
sql_dr2 = select([bv.c.AccountNumber.label('Account_Number'),
bb.c.UniqueClaimReferenceNumber.label('UCRN')])
sql_dr2 = sql_dr2.select_from(bv.join(bb, and_(bb.c.SourceID == bv.c.SourceID,
bb.c.BillingID == bv.c.BillingID)))
sql_dr2 = sql_dr2.where(bb.c.UniqueClaimReferenceNumber.in_(dr2))
mapping_list = pd.read_sql(sql_dr2, conn)
conn.close()
The raw SQL query that should match sql_dr2 and runs lickety split is here:
"""SELECT Account_Number = z.AccountNumber, UCRN = y.UniqueClaimReferenceNumber
FROM livecsdb.dbo.BarVisits z
INNER JOIN
livecsdb.dbo.BarBillsUB92Claims y
ON
y.SourceID = z.SourceID
AND
y.BillingID = z.BillingID
WHERE
y.UniqueClaimReferenceNumber IN ({0})""".format(', '.join(["'" + acct + "'" for acct in dr2]))
The list dr2 typically contains upwards of 70,000 elements. Again, the raw SQL handles this in one minute or less. The SQLAlchemy rewrite has been running for 8+ hours now and still not done.
Update
Additional information is provided below. I don't own the database or the tables and they contain protected health information so it's not something I can directly share but I'll see about making some mock data.
tables = ['BarVisits', 'BarBillsUB92Claims']
for t in tables:
print(insp.get_foreign_keys(t))
[], []
for t in tables:
print(insp.get_indexes(t))
[{'name': 'BarVisits_SourceVisit', 'unique': False, 'column_names': ['SourceID', 'VisitID']}]
[]
for t in tables:
print(insp.get_pk_constraint(t))
{'constrained_columns': ['BillingID', 'SourceID'], 'name': 'mtpk_visits'}
{'constrained_columns': ['BillingID', 'BillNumberID', 'ClaimID', 'ClaimInsuranceID', 'SourceID'], 'name': 'mtpk_insclaim'}
Thanks in advance for any insight.
I figured out how to make the query run fast but have no idea why it's needed with this server and not any others.
Taking
sql_dr2 = str(sql.compile(dialect=mssql.dialect(), compile_kwargs={"literal_binds": True}))
and sending that through
pd.read_sql(sql_dr2, conn)
performs the query in about 2 seconds.
Again, I have no idea why that works but it does.
I'm connecting to an Oracle database from sqlalchemy and I want to know when the tables in the database were created. I can access this information through the sql developer application so I know that it is stored somewhere, but I don't know if its possible to get this information from sqlalchemy.
Also if its not possible, how should I be getting it?
SqlAlchemy doesn't provide anything to help you get that information. You have to query the database yourself.
something like:
with engine.begin() as c:
result = c.execute("""
SELECT created
FROM dba_objects
WHERE object_name = <<your table name>>
AND object_type = 'TABLE'
""")
I'm trying to populate a couple databases with psycopg2 within a server I am not the root user of (don't know if it's relevant or not). My code looks like
import json
from psycopg2 import connect
cors = connect(user='jungal01', dbname='course')
req = connect(user="jungal01", dbname='requirement')
core = cors.cursor()
reqs = req.cursor()
with open('gened.json') as gens:
geneds = json.load(gens)
for i in range(len(geneds)):
core.execute('''insert into course (number, description, title)
values({0}, {1}, {2});''' .format(geneds[i]["number"], geneds[i]['description'], geneds[i]['title'] ))
reqs.execute('''insert into requirement (fulfills)
values({0});''' .format(geneds[i]['fulfills'] ))
db.commit()
when I execute the code, I get the above pycopg2 error. I know that these particular databases exist, but I just can't figure out why it won't connect to my databases. (side quest, I am also unsure about that commit statement. Should it be in the for loop, or outside of it? It suppose to be database specific?)
First, you have db is not a defined variable, so you code shouldn't run completely anyway.
\list on this server is a bunch of databases full of usernames, of which my username is one
Then the following is how you should connect. To a database, not a table, and the regular pattern is to put the database name, and then the user/pass.
A "schema" is a loose term in relational database. Both tables and databases have schemas, but you seem to be expecting to connect to a table, not a database.
So, try this code with an attempt at fixing your indentation and SQL injection problem -- See this documentation
Note that you first must have created the two tables in the database you are connecting to.
import json
from psycopg2 import connect
username = 'jungal01'
conn = connect(dbname=username, user=username)
cur = conn.cursor()
with open('gened.json') as gens:
geneds = json.load(gens)
for g in geneds:
cur.execute('''insert into course (number, description, title)
values(%(number)s, %(description)s, %(title)s);''', g)
cur.execute('''insert into requirement (fulfills)
values(%(fulfills)s);''', g)
conn.commit()
Allen, you said: "in postgres, tables are databases." That's wrong. Your error message results from this misunderstanding. You want to connect to a database, and insert into a table that exists in that database. You're trying to insert into a database -- a nonsensical operation.
Make sure you are giving the catalog name as database name and not the schema's under catalog.
Catalog is confusing and quite unnecessary. More details below: What's the difference between a catalog and a schema in a relational database?
I'm trying to get the last inserted row id from an sqlalchemy insert with sqlite. It appears this should be easy but I haven't managed to figure it out yet and didn't find anything in my searches so far. I'm a new to python and I know there are some similar posts so I hope this is not a repeat. Below is some simple sample script:
from sqlalchemy import *
engine = create_engine('sqlite:///myTest.db',echo=False)
metadata = MetaData()
dbTable1 = Table('dbTable1', metadata,Column('ThisNum', Integer),Column('ThisString', String))
metadata.create_all(engine)
dbConn = engine.connect()
insertList={}
insertList['ThisNum']=1
insertList['ThisString']='test'
insertStatement = dbTable1.insert().values(insertList)
lastInsertID = dbConn.execute(insertStatement).inserted_primary_key
The value returned is empty. I get the same result using
lastInsertID = dbConn.execute(insertStatement).last_inserted_ids()
I can get the last rowid using a separate statement after the insert:
lastInsertID = dbConn.execute('SELECT last_insert_rowid();')
But this would not guarantee the database had not been accessed in between the executions so the returned ID might not be correct. Lastly I tried executing the insert and select statements in one execution for instance:
lastInsertID = dbConn.execute('INSERT INTO "dbTable1" ("ThisNum", "ThisString") VALUES (1, "test"); SELECT last_insert_rowid();')
But this gives the error: sqlite3.Warning: You can only execute one statement at a time.
Any help would be appreciated. Thanks.