I'm attempting to update around 500k rows in a SQLite database. I can create them rather quickly, but when I'm updating, it seems to be indefinitely hung, but I don't get an error message. (An insert of the same size took 35 seconds, this update has been at it for over 12 hours).
The portion of my code that does the updating is:
for line in result:
if --- blah blah blah ---:
stuff
else:
counter = 1
print("Starting to append result_list...")
result_list = []
for line in result:
result_list.append((str(line),counter))
counter += 1
sql = 'UPDATE BRFSS2015 SET ' + col[1] + \
' = ? where row_id = ?'
print("Executing SQL...")
c.executemany(sql, result_list)
print("Committing.")
conn.commit()
It prints "Executing SQL..." and presumably attempts the executemany and that's where its stuck. The variable "result" is a list of records and is working as far as I can tell because the insert statement is working and it is basically the same.
Am I misusing executemany? I see many threads on executemany(), but all of them as far as I can tell are getting an error message, not just hanging indefinitely.
For reference, the full code I have is below. Basically I'm trying to convert an ASCII file to a sqlite database. I know I could technically insert all columns at the same time, but the machines I have access to are all limited to 32bit Python and they run out of memory (this file is quite large, close to 1GB of text).
import pandas as pd
import sqlite3
ascii_file = r'c:\Path\to\file.ASC_'
sqlite_file = r'c:\path\to\sqlite.db'
conn = sqlite3.connect(sqlite_file)
c = conn.cursor()
# Taken from https://www.cdc.gov/brfss/annual_data/2015/llcp_varlayout_15_onecolumn.html
raw_list = [[1,"_STATE",2],
[17,"FMONTH",2],
... many other values here
[2154,"_AIDTST3",1],]
col_list = []
for col in raw_list:
begin = (col[0] - 1)
col_name = col[1]
end = (begin + col[2])
col_list.append([(begin, end,), col_name,])
for col in col_list:
print(col)
col_specification = [col[0]]
print("Parsing...")
data = pd.read_fwf(ascii_file, colspecs=col_specification)
print("Done")
result = data.iloc[:,[0]]
result = result.values.flatten()
sql = '''CREATE table if not exists BRFSS2015
(row_id integer NOT NULL,
''' + col[1] + ' text)'
print(sql)
c.execute(sql)
conn.commit()
sql = '''ALTER TABLE
BRFSS2015 ADD COLUMN ''' + col[1] + ' text'
try:
c.execute(sql)
print(sql)
conn.commit()
except Exception as e:
print("Error Happened instead")
print(e)
counter = 1
result_list = []
for line in result:
result_list.append((counter, str(line)))
counter += 1
if '_STATE' in col:
counter = 1
result_list = []
for line in result:
result_list.append((counter, str(line)))
counter += 1
sql = 'INSERT into BRFSS2015 (row_id,' + col[1] + ')'\
+ 'values (?,?)'
c.executemany(sql, result_list)
else:
counter = 1
print("Starting to append result_list...")
result_list = []
for line in result:
result_list.append((str(line),counter))
counter += 1
sql = 'UPDATE BRFSS2015 SET ' + col[1] + \
' = ? where row_id = ?'
print("Executing SQL...")
c.executemany(sql, result_list)
print("Committing.")
conn.commit()
print("Comitted... moving on to next column...")
For each row to be updated, the database has to search for that row. (This is not necessary when inserting.) If there is no index on the row_id column, then the database has to go through the entire table for each update.
It would be a better idea to insert entire rows at once. If that is not possible, create an index on row_id, or better, declare it as INTEGER PRIMARY KEY.
Related
I currently have a list of id's approx. of size 10,000. I need to update all rows in the mySQL table which have an id in the inactive_ids list that you see below. I need to change their active status to 'No' which is a column in the mySQL table.
I am using mysql.connector python library.
When I run the code below, it is taking about 0.7 seconds to execute each iteration in the for loop. Thats about a 2 hour run time for all 10,000 id's to be changed. Is there a more optimal/quicker way to do this?
# inactive_ids are unique strings something like shown below
# inactive_ids = ['a9okeoko', 'sdfhreaa', 'xsdfasy', ..., 'asdfad']
# initialize connection
mydb = mysql.connector.connect(
user="REMOVED",
password="REMOVED",
host="REMOVED",
database="REMOVED"
)
# initialize cursor
mycursor = mydb.cursor(buffered=True)
# Function to execute multiple lines
def alter(state, msg, count):
result = mycursor.execute(state, multi=True)
result.send(None)
print(str(count), ': ', msg, result)
count += 1
return count
# Try to execute, throw exception if fails
try:
count = 0
for Id in inactive_ids:
# SAVE THE QUERY AS STRING
sql_update = "UPDATE test_table SET Active = 'No' WHERE NoticeId = '" + Id + "'"
# ALTER
count = alter(sql_update, "done", count)
# commits all changes to the database
mydb.commit()
except Exception as e:
mydb.rollback()
raise e
Do it with a single query that uses IN (...) instead of multiple queries.
placeholders = ','.join(['%s'] * len(inactive_ids))
sql_update = f"""
UPDATE test_table
SET Active = 'No'
WHERE NoticeId IN ({placeholders})
"""
mycursor.execute(sql_update, inactive_ids)
I am pretty new in python developing. I have a long python script what "clone" a database and add additional stored functions and procedures. Clone means copy only the schema of DB.These steps work fine.
My question is about pymysql insert exection:
I have to copy some table contents into the new DB. I don't get any sql error. If I debug or print the created INSERT INTO command is correct (I've tested it in an sql editor/handler). The insert execution is correct becuse the result contain the exact row number...but all rows are missing from destination table in dest.DB...
(Ofcourse DB_* variables have been definied!)
import pymysql
liveDbConn = pymysql.connect(DB_HOST, DB_USER, DB_PWD, LIVE_DB_NAME)
testDbConn = pymysql.connect(DB_HOST, DB_USER, DB_PWD, TEST_DB_NAME)
tablesForCopy = ['role', 'permission']
for table in tablesForCopy:
with liveDbConn.cursor() as liveCursor:
# Get name of columns
liveCursor.execute("DESCRIBE `%s`;" % (table))
columns = '';
for column in liveCursor.fetchall():
columns += '`' + column[0] + '`,'
columns = columns.strip(',')
# Get and convert values
values = ''
liveCursor.execute("SELECT * FROM `%s`;" % (table))
for result in liveCursor.fetchall():
data = []
for item in result:
if type(item)==type(None):
data.append('NULL')
elif type(item)==type('str'):
data.append("'"+item+"'")
elif type(item)==type(datetime.datetime.now()):
data.append("'"+str(item)+"'")
else: # for numeric values
data.append(str(item))
v = '(' + ', '.join(data) + ')'
values += v + ', '
values = values.strip(', ')
print("### table: %s" % (table))
testDbCursor = testDbConn.cursor()
testDbCursor.execute("INSERT INTO `" + TEST_DB_NAME + "`.`" + table + "` (" + columns + ") VALUES " + values + ";")
print("Result: {}".format(testDbCursor._result.message))
liveDbConn.close()
testDbConn.close()
Result is:
### table: role
Result: b"'Records: 16 Duplicates: 0 Warnings: 0"
### table: permission
Result: b'(Records: 222 Duplicates: 0 Warnings: 0'
What am I doing wrong? Thanks!
You have 2 main issues here:
You don't use conn.commit() (which would be either be liveDbConn.commit() or testDbConn.commit() here). Changes to the database will not be reflected without committing those changes. Note that all changes need committing but SELECT, for example, does not.
Your query is open to SQL Injection. This is a serious problem.
Table names cannot be parameterized, so there's not much we can do about that, but you'll want to parameterize your values. I've made multiple corrections to the code in relation to type checking as well as parameterization.
for table in tablesForCopy:
with liveDbConn.cursor() as liveCursor:
liveCursor.execute("SELECT * FROM `%s`;" % (table))
name_of_columns = [item[0] for item in liveCursor.description]
insert_list = []
for result in liveCursor.fetchall():
data = []
for item in result:
if item is None: # test identity against the None singleton
data.append('NULL')
elif isinstance(item, str): # Use isinstance to check type
data.append(item)
elif isinstance(item, datetime.datetime):
data.append(item.strftime('%Y-%m-%d %H:%M:%S'))
else: # for numeric values
data.append(str(item))
insert_list.append(data)
testDbCursor = testDbConn.cursor()
placeholders = ', '.join(['`%s`' for item in insert_list[0]])
testDbCursor.executemany("INSERT INTO `{}.{}` ({}) VALUES ({})".format(
TEST_DB_NAME,
table,
name_of_columns,
placeholders),
insert_list)
testDbConn.commit()
From this github thread, I notice that executemany does not work as expected in psycopg2; it instead sends each entry as a single query. You'll need to use execute_batch:
from psycopg2.extras import execute_batch
execute_batch(testDbCursor,
"INSERT INTO `{}.{}` ({}) VALUES ({})".format(TEST_DB_NAME,
table,
name_of_columns,
placeholders),
insert_list)
testDbConn.commit()
How to insert data into table using python pymsql
Find my solution below
import pymysql
import datetime
# Create a connection object
dbServerName = "127.0.0.1"
port = 8889
dbUser = "root"
dbPassword = ""
dbName = "blog_flask"
# charSet = "utf8mb4"
conn = pymysql.connect(host=dbServerName, user=dbUser, password=dbPassword,db=dbName, port= port)
try:
# Create a cursor object
cursor = conn.cursor()
# Insert rows into the MySQL Table
now = datetime.datetime.utcnow()
my_datetime = now.strftime('%Y-%m-%d %H:%M:%S')
cursor.execute('INSERT INTO posts (post_id, post_title, post_content, \
filename,post_time) VALUES (%s,%s,%s,%s,%s)',(5,'title2','description2','filename2',my_datetime))
conn.commit()
except Exception as e:
print("Exeception occured:{}".format(e))
finally:
conn.close()
I am moving data from MySQL to MSSQL - however I have a problem with insert into statement when I have ' in value.
for export i have used code below:
import pymssql
import mysql.connector
conn = pymssql.connect(host='XXX', user='XXX',
password='XXX', database='XXX')
sqlcursor = conn.cursor()
cnx = mysql.connector.connect(user='root',password='XXX',
database='XXX')
cursor = cnx.cursor()
sql= "SELECT Max(ID) FROM XXX;"
cursor.execute(sql)
row=cursor.fetchall()
maxID = str(row)
maxID = maxID.replace("[(", "")
maxID = maxID.replace(",)]", "")
AMAX = int(maxID)
LC = 1
while LC <= AMAX:
LCC = str(LC)
sql= "SELECT * FROM XX where ID ='"+ LCC +"'"
cursor.execute(sql)
result = cursor.fetchall()
data = str(result)
data = data.replace("[(","")
data = data.replace(")]","")
data = data.replace("None","NULL")
#print(row)
si = "insert into [XXX].[dbo].[XXX] select " + data
#print(si)
#sys.exit("stop")
try:
sqlcursor.execute(si)
conn.commit()
except Exception:
print("-----------------------")
print(si)
LC = LC + 1
print('Import done | total count:', LC)
It is working fine until I have ' in one of my values:
'N', '0000000000', **"test string'S nice company"**
I would like to avoid spiting the data into columns and then checking if there is ' in the data - as my table has about 500 fields.
Is there a smart way of replacing ' with ''?
Answer:
Added SET QUOTED_IDENTIFIER OFF to insert statement:
si = "SET QUOTED_IDENTIFIER OFF insert into [TechAdv].[dbo].[aem_data_copy]
select " + data
In MSSQL, you can SET QUOTED_IDENTIFIER OFF, then you can use double quotes to escape a singe quote, or use two single quotes to escape one quote.
I am trying to fetch records after a regular interval from a database table which growing with records. I am using Python and its pyodbc package to carry out the fetching of records. While fetching, how can I point the cursor to the next row of the row which was read/fetched last so that with every fetch I can only get the new set of records inserted.
To explain more,
my table has 100 records and they are fetched.
after an interval the table has 200 records and I want to fetch rows from 101 to 200. And so on.
Is there a way with pyodbc cursor?
Or any other suggestion would be very helpful.
Below is the code I am trying:
#!/usr/bin/python
import pyodbc
import csv
import time
conn_str = (
"DRIVER={PostgreSQL Unicode};"
"DATABASE=postgres;"
"UID=userid;"
"PWD=database;"
"SERVER=localhost;"
"PORT=5432;"
)
conn = pyodbc.connect(conn_str)
cursor = conn.cursor()
def fetch_table(**kwargs):
qry = kwargs['qrystr']
try:
#cursor = conn.cursor()
cursor.execute(qry)
all_rows = cursor.fetchall()
rowcnt = cursor.rowcount
rownum = cursor.description
#return (rowcnt, rownum)
return all_rows
except pyodbc.ProgrammingError as e:
print ("Exception occured as :", type(e) , e)
def poll_db():
for i in [1, 2]:
stmt = "select * from my_database_table"
rows = fetch_table(qrystr = stmt)
print("***** For i = " , i , "******")
for r in rows:
print("ROW-> ", r)
time.sleep(10)
poll_db()
conn.close()
I don't think you can use pyodbc, or any other odbc package, to find "new" rows. But if there is a 'timestamp' column in your database, or if you can add such a column (some databases allow for it to be automatically populated as the time of insertion so you don't have to change the insert queries) then you can change your query to select only the rows whose timestamp is greater than the previous timestamp. And you can keep changing the prev_timestamp variable on each iteration.
def poll_db():
prev_timestamp = ""
for i in [1, 2]:
if prev_timestamp == "":
stmt = "select * from my_database_table"
else:
# convert your timestamp str to match the database's format
stmt = "select * from my_database_table where timestamp > " + str(prev_timestamp)
rows = fetch_table(qrystr = stmt)
prev_timestamp = datetime.datetime.now()
print("***** For i = " , i , "******")
for r in rows:
print("ROW-> ", r)
time.sleep(10)
I am trying to create a program using Python 3.1 and Sqlite3. The program will open a text file and read parameters to pass for the select query and output a text file with the result. I am getting stuck on the cursor.execute(query) statement. I may be doing everything incorrect. Any help would be appreciated.
import sqlite3
# Connect to database and test
#Make sure the database is in the same folder as the python script folder
conn = sqlite3.connect("nnhs.sqlite3")
if (conn):
print ("Connection successful")
else:
print ("Connection not successful")
# Create a cursor to execute SQL queries
cursor = conn.cursor()
# Read data from a file
data = []
infile = open ("patient_in.txt", "r")
for line in infile:
line = line.rstrip("\n")
line = line.strip()
seq = line.split(' ')
seq[5] = int(seq[5])
seq = tuple (seq)
data.append(seq)
infile.close()
# Check that the data has been read correctly
print
print ("Check that the data was read from file")
print (data)
# output file
outfile = open("patient_out.txt", "w")
# select statement
query = "SELECT DISTINCT patients.resnum, patients.facnum, patients.sex, patients.age, patients.rxmed, icd9_1.resnum, icd9_1.code "
query += "from patients "
query += "INNER JOIN icd9 as icd9_1 on (icd9_1.resnum = patients.resnum) AND (icd9_1.code LIKE ':6%') "
query += "INNER JOIN icd9 as icd9_2 on (icd9_2.resnum = patients.resnum) AND (icd9_2.code LIKE ':6%') "
query += "(where patients.age >= :2) AND (patients.age <= :3) "
query += "AND patients.sex = :1 "
query += "AND (patients.rxmed >= :4) AND (patients.rxmed <= :5) "
query += "ORDER BY patients.resnum;"
result = cursor.execute(query)
for row in result:
ResultNumber = row[0]
FacNumber = row[1]
Sex = row[2]
Age = row[3]
RxMed = row[4]
ICDResNum = row[5]
ICDCode = row[6]
outfile.write("Patient Id Number: " + str(ResultNumber) + "\t" + " ICD Res Num: " + str(ICDResNum) + "\t" + " Condition: " + str(ICDCode) + "\t" + " Fac ID Num: " + str(FacNumber) + "\t" + " Patient Sex: " + str(Sex) + "\t" + " Patient Age: " + str(Age) + "\t" +" Number of Medications: " + str(RxMed) + "\t" + "\n")
# Close the cursor
cursor.close()
# Close the connection
con.close()
You have read multiple rows of query parameters and stored them in data and then ... nothing. data is a misleading name. Let's call it queries instead.
You presumably want to iterate over queries and perform one query for each row in queries. So do that: for query_row in queries: .....
Also let's rename query to sql.
You'll need result = cursor.execute(sql, query_row)
You'll also need to decide whether you want to have a different output file for each query_row, or have only one file with a field (or sub-heading) to distinguish what info comes from what query_row.
Update about parameter passing with sqlite3
It appears not to be documented, but if you use numbered place holders, you can supply a tuple of arguments -- you don't need to supply a dict. The following example presumes a database blah.db with an empty table created by
create table foo (id int, name text, amt int);
>>> import sqlite3
>>> conn = sqlite3.connect('blah.db')
>>> curs = conn.cursor()
>>> sql = 'insert into foo values(:1,:2,:1);'
>>> curs.execute(sql, (42, 'bar'))
<sqlite3.Cursor object at 0x01E3D520>
>>> result = curs.execute('select * from foo;')
>>> print list(result)
[(42, u'bar', 42)]
>>> curs.execute(sql, {'1':42, '2':'bar'})
<sqlite3.Cursor object at 0x01E3D520>
>>> result = curs.execute('select * from foo;')
>>> print list(result)
[(42, u'bar', 42), (42, u'bar', 42)]
>>> curs.execute(sql, {1:42, 2:'bar'})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
sqlite3.ProgrammingError: You did not supply a value for binding 1.
>>>
Update 2a You have a bug in this line of your SQL (and the following one):
INNER JOIN icd9 as icd9_1 on (icd9_1.resnum = patients.resnum) AND (icd9_1.code LIKE ':6%')
If your parameter is the Python string "XYZ", the resultant SQL will be ... LIKE ''XYZ'%') which is not what you want. The db interface will always quote your supplied string. You want ... LIKE 'XYZ%'). What you should do is have ... LIKE :6) in your SQL, and pass e.g. user_input[5].rstrip("%") + "%" (ensures exactly 1 %) as the parameter.
Update 2b You can of course use a dictionary for the parameters, as documented, but it would improve the legibility considerably if you used meaningful names instead of digits.
For example, ... LIKE :code) instead of the above, and pass e.g. {'code': user_input[5].rstrip("%"), .....} as the second arg of execute()
:2? These are placeholders for parameters, which you're not giving it. Take a look at the module docs for how to call execute with a tuple of parameters:
http://docs.python.org/library/sqlite3.html#sqlite3.Cursor.execute
Yeah, it looks like you need to add the parameters. Also, changing the line query += "(where patients.age >= :2) AND (patients.age <= :3) " to query += "where (patients.age >= :2) AND (patients.age <= :3) " might help.
Here is a slightly more python-ish way of writing your code... Although PEP8 might say otherwise...
import sqlite3
with open sqlite3.connect("nnhs.sqlite3") as f:
cursor = f.cursor()
data = []
with open("patient_in.txt", "r") as infile:
for line in infile:
line = line.rstrip("\n")
line = line.strip()
seq = line.split(' ')
seq[5] = int(seq[5])
seq = tuple (seq)
data.append(seq)
print(data)
with open("patient_out.txt", "w") as outfile:
query = """SELECT DISTINCT patients.resnum, patients.facnum,patients.sex, patients.age, patients.rxmed, icd9_1.resnum, icd9_1.code
from patients
INNER JOIN icd9 as icd9_1 on (icd9_1.resnum = patients.resnum) AND (icd9_1.code LIKE ':6%')
INNER JOIN icd9 as icd9_2 on (icd9_2.resnum = patients.resnum) AND (icd9_2.code LIKE ':6%')
(where patients.age >= :2) AND (patients.age <= :3)
AND patients.sex = :1
AND (patients.rxmed >= :4) AND (patients.rxmed <= :5)
ORDER BY patients.resnum"""
variables = {'1':sex, '2':age_lowerbound, '3':age_upperbound, '4':rxmed_lowerbound, '5':rxmed_upperbound, '6':icd9_1}
cursor.execute(query, variables)
for row in cursor.fetchall():
outfile.write("Patient Id Number: {0}\t ICD Res Num: {1}\t Condition: {2}\t Fac ID Num: {3}\t Patient Sex: {4}\t Patient Age: {5}\t Number of Medications: {6}\n".format(*row))