I am trying to create a program using Python 3.1 and Sqlite3. The program will open a text file and read parameters to pass for the select query and output a text file with the result. I am getting stuck on the cursor.execute(query) statement. I may be doing everything incorrect. Any help would be appreciated.
import sqlite3
# Connect to database and test
#Make sure the database is in the same folder as the python script folder
conn = sqlite3.connect("nnhs.sqlite3")
if (conn):
print ("Connection successful")
else:
print ("Connection not successful")
# Create a cursor to execute SQL queries
cursor = conn.cursor()
# Read data from a file
data = []
infile = open ("patient_in.txt", "r")
for line in infile:
line = line.rstrip("\n")
line = line.strip()
seq = line.split(' ')
seq[5] = int(seq[5])
seq = tuple (seq)
data.append(seq)
infile.close()
# Check that the data has been read correctly
print
print ("Check that the data was read from file")
print (data)
# output file
outfile = open("patient_out.txt", "w")
# select statement
query = "SELECT DISTINCT patients.resnum, patients.facnum, patients.sex, patients.age, patients.rxmed, icd9_1.resnum, icd9_1.code "
query += "from patients "
query += "INNER JOIN icd9 as icd9_1 on (icd9_1.resnum = patients.resnum) AND (icd9_1.code LIKE ':6%') "
query += "INNER JOIN icd9 as icd9_2 on (icd9_2.resnum = patients.resnum) AND (icd9_2.code LIKE ':6%') "
query += "(where patients.age >= :2) AND (patients.age <= :3) "
query += "AND patients.sex = :1 "
query += "AND (patients.rxmed >= :4) AND (patients.rxmed <= :5) "
query += "ORDER BY patients.resnum;"
result = cursor.execute(query)
for row in result:
ResultNumber = row[0]
FacNumber = row[1]
Sex = row[2]
Age = row[3]
RxMed = row[4]
ICDResNum = row[5]
ICDCode = row[6]
outfile.write("Patient Id Number: " + str(ResultNumber) + "\t" + " ICD Res Num: " + str(ICDResNum) + "\t" + " Condition: " + str(ICDCode) + "\t" + " Fac ID Num: " + str(FacNumber) + "\t" + " Patient Sex: " + str(Sex) + "\t" + " Patient Age: " + str(Age) + "\t" +" Number of Medications: " + str(RxMed) + "\t" + "\n")
# Close the cursor
cursor.close()
# Close the connection
con.close()
You have read multiple rows of query parameters and stored them in data and then ... nothing. data is a misleading name. Let's call it queries instead.
You presumably want to iterate over queries and perform one query for each row in queries. So do that: for query_row in queries: .....
Also let's rename query to sql.
You'll need result = cursor.execute(sql, query_row)
You'll also need to decide whether you want to have a different output file for each query_row, or have only one file with a field (or sub-heading) to distinguish what info comes from what query_row.
Update about parameter passing with sqlite3
It appears not to be documented, but if you use numbered place holders, you can supply a tuple of arguments -- you don't need to supply a dict. The following example presumes a database blah.db with an empty table created by
create table foo (id int, name text, amt int);
>>> import sqlite3
>>> conn = sqlite3.connect('blah.db')
>>> curs = conn.cursor()
>>> sql = 'insert into foo values(:1,:2,:1);'
>>> curs.execute(sql, (42, 'bar'))
<sqlite3.Cursor object at 0x01E3D520>
>>> result = curs.execute('select * from foo;')
>>> print list(result)
[(42, u'bar', 42)]
>>> curs.execute(sql, {'1':42, '2':'bar'})
<sqlite3.Cursor object at 0x01E3D520>
>>> result = curs.execute('select * from foo;')
>>> print list(result)
[(42, u'bar', 42), (42, u'bar', 42)]
>>> curs.execute(sql, {1:42, 2:'bar'})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
sqlite3.ProgrammingError: You did not supply a value for binding 1.
>>>
Update 2a You have a bug in this line of your SQL (and the following one):
INNER JOIN icd9 as icd9_1 on (icd9_1.resnum = patients.resnum) AND (icd9_1.code LIKE ':6%')
If your parameter is the Python string "XYZ", the resultant SQL will be ... LIKE ''XYZ'%') which is not what you want. The db interface will always quote your supplied string. You want ... LIKE 'XYZ%'). What you should do is have ... LIKE :6) in your SQL, and pass e.g. user_input[5].rstrip("%") + "%" (ensures exactly 1 %) as the parameter.
Update 2b You can of course use a dictionary for the parameters, as documented, but it would improve the legibility considerably if you used meaningful names instead of digits.
For example, ... LIKE :code) instead of the above, and pass e.g. {'code': user_input[5].rstrip("%"), .....} as the second arg of execute()
:2? These are placeholders for parameters, which you're not giving it. Take a look at the module docs for how to call execute with a tuple of parameters:
http://docs.python.org/library/sqlite3.html#sqlite3.Cursor.execute
Yeah, it looks like you need to add the parameters. Also, changing the line query += "(where patients.age >= :2) AND (patients.age <= :3) " to query += "where (patients.age >= :2) AND (patients.age <= :3) " might help.
Here is a slightly more python-ish way of writing your code... Although PEP8 might say otherwise...
import sqlite3
with open sqlite3.connect("nnhs.sqlite3") as f:
cursor = f.cursor()
data = []
with open("patient_in.txt", "r") as infile:
for line in infile:
line = line.rstrip("\n")
line = line.strip()
seq = line.split(' ')
seq[5] = int(seq[5])
seq = tuple (seq)
data.append(seq)
print(data)
with open("patient_out.txt", "w") as outfile:
query = """SELECT DISTINCT patients.resnum, patients.facnum,patients.sex, patients.age, patients.rxmed, icd9_1.resnum, icd9_1.code
from patients
INNER JOIN icd9 as icd9_1 on (icd9_1.resnum = patients.resnum) AND (icd9_1.code LIKE ':6%')
INNER JOIN icd9 as icd9_2 on (icd9_2.resnum = patients.resnum) AND (icd9_2.code LIKE ':6%')
(where patients.age >= :2) AND (patients.age <= :3)
AND patients.sex = :1
AND (patients.rxmed >= :4) AND (patients.rxmed <= :5)
ORDER BY patients.resnum"""
variables = {'1':sex, '2':age_lowerbound, '3':age_upperbound, '4':rxmed_lowerbound, '5':rxmed_upperbound, '6':icd9_1}
cursor.execute(query, variables)
for row in cursor.fetchall():
outfile.write("Patient Id Number: {0}\t ICD Res Num: {1}\t Condition: {2}\t Fac ID Num: {3}\t Patient Sex: {4}\t Patient Age: {5}\t Number of Medications: {6}\n".format(*row))
Related
I created a python script with a date argument which allows to extract data from a file (4.2 MB) to feed a table; when executing it shows me this error :
File "./insert_pru_data.py", line 136, in <module>
importYear(year)
File "./insert_pru_data.py", line 124, in importYear
SQLrequest += "(" + ", ".join(data_to_insert[i]) + "),\n"
MemoryError
My Code:
def importYear(year):
go = True
if isAlreadyInserted(year):
if replace == False:
print("donnees pour annee " + year + " deja inserees, action annulee")
go = False
else:
print("donnees pour annee " + year + " deja inserees, les donnees seront remplacees")
deleteData(year)
if go:
data_to_insert = getDataToInsert(data)
SQLrequest = "INSERT INTO my_table (date_h, day, area, h_type, act, dir, ach) VALUES\n"
i = 0
print(data_to_insert)
while i < len(data_to_insert) - 1:
data_to_insert[i] = ["None" if element == None else element for element in data_to_insert[i]]
SQLrequest += "(" + ", ".join(data_to_insert[i]) + "),\n"
SQLrequest += "(" + ", ".join(data_to_insert[len(data_to_insert) - 1]) + ");"
with psycopg2.connect(connString) as conn: # Ouverture connexion a la base
with conn.cursor() as cur:
cur.execute(SQLrequest)
cur.execute("COMMIT")
cur.close()
importYear(year)
, someone help me to know how to solve this problem?
Firstly, avoid constructing an SQL query like this; sooner or later, one of the values to be inserted will have something like a quote and then everything will break. It's one of the more common security problems on the internet (SQL injection).
The cur.execute() function can take two arguments - the query (with placeholders) and then the values to be inserted:
cur.execute("insert into tbl (a, b) values (%s, %s)", (1, 2))
Rather than inserting all the data at once, read them from the file in groups of 100 or 1000 or something; small enough to fit into memory easily, large enough that there aren't too many round-trips.
There is an execute_values() function which does exactly what you want; you give it a query and a list of tuples:
execute_values(cur, "insert into tbl (a, b) values %s", [(1, 2), (3, 4)])
Sorry if this is a noob question, but I am trying to dump a psycopg2 dictionary directly into a json string. I do get a return value in the browser, but it isn't formatted like most of the other json examples I see. The idea being to dump the result of a select statement into a json string and unbundle it on the other end to add into a database on the client side. The code is below and a sample of the return. Is there a better way to do this operation with json and psycopg2?
# initializing variables
location_connection = location_cursor = 0
sql_string = coordinate_return = data = ""
# opening connection and setting cursor
location_connection = psycopg2.connect("dbname='' user='' password=''")
location_cursor = location_connection.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
# setting sql string and executing query
sql_string = "select * from " + tablename + " where left(alphacoordinate," + str(len(coordinate)) + ") = '" + coordinate + "' order by alphacoordinate;"
location_cursor.execute(sql_string)
data = json.dumps(location_cursor.fetchall())
# closing database connection
location_connection.close()
# returning coordinate string
return data
sample return
"[{\"alphacoordinate\": \"nmaan-001-01\", \"xcoordinate\":
3072951151886, \"planetarydiameter\": 288499, \"planetarymass\":
2.020936938e+27, \"planetarydescription\": \"PCCGQAAA\", \"planetarydescriptionsecondary\": 0, \"moons\": 1"\"}]"
You could create the JSON string directly in Postgres using row_to_json:
# setting sql string and executing query
sql_string = "select row_to_json(" + tablename + ") from " + tablename + " where left(alphacoordinate," + str(len(coordinate)) + ") = '" + coordinate + "' order by alphacoordinate;"
location_cursor.execute(sql_string)
data = location_cursor.fetchall()
I am pretty new in python developing. I have a long python script what "clone" a database and add additional stored functions and procedures. Clone means copy only the schema of DB.These steps work fine.
My question is about pymysql insert exection:
I have to copy some table contents into the new DB. I don't get any sql error. If I debug or print the created INSERT INTO command is correct (I've tested it in an sql editor/handler). The insert execution is correct becuse the result contain the exact row number...but all rows are missing from destination table in dest.DB...
(Ofcourse DB_* variables have been definied!)
import pymysql
liveDbConn = pymysql.connect(DB_HOST, DB_USER, DB_PWD, LIVE_DB_NAME)
testDbConn = pymysql.connect(DB_HOST, DB_USER, DB_PWD, TEST_DB_NAME)
tablesForCopy = ['role', 'permission']
for table in tablesForCopy:
with liveDbConn.cursor() as liveCursor:
# Get name of columns
liveCursor.execute("DESCRIBE `%s`;" % (table))
columns = '';
for column in liveCursor.fetchall():
columns += '`' + column[0] + '`,'
columns = columns.strip(',')
# Get and convert values
values = ''
liveCursor.execute("SELECT * FROM `%s`;" % (table))
for result in liveCursor.fetchall():
data = []
for item in result:
if type(item)==type(None):
data.append('NULL')
elif type(item)==type('str'):
data.append("'"+item+"'")
elif type(item)==type(datetime.datetime.now()):
data.append("'"+str(item)+"'")
else: # for numeric values
data.append(str(item))
v = '(' + ', '.join(data) + ')'
values += v + ', '
values = values.strip(', ')
print("### table: %s" % (table))
testDbCursor = testDbConn.cursor()
testDbCursor.execute("INSERT INTO `" + TEST_DB_NAME + "`.`" + table + "` (" + columns + ") VALUES " + values + ";")
print("Result: {}".format(testDbCursor._result.message))
liveDbConn.close()
testDbConn.close()
Result is:
### table: role
Result: b"'Records: 16 Duplicates: 0 Warnings: 0"
### table: permission
Result: b'(Records: 222 Duplicates: 0 Warnings: 0'
What am I doing wrong? Thanks!
You have 2 main issues here:
You don't use conn.commit() (which would be either be liveDbConn.commit() or testDbConn.commit() here). Changes to the database will not be reflected without committing those changes. Note that all changes need committing but SELECT, for example, does not.
Your query is open to SQL Injection. This is a serious problem.
Table names cannot be parameterized, so there's not much we can do about that, but you'll want to parameterize your values. I've made multiple corrections to the code in relation to type checking as well as parameterization.
for table in tablesForCopy:
with liveDbConn.cursor() as liveCursor:
liveCursor.execute("SELECT * FROM `%s`;" % (table))
name_of_columns = [item[0] for item in liveCursor.description]
insert_list = []
for result in liveCursor.fetchall():
data = []
for item in result:
if item is None: # test identity against the None singleton
data.append('NULL')
elif isinstance(item, str): # Use isinstance to check type
data.append(item)
elif isinstance(item, datetime.datetime):
data.append(item.strftime('%Y-%m-%d %H:%M:%S'))
else: # for numeric values
data.append(str(item))
insert_list.append(data)
testDbCursor = testDbConn.cursor()
placeholders = ', '.join(['`%s`' for item in insert_list[0]])
testDbCursor.executemany("INSERT INTO `{}.{}` ({}) VALUES ({})".format(
TEST_DB_NAME,
table,
name_of_columns,
placeholders),
insert_list)
testDbConn.commit()
From this github thread, I notice that executemany does not work as expected in psycopg2; it instead sends each entry as a single query. You'll need to use execute_batch:
from psycopg2.extras import execute_batch
execute_batch(testDbCursor,
"INSERT INTO `{}.{}` ({}) VALUES ({})".format(TEST_DB_NAME,
table,
name_of_columns,
placeholders),
insert_list)
testDbConn.commit()
How to insert data into table using python pymsql
Find my solution below
import pymysql
import datetime
# Create a connection object
dbServerName = "127.0.0.1"
port = 8889
dbUser = "root"
dbPassword = ""
dbName = "blog_flask"
# charSet = "utf8mb4"
conn = pymysql.connect(host=dbServerName, user=dbUser, password=dbPassword,db=dbName, port= port)
try:
# Create a cursor object
cursor = conn.cursor()
# Insert rows into the MySQL Table
now = datetime.datetime.utcnow()
my_datetime = now.strftime('%Y-%m-%d %H:%M:%S')
cursor.execute('INSERT INTO posts (post_id, post_title, post_content, \
filename,post_time) VALUES (%s,%s,%s,%s,%s)',(5,'title2','description2','filename2',my_datetime))
conn.commit()
except Exception as e:
print("Exeception occured:{}".format(e))
finally:
conn.close()
I'm attempting to update around 500k rows in a SQLite database. I can create them rather quickly, but when I'm updating, it seems to be indefinitely hung, but I don't get an error message. (An insert of the same size took 35 seconds, this update has been at it for over 12 hours).
The portion of my code that does the updating is:
for line in result:
if --- blah blah blah ---:
stuff
else:
counter = 1
print("Starting to append result_list...")
result_list = []
for line in result:
result_list.append((str(line),counter))
counter += 1
sql = 'UPDATE BRFSS2015 SET ' + col[1] + \
' = ? where row_id = ?'
print("Executing SQL...")
c.executemany(sql, result_list)
print("Committing.")
conn.commit()
It prints "Executing SQL..." and presumably attempts the executemany and that's where its stuck. The variable "result" is a list of records and is working as far as I can tell because the insert statement is working and it is basically the same.
Am I misusing executemany? I see many threads on executemany(), but all of them as far as I can tell are getting an error message, not just hanging indefinitely.
For reference, the full code I have is below. Basically I'm trying to convert an ASCII file to a sqlite database. I know I could technically insert all columns at the same time, but the machines I have access to are all limited to 32bit Python and they run out of memory (this file is quite large, close to 1GB of text).
import pandas as pd
import sqlite3
ascii_file = r'c:\Path\to\file.ASC_'
sqlite_file = r'c:\path\to\sqlite.db'
conn = sqlite3.connect(sqlite_file)
c = conn.cursor()
# Taken from https://www.cdc.gov/brfss/annual_data/2015/llcp_varlayout_15_onecolumn.html
raw_list = [[1,"_STATE",2],
[17,"FMONTH",2],
... many other values here
[2154,"_AIDTST3",1],]
col_list = []
for col in raw_list:
begin = (col[0] - 1)
col_name = col[1]
end = (begin + col[2])
col_list.append([(begin, end,), col_name,])
for col in col_list:
print(col)
col_specification = [col[0]]
print("Parsing...")
data = pd.read_fwf(ascii_file, colspecs=col_specification)
print("Done")
result = data.iloc[:,[0]]
result = result.values.flatten()
sql = '''CREATE table if not exists BRFSS2015
(row_id integer NOT NULL,
''' + col[1] + ' text)'
print(sql)
c.execute(sql)
conn.commit()
sql = '''ALTER TABLE
BRFSS2015 ADD COLUMN ''' + col[1] + ' text'
try:
c.execute(sql)
print(sql)
conn.commit()
except Exception as e:
print("Error Happened instead")
print(e)
counter = 1
result_list = []
for line in result:
result_list.append((counter, str(line)))
counter += 1
if '_STATE' in col:
counter = 1
result_list = []
for line in result:
result_list.append((counter, str(line)))
counter += 1
sql = 'INSERT into BRFSS2015 (row_id,' + col[1] + ')'\
+ 'values (?,?)'
c.executemany(sql, result_list)
else:
counter = 1
print("Starting to append result_list...")
result_list = []
for line in result:
result_list.append((str(line),counter))
counter += 1
sql = 'UPDATE BRFSS2015 SET ' + col[1] + \
' = ? where row_id = ?'
print("Executing SQL...")
c.executemany(sql, result_list)
print("Committing.")
conn.commit()
print("Comitted... moving on to next column...")
For each row to be updated, the database has to search for that row. (This is not necessary when inserting.) If there is no index on the row_id column, then the database has to go through the entire table for each update.
It would be a better idea to insert entire rows at once. If that is not possible, create an index on row_id, or better, declare it as INTEGER PRIMARY KEY.
I am moving data from Mysql to Postgres and my code is like below -
import os, re, time, codecs, glob, sqlite3
from StringIO import StringIO
import psycopg2, MySQLdb, datetime, decimal
from datetime import date
import gc
tables = (['table1' , 27],)
conn = psycopg2.connect("dbname='xxx' user='xxx' host='localhost' password='xxx' ")
curpost = conn.cursor()
db = MySQLdb.connect(host="127.0.0.1", user="root", passwd="root" , unix_socket='/var/mysql/mysql.sock', port=3306 )
cur = db.cursor()
cur.execute('use xxx;')
for t in tables:
print t
curpost.execute( "truncate table " + t[0] )
cur.execute("select * from "+ t[0] )
a = ','.join( '%s' for i in range(t[1]) )
qry = "insert into " + t[0] + " values ( " + a +" )"
print qry
i = 0
while True:
rows = cur.fetchmany(5000)
if not rows: break
string = ''
for row in rows:
string = string + ('|'.join([str(x) for x in row])) + "\n"
curpost.copy_from(StringIO(string), t[0], sep="|", null="None" )
i += curpost.rowcount
print i , " loaded"
curpost.connection.commit()
del string, row, rows
gc.collect()
curpost.close()
cur.close()
For small tables, the code runs fine. However the larger ones (3.6 million records), the moment the mysql execute (cur.execute("select * from "+ t[0] )) runs, the memory utilization on the machine zooms. This is even though i have used fetchmany and records should only come in batches of 5000. I have tried with 500 records also and its the same. For large tables it seems that fetchmany is not working as documented..
Edit - I added garbage collection and del statements. Still the memory keeps on bloating till all records are not processed.
Any ideas?
Sorry if I am wrong, you've said that you don't want to change query
But just in case if you have no choice you can try:
replace this fragment:
cur.execute("select * from "+ t[0] )
a = ','.join( '%s' for i in range(t[1]) )
qry = "insert into " + t[0] + " values ( " + a +" )"
print qry
i = 0
while True:
rows = cur.fetchmany(5000)
to this one:
a = ','.join( '%s' for i in range(t[1]) )
qry = "insert into " + t[0] + " values ( " + a +" )"
print qry
i = 0
while True:
cur.execute("select * from "+ t[0]+" LIMIT "+i+", 5000")
rows = cur.fetchall()