I created a python script with a date argument which allows to extract data from a file (4.2 MB) to feed a table; when executing it shows me this error :
File "./insert_pru_data.py", line 136, in <module>
importYear(year)
File "./insert_pru_data.py", line 124, in importYear
SQLrequest += "(" + ", ".join(data_to_insert[i]) + "),\n"
MemoryError
My Code:
def importYear(year):
go = True
if isAlreadyInserted(year):
if replace == False:
print("donnees pour annee " + year + " deja inserees, action annulee")
go = False
else:
print("donnees pour annee " + year + " deja inserees, les donnees seront remplacees")
deleteData(year)
if go:
data_to_insert = getDataToInsert(data)
SQLrequest = "INSERT INTO my_table (date_h, day, area, h_type, act, dir, ach) VALUES\n"
i = 0
print(data_to_insert)
while i < len(data_to_insert) - 1:
data_to_insert[i] = ["None" if element == None else element for element in data_to_insert[i]]
SQLrequest += "(" + ", ".join(data_to_insert[i]) + "),\n"
SQLrequest += "(" + ", ".join(data_to_insert[len(data_to_insert) - 1]) + ");"
with psycopg2.connect(connString) as conn: # Ouverture connexion a la base
with conn.cursor() as cur:
cur.execute(SQLrequest)
cur.execute("COMMIT")
cur.close()
importYear(year)
, someone help me to know how to solve this problem?
Firstly, avoid constructing an SQL query like this; sooner or later, one of the values to be inserted will have something like a quote and then everything will break. It's one of the more common security problems on the internet (SQL injection).
The cur.execute() function can take two arguments - the query (with placeholders) and then the values to be inserted:
cur.execute("insert into tbl (a, b) values (%s, %s)", (1, 2))
Rather than inserting all the data at once, read them from the file in groups of 100 or 1000 or something; small enough to fit into memory easily, large enough that there aren't too many round-trips.
There is an execute_values() function which does exactly what you want; you give it a query and a list of tuples:
execute_values(cur, "insert into tbl (a, b) values %s", [(1, 2), (3, 4)])
Related
conn = sqlite3.connect('business_database.db')
c = conn.cursor()
c.execute("INSERT INTO business VALUES(self.nob_text_input.text, self.post_text_input.text, self.descrip_text_input.text )")
conn.commit()
conn.close()
I want to add records into my database using the TextInput in kivy hence the 'self.post_text_input.text' etc, but I get this error:
OperationalError: no such column: self.nob_text_input.text
I tried putting the columns next to table name in the query:
c.execute("INSERT INTO business(column1, column2,column3) VALUES(self.nob_text_input.text....)
But I still get the same error.
Turning my comment into a more detailed answer.
If you're trying to use the values of the variables (self.nob_text_input.text and friends) in the string, you need to embed those values in the string.
One way is to use a format string:
"INSERT INTO business VALUES(%s, %s, %s)" % (self.nob_text_input.text, self.post_text_input.text, self.descrip_text_input.text)
And another is to just concatenate the strings:
"INSERT INTO business VALUES(" + self.nob_text_input.text + ", " + self.post_text_input.text + ", " + self.descrip_text_input.text + ")"
I want to insert data into a table created with SQLite. Most of the code is converting my arrays into one string for con.excute(). Maybe this is the problem? Is there a better way? No error returned.
def add_row(table, columns, values):
con = sql_connection("database.db")
cursorObj = con.cursor()
# column list to string
if isinstance(columns, list) == True:
columns = ", ".join(columns)
# wrap each string in list with '' and convert whole list to string
if isinstance(values, list) == True:
for i in range(0, len(values)):
if isinstance(values[i], str) == True:
values[i] = "'" + values[i] + "'"
values = ", ".join(values)
try:
cmd = "insert into " + table + "(" + columns + ") values (" + values + ")"
print(cmd)
cursorObj.execute(cmd)
except sqlite3.Error as e:
print("An error occurred:", e.args[0])
add_row("Stocks", ["symbol", "name"], ["TEST", "test"])
print(cmd) output:
insert into Stocks (symbol, name) values ('TEST', 'test')
CLARIFICATION: I'm not worried about security concerns. It will only ever be used locally.
I don't recommend trying to generalize this function for arbitrary tables and columns.
def add_symbol_and_name(symbol, name):
with sqlite3.connect("database.db") as con:
cursor = con.cursor()
cursor.execute("insert into Stocks (symbol, name) values (?, ?)",
(symbol, name))
Anything more dynamic than this opens you up to SQL injection attacks.
I figured it out. Note what others have said: such a general function should not be used online as makes the database vulnerable to SQL injection attacks. However, my database is, and will always be, local.
def add_row(table, columns, values):
valueArr = []
if (len(columns) == len(values)) == False:
print("Values and columns must be of equal length")
return
columns = ", ".join(columns)
for value in values:
valueArr.append("?")
valueArr = ", ".join(valueArr)
with sqlite3.connect("database.db") as con:
try:
cursor = con.cursor()
cmd = "insert into " + table + " (" + columns + ") values (" + valueArr + ")"
print("cmd", cmd)
cursor.execute(cmd,
values)
except sqlite3.Error as e:
print("An error occurred:", e.args[0])
add_row("Stocks", ["symbol", "name", "exchange"], ["AAPL", "Apple", "NASDAQ"])
I'm attempting to update around 500k rows in a SQLite database. I can create them rather quickly, but when I'm updating, it seems to be indefinitely hung, but I don't get an error message. (An insert of the same size took 35 seconds, this update has been at it for over 12 hours).
The portion of my code that does the updating is:
for line in result:
if --- blah blah blah ---:
stuff
else:
counter = 1
print("Starting to append result_list...")
result_list = []
for line in result:
result_list.append((str(line),counter))
counter += 1
sql = 'UPDATE BRFSS2015 SET ' + col[1] + \
' = ? where row_id = ?'
print("Executing SQL...")
c.executemany(sql, result_list)
print("Committing.")
conn.commit()
It prints "Executing SQL..." and presumably attempts the executemany and that's where its stuck. The variable "result" is a list of records and is working as far as I can tell because the insert statement is working and it is basically the same.
Am I misusing executemany? I see many threads on executemany(), but all of them as far as I can tell are getting an error message, not just hanging indefinitely.
For reference, the full code I have is below. Basically I'm trying to convert an ASCII file to a sqlite database. I know I could technically insert all columns at the same time, but the machines I have access to are all limited to 32bit Python and they run out of memory (this file is quite large, close to 1GB of text).
import pandas as pd
import sqlite3
ascii_file = r'c:\Path\to\file.ASC_'
sqlite_file = r'c:\path\to\sqlite.db'
conn = sqlite3.connect(sqlite_file)
c = conn.cursor()
# Taken from https://www.cdc.gov/brfss/annual_data/2015/llcp_varlayout_15_onecolumn.html
raw_list = [[1,"_STATE",2],
[17,"FMONTH",2],
... many other values here
[2154,"_AIDTST3",1],]
col_list = []
for col in raw_list:
begin = (col[0] - 1)
col_name = col[1]
end = (begin + col[2])
col_list.append([(begin, end,), col_name,])
for col in col_list:
print(col)
col_specification = [col[0]]
print("Parsing...")
data = pd.read_fwf(ascii_file, colspecs=col_specification)
print("Done")
result = data.iloc[:,[0]]
result = result.values.flatten()
sql = '''CREATE table if not exists BRFSS2015
(row_id integer NOT NULL,
''' + col[1] + ' text)'
print(sql)
c.execute(sql)
conn.commit()
sql = '''ALTER TABLE
BRFSS2015 ADD COLUMN ''' + col[1] + ' text'
try:
c.execute(sql)
print(sql)
conn.commit()
except Exception as e:
print("Error Happened instead")
print(e)
counter = 1
result_list = []
for line in result:
result_list.append((counter, str(line)))
counter += 1
if '_STATE' in col:
counter = 1
result_list = []
for line in result:
result_list.append((counter, str(line)))
counter += 1
sql = 'INSERT into BRFSS2015 (row_id,' + col[1] + ')'\
+ 'values (?,?)'
c.executemany(sql, result_list)
else:
counter = 1
print("Starting to append result_list...")
result_list = []
for line in result:
result_list.append((str(line),counter))
counter += 1
sql = 'UPDATE BRFSS2015 SET ' + col[1] + \
' = ? where row_id = ?'
print("Executing SQL...")
c.executemany(sql, result_list)
print("Committing.")
conn.commit()
print("Comitted... moving on to next column...")
For each row to be updated, the database has to search for that row. (This is not necessary when inserting.) If there is no index on the row_id column, then the database has to go through the entire table for each update.
It would be a better idea to insert entire rows at once. If that is not possible, create an index on row_id, or better, declare it as INTEGER PRIMARY KEY.
When trying to execute the following:
def postToMySQL(date,data,date_column_name,data_column_name,table):
cursor = conn.cursor ()
sql = "\"\"\"INSERT INTO " + table + " (" + date_column_name + ", " + data_column_name + ") VALUES(%s, %s)" + "\"\"\"" #+ ", " + "(" + date + ", " + data + ")"
cursor.execute(sql,(date,data))
I get this error:
_mysql_exceptions.ProgrammingError: (1064, 'You have an error in your SQL syntax... near:
\'"""INSERT INTO natgas (Date, UK) VALUES(\'2012-05-01 13:00:34\', \'59.900\')"""\' at line 1')
I'm puzzled as to where the syntax is wrong, because the following hardcoded example works fine:
def postUKnatgastoMySQL(date, UKnatgas):
cursor = conn.cursor ()
cursor.execute("""INSERT INTO natgas (Date, UK)VALUES(%s, %s)""", (date, UKnatgas))
Can you spot the error?
Alternately, could you tell me how to pass parameters to the field list as well as the value list?
Thanks a lot!
Those triple quotes are a way of representing a string in python. They aren't supposed to be part of the actual query.
On another note, be very sure you trust your input with this approach. Look up SQL Injection.
\'"""INSERT INTO natgas (Date, UK) VALUES(\'2012-05-01 13:00:34\',
\'59.900\')"""\' at line 1')
this is obviously not a vlaid SQL command. You need to get the backslashes out of there, you are probably escaping stuff you shouldn't.
the triple quotes for example sure are unnecessary there.
I am trying to create a program using Python 3.1 and Sqlite3. The program will open a text file and read parameters to pass for the select query and output a text file with the result. I am getting stuck on the cursor.execute(query) statement. I may be doing everything incorrect. Any help would be appreciated.
import sqlite3
# Connect to database and test
#Make sure the database is in the same folder as the python script folder
conn = sqlite3.connect("nnhs.sqlite3")
if (conn):
print ("Connection successful")
else:
print ("Connection not successful")
# Create a cursor to execute SQL queries
cursor = conn.cursor()
# Read data from a file
data = []
infile = open ("patient_in.txt", "r")
for line in infile:
line = line.rstrip("\n")
line = line.strip()
seq = line.split(' ')
seq[5] = int(seq[5])
seq = tuple (seq)
data.append(seq)
infile.close()
# Check that the data has been read correctly
print
print ("Check that the data was read from file")
print (data)
# output file
outfile = open("patient_out.txt", "w")
# select statement
query = "SELECT DISTINCT patients.resnum, patients.facnum, patients.sex, patients.age, patients.rxmed, icd9_1.resnum, icd9_1.code "
query += "from patients "
query += "INNER JOIN icd9 as icd9_1 on (icd9_1.resnum = patients.resnum) AND (icd9_1.code LIKE ':6%') "
query += "INNER JOIN icd9 as icd9_2 on (icd9_2.resnum = patients.resnum) AND (icd9_2.code LIKE ':6%') "
query += "(where patients.age >= :2) AND (patients.age <= :3) "
query += "AND patients.sex = :1 "
query += "AND (patients.rxmed >= :4) AND (patients.rxmed <= :5) "
query += "ORDER BY patients.resnum;"
result = cursor.execute(query)
for row in result:
ResultNumber = row[0]
FacNumber = row[1]
Sex = row[2]
Age = row[3]
RxMed = row[4]
ICDResNum = row[5]
ICDCode = row[6]
outfile.write("Patient Id Number: " + str(ResultNumber) + "\t" + " ICD Res Num: " + str(ICDResNum) + "\t" + " Condition: " + str(ICDCode) + "\t" + " Fac ID Num: " + str(FacNumber) + "\t" + " Patient Sex: " + str(Sex) + "\t" + " Patient Age: " + str(Age) + "\t" +" Number of Medications: " + str(RxMed) + "\t" + "\n")
# Close the cursor
cursor.close()
# Close the connection
con.close()
You have read multiple rows of query parameters and stored them in data and then ... nothing. data is a misleading name. Let's call it queries instead.
You presumably want to iterate over queries and perform one query for each row in queries. So do that: for query_row in queries: .....
Also let's rename query to sql.
You'll need result = cursor.execute(sql, query_row)
You'll also need to decide whether you want to have a different output file for each query_row, or have only one file with a field (or sub-heading) to distinguish what info comes from what query_row.
Update about parameter passing with sqlite3
It appears not to be documented, but if you use numbered place holders, you can supply a tuple of arguments -- you don't need to supply a dict. The following example presumes a database blah.db with an empty table created by
create table foo (id int, name text, amt int);
>>> import sqlite3
>>> conn = sqlite3.connect('blah.db')
>>> curs = conn.cursor()
>>> sql = 'insert into foo values(:1,:2,:1);'
>>> curs.execute(sql, (42, 'bar'))
<sqlite3.Cursor object at 0x01E3D520>
>>> result = curs.execute('select * from foo;')
>>> print list(result)
[(42, u'bar', 42)]
>>> curs.execute(sql, {'1':42, '2':'bar'})
<sqlite3.Cursor object at 0x01E3D520>
>>> result = curs.execute('select * from foo;')
>>> print list(result)
[(42, u'bar', 42), (42, u'bar', 42)]
>>> curs.execute(sql, {1:42, 2:'bar'})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
sqlite3.ProgrammingError: You did not supply a value for binding 1.
>>>
Update 2a You have a bug in this line of your SQL (and the following one):
INNER JOIN icd9 as icd9_1 on (icd9_1.resnum = patients.resnum) AND (icd9_1.code LIKE ':6%')
If your parameter is the Python string "XYZ", the resultant SQL will be ... LIKE ''XYZ'%') which is not what you want. The db interface will always quote your supplied string. You want ... LIKE 'XYZ%'). What you should do is have ... LIKE :6) in your SQL, and pass e.g. user_input[5].rstrip("%") + "%" (ensures exactly 1 %) as the parameter.
Update 2b You can of course use a dictionary for the parameters, as documented, but it would improve the legibility considerably if you used meaningful names instead of digits.
For example, ... LIKE :code) instead of the above, and pass e.g. {'code': user_input[5].rstrip("%"), .....} as the second arg of execute()
:2? These are placeholders for parameters, which you're not giving it. Take a look at the module docs for how to call execute with a tuple of parameters:
http://docs.python.org/library/sqlite3.html#sqlite3.Cursor.execute
Yeah, it looks like you need to add the parameters. Also, changing the line query += "(where patients.age >= :2) AND (patients.age <= :3) " to query += "where (patients.age >= :2) AND (patients.age <= :3) " might help.
Here is a slightly more python-ish way of writing your code... Although PEP8 might say otherwise...
import sqlite3
with open sqlite3.connect("nnhs.sqlite3") as f:
cursor = f.cursor()
data = []
with open("patient_in.txt", "r") as infile:
for line in infile:
line = line.rstrip("\n")
line = line.strip()
seq = line.split(' ')
seq[5] = int(seq[5])
seq = tuple (seq)
data.append(seq)
print(data)
with open("patient_out.txt", "w") as outfile:
query = """SELECT DISTINCT patients.resnum, patients.facnum,patients.sex, patients.age, patients.rxmed, icd9_1.resnum, icd9_1.code
from patients
INNER JOIN icd9 as icd9_1 on (icd9_1.resnum = patients.resnum) AND (icd9_1.code LIKE ':6%')
INNER JOIN icd9 as icd9_2 on (icd9_2.resnum = patients.resnum) AND (icd9_2.code LIKE ':6%')
(where patients.age >= :2) AND (patients.age <= :3)
AND patients.sex = :1
AND (patients.rxmed >= :4) AND (patients.rxmed <= :5)
ORDER BY patients.resnum"""
variables = {'1':sex, '2':age_lowerbound, '3':age_upperbound, '4':rxmed_lowerbound, '5':rxmed_upperbound, '6':icd9_1}
cursor.execute(query, variables)
for row in cursor.fetchall():
outfile.write("Patient Id Number: {0}\t ICD Res Num: {1}\t Condition: {2}\t Fac ID Num: {3}\t Patient Sex: {4}\t Patient Age: {5}\t Number of Medications: {6}\n".format(*row))