Using ? within INSERT INTO of a executemany() in Python with Sqlite - python

I am trying to submit data to a Sqlite db through python with executemany(). I am reading data from a JSON file and then placing it into the db. My problem is that the JSON creation is not under my control and depending on who I get the file from, the order of values is not the same each time. The keys are correct so they correlate with the keys in the db but I can't just toss the values at the executemany() function and have the data appear in the correct columns each time.
Here is what I need to be able to do.
keyTuple = (name, address, telephone)
listOfTuples = [(name1, address1, telephone1),
(name2, address2, telephone2),
(...)]
cur.executemany("INSERT INTO myTable(?,?,?)", keysTuple"
"VALUES(?,?,?)", listOfTuples)
The problem I have is that some JSON files have order of "name, telephone, address" or some other order. I need to be able to input my keysTuple into the INSERT portion of the command so I can keep my relations straight no matter what order the JSON file come in without having to completely rebuild the listOfTuples. I know there has got to be a way but what I have written doesn't match the right syntax for the INSERT portion. The VALUES line works just fine, it uses each element in listofTuples.
Sorry if I am not asking with the correct verbage. FNG here and this is my first post. I have look all over the web but it only produces the examples of using ? in the VALUE portion, never in the INSERT INTO portion.

You cannot use SQL parameters (?) for table/column names.
But when you already have the column names in the correct order, you can simply join them in order to be able to insert them into the SQL command string:
>>> keyTuple = ("name", "address", "telephone")
>>> "INSERT INTO MyTable(" + ",".join(keyTuple) + ")"
'INSERT INTO MyTable(name,address,telephone)'

Try this
Example if you have table named products with the following fields:
Prod_Name Char( 30 )
UOM Char( 10 )
Reference Char( 10 )
Const Float
Price Float
list_products = [('Garlic', '5 Gr.', 'Can', 1.10, 2.00),
('Beans', '8 On.', 'Bag', 1.25, 2.25),
('Apples', '1 Un.', 'Unit', 0.25, 0.30),
]
c.executemany('Insert Into products Values (?,?,?,?,?)', list_products )

Related

sqlite3.OperationalError: table A has X columns but Y values were supplied

Firstly, I've read all similar questions and applied the solutions listed here: [1] SQLite with Python "Table has X columns but Y were supplied", [1]"Table has X columns but Y values were supplied" error, 3sqlite3.OperationalError: table card_data has 11 columns but 10 values were supplied, [4]sqlite3.OperationalError: table book has 6 columns but 5 values were supplied, [5]Sqlite3 Table users has 7 columns but 6 values were supplied Yet, none of them worked out.
I am creating 26 tables succesfully insert data using this:
im.execute("""CREATE TABLE IF NOT EXISTS C_Socket7 (Date_Requested, Time_Requested, Time_Passed, Energy_Consumption, Cumulative_Consumption, Hourly_Consumption, Daily_Consumption, Weekly_Consumption, Monthly_Consumption)""")
string = """INSERT INTO {} VALUES ('{}','{}','{}','{}','{}','{}','{}','{}')""".format('C_'+tables[7], tstr, dict4DB['timePassed'], dict4DB['Socket7Consp'],DummyConsumptions['Cumulative'], DummyConsumptions['Hourly'], DummyConsumptions['Daily'], DummyConsumptions['Weekly'], DummyConsumptions['Monthly'])
and this code:
im.execute("""CREATE TABLE IF NOT EXISTS A_Status (Time_Requested, Socket0Status, Socket1Status, Socket2Status, Socket3Status, Socket4Status, Socket5Status, Socket6Status, Socket7Status)""")
string = """INSERT INTO {} VALUES ('{}','{}','{}','{}','{}','{}','{}','{}','{}')""".format('A_'+tables[7], tstr, dict4DB['Socket0Stat'], dict4DB['Socket1Stat'],dict4DB['Socket2Stat'], dict4DB['Socket3Stat'], dict4DB['Socket4Stat'], dict4DB['Socket5Stat'], dict4DB['Socket6Stat'], dict4DB['Socket7Stat'])
But when it comes to this table:
im.execute("""CREATE TABLE IF NOT EXISTS A_Environment (Date_Requested, Time_Requested, Voltage, Frequency, In_Temperature, In_Humidity, Ext_Temperature, Ext_Humidity, Door_Switch, Door_Relay)""")
string = """INSERT INTO A_Environment(Date_Requested, Time_Requested, Voltage, Frequency, In_Temperature, In_Humidity, Ext_Temperature, Ext_Humidity, Door_Switch, Door_Relay) VALUES ('{}','{}', '{}','{}','{}','{}','{}','{}','{}','{}')""".format(d_str, h_str, dict4DB['Voltage'], dict4DB['Frequency'],dict4DB['InTemp'], dict4DB['InHumidity'], dict4DB['ExtTemp'], dict4DB['ExtHumidity'], dict4DB['DoorSwitch'], dict4DB['DoorRelay'])
It gives the error:
sqlite3.OperationalError: table A_Environment has 10 columns but 9 values were supplied
When I check the database using "DB Browser for SQLite", I can't see the column names. When I create columns manually using the interface, I get the same error.
Here is a screenshot how it looks from DB Browser:
[]
It was working fine when the structure was like this:
im.execute("""CREATE TABLE IF NOT EXISTS A_Environment (Time_Requested, Voltage, Frequency, In_Temperature, In_Humidity, Ext_Temperature, Ext_Humidity, Door_Switch, Door_Relay)""")
string = """INSERT INTO {} VALUES ('{}','{}','{}','{}','{}','{}','{}','{}','{}')""".format('A_'+tables[6], tstr, dict4DB['Voltage'], dict4DB['Frequency'],dict4DB['InTemp'], dict4DB['InHumidity'], dict4DB['ExtTemp'], dict4DB['ExtHumidity'], dict4DB['DoorSwitch'], dict4DB['DoorRelay'])
I've changed 'tstr' with 'd_str' and 'h_str' which I've seperated date and time into different columns.
I've tried these:
Created the columns manually. Got the same result.
Dropped the table and let it create itself using "CREATE TABLE IF NOT EXISTS", it created the table but no columns in it too. Put the columns manually. Did not work.
Delete the whole database and let it create itself from scratch. Got the same result.
I also change the provided values in .format() function as str(d_str) and str(h_str) did not work as well.
I am planning to create a minimum working code snippet to see which value is not being able to read, yet after seeing that the columns are not being created. I thought the problem is not about the data to be written.
I've read the documentation but could not find something I can use.
Here is a screenshot of the tables from "DB Browser for SQLite"
[]
What do you recommend? What should I do?
PS: SQLite is running on Raspberry Pi, Rasbian OS, Python 3.7, SQLite3.
Using
You should never use .format() (or % or f-strings) to generate SQL statements. Your issue is possibly (hard to tell since we don't see your input data) stemming from an unescaped value being wrongly interpreted by SQLite - in effect, you're your own SQL injection vulnerability.
Instead use the placeholder system provided by your database, e.g. like this for Sqlite:
im.execute(
"""
INSERT INTO
A_Environment(
Date_Requested,
Time_Requested,
Voltage,
Frequency,
In_Temperature,
In_Humidity,
Ext_Temperature,
Ext_Humidity,
Door_Switch,
Door_Relay
)
VALUES
(?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""",
(
d_str,
h_str,
dict4DB["Voltage"],
dict4DB["Frequency"],
dict4DB["InTemp"],
dict4DB["InHumidity"],
dict4DB["ExtTemp"],
dict4DB["ExtHumidity"],
dict4DB["DoorSwitch"],
dict4DB["DoorRelay"],
),
)

How to use variable column name in filter in Django ORM?

I have two tables BloodBank(id, name, phone, address) and BloodStock(id, a_pos, b_pos, a_neg, b_neg, bloodbank_id). I want to fetch all the columns from two tables where the variable column name (say bloodgroup) which have values like a_pos or a_neg... like that and their value should be greater than 0. How can I write ORM for the same?
SQL query is written like this to get the required results.
sql="select * from public.bloodbank_bloodbank as bb, public.bloodbank_bloodstock as bs where bs."+blood+">0 and bb.id=bs.bloodbank_id order by bs."+blood+" desc;"
cursor = connection.cursor()
cursor.execute(sql)
bloodbanks = cursor.fetchall()
You could be more specific in your questions, but I believe you have a variable called blood which contains the string name of the column and that the columns a_pos, b_pos, etc. are numeric.
You can use a dictionary to create keyword arguments from strings:
filter_dict = {bloodstock__blood + '__gt': 0}
bloodbanks = Bloodbank.objects.filter(**filter_dict)
This will get you Bloodbank objects that have a related bloodstock with a greater than zero value in the bloodgroup represented by the blood variable.
Note that the way I have written this, you don't get the bloodstock columns selected, and you may get duplicate bloodbanks. If you want to get eliminate duplicate bloodbanks you can add .distinct() to your query. The bloodstocks are available for each bloodbank instance using .bloodstock_set.all().
The ORM will generate SQL using a join. Alternatively, you can do an EXISTS in the where clause and no join.
from django.db.models import Exists, OuterRef
filter_dict = {blood + '__gt': 0}
exists = Exists(Bloodstock.objects.filter(
bloodbank_id=OuterRef('id'),
**filter_dict
)
bloodbanks = Bloodbank.objects.filter(exists)
There will be no need for a .distinct() in this case.

Write dictionnary with tuple containing parameters as unique value for features into postgresql table

In Python 2.7, let a dictionary with features' IDs as keys.
There are thousands of features.
Each feature has a single value, but this value is a tuple containing 6 parameters for the features (for example; size, color, etc.)
On the other hand I have a postgreSQL table in a database where these features parameters must be saved.
The features' IDs are already set in the table (as well as other informations about these features).
The IDs are unique (they are random (thus not serial) but unique numbers).
There is 6 empty columns with names: "param1", "param2", "param3", ..., "param6".
I already have a tuple containing these names:
columns = ("param1", "param2", "param3", ..., "param6")
The code I have doesn't work for saving these parameters in their respective columns for each feature:
# "view" is the dictionary with features's ID as keys()
# and their 6 params stored in values().
values = [view[i] for i in view.keys()]
columns = ("param1","param2","param3","param4","param5","param6")
conn = psycopg2.connect("dbname=mydb user=username password=password")
curs = conn.cursor()
curs.execute("DROP TABLE IF EXISTS mytable;")
curs.execute("CREATE TABLE IF NOT EXISTS mytable (LIKE originaltable including defaults including constraints including indexes);")
curs.execute("INSERT INTO mytable SELECT * from originaltable;")
insertstatmnt = 'INSERT INTO mytable (%s) values %s'
alterstatement = ('ALTER TABLE mytable '+
'ADD COLUMN param1 text,'+
'ADD COLUMN param2 text,'+
'ADD COLUMN param3 real,'+
'ADD COLUMN param4 text,'+
'ADD COLUMN param5 text,'+
'ADD COLUMN param6 text;'
)
curs.execute(alterstatement) # It's working up to this point.
curs.execute(insertstatmnt, (psycopg2.extensions.AsIs(','.join(columns)), tuple(values))) # The problem seems to be here.
conn.commit() # Making change to DB !
curs.close()
conn.close()
Here's the error I have:
curs.execute(insert_statement, (psycopg2.extensions.AsIs(','.join(columns)), tuple(values)))
ProgrammingError: INSERT has more expressions than target columns
I must miss something.
How to do that properly?
When using '%s' to get the statement as what I think you want, you just need to change a couple things.
Ignoring c.execute(), this statement is by no means wrong, but it does not return what you are looking for. Using my own version, this is what I got with that statement. I also ignored psycopg2.extensions.AsIs() because, it is just a Adapter conform to the ISQLQuote protocol useful for objects whose string representation is already valid as SQL representation.
>>> values = [ i for i in range(0,5)] #being I dont know the keys, I just made up values.
>>> insertstatmnt, (','.join(columns), tuple(vlaues))
>>> ('INSERT INTO mytable (%s) values %s', ('param1,param2,param3,param4,param5,param6', (0, 1, 2, 3, 4)))
As you can see, what you entered returns a tuple with the values.
>>> insertstatmnt % (','.join(columns), tuple(values))
>>> 'INSERT INTO mytable (param1,param2,param3,param4,param5,param6) values (0, 1, 2, 3, 4)'
Where as, this returns a string that is more likely to be read by the SQL. The values obviously do not match the specified ones. I believe the problem you have lies within creating your string.
Reference for pycopg2: http://initd.org/psycopg/docs/extensions.html
As I took the syntax of the psycopg2 command from this thread:
Insert Python Dictionary using Psycopg2
and as my values dictionary doesn't exactly follow the same structure as the mentioned example (I also have 1 key as ID, like in this example, but mine has only 1 corresponding value, as a tuple containing my 6-parameters, thus "nested 1 lever deeper" instead of directly 6 values corresponding to the keys) I need to loop through all features to execute one SQL statement per feature:
[curs.execute(insertstatmnt, (psycopg2.extensions.AsIs(', '.join(columns)), i)) for i in tuple(values)].
This, is working.

How can django produce this SQL?

I have the following SQL query that returns what i need:
SELECT sensors_sensorreading.*, MAX(sensors_sensorreading.timestamp) AS "last"
FROM sensors_sensorreading
GROUP BY sensors_sensorreading.chipid
In words: get the last sensor reading entry for each unique chipid.
But i cannot seem to figure out the correct Django ORM statement to produce this query. The best i could come up with is:
SensorReading.objects.values('chipid').annotate(last=Max('timestamp'))
But if i inspect the raw sql it generates:
>>> print connection.queries[-1:]
[{u'time': u'0.475', u'sql': u'SELECT
"sensors_sensorreading"."chipid",
MAX("sensors_sensorreading"."timestamp") AS "last" FROM
"sensors_sensorreading" GROUP BY "sensors_sensorreading"."chipid"'}]
As you can see, it almost generates the correct SQL, except django selects only the chipid field and the aggregate "last" (but i need all the table fields returned instead).
Any idea how to return all fields?
Assuming you also have other fields in the table besides chipid and timestamp, then I would guess this is the SQL you actually need:
select * from (
SELECT *, row_number() over (partition by chipid order by timestamp desc) as RN
FROM sensors_sensorreading
) X where RN = 1
This will return the latest rows for each chipid with all the data that is in the row.

SQL multiple inserts with Python

UPDATE
After passing execute() a list of rows as per Nathan's suggestion, below, the code executes further but still gets stuck on the execute function. The error message reads:
query = query % db.literal(args)
TypeError: not all arguments converted during string formatting
So it still isn't working. Does anybody know why there is a type error now?
END UPDATE
I have a large mailing list in .xls format. I am using python with xlrd to retrieve the name and email from the xls file into two lists. Now I want to put each name and email into a mysql database. I'm using MySQLdb for this part. Obviously I don't want to do an insert statement for every list item.
Here's what I have so far.
from xlrd import open_workbook, cellname
import MySQLdb
dbname = 'h4h'
host = 'localhost'
pwd = 'P#ssw0rd'
user = 'root'
book = open_workbook('h4hlist.xls')
sheet = book.sheet_by_index(0)
mailing_list = {}
name_list = []
email_list = []
for row in range(sheet.nrows):
"""name is in the 0th col. email is the 4th col."""
name = sheet.cell(row, 0).value
email = sheet.cell(row, 4).value
if name and email:
mailing_list[name] = email
for n, e in sorted(mailing_list.iteritems()):
name_list.append(n)
email_list.append(e)
db = MySQLdb.connect(host=host, user=user, db=dbname, passwd=pwd)
cursor = db.cursor()
cursor.execute("""INSERT INTO mailing_list (name,email) VALUES (%s,%s)""",
(name_list, email_list))
The problem when the cursor executes. This is the error: _mysql_exceptions.OperationalError: (1241, 'Operand should contain 1 column(s)') I tried putting my query into a var initially, but then it just barfed up a message about passing a tuple to execute().
What am I doing wrong? Is this even possible?
The list is huge and I definitely can't afford to put the insert into a loop. I looked at using LOAD DATA INFILE, but I really don't understand how to format the file or the query and my eyes bleed when I have to read MySQL docs. I know I could probably use some online xls to mysql converter, but this is a learning exercise for me as well. Is there a better way?
You need to give executemany() a list of rows. You don't need break the name and email out into separate lists, just create one list with both of the values in it.
rows = []
for row in range(sheet.nrows):
"""name is in the 0th col. email is the 4th col."""
name = sheet.cell(row, 0).value
email = sheet.cell(row, 4).value
rows.append((name, email))
db = MySQLdb.connect(host=host, user=user, db=dbname, passwd=pwd)
cursor = db.cursor()
cursor.executemany("""INSERT INTO mailing_list (name,email) VALUES (%s,%s)""", rows)
Update: as #JonClements mentions, it should be executemany() not execute().
To fix TypeError: not all arguments converted during string formatting - you need to use the cursor.executemany(...) method, as this accepts an iterable of tuples (more than one row), while cursor.execute(...) expects the parameter to be a single row value.
After the command is executed, you need to ensure that the transaction is committed to make the changes active in the database by using db.commit().
If you are interested in high-performance of the code, this answer may be better.
Compare to excutemany method, the below execute will much faster:
INSERT INTO mailing_list (name,email) VALUES ('Jim','jim#yahoo.com'),('Lucy','Lucy#gmail.com')
You can easily modify the answer from #Nathan Villaescusa and get the new code.
cursor.execute("""INSERT INTO mailing_list (name,email) VALUES (%s)""".format(",".join(str(i) for i in rows))
here is my own test result:
excutemany:10000 runs takes 220 seconds
execute:10000 runs takes 12 seconds.
The speed difference will be about 15 times.
Taking up the idea of #PengjuZhao, it should work to simply add one single placeholder for all values to be passed. The difference to #PengjuZhao's answer is that the values are passed as a second parameter to the execute() function, which should be injection attack safe because this is only evalutated during runtime (in contrast to ".format()").
cursor.execute("""INSERT INTO mailing_list (name,email) VALUES (%s)""", ",".join(str(i) for i in rows))
Only if this does not work properly, try the approach below.
####
#PengjuZhao's answer shows that executemany() has either a strong Python overhead or it uses multiple execute() statements where this is not needed, elsewise executemany() would not be so much slower than a single execute() statement.
Here is a function that puts NathanVillaescusa's and #PengjuZhao's answers in a single execute() approach.
The solution builds a dynamic number of placeholders to be added to the sql statement. It is a manually built execute() statement with multiple placeholders of "%s", which likely outperforms the executemany() statement.
For example, at 2 columns, inserting 100 rows:
execute(): 200 times "%s" (= dependent from the number of the rows)
executemany(): just 2 times "%s" (= independent from the number of the rows).
There is a chance that this solution has the high speed of #PengjuZhao's answer without risking injection attacks.
Prepare parameters of the function:
You will store your values in 1-dimensional numpy arrays arr_name and arr_email which are then converted in a list of concatenated values, row by row. Alternatively, you use the approach of #NathanVillaescusa.
from itertools import chain
listAllValues = list(chain([
arr_name.reshape(-1,1), arr_email.reshape(-1,1)
]))
column_names = 'name, email'
table_name = 'mailing_list'
Get sql query with placeholders:
The numRows = int((len(listAllValues))/numColumns) simply avoids passing the number of rows. If you insert 6 values in listAllValues at 2 columns this would make 6/2 = 3 rows then, obviously.
def getSqlInsertMultipleRowsInSqlTable(table_name, column_names, listAllValues):
numColumns = len(column_names.split(","))
numRows = int((len(listAllValues))/numColumns)
placeholdersPerRow = "("+', '.join(['%s'] * numColumns)+")"
placeholders = ', '.join([placeholdersPerRow] * numRows)
sqlInsertMultipleRowsInSqlTable = "insert into `{table}` ({columns}) values {values};".format(table=table_name, columns=column_names, values=placeholders)
return sqlInsertMultipleRowsInSqlTable
strSqlQuery = getSqlInsertMultipleRowsInSqlTable(table_name, column_names, listAllValues)
Execute strSqlQuery
Final step:
db = MySQLdb.connect(host=host, user=user, db=dbname, passwd=pwd)
cursor = db.cursor()
cursor.execute(strSqlQuery, listAllValues)
This solution is hopefully without the risk of injection attacks as in #PengjuZhao's answer since it fills the sql statement only with placeholders instead of values. The values are only passed separately in listAllValues at this point here, where strSqlQuery has only placeholders instead of values:
cursor.execute(strSqlQuery, listAllValues)
The execute() statement gets the sql statement with placeholders %s and the list of values in two separate parameters, as it is done in #NathanVillaescusa's answer. I am still not sure whether this avoids injection attacks. It is my understanding that injection attacks can only occur if the values are put directly in the sql statement, please comment if I am wrong.

Categories