Check if row exists in SQLite with Python - python

I want to know if a row exists already in one of my tables, in this case coll. In order to do this I played around with SQLite in the shell a little and stumbled upon SELECT EXISTS(SELECT 1 FROM coll WHERE ceeb="1234"). In SQLite this works perfectly and it returns either a 0 or a 1-- which is exactly what I wanted. So, with code in hand, I wrote up a quick Python script to see if I could get this to work for me before sticking it into my program. This is what I came up with:
import sqlite3
conn = sqlite3.connect('stu.db')
c = conn.cursor()
sceeb = int(raw_input(":> "))
ceeb_exists = c.execute('SELECT EXISTS(SELECT 1 FROM coll WHERE ceeb="%d" LIMIT 1)' % sceeb)
print ceeb_exists
Instead of assigning ceeb_existsa 1 or a 0 it gives me an output that looks like <sqlite3.Cursor object at 0x01DF6860>. What am I doing wrong here?

The execution of a query always results in 0 or more rows. You'd need to fetch those rows; a SELECT EXISTS query results in 1 row, so you'd need to fetch that row.
Rows always consist of 1 or more columns, here you get one, so you could use tuple assignment (note the , comma after ceeb_exists):
c.execute('SELECT EXISTS(SELECT 1 FROM coll WHERE ceeb="%d" LIMIT 1)' % sceeb)
ceeb_exists, = c.fetchone()
However, using an EXISTS query is a bit redundant here; you could just test if there is any row returned. You should also use query parameters to avoid a SQL injection attack (you are asking a user to give you the ceeb value, so that is easily hijacked):
c.execute('SELECT 1 FROM coll WHERE ceeb=? LIMIT 1', (sceeb,))
ceeb_exists = c.fetchone() is not None
cursor.fetchone() returns None if there is no row available to fetch, the is not None test turns that into True or False.

.executes() returns a cursor object as you can see.
In order to print the results of the query you need to iterate over it:
for result in exists:
print result

Related

SQLalchemy rowcount always -1 for statements

I was playing around with SQLalchemy and Microsoft SQL Server to get a hang of the functions when I came across a strange behavior. I was taught that the attribute rowcount on the result proxy object will tell how many rows were effected by executing a statement. However, when I select or insert single or multiple rows in my test database, I always get -1. How could this be and how can I fix this to reflect the reality?
connection = engine.connect()
metadata = MetaData()
# Ex1: select statement for all values
student = Table('student', metadata, autoload=True, autoload_with=engine)
stmt = select([student])
result_proxy = connection.execute(stmt)
results = result_proxy.fetchall()
print(result_proxy.rowcount)
# Ex2: inserting single values
stmt = insert(student).values(firstname='Severus', lastname='Snape')
result_proxy = connection.execute(stmt)
print(result_proxy.rowcout)
# Ex3: inserting multiple values
stmt = insert(student)
values_list = [{'firstname': 'Rubius', 'lastname': 'Hagrid'},
{'firstname': 'Minerva', 'lastname': 'McGonogall'}]
result_proxy = connection.execute(stmt, values_list)
print(result_proxy.rowcount)
The print function for each block seperately run example code prints -1. The Ex1 successfully fetches all rows and both insert statements successfully write the data to the database.
According to the following issue, the rowcount attribute isn't always to be trusted. Is that true here as well? And when, how can I compensate with a Count statement in a SQLalcehmy transaction?
PDO::rowCount() returning -1
The single-row INSERT … VALUES ( … ) is trivial: If the statement succeeds then one row was affected, and if it fails (throws an error) then zero rows were affected.
For a multi-row INSERT simply perform it inside a transaction and rollback if an error occurs. Then the number of rows affected will either be zero or len(values_list).
To get the number of rows that a SELECT will return, wrap the select query in a SELECT count(*) query and run that first, for example:
select_stmt = sa.select([Parent])
count_stmt = sa.select([sa.func.count(sa.text("*"))]).select_from(
select_stmt.alias("s")
)
with engine.connect() as conn:
conn.execution_options(isolation_level="SERIALIZABLE")
rows_found = conn.execute(count_stmt).scalar()
print(f"{rows_found} row(s) found")
results = conn.execute(select_stmt).fetchall()
for item in results:
print(item.id)

How to run "if not" statement before "else":

I am using Sqlite3 and Python. Here is some sample code:
test
-----------------
amount | date
query = "SELECT SUM (column1) FROM test WHERE date BETWEEN '"+blah+"' AND '"+blah+"'"
c.execute(query)
data = c.fetchone()
if not data:
amountsum = 0
else:
amountsum = data[0]
print(amountsum)
The problem is that it only runs else:. If data is NoneType it does not set amountsum to 0 either. How can I make this work?
In this case, data will never be None, due to the aggregating query. SELECT SUM(...) FROM table will always return exactly one row. However, the SUM can be None in SQLite, if there are no rows in the table, so that should be taken into account:
query = "SELECT SUM (column1) FROM test WHERE ..."
c.execute(query)
data = c.fetchone()
amount = data[0] or 0
(A sidenote: you seem to be creating your SQL query using string concatenation, which is a potential SQL injection vulnerability. Consider using parameterized queries instead.)

Parameterized Python SQLite3 query is returning the first parameter

I'm trying to make a query to a SQLite database from a python script. However, whenever I use parameterization it just returns the first parameter, which is column2. The desired result is for it to return the value held in column2 on the row where column1 is equal to row1.
conn = sqlite3.connect('path/to/database')
c = conn.cursor()
c.execute('SELECT ? from table WHERE column1 = ? ;', ("column2","row1"))
result = c.fetchone()[0]
print(result)
It prints
>>column2
Whenever I run this using concatenated strings, it works fine.
conn = sqlite3.connect('path/to/database')
c = conn.cursor()
c.execute('SELECT ' + column2 + ' from table WHERE column1 = ' + row1 + ';')
result = c.fetchone()[0]
print(result)
And it prints:
>>desired data
Any idea why this is happening?
This behaves as designed.
The mechanism that parameterized queries provide is meant to pass literal values to the query, not meta information such as column names.
One thing to keep in mind is that the database must be able to parse the parameterized query string without having the parameter at hand: obviously, a column name cannot be used as parameter under such assumption.
For your use case, the only possible solution is to concatenate the column name into the query string, as shown in your second example. If the parameter comes from outside your code, be sure to properly validate it before that (for example, by checking it against a fixed list of values).

Python foreach not looping properly

I'm writing a script that formats a bunch of csv files into one csv file.
To do this, I'm using a couple of cursor tables in python via sqlite.
Here is my code - currently I'm just trying to get every row in gsap that is associated with a code that is in gsap_locs to print
data = c.execute("SELECT * from gsap_locs")
for row in data:
print row[0]
d2 = c.execute("select date, cardtype, volume, transactions from gsap where gsaploc=?", (row[0],))
for r2 in d2:
print r2
However, my code is only returning one row. I know that the problem isn't in the first for because when I take out everything after print row[0] it prints out all of the values from the first select.
Why is it escaping out of my first for after my second for runs without satisfying the conditions of the first for?
You are missing the fetchall or fetchone instructions.
It's a common thing, we think that the execute has done the job of getting the data but you should use fetch.
To retrieve data after executing a SELECT statement, you can either treat the cursor as an iterator, call the cursor’s fetchone() method to retrieve a single matching row, or call fetchall() to get a list of the matching rows.
import sqlite3
conn = sqlite3.connect('gasp.sqlite')
c = conn.cursor()
c.execute("SELECT * FROM gsap_locs")
rows = c.fetchall()
for row in rows:
print row[0]
c.execute("select * from gsap where loc=?", (row[0],))
d2 = c.fetchall()
for r2 in d2:
print r2
conn.close()
Looks like cursor.execute can only track one operation/returns an iterator at a time. You might want to keep the results of the first operation in memory, calling tuple on it:
data = tuple(c.execute("SELECT * from gsap_locs"))
for row in data:
...
Be sure to have enough memory to hold all the results from the first query.

SQL multiple inserts with Python

UPDATE
After passing execute() a list of rows as per Nathan's suggestion, below, the code executes further but still gets stuck on the execute function. The error message reads:
query = query % db.literal(args)
TypeError: not all arguments converted during string formatting
So it still isn't working. Does anybody know why there is a type error now?
END UPDATE
I have a large mailing list in .xls format. I am using python with xlrd to retrieve the name and email from the xls file into two lists. Now I want to put each name and email into a mysql database. I'm using MySQLdb for this part. Obviously I don't want to do an insert statement for every list item.
Here's what I have so far.
from xlrd import open_workbook, cellname
import MySQLdb
dbname = 'h4h'
host = 'localhost'
pwd = 'P#ssw0rd'
user = 'root'
book = open_workbook('h4hlist.xls')
sheet = book.sheet_by_index(0)
mailing_list = {}
name_list = []
email_list = []
for row in range(sheet.nrows):
"""name is in the 0th col. email is the 4th col."""
name = sheet.cell(row, 0).value
email = sheet.cell(row, 4).value
if name and email:
mailing_list[name] = email
for n, e in sorted(mailing_list.iteritems()):
name_list.append(n)
email_list.append(e)
db = MySQLdb.connect(host=host, user=user, db=dbname, passwd=pwd)
cursor = db.cursor()
cursor.execute("""INSERT INTO mailing_list (name,email) VALUES (%s,%s)""",
(name_list, email_list))
The problem when the cursor executes. This is the error: _mysql_exceptions.OperationalError: (1241, 'Operand should contain 1 column(s)') I tried putting my query into a var initially, but then it just barfed up a message about passing a tuple to execute().
What am I doing wrong? Is this even possible?
The list is huge and I definitely can't afford to put the insert into a loop. I looked at using LOAD DATA INFILE, but I really don't understand how to format the file or the query and my eyes bleed when I have to read MySQL docs. I know I could probably use some online xls to mysql converter, but this is a learning exercise for me as well. Is there a better way?
You need to give executemany() a list of rows. You don't need break the name and email out into separate lists, just create one list with both of the values in it.
rows = []
for row in range(sheet.nrows):
"""name is in the 0th col. email is the 4th col."""
name = sheet.cell(row, 0).value
email = sheet.cell(row, 4).value
rows.append((name, email))
db = MySQLdb.connect(host=host, user=user, db=dbname, passwd=pwd)
cursor = db.cursor()
cursor.executemany("""INSERT INTO mailing_list (name,email) VALUES (%s,%s)""", rows)
Update: as #JonClements mentions, it should be executemany() not execute().
To fix TypeError: not all arguments converted during string formatting - you need to use the cursor.executemany(...) method, as this accepts an iterable of tuples (more than one row), while cursor.execute(...) expects the parameter to be a single row value.
After the command is executed, you need to ensure that the transaction is committed to make the changes active in the database by using db.commit().
If you are interested in high-performance of the code, this answer may be better.
Compare to excutemany method, the below execute will much faster:
INSERT INTO mailing_list (name,email) VALUES ('Jim','jim#yahoo.com'),('Lucy','Lucy#gmail.com')
You can easily modify the answer from #Nathan Villaescusa and get the new code.
cursor.execute("""INSERT INTO mailing_list (name,email) VALUES (%s)""".format(",".join(str(i) for i in rows))
here is my own test result:
excutemany:10000 runs takes 220 seconds
execute:10000 runs takes 12 seconds.
The speed difference will be about 15 times.
Taking up the idea of #PengjuZhao, it should work to simply add one single placeholder for all values to be passed. The difference to #PengjuZhao's answer is that the values are passed as a second parameter to the execute() function, which should be injection attack safe because this is only evalutated during runtime (in contrast to ".format()").
cursor.execute("""INSERT INTO mailing_list (name,email) VALUES (%s)""", ",".join(str(i) for i in rows))
Only if this does not work properly, try the approach below.
####
#PengjuZhao's answer shows that executemany() has either a strong Python overhead or it uses multiple execute() statements where this is not needed, elsewise executemany() would not be so much slower than a single execute() statement.
Here is a function that puts NathanVillaescusa's and #PengjuZhao's answers in a single execute() approach.
The solution builds a dynamic number of placeholders to be added to the sql statement. It is a manually built execute() statement with multiple placeholders of "%s", which likely outperforms the executemany() statement.
For example, at 2 columns, inserting 100 rows:
execute(): 200 times "%s" (= dependent from the number of the rows)
executemany(): just 2 times "%s" (= independent from the number of the rows).
There is a chance that this solution has the high speed of #PengjuZhao's answer without risking injection attacks.
Prepare parameters of the function:
You will store your values in 1-dimensional numpy arrays arr_name and arr_email which are then converted in a list of concatenated values, row by row. Alternatively, you use the approach of #NathanVillaescusa.
from itertools import chain
listAllValues = list(chain([
arr_name.reshape(-1,1), arr_email.reshape(-1,1)
]))
column_names = 'name, email'
table_name = 'mailing_list'
Get sql query with placeholders:
The numRows = int((len(listAllValues))/numColumns) simply avoids passing the number of rows. If you insert 6 values in listAllValues at 2 columns this would make 6/2 = 3 rows then, obviously.
def getSqlInsertMultipleRowsInSqlTable(table_name, column_names, listAllValues):
numColumns = len(column_names.split(","))
numRows = int((len(listAllValues))/numColumns)
placeholdersPerRow = "("+', '.join(['%s'] * numColumns)+")"
placeholders = ', '.join([placeholdersPerRow] * numRows)
sqlInsertMultipleRowsInSqlTable = "insert into `{table}` ({columns}) values {values};".format(table=table_name, columns=column_names, values=placeholders)
return sqlInsertMultipleRowsInSqlTable
strSqlQuery = getSqlInsertMultipleRowsInSqlTable(table_name, column_names, listAllValues)
Execute strSqlQuery
Final step:
db = MySQLdb.connect(host=host, user=user, db=dbname, passwd=pwd)
cursor = db.cursor()
cursor.execute(strSqlQuery, listAllValues)
This solution is hopefully without the risk of injection attacks as in #PengjuZhao's answer since it fills the sql statement only with placeholders instead of values. The values are only passed separately in listAllValues at this point here, where strSqlQuery has only placeholders instead of values:
cursor.execute(strSqlQuery, listAllValues)
The execute() statement gets the sql statement with placeholders %s and the list of values in two separate parameters, as it is done in #NathanVillaescusa's answer. I am still not sure whether this avoids injection attacks. It is my understanding that injection attacks can only occur if the values are put directly in the sql statement, please comment if I am wrong.

Categories