DBAPI syntax with pd.read_sql_query() call - python

I want to read all of the tables contained in a database into pandas data frames. This answer does what I want to accomplish, but I'd like to use the DBAPI syntax with the ? instead of the %s, per the documentation. However, I ran into an error. I thought this answer may address the problem, but I'm now posting my own question because I can't figure it out.
Minimal example
import pandas as pd
import sqlite3
pd.__version__ # 0.19.1
sqlite3.version # 2.6.0
excon = sqlite3.connect('example.db')
c = excon.cursor()
c.execute('''CREATE TABLE stocks
(date text, trans text, symbol text, qty real, price real)''')
c.execute("INSERT INTO stocks VALUES ('2006-01-05', 'BUY', 'RHAT', 100, 35.14)")
c.execute('''CREATE TABLE bonds
(date text, trans text, symbol text, qty real, price real)''')
c.execute("INSERT INTO bonds VALUES ('2015-01-01', 'BUY', 'RSOCK', 90, 23.11)")
data = pd.read_sql_query('SELECT * FROM stocks', excon)
# >>> data
# date trans symbol qty price
# 0 2006-01-05 BUY RHAT 100.0 35.14
But when I include a ? or a (?) as below, I get the error message pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT * FROM (?)': near "?": syntax error.
Problem code
c.execute("SELECT name FROM sqlite_master WHERE type='table';")
tables = c.fetchall()
# >>> tables
# [('stocks',), ('bonds',)]
table = tables[0]
data = pd.read_sql_query("SELECT * FROM ?", excon, params=table)
It's probably something trivial that I'm missing, but I'm not seeing it!

The problem is that you're trying to use parameter substitution for a table name, which is not possible. There's an issue on GitHub that discusses this. The relevant part is at the very end of the thread, in a comment by #jorisvandenbossche:
Parameter substitution is not possible for the table name AFAIK.
The thing is, in sql there is often a difference between string
quoting, and variable quoting (see eg
https://sqlite.org/lang_keywords.html the difference in quoting
between string and identifier). So you are filling in a string, which
is for sql something else as a variable name (in this case a table
name).

Parameter substitution is essential to prevent SQL Injection from unsafe user-entered values.
In this particular example you are sourcing table names directly from the database's own metadata, which is already safe, so it's OK to just use normal string formatting to construct the query, but still good to wrap the table names in quotes.
If you are sourcing user-entered table names, you can also parameterize them first before using them in your normal python string formatting.
e.g.
# assume this is user-entered:
table = '; select * from members; DROP members --'
c.execute("SELECT name FROM sqlite_master WHERE type='table' and name = ?;", excon, params=table )
tables = c.fetchall()
In this case the user has entered some malicious input intended to cause havoc, and the parameterized query will cleanse it and the query will return no rows.
If the user entered a clean table e.g. table = 'stocks' then the above query would return that same name back to you, through the wash, and it is now safe.
Then it is fine to continue with normal python string formatting, in this case using f-string style:
table = tables[0]
data = pd.read_sql_query(f"""SELECT * FROM "{table}" ;""", excon)
Referring back to your original example, my first step above is entirely unnecessary. I just provided it for context. It is unnecessary, because there is no user input so you could just do something like this to get a dictionary of dataframes for every table.
c.execute("SELECT name FROM sqlite_master WHERE type='table';")
tables = c.fetchall()
# >>> tables
# [('stocks',), ('bonds',)]
dfs = dict()
for t in tables:
dfs[t] = pd.read_sql_query(f"""SELECT * FROM "{t}" ;""", excon)
Then you can fetch the dataframe from the dictionary using the tablename as the key.

Related

Python's sqlite3 module: get query with escaped parameters [duplicate]

I'm trying to debug a SQL statement generated with sqlite3 python module...
c.execute("SELECT * FROM %s WHERE :column = :value" % Photo.DB_TABLE_NAME, {"column": column, "value": value})
It is returning no rows when I do a fetchall()
When I run this directly on the database
SELECT * FROM photos WHERE album_id = 10
I get the expected results.
Is there a way to see the constructed query to see what the issue is?
To actually answer your question, you can use the set_trace_callback of the connection object to attach the print function; this will make all queries get printed when they are executed. Here is an example in action:
# Import and connect to database
import sqlite3
conn = sqlite3.connect('example.db')
# This attaches the tracer
conn.set_trace_callback(print)
# Get the cursor, execute some statement as an example
c = conn.cursor()
c.execute("CREATE TABLE stocks (symbol text)")
t = ('RHAT',)
c.execute("INSERT INTO stocks VALUES (?)", t)
c.execute('SELECT * FROM stocks WHERE symbol=?', t)
print(c.fetchone())
This produces the output:
CREATE TABLE stocks (symbol text)
BEGIN
INSERT INTO stocks VALUES ('RHAT')
SELECT * FROM stocks WHERE symbol='RHAT'
('RHAT',)
the problem here is that the string values are automatically embraced with single quotes. You can not dynamically insert column names that way.
Concerning your question, I'm not sure about sqlite3, but in MySQLdb you can get the final query as something like (I am currently not at a computer to check):
statement % conn.literal(query_params)
You can only use substitution parameters for row values, not column or table names.
Thus, the :column in SELECT * FROM %s WHERE :column = :value is not allowed.

Column does not exist (SQLAlchemy / PostgreSQL): Trouble with quotation marks

Im having trouble with a postgresql query using SQLAlchemy.
I created some large tables using this line of code:
frame.to_sql('Table1', con=engine, method='multi', if_exists='append')
It worked fine. Now, when I want to query data out of it, my first problem is that I have to use quotation marks for each table and column name and I dont really know why, maybe somebody can help me out there.
That is not my main problem though. My main problem is, that when querying the data, all numerical WHERE conditions work fine, but not the ones with Strings in the column data. I get an error that the column does not exist. Im using:
df = pd.read_sql_query('SELECT "variable1", "variable2" FROM "Table1" WHERE "variable1" = 123 AND "variable2" = "abc" ', engine)
I think it might be a problem that I use "abc" instead of 'abc', but I cant change it because of the ' signs in the argument of the query. If I change those ' to " then the Column names and Table names are not detected correctly (because of the problem before that they have to be in quotation marks).
This is the error message:
ProgrammingError: (psycopg2.errors.UndefinedColumn) ERROR: COLUMN »abc« does not exist
LINE 1: ...er" FROM "Table1" WHERE "variable2" = "abc"
And there is an arrow pointing to the first quotation mark of the "abc".
Im new to SQL and I would really appreciate if someone could point me in the right direction.
"Most" SQL dialects (notable exceptions being MS SQL Server and MS Access) strictly differentiate between
single quotes: for string literals, e.g., WHERE thing = 'foo'
double quotes: for object (table, column) names, e.g., WHERE "some col" = 123
PostgreSQL throws in the added wrinkle that table/column names are forced to lower case if they are not (double-)quoted and then uses case-sensitive matching, so if your table is named Table1 then
SELECT * FROM Table1 will fail because PostgreSQL will look for table1, but
SELECT * FROM "Table1" will succeed.
The way to avoid confusion in your query is to use query parameters instead of string literals:
# set up test environment
with engine.begin() as conn:
conn.exec_driver_sql('DROP TABLE IF EXISTS "Table1"')
conn.exec_driver_sql('CREATE TABLE "Table1" (variable1 int, variable2 varchar(50))')
df1 = pd.DataFrame([(123, "abc"), (456, "def")], columns=["variable1", "variable2"])
df1.to_sql("Table1", engine, index=False, if_exists="append")
# test .read_sql_query() with parameters
import sqlalchemy as sa
sql = sa.text('SELECT * FROM "Table1" WHERE variable1 = :v1 AND variable2 = :v2')
param_dict = {"v1": 123, "v2": "abc"}
df2 = pd.read_sql_query(sql, engine, params=param_dict)
print(df2)
"""
variable1 variable2
0 123 abc
"""
It should be: AND "variable2" = 'abc'.
You cannot quote strings/literals with ", as PostgreSQL will interpret it as a database object. Btw. you do not need to wrap table names and and columns with double quotes unless it is extremely necessary, e.g. case sensitive object names, names containing spaces, etc. Imho it is a bad practice and on the long run only leads to confusion. So your query could be perfectly written as follows:
SELECT variable1, variable2
FROM table1
WHERE variable1 = 123 AND variable2 = 'abc';
Keep in mind that it also applies for other objects, like tables or indexes.
CREATE TABLE Table1 (id int) - nice.
CREATE TABLE "Table1" (id int) - not nice.
CREATE TABLE "Table1" ("id" int) - definitely not nice ;)
In case you want to remove the unnecessary double quotes from your table name:
ALTER TABLE "Table1" RENAME TO table1;
Demo: db<>fiddle

Set Sqlite query results as variables [duplicate]

This question already has answers here:
How can I get dict from sqlite query?
(16 answers)
Closed 4 years ago.
Issue:
Hi, right now I am making queries to sqlite and assigning the result to variables like this:
Table structure: rowid, name, something
cursor.execute("SELECT * FROM my_table WHERE my_condition = 'ExampleForSO'")
found_record = cursor.fetchone()
record_id = found_record[0]
record_name = found_record[1]
record_something = found_record[2]
print(record_name)
However, it's very possible that someday I have to add a new column to the table. Let's put the example of adding that column:
Table structure: rowid, age, name, something
In that scenario, if we run the same code, name and something will be assigned wrongly and the print will not get me the name but the age, so I have to edit the code manually to fit the current index. However, I am working now with tables of more than 100 fields for a complex UI and doing this is tiresome.
Desired output:
I am wondering if there is a better way to catch results by using dicts or something like this:
Note for lurkers: The next snipped is made up code that does not works, do not use it.
cursor.execute_to(my_dict,
'''SELECT rowid as my_dict["id"],
name as my_dict["name"],
something as my_dict["something"]
FROM my_table WHERE my_condition = "ExampleForSO"''')
print(my_dict['name'])
I am probably wrong with this approach, but that's close to what I want. That way if I don't access the results as an index, and if add a new column, no matter where it's, the output would be the same.
What is the correct way to achieve it? Is there any other alternatives?
You can use namedtuple and then specify connection.row_factory in sqlite. Example:
import sqlite3
from collections import namedtuple
# specify my row structure using namedtuple
MyRecord = namedtuple('MyRecord', 'record_id record_name record_something')
con = sqlite3.connect(":memory:")
con.isolation_level = None
con.row_factory = lambda cursor, row: MyRecord(*row)
cur = con.cursor()
cur.execute("CREATE TABLE my_table (record_id integer PRIMARY KEY, record_name text NOT NULL, record_something text NOT NULL)")
cur.execute("INSERT INTO my_table (record_name, record_something) VALUES (?, ?)", ('Andrej', 'This is something'))
cur.execute("INSERT INTO my_table (record_name, record_something) VALUES (?, ?)", ('Andrej', 'This is something too'))
cur.execute("INSERT INTO my_table (record_name, record_something) VALUES (?, ?)", ('Adrika', 'This is new!'))
for row in cur.execute("SELECT * FROM my_table WHERE record_name LIKE 'A%'"):
print(f'ID={row.record_id} NAME={row.record_name} SOMETHING={row.record_something}')
con.close()
Prints:
ID=1 NAME=Andrej SOMETHING=This is something
ID=2 NAME=Andrej SOMETHING=This is something too
ID=3 NAME=Adrika SOMETHING=This is new!

Using instance variables in SQLite3 update?

Ok so basically I'm trying to update an existing SQLite3 Database with instance variables (typ and lvl)
#Set variables
typ = 'Test'
lvl = 6
#Print Databse
print("\nHere's a listing of all the records in the table:\n")
for row in cursor.execute("SELECT rowid, * FROM fieldmap ORDER BY rowid"):
print(row)
#Update Info
sql = """
UPDATE fieldmap
SET buildtype = typ, buildlevel = lvl
WHERE rowid = 11
"""
cursor.execute(sql)
#Print Databse
print("\nHere's a listing of all the records in the table:\n")
for row in cursor.execute("SELECT rowid, * FROM fieldmap ORDER BY rowid"):
print(row)
As an Error I'm getting
sqlite3.OperationalError: no such column: typ
Now I basically know the problem is that my variable is inserted with the wrong syntax but I can not for the life of me find the correct one. It works with strings and ints just fine like this:
sql = """
UPDATE fieldmap
SET buildtype = 'house', buildlevel = 3
WHERE rowid = 11
"""
But as soon as I switch to the variables it throws the error.
Your query is not actually inserting the values of the variables typ and lvl into the query string. As written the query is trying to reference columns named typ and lvl, but these don't exist in the table.
Try writing is as a parameterised query:
sql = """
UPDATE fieldmap
SET buildtype = ?, buildlevel = ?
WHERE rowid = 11
"""
cursor.execute(sql, (typ, lvl))
The ? acts as a placeholder in the query string which is replaced by the values in the tuple passed to execute(). This is a secure way to construct the query and avoids SQL injection vulnerabilities.
Hey I think you should use ORM to manipulate with SQL database.
SQLAlchemy is your friend. I use that with SQLite, MySQL, PostgreSQL. It is fantastic.
That can make you get away from this syntax error since SQL does take commas and quotation marks as importance.
For hard coding, you may try this:
sql = """
UPDATE fieldmap
SET buildtype = '%s', buildlevel = 3
WHERE rowid = 11
""" % (house)
This can solve your problem temporarily but not for the long run. ORM is your friend.
Hope this could be helpful!

sqlite3 with python - query debugging - print the final query

I'm trying to debug a SQL statement generated with sqlite3 python module...
c.execute("SELECT * FROM %s WHERE :column = :value" % Photo.DB_TABLE_NAME, {"column": column, "value": value})
It is returning no rows when I do a fetchall()
When I run this directly on the database
SELECT * FROM photos WHERE album_id = 10
I get the expected results.
Is there a way to see the constructed query to see what the issue is?
To actually answer your question, you can use the set_trace_callback of the connection object to attach the print function; this will make all queries get printed when they are executed. Here is an example in action:
# Import and connect to database
import sqlite3
conn = sqlite3.connect('example.db')
# This attaches the tracer
conn.set_trace_callback(print)
# Get the cursor, execute some statement as an example
c = conn.cursor()
c.execute("CREATE TABLE stocks (symbol text)")
t = ('RHAT',)
c.execute("INSERT INTO stocks VALUES (?)", t)
c.execute('SELECT * FROM stocks WHERE symbol=?', t)
print(c.fetchone())
This produces the output:
CREATE TABLE stocks (symbol text)
BEGIN
INSERT INTO stocks VALUES ('RHAT')
SELECT * FROM stocks WHERE symbol='RHAT'
('RHAT',)
the problem here is that the string values are automatically embraced with single quotes. You can not dynamically insert column names that way.
Concerning your question, I'm not sure about sqlite3, but in MySQLdb you can get the final query as something like (I am currently not at a computer to check):
statement % conn.literal(query_params)
You can only use substitution parameters for row values, not column or table names.
Thus, the :column in SELECT * FROM %s WHERE :column = :value is not allowed.

Categories