Pandas read_sql_query using multiple AND statements - python

I am trying to put together a SQL query in python pandas. I have attempted different methods, but always getting the following error:
Incorrect number of bindings supplied. The current statement uses 6, and there are 3 supplied.
My code is as follows. What am I doing wrong?
conn = sqlite3.connect(my_db)
df = pd.read_sql_query(
con = conn,
sql = """SELECT * FROM {table}
WHERE #var1 = (?)
AND #var2 = (?)
AND #var3 = (?)
;""".format(table=table),
params= (value1, value2, value3),
)

As you've said, #var1, #var2 and #var3 are all column names.
However, SQL interprets the # symbol as a parameter for a value which you will supply later in your code.
So, your SQL code expects 6 values because of the three (?)s and three #vars. But you're only supplying 3 values (for the (?)s), meaning that the said error is occurring.
I would recommend naming your columns something without '#' so that there is less chance for errors.
See this question for further clarification.

sqlite interprets the # symbol, like ?, as a parameter placeholder (Search for "Parameters"). If #var1 is the name of a column, then it must escaped by surrounding it with backticks:
df = pd.read_sql_query(
con = conn,
sql = """SELECT * FROM {table}
WHERE `#var1` = (?)
AND `#var2` = (?)
AND `#var3` = (?)""".format(table=table),
params= (value1, value2, value3), )
I agree with #AdiC, though -- it would be more convenient to rename your columns so that they do not use characters with special meaning.

Related

use string as columns definition for DataFrame(cursor.fetchall(),columns

I would like to use a string as column names for pandas DataFrame.
The problem arised is that pandas DataFrame interpret the string var as single column instead of multiple ones. An thus the error:
ValueError: 1 columns passed, passed data had 11 columns
The first part of my code is intended to get the column names from the Mysql database I am about to query:
cursor1.execute ("SELECT GROUP_CONCAT(COLUMN_NAME) AS cols FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'or_red' AND TABLE_NAME = 'nomen_prefix'")
for colsTableMysql in cursor1.fetchall() :
colsTable = colsTableMysql[0]
colsTable="'"+colsTable.replace(",", "','")+"'"
The second part uses the created variable "colsTable" :
cursor = connection.cursor()
cursor.execute("SELECT * FROM or_red.nomen_prefix WHERE C_emp IN ("+emplazamientos+")")
tabla = pd.DataFrame(cursor.fetchall(),columns=[colsTable])
#tabla = exec("pd.DataFrame(cursor.fetchall(),columns=["+colsTable+"])")
#tabla = pd.DataFrame(cursor.fetchall())
I have tried ather aproaches like the use of exec(). In that case, there is no error but there is no response with information either, and the result of print(tabla) is None.
¿Is there any direct way of passing the columns dynamically as string to a python pandas DataFrame?
Thanks in advance
I am going to answer my question since I've already found the way.
The first part of my code is intended to get the column names from the Mysql database table I am about to query:
cursor1.execute ("SELECT GROUP_CONCAT(COLUMN_NAME) AS cols FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'or_red' AND TABLE_NAME = 'nomen_prefix'")
for colsTableMysql in cursor1.fetchall() :
colsTable = colsTableMysql[0]
colsTable="'"+colsTable.replace(",", "','")+"'"
The second part uses the created variable "colsTable" as input in the statement to define the columns.
cursor = connection.cursor()
cursor.execute("SELECT * FROM or_red.nomen_prefix WHERE C_emp IN ("+emplazamientos+")")
tabla = eval("pd.DataFrame(cursor.fetchall(),columns=["+colsTable+"])")
Using eval the string is parsed and evaluated as a Python expression.

Parameterized Python SQLite3 query is returning the first parameter

I'm trying to make a query to a SQLite database from a python script. However, whenever I use parameterization it just returns the first parameter, which is column2. The desired result is for it to return the value held in column2 on the row where column1 is equal to row1.
conn = sqlite3.connect('path/to/database')
c = conn.cursor()
c.execute('SELECT ? from table WHERE column1 = ? ;', ("column2","row1"))
result = c.fetchone()[0]
print(result)
It prints
>>column2
Whenever I run this using concatenated strings, it works fine.
conn = sqlite3.connect('path/to/database')
c = conn.cursor()
c.execute('SELECT ' + column2 + ' from table WHERE column1 = ' + row1 + ';')
result = c.fetchone()[0]
print(result)
And it prints:
>>desired data
Any idea why this is happening?
This behaves as designed.
The mechanism that parameterized queries provide is meant to pass literal values to the query, not meta information such as column names.
One thing to keep in mind is that the database must be able to parse the parameterized query string without having the parameter at hand: obviously, a column name cannot be used as parameter under such assumption.
For your use case, the only possible solution is to concatenate the column name into the query string, as shown in your second example. If the parameter comes from outside your code, be sure to properly validate it before that (for example, by checking it against a fixed list of values).

Syntax error when inserting into 2003 MDB file with Pyodbc

I am trying to write 2003 mdb files from scratch. I already have a file with the tables and column names (I have 112 columns). In my attempt I read lines from a pandas DataFrame (named sections in my code) and append those lines to the mdb file. But, when using the pyodbc INSERT INTO syntax it gave me this error:
ProgrammingError: ('42000', "[42000] [Microsoft][Driver ODBC Microsoft Access] Expression syntax error 'Equatorial-TB-BG-CA_IRI-1.0_SNP-1.0_ACA-0_ESAL-1000'. (-3100) (SQLExecDirectW)")
here is my code:
for k in range(len(sections)):
cols = tuple(list(sections.columns))
vals = tuple(list(sections.iloc[k]))
action = 'INSERT INTO SECTIONS {columns} VALUES {values}'.format(columns = str(cols).replace("'",""), values = str(vals).replace("'",""))
cursor.execute(action)
conn.commit()
Does anyone know why I am having this kind of problem?
Actually, this is not an Access specific error but a general SQL error where your string literals are not properly enclosed with quotes. Therefore, the Access engine assumes they are named fields further complicated by the hyphens where engine assumes you are running a subtraction expression.
To demonstrate the issue, see below filling in for your unknown values. Notice the string items passed in VALUES are not quoted:
sections_columns = ['database', 'tool']
cols = tuple(list(sections_columns))
sections_vals = ['ms-access', 'pandas']
vals = tuple(list(sections_vals))
action = 'INSERT INTO SECTIONS {columns} VALUES {values}'.\
format(columns = str(cols).replace("'",""), values = str(vals).replace("'",""))
print(action)
# INSERT INTO SECTIONS (database, tool) VALUES (ms-access, pandas)
Now, you could leave in the single quotes you replace in str(vals):
action = 'INSERT INTO SECTIONS {columns} VALUES {values}'.\
format(columns = str(cols).replace("'",""), values = str(vals))
print(action)
# INSERT INTO SECTIONS (database, tool) VALUES ('ms-access', 'pandas')
But even better, consider parameterizing the query with qmark placeholders and passing the values as parameters (second argument of cursor.execute(query, params)). This avoids any need to quote or unquote string or numeric values:
# MOVED OUTSIDE LOOP AS UNCHANGING OBJECTS
cols = tuple(sections.columns) # REMOVED UNNEEDED list()
qmarks = tuple(['?' for i in cols]) # NEW OBJECT
action = 'INSERT INTO SECTIONS {columns} VALUES {qmarks}'.\
format(columns = str(cols).replace("'",""), qmarks = str(qmarks))
# INSERT INTO SECTIONS (col1, col2, col3, ...) VALUES (?, ?, ?...)
for k in range(len(sections)):
vals = list(sections.iloc[k]) # REMOVED tuple()
cursor.execute(action, vals) # EXECUTE PARAMETERIZED QUERY
conn.commit()
Even much better, avoid any looping with executemany of DataFrame.values.tolist() using a prepared statement:
# PREPARED STATEMENT
cols = tuple(sections.columns)
qmarks = tuple(['?' for i in cols])
action = 'INSERT INTO SECTIONS {columns} VALUES {qmarks}'.\
format(columns = str(cols).replace("'",""), qmarks = str(qmarks))
# EXECUTE PARAMETERIZED QUERY
cursor.executemany(action, sections.values.tolist())
conn.commit()

Pandas read_sql with where clause using "in"

Help!
I need to query a table with an "in" clause, where the SQL looks like this:
select * from some_table where some_field in (?)
I originally took a naive approach and tried this:
in_items = [1,2,3,4]
df = pd.read_sql(MY_SQL_STATEMENT, con=con, params=[in_items]
Which did not work, it throws the following error:
The SQL contains 1 parameter markers, but 4 parameters were supplied
Where I'm stuck at, is figuring out how to pass a list of items as a single parameter.
I can do a string concatenation approach, something like:
MY_SQL = 'select * from tableA where fieldA in ({})'.format(
','.join([str(x) from x in list_items]))
df = pd.read_sql(MY_SQL, con=con)
I would rather avoid this approach if possible. Does anybody know of a way to pass a list of values as a single parameter?
I'm also open to a possibly more cleverer way to do this. :)
Simply string format the placeholders then pass in your params into pandas.read_sql. Do note, placeholder markers depend on DB-API: pyodbc/sqlite3 uses qmarks ? and most others use %s. Below assumes the former marker:
in_items = [1,2,3,4]
MY_SQL = 'select * from tableA where fieldA in ({})'\
.format(', '.join(['?' for _ in in_items]))
# select * from tableA where fieldA in (?, ?, ?, ?)
df = pd.read_sql(MY_SQL, con=con, params=[in_items])
For me, using sqllite3, worked this way:
list_of_entries_to_retrive = pd.read_excel('.table_with_entries.xlsx')
list_of_entries_to_retrive = (cell_list['entries']).tolist()
conn = sqlite3.connect('DataBase.db')
queryString = 'SELECT * FROM table WHERE attribute IN (\'{}\');'.format('\',\''.join([_ for _ in list_of_entries_to_retrive]))
df = pd.read_sql(queryString, con=conn)
Do not worked this way:
df = pd.read_sql(queryString, con=conn, params=[list_of_entries_to_retrive]))
Thanks

passing string arguments to filter database rows in python

i have a written the below function to filter a column in a sql query, the function takes a string argument which will be inputted in the 'where clause'
def summaryTable(machineid):
df=pd.read_sql(""" SELECT fld_ATM FROM [003_tbl_ATM_Tables]
WHERE (LINK <> 1) AND (fld_ATM =('machineid')) ;
""",connection)
connection.close()
return df
the function returns an empty Dataframe. i know the query itself is correct 'cause i get the expected data when i 'hardcode' the machine id
Use params to pass a tuple of parameters including machineid to read_sql. pyodbc replaces the ? character in your query with parameters from the tuple, in order. Their values will be safely substituted at runtime. This avoids dangerous string formatting issues which may result in SQL injection.
df = pd.read_sql(""" SELECT fld_ATM FROM [003_tbl_ATM_Tables]
WHERE (LINK <> 1) AND (fld_ATM = ?) ;
""", connection, params=(machineid,))
You need to add machineid to query using params.
# ? is the placeholder style used by pyodbc. Some use %s, for example.
query = """ SELECT fld_ATM FROM [003_tbl_ATM_Tables]
WHERE (LINK <> 1) AND (fld_ATM = ?) ;
"""
data_df = pd.read_sql_query(query, engine, params=(machineid, ))

Categories