I have an SQLite database. In said table there exists a column that has a "-" / minus sign in it. This SQLite database was created from python 3.6 with pandas (SQLAlchemy as the engine). The table and this column is created without problem. However when I want to build a query on this table I don't know how to escape the "-" character. Here is a short example:
#imports
import numpy as np
import pandas as pd
from sqlalchemy import create_engine
#create df
df = pd.DataFrame(np.random.rand(10,2),columns=['Column1','Prob-Column'])
# create engine to connect to db
engine = create_engine('sqlite://')
#create table in db
df.to_sql('my_table',engine,if_exists='replace')
# variables
vals = '(?)'
fil = ('key',)
# create sql string
sq = 'SELECT * FROM {t} WHERE {c1} IN {vals} GROUP BY {c2}'\
.format(t='my_table',c1='Column1',c2='Prob-Column',vals = vals)
#write query to pandas df
df = pd.read_sql_query(sq,engine,params=(fil))
the trace is as follows:
OperationalError: (sqlite3.OperationalError) no such column: Prob [SQL: 'SELECT * FROM my_table WHERE Column1 IN (?) GROUP BY Prob-Column'] [parameters: ('key',)] (Background on this error at: http://sqlalche.me/e/e3q8)
Here is the solution. The column name just needs double-quotes around it i.e. on the inside of the single quotes such that c2='"Prob-Column"'. Anyway hope this helps someone else.
#imports
import numpy as np
import pandas as pd
from sqlalchemy import create_engine
#create df
df = pd.DataFrame(np.random.rand(10,2),columns=['Column1','Prob-Column'])
# create engine to connect to db
engine = create_engine('sqlite://')
#create table in db
df.to_sql('my_table',engine,if_exists='replace')
# variables
vals = '(?)'
fil = ('key',)
# create sql string
sq = 'SELECT * FROM {t} WHERE {c1} IN {vals} GROUP BY {c2}'\
.format(t='my_table',c1='Column1',c2='"Prob-Column"',vals = vals)
#write query to pandas df
df = pd.read_sql_query(sq,engine,params=(fil))
Related
I'm looking to do a simple read_sql in pandas to use a variable to extract data in Python and SQL Server as such:
import pyodbc as cnn
import pandas as pd
cursor = cnn.Cursor
cnxn = cnn.connect('DRIVER={SQL Server};SERVER=SQLSERVER;DATABASE=DATABASE')
x = "FirstName"
tableResult = pd.read_sql(("SELECT * FROM TABLE where COLUMN = ?"),cnxn,index_col=None, coerce_float=True, params=x)
I get the following error however:
Execution failed on sql 'SELECT * FROM TABLE where COLUMN = ?': ('The SQL contains 1 parameter markers, but 9 parameters were supplied', 'HY000')
What am I doing wrong here?
I also tried this, which runs, but I don't know how to actually grab the results here:
tableResult2 = cnxn.execute("SELECT * FROM TABLE where COLUMN = ?", x)
It would be like this:
import pyodbc as cnn
import pandas as pd
cursor = cnn.Cursor
cnxn = cnn.connect('DRIVER={SQL Server};SERVER=SQLSERVER;DATABASE=DATABASE')
x = "FirstName"
query=("SELECT * FROM TABLE"
f"where COLUMN = '{x}'")
tableResult = pd.read_sql(query,cnxn)
The dataframe contains a varchar(10), which contains a date in the format 'YYYY-MM-DD', and I am inserting it into a varchar(10) column in a SQL table with pandas' to_sql method. The resulting table has the date formatted as 'MMM-DD-YYY' (abbreviated month name, truncated to 10 characters). Any ideas?
Here is a workable example, hard to test against your database of course :)
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('sqlite://', echo=False)
df = pd.DataFrame(['2022-10-18', '2022-09-07'], columns = ['dates'])
df['dates'] = pd.to_datetime(df['dates'], yearfirst = True).dt.date
df.to_sql('my_dates', con = engine, index = False)
engine.execute("SELECT dates FROM my_dates").fetchall()
I have an access table called "Cell_list" with a key column called "Cell_#". I want to read the table into a dataframe, but only the rows that match indices which are specified in a python list "cell_numbers".
I tried several variations on:
import pyodbc
import pandas as pd
cell_numbers = [1,3,7]
cnn_str = r'Driver={Microsoft Access Driver (*.mdb,*.accdb)};DBQ=C:\folder\myfile.accdb;'
conn = pyodbc.connect(cnn_str)
query = ('SELECT * FROM Cell_list WHERE Cell_# in '+tuple(cell_numbers))
df = pd.read_sql(query, conn)
But no matter what I try I get a syntax error.
How do I do this?
Consider best practice of parameterization which is supported in pandas.read_sql:
# PREPARED STATEMENT, NO DATA
query = (
'SELECT * FROM Cell_list '
'WHERE [Cell_#] IN (?, ?, ?)'
)
# RUN SQL WITH BINDED PARAMS
df = pd.read_sql(query, conn, params=cell_numbers)
Consider even dynamic qmark placeholders dependent on length of cell_numbers:
qmarks = [', '.join('?' for _ in cell_numbers)]
query = (
'SELECT * FROM Cell_list '
f'WHERE [Cell_#] IN ({qmarks})'
)
Convert (join) cell_numbers to text:
cell_text = '(1,3,7)'
and concatenate this.
The finished SQL should read (you may need brackets around the weird field name Cell_#):
SELECT * FROM Cell_list WHERE [Cell_#] IN (1,3,7)
I am trying to pass a list like below to a sql query
x = ['1000000000164774783','1000000000253252111']
I am using sqlalchemy and pyodbc to connect to sql:
import pandas as pd
from pandas import Series,DataFrame
import pyodbc
import sqlalchemy
cnx=sqlalchemy.create_engine("mssql+pyodbc://Omnius:MainBrain1#172.31.163.135:1433/Basis?driver=/opt/microsoft/sqlncli/lib64/libsqlncli-11.0.so.1790.0")
I tried using various string format functions to be used in the sql query. below is one of them
xx = ', '.join(x)
sql = "select * from Pretty_Txns where Send_Customer in (%s)" % xx
df = pd.read_sql(sql,cnx)
All of them seem to convert it into a numeric format
(1000000000164774783, 1000000000253252111) instead of ('1000000000164774783','1000000000253252111')
And hence the query fails as datatype of Send_Customer is varchar(50) in SQL
ProgrammingError: (pyodbc.ProgrammingError) ('42000', '[42000] [Microsoft][SQL Server Native Client 11.0]
[SQL Server]Error converting data type varchar to numeric. (8114) (SQLExecDirectW)')
[SQL: 'select * from Pretty_Txns where Send_Customer in (1000000000164774783, 1000000000253252111)']
As stated in the comment to the other answer, that approach can fail for a variety of reasons. What you really want to do is create an SQL statement with the required number of parameter placeholders and then use the params= parameter of .read_sql_query() to supply the values:
x = ['1000000000164774783','1000000000253252111']
placeholders = ','.join('?' for i in range(len(x))) # '?,?'
sql = f"select * from Pretty_Txns where Send_Customer in ({placeholders})"
df = pd.read_sql_query(sql, cnx, params=x)
Here's the SQL query you need
sql = f"select * from Pretty_Txns where Send_Customer in {tuple(x)}"
df = pd.read_sql(sql,cnx)
Used the below approach and it worked fine:
sql = "select * from Pretty_Txns where Send_Customer in %s" % str(tuple(x))
df = pd.read_sql(sql,cnx)
Making sqlalchemey, pyodbc to work with pandas read_sql() is a hairy and messy thing. After much frustration and bumping into various solutions and documentation from pandas and psycopg, here's the correct (so far) way to do a query with named parameter:
import pandas as pd
import psycopg2
# import pyodbc
import sqlalchemy
from sqlalchemy import text # this is crucial
cnx=sqlalchemy.create_engine(...)
x = ['1000000000164774783','1000000000253252111']
sql = "select * from Pretty_Txns where Send_Customer in (:id);" # named parameter
df = pd.read_sql(text(sql), cnx, params={'id':x}) # note how `sql`
# string is cast with text()
# and key-pair value is passed for
# named parameter 'id'
df.head()
I've made it work with the PostgreSQL database. I hope it wouldn't be too much different for MySQL.
I am trying to run a MySQL query that has a text wildcard in as demonstrated below:
import sqlalchemy
import pandas as pd
#connect to mysql database
engine = sqlalchemy.create_engine('mysql://user:#localhost/db?charset=utf8')
conn = engine.connect()
#read sql into pandas dataframe
mysql_statement = """SELECT * FROM table WHERE field LIKE '%part%'; """
df = pd.read_sql(mysql_statement, con=conn)
When run I get the error as shown below related to formatting.
TypeError: not enough arguments for format string
How can I use a wild card when reading MySQL with Pandas?
from sqlalchemy import create_engine, text
import pandas as pd
mysql_statement = """SELECT * FROM table WHERE field LIKE '%part%'; """
df = pd.read_sql( text(mysql_statement), con=conn)
You can use text() function from sqlalchemy
Under the hood of pandas, it's using whatever sql engine you've given it to parse the statement. In your case that's sqlalchemy, so you need to figure out how it handles %. It might be as easy as escaping it with LIKE '%%part%%'.
In the case of psycopg, you use the params variable like this:
mysql_statement = """SELECT * FROM table WHERE field LIKE %s; """
df = pd.read_sql(mysql_statement, con=conn, params=("%part%",))
you need to use params options in read_sql pandas method.
# string interpolate
mysql_statement = """SELECT * FROM table WHERE field LIKE %s; """
df = pd.read_sql(mysql_statement, con=conn, params=("%part%",))
# variable interpolate
# string interpolate
val = 'part'
mysql_statement = """SELECT * FROM table WHERE field LIKE %s; """
df = pd.read_sql(mysql_statement, con=conn, params=('%' + val + '%',))
'select * from _table_ where "_column_name_" like 'somethin%' order by... '
Putting "" around the _column_name_ solved this problem for me.