I have an access table called "Cell_list" with a key column called "Cell_#". I want to read the table into a dataframe, but only the rows that match indices which are specified in a python list "cell_numbers".
I tried several variations on:
import pyodbc
import pandas as pd
cell_numbers = [1,3,7]
cnn_str = r'Driver={Microsoft Access Driver (*.mdb,*.accdb)};DBQ=C:\folder\myfile.accdb;'
conn = pyodbc.connect(cnn_str)
query = ('SELECT * FROM Cell_list WHERE Cell_# in '+tuple(cell_numbers))
df = pd.read_sql(query, conn)
But no matter what I try I get a syntax error.
How do I do this?
Consider best practice of parameterization which is supported in pandas.read_sql:
# PREPARED STATEMENT, NO DATA
query = (
'SELECT * FROM Cell_list '
'WHERE [Cell_#] IN (?, ?, ?)'
)
# RUN SQL WITH BINDED PARAMS
df = pd.read_sql(query, conn, params=cell_numbers)
Consider even dynamic qmark placeholders dependent on length of cell_numbers:
qmarks = [', '.join('?' for _ in cell_numbers)]
query = (
'SELECT * FROM Cell_list '
f'WHERE [Cell_#] IN ({qmarks})'
)
Convert (join) cell_numbers to text:
cell_text = '(1,3,7)'
and concatenate this.
The finished SQL should read (you may need brackets around the weird field name Cell_#):
SELECT * FROM Cell_list WHERE [Cell_#] IN (1,3,7)
Related
I would like to use a string as column names for pandas DataFrame.
The problem arised is that pandas DataFrame interpret the string var as single column instead of multiple ones. An thus the error:
ValueError: 1 columns passed, passed data had 11 columns
The first part of my code is intended to get the column names from the Mysql database I am about to query:
cursor1.execute ("SELECT GROUP_CONCAT(COLUMN_NAME) AS cols FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'or_red' AND TABLE_NAME = 'nomen_prefix'")
for colsTableMysql in cursor1.fetchall() :
colsTable = colsTableMysql[0]
colsTable="'"+colsTable.replace(",", "','")+"'"
The second part uses the created variable "colsTable" :
cursor = connection.cursor()
cursor.execute("SELECT * FROM or_red.nomen_prefix WHERE C_emp IN ("+emplazamientos+")")
tabla = pd.DataFrame(cursor.fetchall(),columns=[colsTable])
#tabla = exec("pd.DataFrame(cursor.fetchall(),columns=["+colsTable+"])")
#tabla = pd.DataFrame(cursor.fetchall())
I have tried ather aproaches like the use of exec(). In that case, there is no error but there is no response with information either, and the result of print(tabla) is None.
¿Is there any direct way of passing the columns dynamically as string to a python pandas DataFrame?
Thanks in advance
I am going to answer my question since I've already found the way.
The first part of my code is intended to get the column names from the Mysql database table I am about to query:
cursor1.execute ("SELECT GROUP_CONCAT(COLUMN_NAME) AS cols FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'or_red' AND TABLE_NAME = 'nomen_prefix'")
for colsTableMysql in cursor1.fetchall() :
colsTable = colsTableMysql[0]
colsTable="'"+colsTable.replace(",", "','")+"'"
The second part uses the created variable "colsTable" as input in the statement to define the columns.
cursor = connection.cursor()
cursor.execute("SELECT * FROM or_red.nomen_prefix WHERE C_emp IN ("+emplazamientos+")")
tabla = eval("pd.DataFrame(cursor.fetchall(),columns=["+colsTable+"])")
Using eval the string is parsed and evaluated as a Python expression.
I am using HANA Python Connector's Cursor.execute(sql,hanaparams) method, the parameters to this method are sql statement and hanaparams.My query is like this.
"SELECT * FROM TABLE WHRE COLUMN1 IN(?)" and My PARAMETES ARE VALUE1 ,VALUE2; LIKE LIST/TUPLE;
I am unable to retrieve resultset, Whereas when i run this in HANA with Query and Input Parameters Hard Coded in ,it runs perfectly fine
I am following this tutorial https://developers.sap.com/tutorials/hana-clients-python.html
Any pointers how should i Pass multiple values in Params
Something simple like this seems to work just fine. Count of ? must be equal to count of parameters you have. In your case it takes only VALUE1.
from hdbcli import dbapi
conn = dbapi.connect(
key='HDBKEY'
)
cursor = conn.cursor()
parameters = [11, "2020-12-24"]
params = '?,'*len(parameters)
params2 = params[0:-1]
sql_command2 = f"SELECT {params2} FROM DUMMY;"
cursor.execute(sql_command2, parameters)
rows = cursor.fetchall()
for row in rows:
for col in row:
print ("%s" % col, end=" ")
print (" ")
cursor.close()
conn.close()
So instead of SELECT * FROM TABLE WHERE COLUMN1 IN(?) it should be SELECT * FROM TABLE WHERE COLUMN1 IN(?, ?)
I have a dataframe df and I want to to execute a query to insert into a table all the values from the dataframe. Basically I am trying to load as the following query:
INSERT INTO mytable
SELECT *
FROM mydataframe
For that I have the following code:
import pyodbc
import pandas as pd
connection = pyodbc.connect('Driver={' + driver + '} ;'
'Server=' + server + ';'
'UID=' + user + ';'
'PWD=' + pass + ';')
cursor = connection.cursor()
query = 'SELECT * FROM [myDB].[dbo].[myTable]'
df = pd.read_sql_query(query, connection)
sql = 'INSERT INTO [dbo].[new_date] SELECT * FROM :x'
cursor.execute(sql, x=df)
connection.commit()
However, I am getting the following error:
TypeError: execute() takes no keyword arguments
Does anyone know what I am doing wrong?
For raw DB-API insert query from Pandas, consider DataFrame.to_numpy() with executemany and avoid any top layer for looping. However, explicit columns must be used in append query. Adjust below columns and qmark parameter placeholders to correspond to data frame columns.
# PREPARED STATEMENT
sql = '''INSERT INTO [dbo].[new_date] (Col1, Col2, Col3, ...)
VALUES (?, ?, ?, ...)
'''
# EXECUTE PARAMETERIZED QUERY
cursor.executemany(sql, df.to_numpy().tolist())
conn.commit()
(And by the way, it is best practice generally in SQL queries to always explicitly reference columns and avoid SELECT * for code readability, maintainability, and even performance.)
I had some issues to connect pandas with SQL Server too. But I've get this solution to write my df:
import pyodbc
import sqlalchemy
engine = sqlalchemy.create_engine('mssql+pyodbc://{0}:{1}#{2}:{3}/{4}?driver={5}'.format(username,password,server,port,bdName,driver))
pd.to_sql("TableName",con=engine,if_exists="append")
See below my favourite solution, with UPSERT statement included.
df_columns = list(df)
columns = ','.join(df_columns)
values = 'VALUES({})'.format(','.join(['%s' for col in df_columns]))
update_list = ['{} = EXCLUDED.{}'.format(col, col) for col in df_columns]
update_str = ','.join(update_list)
insert_stmt = "INSERT INTO {} ({}) {} ON CONFLICT ([your_pkey_here]) DO UPDATE SET {}".format(table, columns, values, update_str)
cursor.execute doesnot accepts keyword arguments. One way of doing the insert can be using following below code snippet.
cols = "`,`".join([str(i) for i in df.columns.tolist()])
# Insert DataFrame recrds one by one.
for i,row in df.iterrows():
sql = "INSERT INTO `[dbo].[new_date]` (`" +cols + "`) VALUES (" + "?,"*(len(row)-1) + "%s)"
cursor.execute(sql, tuple(row))
here you are iterating through each row and then inserting it into the table.
thank you for your answers :) but I use the following code to solve my problem:
params = urllib.parse.quote_plus("DRIVER={SQL Server};SERVER=servername;DATABASE=database;UID=user;PWD=pass")
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
engine.connect()
query = query
df = pd.read_sql_query(query, connection)
df.to_sql(name='new_table',con=engine, index=False, if_exists='append')
I am trying to write 2003 mdb files from scratch. I already have a file with the tables and column names (I have 112 columns). In my attempt I read lines from a pandas DataFrame (named sections in my code) and append those lines to the mdb file. But, when using the pyodbc INSERT INTO syntax it gave me this error:
ProgrammingError: ('42000', "[42000] [Microsoft][Driver ODBC Microsoft Access] Expression syntax error 'Equatorial-TB-BG-CA_IRI-1.0_SNP-1.0_ACA-0_ESAL-1000'. (-3100) (SQLExecDirectW)")
here is my code:
for k in range(len(sections)):
cols = tuple(list(sections.columns))
vals = tuple(list(sections.iloc[k]))
action = 'INSERT INTO SECTIONS {columns} VALUES {values}'.format(columns = str(cols).replace("'",""), values = str(vals).replace("'",""))
cursor.execute(action)
conn.commit()
Does anyone know why I am having this kind of problem?
Actually, this is not an Access specific error but a general SQL error where your string literals are not properly enclosed with quotes. Therefore, the Access engine assumes they are named fields further complicated by the hyphens where engine assumes you are running a subtraction expression.
To demonstrate the issue, see below filling in for your unknown values. Notice the string items passed in VALUES are not quoted:
sections_columns = ['database', 'tool']
cols = tuple(list(sections_columns))
sections_vals = ['ms-access', 'pandas']
vals = tuple(list(sections_vals))
action = 'INSERT INTO SECTIONS {columns} VALUES {values}'.\
format(columns = str(cols).replace("'",""), values = str(vals).replace("'",""))
print(action)
# INSERT INTO SECTIONS (database, tool) VALUES (ms-access, pandas)
Now, you could leave in the single quotes you replace in str(vals):
action = 'INSERT INTO SECTIONS {columns} VALUES {values}'.\
format(columns = str(cols).replace("'",""), values = str(vals))
print(action)
# INSERT INTO SECTIONS (database, tool) VALUES ('ms-access', 'pandas')
But even better, consider parameterizing the query with qmark placeholders and passing the values as parameters (second argument of cursor.execute(query, params)). This avoids any need to quote or unquote string or numeric values:
# MOVED OUTSIDE LOOP AS UNCHANGING OBJECTS
cols = tuple(sections.columns) # REMOVED UNNEEDED list()
qmarks = tuple(['?' for i in cols]) # NEW OBJECT
action = 'INSERT INTO SECTIONS {columns} VALUES {qmarks}'.\
format(columns = str(cols).replace("'",""), qmarks = str(qmarks))
# INSERT INTO SECTIONS (col1, col2, col3, ...) VALUES (?, ?, ?...)
for k in range(len(sections)):
vals = list(sections.iloc[k]) # REMOVED tuple()
cursor.execute(action, vals) # EXECUTE PARAMETERIZED QUERY
conn.commit()
Even much better, avoid any looping with executemany of DataFrame.values.tolist() using a prepared statement:
# PREPARED STATEMENT
cols = tuple(sections.columns)
qmarks = tuple(['?' for i in cols])
action = 'INSERT INTO SECTIONS {columns} VALUES {qmarks}'.\
format(columns = str(cols).replace("'",""), qmarks = str(qmarks))
# EXECUTE PARAMETERIZED QUERY
cursor.executemany(action, sections.values.tolist())
conn.commit()
Help!
I need to query a table with an "in" clause, where the SQL looks like this:
select * from some_table where some_field in (?)
I originally took a naive approach and tried this:
in_items = [1,2,3,4]
df = pd.read_sql(MY_SQL_STATEMENT, con=con, params=[in_items]
Which did not work, it throws the following error:
The SQL contains 1 parameter markers, but 4 parameters were supplied
Where I'm stuck at, is figuring out how to pass a list of items as a single parameter.
I can do a string concatenation approach, something like:
MY_SQL = 'select * from tableA where fieldA in ({})'.format(
','.join([str(x) from x in list_items]))
df = pd.read_sql(MY_SQL, con=con)
I would rather avoid this approach if possible. Does anybody know of a way to pass a list of values as a single parameter?
I'm also open to a possibly more cleverer way to do this. :)
Simply string format the placeholders then pass in your params into pandas.read_sql. Do note, placeholder markers depend on DB-API: pyodbc/sqlite3 uses qmarks ? and most others use %s. Below assumes the former marker:
in_items = [1,2,3,4]
MY_SQL = 'select * from tableA where fieldA in ({})'\
.format(', '.join(['?' for _ in in_items]))
# select * from tableA where fieldA in (?, ?, ?, ?)
df = pd.read_sql(MY_SQL, con=con, params=[in_items])
For me, using sqllite3, worked this way:
list_of_entries_to_retrive = pd.read_excel('.table_with_entries.xlsx')
list_of_entries_to_retrive = (cell_list['entries']).tolist()
conn = sqlite3.connect('DataBase.db')
queryString = 'SELECT * FROM table WHERE attribute IN (\'{}\');'.format('\',\''.join([_ for _ in list_of_entries_to_retrive]))
df = pd.read_sql(queryString, con=conn)
Do not worked this way:
df = pd.read_sql(queryString, con=conn, params=[list_of_entries_to_retrive]))
Thanks