INSERT INTO SELECT based on a dataframe - python

I have a dataframe df and I want to to execute a query to insert into a table all the values from the dataframe. Basically I am trying to load as the following query:
INSERT INTO mytable
SELECT *
FROM mydataframe
For that I have the following code:
import pyodbc
import pandas as pd
connection = pyodbc.connect('Driver={' + driver + '} ;'
'Server=' + server + ';'
'UID=' + user + ';'
'PWD=' + pass + ';')
cursor = connection.cursor()
query = 'SELECT * FROM [myDB].[dbo].[myTable]'
df = pd.read_sql_query(query, connection)
sql = 'INSERT INTO [dbo].[new_date] SELECT * FROM :x'
cursor.execute(sql, x=df)
connection.commit()
However, I am getting the following error:
TypeError: execute() takes no keyword arguments
Does anyone know what I am doing wrong?

For raw DB-API insert query from Pandas, consider DataFrame.to_numpy() with executemany and avoid any top layer for looping. However, explicit columns must be used in append query. Adjust below columns and qmark parameter placeholders to correspond to data frame columns.
# PREPARED STATEMENT
sql = '''INSERT INTO [dbo].[new_date] (Col1, Col2, Col3, ...)
VALUES (?, ?, ?, ...)
'''
# EXECUTE PARAMETERIZED QUERY
cursor.executemany(sql, df.to_numpy().tolist())
conn.commit()
(And by the way, it is best practice generally in SQL queries to always explicitly reference columns and avoid SELECT * for code readability, maintainability, and even performance.)

I had some issues to connect pandas with SQL Server too. But I've get this solution to write my df:
import pyodbc
import sqlalchemy
engine = sqlalchemy.create_engine('mssql+pyodbc://{0}:{1}#{2}:{3}/{4}?driver={5}'.format(username,password,server,port,bdName,driver))
pd.to_sql("TableName",con=engine,if_exists="append")

See below my favourite solution, with UPSERT statement included.
df_columns = list(df)
columns = ','.join(df_columns)
values = 'VALUES({})'.format(','.join(['%s' for col in df_columns]))
update_list = ['{} = EXCLUDED.{}'.format(col, col) for col in df_columns]
update_str = ','.join(update_list)
insert_stmt = "INSERT INTO {} ({}) {} ON CONFLICT ([your_pkey_here]) DO UPDATE SET {}".format(table, columns, values, update_str)

cursor.execute doesnot accepts keyword arguments. One way of doing the insert can be using following below code snippet.
cols = "`,`".join([str(i) for i in df.columns.tolist()])
# Insert DataFrame recrds one by one.
for i,row in df.iterrows():
sql = "INSERT INTO `[dbo].[new_date]` (`" +cols + "`) VALUES (" + "?,"*(len(row)-1) + "%s)"
cursor.execute(sql, tuple(row))
here you are iterating through each row and then inserting it into the table.

thank you for your answers :) but I use the following code to solve my problem:
params = urllib.parse.quote_plus("DRIVER={SQL Server};SERVER=servername;DATABASE=database;UID=user;PWD=pass")
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
engine.connect()
query = query
df = pd.read_sql_query(query, connection)
df.to_sql(name='new_table',con=engine, index=False, if_exists='append')

Related

use python variable to read specific rows from access table using sqlalchemy

I have an access table called "Cell_list" with a key column called "Cell_#". I want to read the table into a dataframe, but only the rows that match indices which are specified in a python list "cell_numbers".
I tried several variations on:
import pyodbc
import pandas as pd
cell_numbers = [1,3,7]
cnn_str = r'Driver={Microsoft Access Driver (*.mdb,*.accdb)};DBQ=C:\folder\myfile.accdb;'
conn = pyodbc.connect(cnn_str)
query = ('SELECT * FROM Cell_list WHERE Cell_# in '+tuple(cell_numbers))
df = pd.read_sql(query, conn)
But no matter what I try I get a syntax error.
How do I do this?
Consider best practice of parameterization which is supported in pandas.read_sql:
# PREPARED STATEMENT, NO DATA
query = (
'SELECT * FROM Cell_list '
'WHERE [Cell_#] IN (?, ?, ?)'
)
# RUN SQL WITH BINDED PARAMS
df = pd.read_sql(query, conn, params=cell_numbers)
Consider even dynamic qmark placeholders dependent on length of cell_numbers:
qmarks = [', '.join('?' for _ in cell_numbers)]
query = (
'SELECT * FROM Cell_list '
f'WHERE [Cell_#] IN ({qmarks})'
)
Convert (join) cell_numbers to text:
cell_text = '(1,3,7)'
and concatenate this.
The finished SQL should read (you may need brackets around the weird field name Cell_#):
SELECT * FROM Cell_list WHERE [Cell_#] IN (1,3,7)

Insert data from pandas into sql db - keys doesn't fit columns

I have a database with around 10 columns. Sometimes I need to insert a row which has only 3 of the required columns, the rest are not in the dic.
The data to be inserted is a dictionary named row :
(this insert is to avoid duplicates)
row = {'keyword':'abc','name':'bds'.....}
df = pd.DataFrame([row]) # df looks good, I see columns and 1 row.
engine = getEngine()
connection = engine.connect()
df.to_sql('temp_insert_data_index', connection, if_exists ='replace',index=False)
result = connection.execute(('''
INSERT INTO {t} SELECT * FROM temp_insert_data_index
ON CONFLICT DO NOTHING''').format(t=table_name))
connection.close()
Problem : when I don't have all columns in the row(dic), it will insert dic fields by order (a 3 keys dic will be inserted to the first 3 columns) and not to the right columns. ( I expect the keys in dic to fit the db columns)
Why ?
Consider explicitly naming the columns to be inserted in INSERT INTO and SELECT clauses which is best practice for SQL append queries. Doing so, the dynamic query should work for all or subset of columns. Below uses F-string (available Python 3.6+) for all interpolation to larger SQL query:
# APPEND TO STAGING TEMP TABLE
df.to_sql('temp_insert_data_index', connection, if_exists='replace', index=False)
# STRING OF COMMA SEPARATED COLUMNS
cols = ", ".join(df.columns)
sql = (
f"INSERT INTO {table_name} ({cols}) "
f"SELECT {cols} FROM temp_insert_data_index "
"ON CONFLICT DO NOTHING"
)
result = connection.execute(sql)
connection.close()

How To Pass Mutiple Values As Parameter In Cursor.Execute Method OF HANA Python Connector

I am using HANA Python Connector's Cursor.execute(sql,hanaparams) method, the parameters to this method are sql statement and hanaparams.My query is like this.
"SELECT * FROM TABLE WHRE COLUMN1 IN(?)" and My PARAMETES ARE VALUE1 ,VALUE2; LIKE LIST/TUPLE;
I am unable to retrieve resultset, Whereas when i run this in HANA with Query and Input Parameters Hard Coded in ,it runs perfectly fine
I am following this tutorial https://developers.sap.com/tutorials/hana-clients-python.html
Any pointers how should i Pass multiple values in Params
Something simple like this seems to work just fine. Count of ? must be equal to count of parameters you have. In your case it takes only VALUE1.
from hdbcli import dbapi
conn = dbapi.connect(
key='HDBKEY'
)
cursor = conn.cursor()
parameters = [11, "2020-12-24"]
params = '?,'*len(parameters)
params2 = params[0:-1]
sql_command2 = f"SELECT {params2} FROM DUMMY;"
cursor.execute(sql_command2, parameters)
rows = cursor.fetchall()
for row in rows:
for col in row:
print ("%s" % col, end=" ")
print (" ")
cursor.close()
conn.close()
So instead of SELECT * FROM TABLE WHERE COLUMN1 IN(?) it should be SELECT * FROM TABLE WHERE COLUMN1 IN(?, ?)

Pandas read_sql with where clause using "in"

Help!
I need to query a table with an "in" clause, where the SQL looks like this:
select * from some_table where some_field in (?)
I originally took a naive approach and tried this:
in_items = [1,2,3,4]
df = pd.read_sql(MY_SQL_STATEMENT, con=con, params=[in_items]
Which did not work, it throws the following error:
The SQL contains 1 parameter markers, but 4 parameters were supplied
Where I'm stuck at, is figuring out how to pass a list of items as a single parameter.
I can do a string concatenation approach, something like:
MY_SQL = 'select * from tableA where fieldA in ({})'.format(
','.join([str(x) from x in list_items]))
df = pd.read_sql(MY_SQL, con=con)
I would rather avoid this approach if possible. Does anybody know of a way to pass a list of values as a single parameter?
I'm also open to a possibly more cleverer way to do this. :)
Simply string format the placeholders then pass in your params into pandas.read_sql. Do note, placeholder markers depend on DB-API: pyodbc/sqlite3 uses qmarks ? and most others use %s. Below assumes the former marker:
in_items = [1,2,3,4]
MY_SQL = 'select * from tableA where fieldA in ({})'\
.format(', '.join(['?' for _ in in_items]))
# select * from tableA where fieldA in (?, ?, ?, ?)
df = pd.read_sql(MY_SQL, con=con, params=[in_items])
For me, using sqllite3, worked this way:
list_of_entries_to_retrive = pd.read_excel('.table_with_entries.xlsx')
list_of_entries_to_retrive = (cell_list['entries']).tolist()
conn = sqlite3.connect('DataBase.db')
queryString = 'SELECT * FROM table WHERE attribute IN (\'{}\');'.format('\',\''.join([_ for _ in list_of_entries_to_retrive]))
df = pd.read_sql(queryString, con=conn)
Do not worked this way:
df = pd.read_sql(queryString, con=conn, params=[list_of_entries_to_retrive]))
Thanks

Using Python, how to take multiple tables returned from a SQL query?

I am trying to use Python to call a SQL query, with pyodbc.
It worked fine in the following way:
import pyodbc
import pandas.io.sql as psql
server_name = 'My_Server'
database_name = 'My_DB'
conn = pyodbc.connection("driver={SQL Server};server=" + server_name + ";database=" + database_name + ";trusted_connection=true")
sql_command = """ EXEC MY_DB.dbo.some_proc"""
df = psql.read_frame(sql_command, conn)
It was ok when some_proc returns only one table. But what can I do if some_proc returns multiple tables, e.g. two tables?
Many thanks.
Borrowed from Stored Procedure Multiple Tables - PYODBC - Python
Ensure you have SET NOCOUNT ON in the stored procedure or none of this will work.
The following creates a list of dataframes where each index is a table returned from the stored procedure.
sql = f"EXEC dbo.StoredProcedure '{param1}', '{param2}'"
cur = cnxn.cursor()
df_list = []
# get First result
rows = cur.execute(sql).fetchall()
columns = [column[0] for column in cur.description]
df_list.append(pd.DataFrame.from_records(rows, columns=columns))
# check for more results
while (cur.nextset()):
rows = cur.fetchall()
columns = [column[0] for column in cur.description]
df_list.append(pd.DataFrame.from_records(rows, columns=columns))
cur.close()
Then reference df_list[0].head() etc

Categories