psycopg2 - insert into variable coumns using extras.batch_execution - python

I am inserting a pandas dataframe into postgres using psycopg2.
Below code:
...
import psycopg2.extras as extras
tuples = [tuple(x) for x in df.to_numpy()]
cols = ','.join(list(column_list))
query = "INSERT INTO %s(%s) VALUES (%%s,%%s,%%s,%%s,%%s)" % (table , cols)
extras.execute_batch(cursor, query, tuples, page_size = 100)
...
This works!
Here, I convert df into tuple, and I think %%s is taking this values at runtime when extras.execute_batch is executed.
The problem is that for this, I need to hardcode %%s, number of times the columns.
In this example its 5 columns, hence I am using %%s,%%s,%%s,%%s,%%s.
Is there a way to have it variable?
Here is what I tried:
...
tuples = [tuple(x) for x in df.to_numpy()]
cols = ','.join(list(column_list))
vals_frame = len(column_list) * """%%s,"""
vals_frame = vals_frame[:-1]
print('vals_frame: ',vals_frame)
query = query = "INSERT INTO %s(%s) VALUES("+vals_frame+")" % (table , cols)
extras.execute_batch(cursor, query, tuples, page_size = 100)
...
This prints:
vals_frame: '%%s,%%s,%%s,%%s,%%s'
which is what I want, but I get below error while creation of query:
TypeError: not all arguments converted during string formatting
How to get past this?
I have tried:
vals_frame = len(column_list) * """\%\%s,"""
vals_frame = len(column_list) * """\\%%s,"""
but this does not seem to work. Can some one help?

The problem is the location of the %. Because of operator precedence, % binds tighter than +. So:
query = "INSERT INTO %s(%s) VALUES("+vals_frame+")" % (table , cols)
The % operator here applies to the string ")". Here are some alternatives to consider:
query = "INSERT INTO %s(%s) VALUES(" % (table, cols) +vals_frame+")"
query = ("INSERT INTO %s(%s) VALUES("+vals_frame+")") % (table , cols)
query = "INSERT INTO %s(%s) VALUES(%s)" % (table, cols, vals_frame)
Alternatively, avoid the problem by using f-strings:
query = f"INSERT INTO {table}({cols}) VALUES({vals_frame});"

Related

INSERT INTO SELECT based on a dataframe

I have a dataframe df and I want to to execute a query to insert into a table all the values from the dataframe. Basically I am trying to load as the following query:
INSERT INTO mytable
SELECT *
FROM mydataframe
For that I have the following code:
import pyodbc
import pandas as pd
connection = pyodbc.connect('Driver={' + driver + '} ;'
'Server=' + server + ';'
'UID=' + user + ';'
'PWD=' + pass + ';')
cursor = connection.cursor()
query = 'SELECT * FROM [myDB].[dbo].[myTable]'
df = pd.read_sql_query(query, connection)
sql = 'INSERT INTO [dbo].[new_date] SELECT * FROM :x'
cursor.execute(sql, x=df)
connection.commit()
However, I am getting the following error:
TypeError: execute() takes no keyword arguments
Does anyone know what I am doing wrong?
For raw DB-API insert query from Pandas, consider DataFrame.to_numpy() with executemany and avoid any top layer for looping. However, explicit columns must be used in append query. Adjust below columns and qmark parameter placeholders to correspond to data frame columns.
# PREPARED STATEMENT
sql = '''INSERT INTO [dbo].[new_date] (Col1, Col2, Col3, ...)
VALUES (?, ?, ?, ...)
'''
# EXECUTE PARAMETERIZED QUERY
cursor.executemany(sql, df.to_numpy().tolist())
conn.commit()
(And by the way, it is best practice generally in SQL queries to always explicitly reference columns and avoid SELECT * for code readability, maintainability, and even performance.)
I had some issues to connect pandas with SQL Server too. But I've get this solution to write my df:
import pyodbc
import sqlalchemy
engine = sqlalchemy.create_engine('mssql+pyodbc://{0}:{1}#{2}:{3}/{4}?driver={5}'.format(username,password,server,port,bdName,driver))
pd.to_sql("TableName",con=engine,if_exists="append")
See below my favourite solution, with UPSERT statement included.
df_columns = list(df)
columns = ','.join(df_columns)
values = 'VALUES({})'.format(','.join(['%s' for col in df_columns]))
update_list = ['{} = EXCLUDED.{}'.format(col, col) for col in df_columns]
update_str = ','.join(update_list)
insert_stmt = "INSERT INTO {} ({}) {} ON CONFLICT ([your_pkey_here]) DO UPDATE SET {}".format(table, columns, values, update_str)
cursor.execute doesnot accepts keyword arguments. One way of doing the insert can be using following below code snippet.
cols = "`,`".join([str(i) for i in df.columns.tolist()])
# Insert DataFrame recrds one by one.
for i,row in df.iterrows():
sql = "INSERT INTO `[dbo].[new_date]` (`" +cols + "`) VALUES (" + "?,"*(len(row)-1) + "%s)"
cursor.execute(sql, tuple(row))
here you are iterating through each row and then inserting it into the table.
thank you for your answers :) but I use the following code to solve my problem:
params = urllib.parse.quote_plus("DRIVER={SQL Server};SERVER=servername;DATABASE=database;UID=user;PWD=pass")
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
engine.connect()
query = query
df = pd.read_sql_query(query, connection)
df.to_sql(name='new_table',con=engine, index=False, if_exists='append')

Variable Substitution in SQLite using Python

I am trying to figure out how to combine two SQLite queries. Both of them work fine independently but when I put them together with AND they do not work. I think the problem is that I do not know how to pass the variables properly.
First query that works:
var1 = 10
mylist = ['A', 'B', 'C', 'AB', 'AC']
c.execute("SELECT * FROM my_table WHERE column1=(?) ORDER BY RANDOM() LIMIT 1", (mylist[2],))
This line also works:
params = [5,0,1]
query = ("SELECT * FROM my_table WHERE column2 NOT IN (%s)" % ','.join('?' * len(params)))
c.execute(query, params)
I have been trying to combine these two statements without success:
query = ("SELECT * FROM my_table WHERE column2 NOT IN (%s) AND column1=(?)" % ','.join('?' * len(params)))
c.execute(query, params, mylist[2])
In case anyone finds this helpful, my final solution looked like this:
query = ("SELECT * FROM my_table WHERE column1 = (?) AND column2 NOT IN (%s) ORDER BY RANDOM() LIMIT 1" % ','.join('?' * len(params)))
c.execute(query, [mylist[2]] + params)
The second parameter of execute() must be a sequence containing all the SQL parameters.
So you have to construct a single list with the values from both original lists:
c.execute(query, params + [list[2]])

MYSQL: how to insert statement without specifying col names or question marks?

I have a list of tuples of which i'm inserting into a Table.
Each tuple has 50 values. How do i insert without having to specify the column names and how many ? there is?
col1 is an auto increment column so my insert stmt starts in col2 and ends in col51.
current code:
l = [(1,2,3,.....),(2,4,6,.....),(4,6,7,.....)...]
for tup in l:
cur.execute(
"""insert into TABLENAME(col2,col3,col4.........col50,col51)) VALUES(?,?,?,.............)
""")
want:
insert into TABLENAME(col*) VALUES(*)
MySQL's syntax for INSERT is documented here: http://dev.mysql.com/doc/refman/5.7/en/insert.html
There is no wildcard syntax like you show. The closest thing is to omit the column names:
INSERT INTO MyTable VALUES (...);
But I don't recommend doing that. It works only if you are certain you're going to specify a value for every column in the table (even the auto-increment column), and your values are guaranteed to be in the same order as the columns of the table.
You should learn to use code to build the SQL query based on arrays of values in your application. Here's a Python example the way I do it. Suppose you have a dict of column: value pairs called data_values.
placeholders = ['%s'] * len(data_values)
sql_template = """
INSERT INTO MyTable ({columns}) VALUES ({placeholders})
"""
sql = sql_template.format(
columns=','.join(keys(data_values)),
placeholders=','.join(placeholders)
)
cur = db.cursor()
cur.execute(sql, data_values)
example code to put before your code:
cols = "("
for x in xrange(2, 52):
cols = cols + "col" + str(x) + ","
test = test[:-1]+")"
Inside your loop
for tup in l:
cur.execute(
"""insert into TABLENAME " + cols " VALUES {0}".format(tup)
""")
This is off the top of my head with no error checking

How to iterate over Postgresql rows in a Python script?

I'm writing a script which selects from a DB table and iterates over the rows.
In MySQL I would do:
import MySQLdb
db_mysql=MySQLdb.Connect(user=...,passwd=...,db=..., host=...)
cur = db_mysql.cursor(MySQLdb.cursors.DictCursor)
cur.execute ("""SELECT X,Y,Z FROM tab_a""")
for row in crs.fetchall () :
do things...
But I don't know how to do it in PostgreSQL.
Basically this question could be how to translate the above MySQL code to work with PostgreSQL.
This is what I have so far (I am using PyGreSQL).
import pg
pos = pg.connect(dbname=...,user=...,passwd=...,host=..., port=...)
pos.query("""SELECT X,Y,Z FROM tab_a""")
How do I iterate over the query results?
Retrieved from http://www.pygresql.org/contents/tutorial.html, which you should read.
q = db.query('select * from fruits')
q.getresult()
The result is a Python list of tuples, eardh tuple contains a row, you just need to iterate over the list and iterate or index the tupple.
I think it is the same, you must create cursor, call some fetch and iterate just like in MySQL:
import pgdb
pos = pgdb.connect(database=...,user=...,password=...,host=..., port=...)
sel = "select version() as x, current_timestamp as y, current_user as z"
cursor = db_conn().cursor()
cursor.execute(sel)
columns_descr = cursor.description
rows = cursor.fetchall()
for row in rows:
x, y, z = row
print('variables:')
print('%s\t%s\t%s' % (x, y, z))
print('\nrow:')
print(row)
print('\ncolumns:')
for i in range(len(columns_descr)):
print('-- %s (%s) --' % (columns_descr[i][0], columns_descr[i][1]))
print('%s' % (row[i]))
# this will work with PyGreSQL >= 5.0
print('\n testing named tuples')
print('%s\t%s\t%s' % (row.x, row.y, row.z))

Parameterizing 'SELECT IN (...)' queries

I want to use MySQLdb to create a parameterized query such as:
serials = ['0123456', '0123457']
c.execute('''select * from table where key in %s''', (serials,))
But what ends up being send to the DBMS is:
select * from table where key in ("'0123456'", "'0123457'")
Is it possible to create a parameterized query like this? Or do I have to loop myself and build up a result set?
Note: executemany(...) won't work for this - it'll only return the last result:
>>> c.executemany('''select * from table where key in (%s)''',
[ (x,) for x in serials ] )
2L
>>> c.fetchall()
((1, '0123457', 'faketestdata'),)
Final solution adapted from Gareth's clever answer:
# Assume check above for case where len(serials) == 0
query = '''select * from table where key in ({0})'''.format(
','.join(["%s"] * len(serials)))
c.execute(query, tuple(serials)) # tuple() for case where len == 1
You want something like this, I think:
query = 'select * from table where key in (%s)' % ','.join('?' * len(serials))
c.execute(query, serials)

Categories