Bulk update of rows in Postgres DB using psycopg2

Bulk update of rows in Postgres DB using psycopg2 - python

We need to do bulk updates of many rows in our Postgres DB, and want to use the SQL syntax below. How do we do that using psycopg2?
UPDATE table_to_be_updated
SET msg = update_payload.msg
FROM (VALUES %(update_payload)s) AS update_payload(id, msg)
WHERE table_to_be_updated.id = update_payload.id
RETURNING *
Attempt 1 - Passing values
We need to pass a nested iterable format to the psycopg2 query. For the update_payload, I've tried passing a list of lists, list of tuples, and tuples of tuples. It all fails with various errors.
Attempt 2 - Writing custom class with __conform__
I've tried to write a custom class that we can use for these operations, which would return
(VALUES (row1_col1, row1_col2), (row2_col1, row2_col2), (...))
I've coded up like this following instructions here, but it's clear that I'm doing something wrong. For instance, in this approach I'll have to handle quoting of all values inside the table, which would be cumbersome and prone to errors.
class ValuesTable(list):
def __init__(self, *args, **kwargs):
super(ValuesTable, self).__init__(*args, **kwargs)
def __repr__(self):
data_in_sql = ""
for row in self:
str_values = ", ".join([str(value) for value in row])
data_in_sql += "({})".format(str_values)
return "(VALUES {})".format(data_in_sql)
def __conform__(self, proto):
return self.__repr__()
def getquoted(self):
return self.__repr__()
def __str__(self):
return self.__repr__()
EDIT: If doing a bulk update can be done in a faster/cleaner way using another syntax than the one in my original question, then I'm all ears!

Requirements:
Postgres table, consisting of the fields id and msg (and potentially other fields)
Python data containing new values for msg
Postgres table should be updated via psycopg2
Example Table
CREATE TABLE einstein(
id CHAR(5) PRIMARY KEY,
msg VARCHAR(1024) NOT NULL
);
Test data
INSERT INTO einstein VALUES ('a', 'empty');
INSERT INTO einstein VALUES ('b', 'empty');
INSERT INTO einstein VALUES ('c', 'empty');
Python Program
Hypothetical, self-contained example program with quotations of a famous physicist.
import sys
import psycopg2
from psycopg2.extras import execute_values
def print_table(con):
cur = con.cursor()
cur.execute("SELECT * FROM einstein")
rows = cur.fetchall()
for row in rows:
print(f"{row[0]} {row[1]}")
def update(con, einstein_quotes):
cur = con.cursor()
execute_values(cur, """UPDATE einstein
SET msg = update_payload.msg
FROM (VALUES %s) AS update_payload (id, msg)
WHERE einstein.id = update_payload.id""", einstein_quotes)
con.commit()
def main():
con = None
einstein_quotes = [("a", "Few are those who see with their own eyes and feel with their own hearts."),
("b", "I have no special talent. I am only passionately curious."),
("c", "Life is like riding a bicycle. To keep your balance you must keep moving.")]
try:
con = psycopg2.connect("dbname='stephan' user='stephan' host='localhost' password=''")
print_table(con)
update(con, einstein_quotes)
print("rows updated:")
print_table(con)
except psycopg2.DatabaseError as e:
print(f'Error {e}')
sys.exit(1)
finally:
if con:
con.close()
if __name__ == '__main__':
main()
Prepared Statements Alternative
import sys
import psycopg2
from psycopg2.extras import execute_batch
def print_table(con):
cur = con.cursor()
cur.execute("SELECT * FROM einstein")
rows = cur.fetchall()
for row in rows:
print(f"{row[0]} {row[1]}")
def update(con, einstein_quotes, page_size):
cur = con.cursor()
cur.execute("PREPARE updateStmt AS UPDATE einstein SET msg=$1 WHERE id=$2")
execute_batch(cur, "EXECUTE updateStmt (%(msg)s, %(id)s)", einstein_quotes, page_size=page_size)
cur.execute("DEALLOCATE updateStmt")
con.commit()
def main():
con = None
einstein_quotes = ({"id": "a", "msg": "Few are those who see with their own eyes and feel with their own hearts."},
{"id": "b", "msg": "I have no special talent. I am only passionately curious."},
{"id": "c", "msg": "Life is like riding a bicycle. To keep your balance you must keep moving."})
try:
con = psycopg2.connect("dbname='stephan' user='stephan' host='localhost' password=''")
print_table(con)
update(con, einstein_quotes, 100) #choose some meaningful page_size here
print("rows updated:")
print_table(con)
except psycopg2.DatabaseError as e:
print(f'Error {e}')
sys.exit(1)
finally:
if con:
con.close()
if __name__ == '__main__':
main()
Output
The above program would output the following to the debug console:
a empty
b empty
c empty
rows updated:
a Few are those who see with their own eyes and feel with their own hearts.
b I have no special talent. I am only passionately curious.
c Life is like riding a bicycle. To keep your balance you must keep moving.

Short answer! Use execute_values(curs, sql, args), see docs
For those looking for short straightforward answer. Sample code to update users in bulk;
from psycopg2.extras import execute_values
sql = """
update users u
set
name = t.name,
phone_number = t.phone_number
from (values %s) as t(id, name, phone_number)
where u.id = t.id;
"""
rows_to_update = [
(2, "New name 1", '+923002954332'),
(5, "New name 2", '+923002954332'),
]
curs = conn.cursor() # Assuming you already got the connection object
execute_values(curs, sql, rows_to_update)
If you're using the uuid for primary key, and haven't registered the uuid data type in psycopg2 (keeping uuid as a python string), you can always use this condition u.id = t.id::uuid.

Related

inputting a variable into select when using python and sqlite

I have an sqlite db called clients.db with a table called prices. Within the table I have columns ['date', 'xyz', 'abc', 'sta, 'vert']. I am accessing the database from python 3.
I can get a specific number easily enough using:
conn = sqlite3.connect('clients.db')
c = conn.cursor()
c.execute('''SELECT "xyz" FROM prices WHERE date=?''', ('2019-01-07', ))
conn.close()
print(c.fetchone()[0])
This returns 1902 as expected.
However when I try the below, instead of the expected number I get xyz.
conn = sqlite3.connect('clients.db')
c = conn.cursor()
c.execute('''SELECT ? FROM prices WHERE date=?''', ('xyz', '2019-01-07', ))
conn.close()
print(c.fetchone()[0])
and when I add a =? I get sqlite3.OperationalError: near "=": syntax error:
conn = sqlite3.connect('clients.db')
c = conn.cursor()
c.execute('''SELECT =? FROM prices WHERE date=?''', ('xyz', '2019-01-07', ))
conn.close()
print(c.fetchone()[0])

From Python documentation:
Instead, use the DB-API’s parameter substitution. Put ? as a
placeholder wherever you want to use a value, and then provide a tuple
of values as the second argument to the cursor’s execute() method.
You need to use ? placeholder for values but for column names you can use string formatting.
I have created a class, inserted some dummy rows and run a select query which is mentioned in the question.
import sqlite3
class Database(object):
def __init__(self):
self.conn = sqlite3.connect('clients.db')
self.c = self.conn.cursor()
def create_table(self):
try:
self.c.execute('''CREATE TABLE prices (date text, xyz text, abc text, sta text, vert text)''')
except:
pass
def insert_dummy_rows(self):
values = [('2019-01-07', 'xyz1', 'abc1', 'sta1', 'vert1'),
('2019-01-07', 'xyz2', 'abc2', 'sta2', 'vert2'),
('2019-01-08', 'xyz3', 'abc3', 'sta3', 'vert3'),
]
self.c.executemany('INSERT INTO prices VALUES (?,?,?,?,?)', values)
self.conn.commit()
def close_connection(self):
self.conn.close()
def get_single_row(self):
t = ('2019-01-07',)
query = "SELECT {} FROM prices WHERE date=?".format('xyz')
self.c.execute(query, t)
return self.c.fetchone()[0]
if __name__ == '__main__':
db = Database()
db.create_table()
db.insert_dummy_rows()
print(db.get_single_row())
Output:
xyz1

psycopg2 postgres database syntax error near value

I am trying to insert info from a pandas DataFrame into a database table by using a function that I wrote:
def insert(table_name="", name="", genere="", year=1, impd_rating=float(1)):
conn = psycopg2.connect("dbname='database1' user='postgres' password='postgres333' host='localhost' port=5433 ")
cur = conn.cursor()
cur.execute("INSERT INTO %s VALUES %s,%s,%s,%s" % (table_name, name, genere, year, impd_rating))
conn.commit()
conn.close()
When I try to use this function like this:
b=0
for row in DF['id']:
insert(impd_rating=float(DF['idbm_rating'][b]),
year=int(DF['year'][b]),
name=str(DF['name'][b]),
genere=str(DF['genere'][b]),
table_name='test_movies')
b = b+1
I get the following syntax error:
SyntaxError: invalid syntax
PS D:\tito\scripts\database training> python .\postgres_script.py
Traceback (most recent call last):
File ".\postgres_script.py", line 56, in <module>insert (impd_rating=float(DF['idbm_rating'][b]),year=int(DF['year'][b]),name=str(DF['name'][b]),genere=str(DF['genere'][b]),table_name='test_movies')
File ".\postgres_script.py", line 15, in insert
cur.execute("INSERT INTO %s VALUES %s,%s,%s,%s"  % (table_name ,name ,genere , year,impd_rating))
psycopg2.ProgrammingError: syntax error at or near "Avatar"
LINE 1: INSERT INTO test_movies VALUES Avatar,action,2009,7.9
I also tried to change the str replacement method from %s to .format()
but I had the same error.

The error message is explicit, this SQL command is wrong at Avatar: INSERT INTO test_movies VALUES Avatar,action,2009,7.9. Simply because values must be enclosed in parenthesis, and character strings must be quoted, so the correct SQL is:
INSERT INTO test_movies VALUES ('Avatar','action',2009,7.9)
But building a full SQL command by concatenating parameters is bad practice (*), only the table name should be directly inserted into the command because is is not a SQL parameter. The correct way is to use a parameterized query:
cur.execute("INSERT INTO %s VALUES (?,?,?,?)" % (table_name,) ,(name ,genere , year,impd_rating)))
(*) It was the cause of numerous SQL injection flaws because if one of the parameter contains a semicolumn (;) what comes after could be interpreted as a new command

Pandas has a DataFrame method for this, to_sql:
# Only needs to be executed once.
conn=psycopg2.connect("dbname='database1' user='postgres' password='postgres333' host='localhost' port=5433 ")
df.to_sql('test_movies', con=conn, if_exists='append', index=False)
This should hopefully get you going in the right direction.

In your original query
INSERT INTO %s VALUES %s,%s,%s,%s
there is a sql problem: you need braces around the values, i.e. it should be VALUES (%s, %s, %s, %s). On top of that the table name cannot be merged as a parameter, or it would be escaped as a string, which is not what you want.
You can use the psycopg 2.7 sql module to merge the table name to the query, with placeholders for the values:
from psycopg2 import sql
query = sql.SQL("INSERT INTO {} VALUES (%s, %s, %s, %s)").format(
sql.Identifier('test_movies'))
cur.execute(query, ('Avatar','action',2009,7.9))
This will make secure both merging the table name and the arguments to the query.

Hello mohamed mahrous,
First install psycopg2 package for the access access PostgreSQL database.
Try this below code,
import psycopg2
conn=psycopg2.connect("dbname='database1' user='postgres' password='postgres333' host='localhost' port=5433 ")
cur=conn.cursor()
def insert(table_name,name,genere,year,impd_rating):
query = "INSERT INTO "+table_name+"(name,genere,year,impd_rating) VALUES(%s,%s,%s,%s)"
try:
print query
cur.execute(query,(name,genere,year,impd_rating))
except Exception, e:
print "Not execute..."
conn.commit()
b=0
for row in DF['id']:
insert (impd_rating=float(DF['idbm_rating'][b]),year=int(DF['year'][b]),name=str(DF['name'][b]),genere=str(DF['genere'][b]),table_name='test_movies')
b= b+1
conn.close()
Example,
import psycopg2
conn=psycopg2.connect("dbname='database1' user='postgres' password='postgres333' host='localhost' port=5433 ")
cur=conn.cursor()
def insert(table_name,name,genere,year,impd_rating):
query = "INSERT INTO "+table_name+"(name,genere,year,impd_rating) VALUES(%s,%s,%s,%s)"
try:
print query
cur.execute(query,(name,genere,year,impd_rating))
except Exception, e:
print "Not execute"
conn.commit()
b=0
for row in DF['id']:
insert (impd_rating="7.0",year="2017",name="Er Ceo Vora Mayur",genere="etc",table_name="test_movies")
b= b+1
conn.close()
I hope my answer is helpful.
If any query so comment please.

i found a solution for my issue by using sqlalchemy and pandas to_sql method
thanks for help everyone
from sqlalchemy import *
import pandas as pd
def connect(user, password, db, host='localhost', port=5433):
'''Returns a connection and a metadata object'''
# We connect with the help of the PostgreSQL URL
# postgresql://federer:grandestslam#localhost:5432/tennis
url = 'postgresql://{}:{}#{}:{}/{}'
url = url.format(user, password, host, port, db)
# The return value of create_engine() is our connection object
con = sqlalchemy.create_engine(url, client_encoding='utf8')
# We then bind the connection to MetaData()
meta = sqlalchemy.MetaData(bind=con, reflect=True)
return con, meta
con, meta = connect('postgres','postgres333','database1')
movies= Table('test',meta,
Column('id',Integer,primary_key=True),
Column('name',String),
Column('genere',String),
Column('year',Integer),
Column('idbm_rating',REAL))
meta.create_all(con)
DF=pd.read_csv('new_movies.txt',sep=' ',engine='python')
DF.columns=('id','name' ,'genere' ,'year' ,'idbm_rating' )
DF.to_sql('movies', con=con, if_exists='append', index=False)

Database is locking but all statements are followed by commit?

I'm working on an IRC bot, forked from a modular bot called Skybot.
There are two other modules that make use of the sqlite3 database by default; they have both been removed and their tables dropped, so I know that the issue is somewhere in what I'm doing.
I only call 3 db.execute() statements in the whole thing and they're all immediately committed. This thing isn't getting hammered with queries either, but the lock remains.
Relevant code:
def db_init(db):
db.execute("create table if not exists searches"
"(search_string UNIQUE PRIMARY KEY,link)")
db.commit()
return db
def get_link(db, inp):
row = db.execute("select link from searches where"
" search_string=lower(?) limit 1",
(inp.lower(),)).fetchone()
db.commit()
return row
def store_link(db, stub, search):
db.execute("insert into searches (search_string, link) VALUES (?, ?)", (search.lower(), stub))
db.commit()
return stub
If the script only has to touch db_init() and get_link() it breezes through, but if it needs to call store_link() while the database is unlocked it will do the insert, but doesn't seem to be committing it in a way that future calls to get_link() can read it until the bot restarts.
The bot's db.py:
import os
import sqlite3
def get_db_connection(conn, name=''):
"returns an sqlite3 connection to a persistent database"
if not name:
name = '%s.%s.db' % (conn.nick, conn.server)
filename = os.path.join(bot.persist_dir, name)
return sqlite3.connect(filename, isolation_level=None)
bot.get_db_connection = get_db_connection
I did adjust the isolation_level myself, that was originally timeout=10. I am fairly stumped.
EDIT: The usages of get_db_connection():
main.py (main loop):
def run(func, input):
args = func._args
if 'inp' not in input:
input.inp = input.paraml
if args:
if 'db' in args and 'db' not in input:
input.db = get_db_connection(input.conn)
if 'input' in args:
input.input = input
if 0 in args:
out = func(input.inp, **input)
else:
kw = dict((key, input[key]) for key in args if key in input)
out = func(input.inp, **kw)
else:
out = func(input.inp)
if out is not None:
input.reply(unicode(out))
...
def start(self):
uses_db = 'db' in self.func._args
db_conns = {}
while True:
input = self.input_queue.get()
if input == StopIteration:
break
if uses_db:
db = db_conns.get(input.conn)
if db is None:
db = bot.get_db_connection(input.conn)
db_conns[input.conn] = db
input.db = db
try:
run(self.func, input)
except:
traceback.print_exc()

Send conn in your functions, along with db, as mentioned. If you wrote the code yourself, you'll know where the database actually is. Conventionally you would do something like:
db = sqlite3.connect('database.db')
conn = db.cursor()
Then for general usage:
db.execute("...")
conn.commit()
Hence, in your case:
def db_init(conn,db):
db.execute("create table if not exists searches"
"(search_string UNIQUE PRIMARY KEY,link)")
conn.commit()
return db
def get_link(conn,db, inp):
row = db.execute("select link from searches where"
" search_string=lower(?) limit 1",
(inp.lower(),)).fetchone()
conn.commit()
return row
def store_link(conn,db, stub, search):
db.execute("insert into searches (search_string, link) VALUES (?, ?)", (search.lower(), stub))
conn.commit()
return stub

On the basis that you have set the isolation_level to automatic updates:
sqlite3.connect(filename, isolation_level=None)
There is no need whatsoever for the commit statements in your code
Edit:
Wrap your execute statements in try statements, so that you at least have a chance of finding out what is going on i.e.
import sqlite3
def get_db(name=""):
if not name:
name = "db1.db"
return sqlite3.connect(name, isolation_level=None)
connection = get_db()
cur = connection.cursor()
try:
cur.execute("create table if not exists searches"
"(search_string UNIQUE PRIMARY KEY,link)")
except sqlite3.Error as e:
print 'Searches create Error '+str(e)
try:
cur.execute("insert into searches (search_string, link) VALUES (?, ?)", ("my search", "other"))
except sqlite3.Error as e:
print 'Searches insert Error '+str(e)
cur.execute("select link from searches where search_string=? limit 1", ["my search"])
s_data = cur.fetchone()
print 'Result:', s_data

creating a table in sqlite3 python

I apologize in advance for asking such a basic question but I am new to SQlite3 and having trouble starting. I am trying to build a database with one table. I used the following code to build a table.
import sqlite3
conn = sqlite3.connect('example.db')
c = conn.cursor()
c.execute('''CREATE TABLE mytable
(start, end, score)''')
but whenever I try to update or access the table it seems that it doesnt exist or maybe it exists in a different database. I also tried creating a table called example.mytable but I got the error:
sqlite3.OperationalError: unknown database example
What am I missing?
Thanks

I think that a commit is needed after inserts (schema changes such as new tables should automatically commit). I would suggest adding the full path to your database as well to make sure you are accessing the same location next time round.
Here is an extension on your code:
import sqlite3
def create():
try:
c.execute("""CREATE TABLE mytable
(start, end, score)""")
except:
pass
def insert():
c.execute("""INSERT INTO mytable (start, end, score)
values(1, 99, 123)""")
def select(verbose=True):
sql = "SELECT * FROM mytable"
recs = c.execute(sql)
if verbose:
for row in recs:
print row
db_path = r'C:\Users\Prosserc\Documents\Geocoding\test.db'
conn = sqlite3.connect(db_path)
c = conn.cursor()
create()
insert()
conn.commit() #commit needed
select()
c.close()
Output:
(1, 99, 123)
After closing the program if I log onto the SQLite database the data is still there.

import sqlite3;
import pandas as pd;
con=None
def getConnection():
databaseFile="./test.db"
global con
if con == None:
con=sqlite3.connect(databaseFile)
return con
def createTable(con):
try:
c = con.cursor()
c.execute("""CREATE TABLE IF NOT EXISTS Movie
(start, end, score)""")
except Exception as e:
pass
def insert(con):
c = con.cursor()
c.execute("""INSERT INTO Movie (start, end, score)
values(1, 99, 123)""")
def queryExec():
con=getConnection()
createTable(con)
insert(con)
# r = con.execute("""SELECT * FROM Movie""")
result=pd.read_sql_query("select * from Movie;",con)
return result
r = queryExec()
print(r)

Class for Making, Inserting rows and retriving queries

I am trying to imitate one of the code of Collective Intelligence and stumbled due to some unkown error. All the methods works fine inside the class except the one for retrieving query.
class Connect:
# Create the Database
def __init__(self,dbname):
self.con = sqlite3.connect(dbname)
# Add the row
def create(self,date,name,age):
self.con.execute('create table Profile(date text, name text,age real)')
# Lock the changes
def commit(self):
self.con.commit()
# Retrive the details
def getentry(self,table,field,value):
cursor = self.con.execute(
"select * from %s where %s = '%s'" % (table,field,value))
result_set = cursor.fetchall()
print result_set
Working Example :
To create DB
C = Connect('test.db')
To add rows
C.create('2013-03-06','Joy',34)
To make changes and lock the files
C.commit()
Getting the row
C.getentry(Profile,name,'Joy')
Error : NameError: name 'Profile' is not defined
Then with making parenthesis.
C.getentry('Profile','name','Joy')
Result = [ ]

The problem is in Connect.create. It creates the table but does not populate it as the comment above its definition appears to imply. You need to update it to something like the following:
def create(self,date,name,age):
self.con.execute( 'create table Profile(date text, name text,age real)' )
self.con.execute( "insert into Profile values('{0}','{1}','{2}')"
.format(date,name,age) )
return

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Bulk update of rows in Postgres DB using psycopg2 - python

Related

inputting a variable into select when using python and sqlite

psycopg2 postgres database syntax error near value

Database is locking but all statements are followed by commit?

creating a table in sqlite3 python

Class for Making, Inserting rows and retriving queries

Categories

Resources