I'm comparing some of the features of Postgres clients for compatibility and I'm having difficulty getting prepared statements to work in psychopg2. The Node.js pg package allows me to do the following where providing a name (insert-values) prepares the query server-side:
for (let rows = 0; rows < 10; rows++) {
// Providing a 'name' field allows for prepared statements / bind variables
const query = {
name: "insert-values",
text: "INSERT INTO my_table VALUES($1, $2, $3, $4);",
values: [Date.now() * 1000, Date.now(), "node pg prep statement", rows],
}
const preparedStatement = await client.query(query)
}
In Python, I'm doing something like this using psycopg2:
# insert 10 records
for x in range(10):
now = dt.datetime.utcnow()
date = dt.datetime.now().date()
cursor.execute("""
INSERT INTO trades
VALUES (%s, %s, %s, %s);
""", (now, date, "python example", x))
# commit records
connection.commit()
Is there any way to create prepared statements in Python?
edit I'm using the samples from QuestDB documentation
No prepared statement support in Psycopg2 even in 2021. Yes you can do PREPARE and use named query with parameters but there is no support from Psycopg2 in the same way as you can find it with Java JDBC or Rust Postgres drivers.
If you start writing loops with INSERT statements the full statement text will be sent every iteration and has to be parsed by the DB so will be measurable IO/CPU an overhead for big loops.
As far as I know, there is no support for "magically" preparing statements. However, you can execute SQL PREPARE and EXECUTE statements with execute().
You probably want to read the section on fast execution helpers in the manual.
Why not?:
date = dt.datetime.now().date()
insert_sql = """INSERT INTO trades
VALUES (%s, %s, %s, %s)"""
# insert 10 records
for x in range(10):
now = dt.datetime.utcnow()
cursor.execute(insert_sql, (now, date, "python example", x))
# commit records
connection.commit()
It works out to the same thing. The query is built once and then it is run multiple times with different parameter for x. As #Ture Pålsson pointed out you can combine the INSERTs using the helpers here Fast Execution.
Related
Are prepared statements written differently in psycopg3 than psycopg2?
I am upgrading from psycopg2 to 3 and in psycopg2 this works but doesnt work for psycopg3
Also the variables name and age will be populated with data
sql is a variable which is connected to the Database we use. So my problem is within the query itself and not the python syntax
def test(name, age=None, alive=False):
sql.execute("""prepare test as UPDATE table AS x
SET var = $1
FROM table1 as y
WHERE y.name = x.name and y.age = $2""")
sql.execute("""execute test (%s, %s)""", (name, age))
This throws an error saying could not determine the datatype of parameter $1. Looking at the docs and github of psycopg, theres no info regarding this matter.
I am inserting thousands of rows, timing and speed is very important. I have found through benchmarking that postgres can ingest my rows faster using execute() instead of executemany()
This works well for me:
...
def insert(self, table, columns, values):
conn = self.connectionPool.getconn()
conn.autocommit = True
try:
with conn.cursor() as cursor:
query = (
f'INSERT INTO {table} ({columns}) '
f'VALUES {values} '
f'ON CONFLICT DO NOTHING;'
).replace('[', '').replace(']', '') # Notice the replace x2 to get rid of the list brackets
print(query)
cursor.execute(query)
finally:
cursor.close()
self.connectionPool.putconn(conn)
...
self.insert('types', 'name, created_at', rows)
After the double replace, printing query returns something like this and the rows are ingested:
INSERT INTO types (name, created_at) VALUES ('TIMER', '2022-04-09 03:19:49'), ('Sequence1', '2022-04-09 03:19:49') ON CONFLICT DO NOTHING;
Is my approach secure? Is there a more pythonic implementation using execute?
No, this isn’t secure or even reliable – Python repr isn’t compatible with PostgreSQL string syntax (try some strings with single quotes, newlines, or backslashes).
Consider passing array parameters instead and using UNNEST:
cursor.execute(
"INSERT INTO types (name, created_at)"
" SELECT name, created_at FROM UNNEST (%(names)s, %(created_ats)s) AS t",
{
'names': ['TIMER', 'Sequence1', ...],
'created_ats': ['2022-04-09 03:19:49', ...],
})
This is the best solution, as the query doesn’t depend on the parameters (can be prepared and cached, statistics can be easily grouped, makes the absence of SQL injection vulnerability obvious, can easily log queries without data).
Failing that, build a query that’s only dynamic in the number of parameters, like VALUES ((%s, %s, ...), (%s, %s, ...), ...). Note that PostgreSQL has a parameter limit, so you might need to produce these in batches.
Failing that, use psycopg2.sql.Literal.
My MySQL table schema is:
CREATE DATABASE test_db;
USE test_db;
CREATE TABLE test_table (
id INT AUTO_INCREMENT,
last_modified DATETIME NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB;
When I run the following benchmark script, I get:
b1: 20.5559301376
b2: 0.504406929016
from timeit import timeit
import MySQLdb
ids = range(1000)
query_1 = "update test_table set last_modified=UTC_TIMESTAMP() where id=%(id)s"
query_2 = "update test_table set last_modified=UTC_TIMESTAMP() where id in (%s)" % ", ".join(('%s', ) * len(ids))
db = MySQLdb.connect(host="localhost", user="some_user", passwd="some_pwd", db="test_db")
def b1():
curs = db.cursor()
curs.executemany(query_1, ids)
db.close()
def b2():
curs = db.cursor()
curs.execute(query_2, ids)
db.close()
print "b1: %s" % str(timeit(lambda:b1(), number=30))
print "b2: %s" % str(timeit(lambda:b2(), number=30))
Why is there such a large difference between executemany and the IN clause?
I'm using Python 2.6.6 and MySQL-python 1.2.3.
The only relevant question I could find was - Why is executemany slow in Python MySQLdb?, but it isn't really what I'm after.
executemany repeatedly goes back and forth to the MySQL server, which then needs to parse the query, perform it, and return results. This is perhaps 10 times as slow as doing everything in a single SQL statement, even if it is more complex.
However, for INSERT, this says that it will do the smart thing and construct a multi-row INSERT for you, thereby being efficient.
Hence, IN(1,2,3,...) is much more efficient than UPDATE;UPDATE;UPDATE...
If you have a sequence of ids, then even better would be to say WHERE id BETWEEN 1 and 1000. This is because it can simply scan the rows rather than looking up each one from scratch. (I am assuming id is indexed, probably as the PRIMARY KEY.)
Also, you are probably running with the settings that make each insert/update/delete into its own "transaction". This adds a lot of overhead to each UPDATE. And it is probably not desirable in this case. I suspect you want the entire 1000-row update to be atomic.
Bottom line: Use executemany only for (a) INSERTs or (b) statements that must be run individually.
I've read the advice here about using parametrized execute call to do all the SQL escaping for you, but this seems to work only when you know the number of columns in advance.
I'm looping over CSV files, one for each table, and populating a local DB for testing purposes. Each table has different numbers of columns, so I can't simply use:
sql = "INSERT INTO TABLE_A VALUES (%s, %s)"
cursor.execute(sql, (val1, val2))
I can build up an sql statement as a string quite flexibly, but this doesn't give me the use of cursor.execute's SQL-escaping facilities, so if the input contains apostrophes or similar, it fails.
It seems like there should be a simple way to do this. Is there?
If you know the number of parameters, you can create a list of them:
count = ...
sql = "INSERT INTO ... VALUES(" + ",".join(count * ["?"]) + ")"
params = []
for i in ...:
params += ['whatever']
cursor.execute(sql, params)
I've been trying to insert a large string into an MySQL database using pythons mysql.connector. The problem I'm having is that long strings are getting cut off at some point when using prepared statements. I'm currently using MySQL Connector/Python that is available on MySQL.com. I used the following code do duplicate the problem I'm having.
db = mysql.connector.connect(**creditials)
cursor = db.cursor()
value = []
for x in range(0, 2000):
value.append(str(x+1))
value = " ".join(value)
cursor.execute("""
CREATE TABLE IF NOT EXISTS test (
pid VARCHAR(50),
name VARCHAR(120),
data LONGTEXT,
PRIMARY KEY(pid)
)
""")
db.commit()
#this works as expected
print("Test 1")
cursor.execute("REPLACE INTO test (pid, name, data) VALUES ('try 1', 'Description', '{0}')".format(value))
db.commit()
cursor.close()
#this does not work
print("Test 2")
cursor = db.cursor(prepared=True)
cursor.execute("""REPLACE INTO test (pid, name, data) VALUE (?, ?, ?)""", ('try 2', 'Description2', value))
db.commit()
cursor.close()
Test 1 works as expected and stores all the numbers up to 2000, but test 2 get cut off right after number 65. I would rather use prepared statements than trying to sanitize incoming strings myself. Any help appreciated.
Extra information:
Computer: Windows 7 64 bit
Python: Tried on both python 3.4 and 3.3
MYSQL: 5.6.17 (Came with WAMP)
Library: MySQL Connector/Python
When MySQL Connector driver processes prepared statements, it's using a lower-level binary protocol to communicate values to the server individually. As such, it's telling the server whether the values are INTs or VARCHARs or TEXT, etc. It's not particularly smart about it, and this "behavior" is the result. In this case, it sees that the value is a Python string value and tells MySQL that it's a VARCHAR value. The VARCHAR value has a string length limit that affects the amount of data be sent to the server. What's worse, the interaction between the long value and the limited data type length can yield some strange behavior.
Ultimately, you have a few options:
Use a file-link object for your string
MySQL Connector treats files and file-like objects as BLOBs and TEXTs (depending on whether the file is open in binary or non-binary mode, respectively). You can leverage this to get the behavior you desire.
import StringIO
...
cursor = db.cursor(prepared=True)
cursor.execute("""REPLACE INTO test (pid, name, data) VALUES (?, ?, ?)""",
('try 2', 'Description', StringIO.String(value)))
cursor.close()
db.commit()
Don't use MySQL Connector prepared statements
If you don't use the prepared=True clause to your cursor creation statement, it will generate full valid SQL statements for each execution. You're not really losing too much by avoiding MySQL prepared statements in this context. You do need to pass your SQL statements in a slightly different form to get proper placeholder sanitization behavior.
cursor = db.cursor()
cursor.execute("""REPLACE INTO test (pid, name, data) VALUES (%s, %s, %s)""",
('try 2', 'Description', value))
cursor.close()
db.commit()
Use another MySQL driver
There are a couple different Python MySQL drivers:
MySQLdb
oursql