PostGres - Insert list of tuples using execute instead of executemany - python

I am inserting thousands of rows, timing and speed is very important. I have found through benchmarking that postgres can ingest my rows faster using execute() instead of executemany()
This works well for me:
...
def insert(self, table, columns, values):
conn = self.connectionPool.getconn()
conn.autocommit = True
try:
with conn.cursor() as cursor:
query = (
f'INSERT INTO {table} ({columns}) '
f'VALUES {values} '
f'ON CONFLICT DO NOTHING;'
).replace('[', '').replace(']', '') # Notice the replace x2 to get rid of the list brackets
print(query)
cursor.execute(query)
finally:
cursor.close()
self.connectionPool.putconn(conn)
...
self.insert('types', 'name, created_at', rows)
After the double replace, printing query returns something like this and the rows are ingested:
INSERT INTO types (name, created_at) VALUES ('TIMER', '2022-04-09 03:19:49'), ('Sequence1', '2022-04-09 03:19:49') ON CONFLICT DO NOTHING;
Is my approach secure? Is there a more pythonic implementation using execute?

No, this isn’t secure or even reliable – Python repr isn’t compatible with PostgreSQL string syntax (try some strings with single quotes, newlines, or backslashes).
Consider passing array parameters instead and using UNNEST:
cursor.execute(
"INSERT INTO types (name, created_at)"
" SELECT name, created_at FROM UNNEST (%(names)s, %(created_ats)s) AS t",
{
'names': ['TIMER', 'Sequence1', ...],
'created_ats': ['2022-04-09 03:19:49', ...],
})
This is the best solution, as the query doesn’t depend on the parameters (can be prepared and cached, statistics can be easily grouped, makes the absence of SQL injection vulnerability obvious, can easily log queries without data).
Failing that, build a query that’s only dynamic in the number of parameters, like VALUES ((%s, %s, ...), (%s, %s, ...), ...). Note that PostgreSQL has a parameter limit, so you might need to produce these in batches.
Failing that, use psycopg2.sql.Literal.

Related

Inserting arrays into databases

I am trying to write a general function that will insert a line of data into a table in a database but I am trying to write an array of unknown length. I am aiming to just be able to call this function in any programand write a line of data of any length to the table (assuming the table and the array are the same length.
I have tried adding the array like it is a singular peice of data.
import sqlite3
def add2Db(dbName, tableName, data):
connection = sqlite3.connect(dbName)
cur = connection.cursor()
cur.execute("INSERT INTO "+ tableName +" VALUES (?)", (data))
connection.commit()
connection.close()
add2Db("items.db", "allItems", (1, "chair", 5, 4))
This just crashes and gives me an error saying it has 4 columns but only one value was supplied.
SQLite does not support arrays - you have to convert to a TEXT using ','.join() to join your array items into a single string and pass that.
Source: SQLite website
https://www.sqlite.org/datatype3.html
I'm not a Python programmer, but I've been doing SQL a long time. I even wrote my own ORM. My advice is do not write your own SQL query builder. There's a myriad of subtle issues and especially security issues. I elaborate on a few of them below.
Instead, use a well-established SQL Query Builder or ORM. They've already dealt with these issues. Here's an example using SQLAlchemy.
from datetime import date
from sqlalchemy import create_engine, MetaData
# Connect to the database with debugging on.
engine = create_engine('sqlite:///test.sqlite', echo=True)
conn = engine.connect()
# Read the schemas from the database
meta = MetaData()
meta.reflect(bind=engine)
# INSERT INTO users (name, birthday, state, country) VALUES (?, ?, ?, ?)
users = meta.tables['users']
conn.execute(
users.insert().values(name="Yarrow Hock", birthday=date(1977, 1, 23), state="NY", country="US")
)
SQLAlchemy can do the entire range of SQL operations and will work with different SQL variants. You also get type safety.
conn.execute(
users.insert().values(name="Yarrow Hock", birthday="in the past", state="NY", country="US")
)
sqlalchemy.exc.StatementError: (exceptions.TypeError) SQLite Date type only accepts Python date objects as input. [SQL: u'INSERT INTO users (name, birthday, state, country) VALUES (?, ?, ?, ?)']
insert into table values (...) relies on column definition order
This relies on the order columns were defined in the schema. This leaves two problems. First is a readability problem.
add2Db(db, 'some_table', (1, 39, 99, 45, 'papa foxtrot', 0, 42, 0, 6)
What does any of that mean? A reader can't tell. They have to go digging into the schema and count columns to figure out what each value means.
Second is a maintenance problem. If, for any reason, the schema is altered and the column order is not exactly the same, this can lead to some extremely difficult to find bugs. For example...
create table users ( name text, birthday date, state text, country text );
vs
create table users ( name text, birthday date, country text, state text );
add2Db(db, 'users', ('Yarrow Hock', date(1977, 1, 23), 'NY', 'US'));
That insert will silently "work" with either column order.
You can fix this by passing in a dictionary and using the keys for column names.
add2Db(db, 'users', (name="Yarrow Hock", birthday=date(1977, 1, 23), state="NY", country="US"));
Then we'd produce a query like:
insert into users
(name, birthday, state, country)
values (?, ?, ?, ?)
This leads to the next and much bigger problem.
SQL Injection Attack
Now this opens up a new problem. If we simply stick the table and column names into the query that leaves us open to one of the most common security holes, a SQL Injection Attack. That's where someone can craft a value which when naively used in a SQL statement causes the query to do something else. Like Little Bobby Tables.
While the ? protects against SQL Injection for values, it's still possible to inject via the column names. There's no guarantee the column names can be trusted. Maybe they came from the parameters of a web form?
Protecting table and column names is complicated and easy to get wrong.
The more SQL you write the more likely you're vulnerable to an injection attack.
You have to write code for everything else.
Ok, you've done insert. Now update? select? Don't forget about subqueries, group by, unions, joins...
If you want to write a SQL query builder, cool! If, instead, you have a job to do using SQL, writing yet another SQL query builder is not your job.
It's harder for anyone else to understand.
There's a good chance that any given Python programmer knows how SQLAlchemy works, and there's plenty of tutorials and documentation if they don't. There's no chance they know about your home-rolled SQL functions, and you have to write all the tutorials and docs.
You shouldn't try to write your own ORMs without an argumented need. You will have a lot of problems, for example here's quick 25 reasons not to.
Instead use any popular orm that is proven. I recommend using SQLAlchemy as a go to outside of Django. Using it you can map a dict of values to insert it into a model just like insert(schema_name).values(**dict_name) (here's an example of insert/update).
Change your function to this:
def add2Db(dbName, tableName, data):
num_qs = len(data)
qm = ','.join(list('?' * num_qs))
query = """
INSERT INTO {table}
VALUES ({qms})
""".format(table=tableName,
qms=qm)
connection = sqlite3.connect(dbName)
cur = connection.cursor()
cur.execute(query, data)
connection.commit()
connection.close()

SQLITE3 Insert Error

I'm definitely new to SQL but I feel like inserting is pretty simple. I can't figure out the issue
def insert(title, name):
time = datetime.now()
conn = sqlite3.connect('test.db')
c = conn.cursor()
query = """INSERT INTO test ('{}', '{}', '{}')""".format(title, name, time)
c.execute(query)
conn.commit()
When I pass the following:
insert(1, 2)
I get the error:
OperationalError: near "'1'": syntax error
All fields are text if that helps.
Thanks in advance
You have not properly formatted your insert statement.
Right now you're specifying column names in the parens. To specify values, you need to use the VALUES keyword. You don't have to specify column names if your providing values for all columns, but you do need to include VALUES.
Do not use string concatenation to build queries. Instead, use parameterized queries, which allows the database driver to escape any user input that could otherwise lead to an injection attack.
query = 'INSERT INTO test (title, name, time) VALUES (?, ?, ?)'
c.execute(query, (title, name, time))

SQLite execute statement for variable-length rows

I've read the advice here about using parametrized execute call to do all the SQL escaping for you, but this seems to work only when you know the number of columns in advance.
I'm looping over CSV files, one for each table, and populating a local DB for testing purposes. Each table has different numbers of columns, so I can't simply use:
sql = "INSERT INTO TABLE_A VALUES (%s, %s)"
cursor.execute(sql, (val1, val2))
I can build up an sql statement as a string quite flexibly, but this doesn't give me the use of cursor.execute's SQL-escaping facilities, so if the input contains apostrophes or similar, it fails.
It seems like there should be a simple way to do this. Is there?
If you know the number of parameters, you can create a list of them:
count = ...
sql = "INSERT INTO ... VALUES(" + ",".join(count * ["?"]) + ")"
params = []
for i in ...:
params += ['whatever']
cursor.execute(sql, params)

dynamic table mysqldb python string/int issue

I am receiving an error when trying to write data to a database table when using a variable for the table name that I do not get when using a static name. For some reason on the line where I insert, if I insert an integer as the column values the code runs and the table is filled, however, if I try to use a string I get a SQL syntax error
cursor = db.cursor()
cursor.execute('DROP TABLE IF EXISTS %s' %data[1])
sql ="""CREATE TABLE %s (IP TEXT, AVAILIBILITY INT)""" %data[1]
cursor.execute(sql)
for key in data[0]:
cur_ip = key.split(".")[3]
cursor.execute("""INSERT INTO %s VALUES (%s,%s)""" %(data[1],key,data[0][key]))
db.commit()
the problem is where I have %(data[1], key, data[0][key]) any ideas?
It's a little hard to analyse your problem when you don't post the actual error, and since we have to guess what your data actually is. But some general points as advise:
Using a dynamic table name is often not way DB-systems want to be used. Try thinking if the problem could be used by using a static table name and adding an additional key column to your table. Into that field you can put what you did now as a dynamic table name. This way the DB might be able to better optimize your queries, and your queries are less likely to get errors (no need to create extra tables on the fly for once, which is not a cheap thing to do. Also you would not have a need for dynamic DROP TABLE queries, which could be a security risk.
So my advice to solve your problem would be to actually work around it by trying to get rid of dynamic table names altogether.
Another problem you have is that you are using python string formatting and not parameters to the query itself. That is a security problem in itself (SQL-Injections), but also is the problem of your syntax error. When you use numbers, your expression evaluates to
INSERT INTO table_name VALUES (100, 200)
Which is valid SQL. But with strings you get
INSERT INTO table_name VALUES (Some Text, some more text)
which is not valid (since you have no quotes ' around the strings.
To get rid of your syntax problem and of the sql-injection-problem, don't add the values to the string, pass them as a list to execute():
cursor.execute("INSERT INTO table_name VALUES (%s,%s)", (key, data[0][key]))
If you must have a dynamic table name, put that in your query string first (e.g. with % formatting), and give the actual values for your query as parameters as above (since I cannot imagine that execute will accept the table name as a parameter).
To put it in some simple sample code. Right now you are trying to do it like this:
# don't do this, this won't even work!
table_name = 'some_table'
user_name = 'Peter Smith'
user_age = 47
query = "INSERT INTO %s VALUES (%s, %s)" % (table_name, user_name, user_age)
cursor.execute(query)
That creates query
INSERT INTO some_table VALUES (Peter Smith, 100)
Which cannot work, because of the unquoted string. So you needed to do:
# DON'T DO THIS, it's bad!
query = "INSERT INTO %s VALUES ('%s', %s)" % (table_name, user_name, user_age)
That's not a good idea, because you need to know where to put quotes and where not (which you will mess up at some point). Even worse, imagine a user named named Connor O'Neal. You would get a syntax error:
INSERT INTO some_table VALUES ('Connor O'Neal', 100)
(This is also the way sql-injections are used to crush your system / steal your data). So you would also need to take care of escaping the values that are strings. Getting more complicated.
Leave those problems to python and mysql, by passing the date (not the table name) as arguments to execute!
table_name = 'some_table'
user_name = 'Peter Smith'
user_age = 47
query = "INSERT INTO " + table_name + " VALUES (%s, %s)"
cursor.execute(query, (user_name, user_age))
This way you can even pass datetime objects directly. There are other ways to put the data than using %s, take a look at this examples http://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-execute.html (that is python3 used there, I don't know which you use - but except of the print statements it should work with python2 as well, I think).

Python mysql connector prepared INSERT statement cutting off text

I've been trying to insert a large string into an MySQL database using pythons mysql.connector. The problem I'm having is that long strings are getting cut off at some point when using prepared statements. I'm currently using MySQL Connector/Python that is available on MySQL.com. I used the following code do duplicate the problem I'm having.
db = mysql.connector.connect(**creditials)
cursor = db.cursor()
value = []
for x in range(0, 2000):
value.append(str(x+1))
value = " ".join(value)
cursor.execute("""
CREATE TABLE IF NOT EXISTS test (
pid VARCHAR(50),
name VARCHAR(120),
data LONGTEXT,
PRIMARY KEY(pid)
)
""")
db.commit()
#this works as expected
print("Test 1")
cursor.execute("REPLACE INTO test (pid, name, data) VALUES ('try 1', 'Description', '{0}')".format(value))
db.commit()
cursor.close()
#this does not work
print("Test 2")
cursor = db.cursor(prepared=True)
cursor.execute("""REPLACE INTO test (pid, name, data) VALUE (?, ?, ?)""", ('try 2', 'Description2', value))
db.commit()
cursor.close()
Test 1 works as expected and stores all the numbers up to 2000, but test 2 get cut off right after number 65. I would rather use prepared statements than trying to sanitize incoming strings myself. Any help appreciated.
Extra information:
Computer: Windows 7 64 bit
Python: Tried on both python 3.4 and 3.3
MYSQL: 5.6.17 (Came with WAMP)
Library: MySQL Connector/Python
When MySQL Connector driver processes prepared statements, it's using a lower-level binary protocol to communicate values to the server individually. As such, it's telling the server whether the values are INTs or VARCHARs or TEXT, etc. It's not particularly smart about it, and this "behavior" is the result. In this case, it sees that the value is a Python string value and tells MySQL that it's a VARCHAR value. The VARCHAR value has a string length limit that affects the amount of data be sent to the server. What's worse, the interaction between the long value and the limited data type length can yield some strange behavior.
Ultimately, you have a few options:
Use a file-link object for your string
MySQL Connector treats files and file-like objects as BLOBs and TEXTs (depending on whether the file is open in binary or non-binary mode, respectively). You can leverage this to get the behavior you desire.
import StringIO
...
cursor = db.cursor(prepared=True)
cursor.execute("""REPLACE INTO test (pid, name, data) VALUES (?, ?, ?)""",
('try 2', 'Description', StringIO.String(value)))
cursor.close()
db.commit()
Don't use MySQL Connector prepared statements
If you don't use the prepared=True clause to your cursor creation statement, it will generate full valid SQL statements for each execution. You're not really losing too much by avoiding MySQL prepared statements in this context. You do need to pass your SQL statements in a slightly different form to get proper placeholder sanitization behavior.
cursor = db.cursor()
cursor.execute("""REPLACE INTO test (pid, name, data) VALUES (%s, %s, %s)""",
('try 2', 'Description', value))
cursor.close()
db.commit()
Use another MySQL driver
There are a couple different Python MySQL drivers:
MySQLdb
oursql

Categories