How can I batch together insert statements using Python and MySQL - python

I have a mysql database, I am using python and the mysql.connector library
My problem set is I have a list of 21 True or False value which I want to insert into the table one at a time. The way I was doing it at first is by looping over the list and trying to write 1 at a time. This was very slow with each operation taking 0.3s
My questions specifically is how can I use the library to perform batch insert statements?
class DatabaseOperations:
def __init__(self, *args): # Establish the db connection
for i in range(len(args)): # Cycle through passed arguments and assign them to variables
if i == 0: # Assign the first argument as db_connection_user
self.host = args[0]
elif i == 1: # Assign the second argument as db_connection_password
self.user = args[1]
elif i == 2: # Assign the third argument as db_connection_database
self.password = args[2]
elif i == 3: # Assign the fourth argument as db_connection_database
self.database = args[3]
elif i == 4: # Assign the fifth argument as localhost_ip
self.localhost = args[4]
self.connection_check = True
self.socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
self.socket.connect((self.localhost, gvl.global_port_database))
self.db_connection = mysql.connector.pooling.MySQLConnectionPool(
pool_name="db_pool",
pool_size=5,
pool_reset_session=True,
host=self.host,
user=self.user,
password=self.password,
database=self.database)
My first attempt at trying to make things faster was by using a connection pool instead of just a single cursor. My confusion lies in the fact on how do I perform the batch operations. My attempt is to create a list of 'write_operations' and pass that along to do something
def batch_write_table(self, list_of_writes):
try:
db_connection = self.db_connection.get_connection()
db_cursor = db_connection.cursor()
db_cursor.execute(
f"INSERT INTO table ("
f"localhost_ip, "
f"plc_ip, "
f"address_name, "
f"address_number, "
f"address_value, "
f"timestamp) "
f"VALUES("
f"'{localhost_ip}', "
f"'{plc_ip}', "
f"'{modbus_name}', "
f"'{modbus_number}', "
f"'{modbus_value}', "
f"'{timestamp}');"
)
except Exception as e:
print(f'Error => {e}')
I do not understand how to create singular write operation objects, or if I am simply supposed to create list of values and use that along with some batch_write function. Help and clarification are greatly appreciated!
EDIT:
Added how I'm writing from the main file
for i in range(len(_list)):
print('\ni -> ', i)
print('_list[i] -> ', _list[i])
match i:
case 0:
print(f'case 0 timestamp -> {timestamp_generator()}')
write_table(
data
)
essentially there are 21 possible cases as the list is 21 in length

MySQL Connector supports batch updates through the cursor's executemany method. The documentation states that this a genuine batch insert, that is it generates insert statements with multiple values clauses, like this:
INSERT INTO tbl (col1, col2) VALUES (1, 2), (3, 4), (5, 6)
as opposed to generating multiple single insert statements.
Your code ought to look like this:
# Collect the values for each insert in a list of tuples, where each tuple contains the values for a single insert statement
list_arg_tuples = [
(localhost_ip1, plc_ip1, modbus_name1, modbus_number1, modbus_value1, timestamp1),
(localhost_ip2, plc_ip2, modbus_name2, modbus_number2, modbus_value2, timestamp2),
...
]
# Use DB-API parameter subsitution to ensure that values are correctly quoted.
# Don't use f-strings or similar for this, see
# https://stackoverflow.com/questions/902408/how-to-use-variables-in-sql-statement-in-python
stmt = """INSERT INTO table (localhost_ip, plc_ip, address_name, address_number, address_value, timestamp) VALUES(%s, %s, %s, %s, %s, %s)"""
db.cursor.execute(stmt, list_of_arg_tuples)
# Commit after execution. For large numbers of inserts, insert and commit in batches (say 10000 per batch, but you should measure performance and adjust).
db_connection.commit()

Related

Database in python - index issue

for page in range(1, pages + 1):
def append_organizator(organizator, organizatorzy=[]):
organizatorzy.append(organizator)
for i in organizatorzy:
try:
query = "INSERT INTO stypendia (organizator) values(%s)"
values = []
values.append(organizatorzy.pop())
cursor.execute(query, values)
conn.commit()
except:
pass
def append_type(rodzaj, rodzaje=[]):
rodzaje.append(rodzaj)
for i in rodzaje:
try:
query = "INSERT INTO stypendia (rodzaj) values(%s)"
values = []
values.append(rodzaje.pop())
cursor.execute(query, values)
conn.commit()
except:
pass
Those are 2 functions that are inserting the data scrapped from website into the database
The program is iterating through all available pages on site. The data that's scrapped is inserted to database.
As you can see on screenshot, the title is inserted 7 times(the amount of pages), then the organizator again 7 times etc...
How can i solve this problem and have everything at same indexesdatabase ss
You need to combine the insert operations - each insert will create a new row. You should also just use the parameters without the array, they really aren't needed.
This example only handles two parameters (same as your code above). Add additional parameters as needed and adjust the insert statement
# The organization of this loop assumes the order of returned data is
# consistent: each "rodzaj" is at the same index as its "organizator"
# (as the original code assumes)
organizator = doc.find_all(class_='organizator-title')
rodzaj = doc.find_all('div', class_='fleft', string="Rodzaj:")
for i in range(min(len(organizator), len(rodzaj))):
o = organizator[i].text.strip().replace('\\n', '').replace('\\r', '')
r = rodzaji].find_next().text.strip().replace('\\n', '').replace('\\r', '')
append(o, r)
def append(organizator: str, rodzaj: str):
try:
query = "INSERT INTO stypendia (organizator, rodzaj) values(%s, %s)"
values = (organizator, rodzaj)
cursor.execute(query, values)
conn.commit()
except:
pass

python-mysql-connector: I need to speed up the time it takes to update multiple items in mySQL table

I currently have a list of id's approx. of size 10,000. I need to update all rows in the mySQL table which have an id in the inactive_ids list that you see below. I need to change their active status to 'No' which is a column in the mySQL table.
I am using mysql.connector python library.
When I run the code below, it is taking about 0.7 seconds to execute each iteration in the for loop. Thats about a 2 hour run time for all 10,000 id's to be changed. Is there a more optimal/quicker way to do this?
# inactive_ids are unique strings something like shown below
# inactive_ids = ['a9okeoko', 'sdfhreaa', 'xsdfasy', ..., 'asdfad']
# initialize connection
mydb = mysql.connector.connect(
user="REMOVED",
password="REMOVED",
host="REMOVED",
database="REMOVED"
)
# initialize cursor
mycursor = mydb.cursor(buffered=True)
# Function to execute multiple lines
def alter(state, msg, count):
result = mycursor.execute(state, multi=True)
result.send(None)
print(str(count), ': ', msg, result)
count += 1
return count
# Try to execute, throw exception if fails
try:
count = 0
for Id in inactive_ids:
# SAVE THE QUERY AS STRING
sql_update = "UPDATE test_table SET Active = 'No' WHERE NoticeId = '" + Id + "'"
# ALTER
count = alter(sql_update, "done", count)
# commits all changes to the database
mydb.commit()
except Exception as e:
mydb.rollback()
raise e
Do it with a single query that uses IN (...) instead of multiple queries.
placeholders = ','.join(['%s'] * len(inactive_ids))
sql_update = f"""
UPDATE test_table
SET Active = 'No'
WHERE NoticeId IN ({placeholders})
"""
mycursor.execute(sql_update, inactive_ids)

Bulk update of rows in Postgres DB using psycopg2

We need to do bulk updates of many rows in our Postgres DB, and want to use the SQL syntax below. How do we do that using psycopg2?
UPDATE table_to_be_updated
SET msg = update_payload.msg
FROM (VALUES %(update_payload)s) AS update_payload(id, msg)
WHERE table_to_be_updated.id = update_payload.id
RETURNING *
Attempt 1 - Passing values
We need to pass a nested iterable format to the psycopg2 query. For the update_payload, I've tried passing a list of lists, list of tuples, and tuples of tuples. It all fails with various errors.
Attempt 2 - Writing custom class with __conform__
I've tried to write a custom class that we can use for these operations, which would return
(VALUES (row1_col1, row1_col2), (row2_col1, row2_col2), (...))
I've coded up like this following instructions here, but it's clear that I'm doing something wrong. For instance, in this approach I'll have to handle quoting of all values inside the table, which would be cumbersome and prone to errors.
class ValuesTable(list):
def __init__(self, *args, **kwargs):
super(ValuesTable, self).__init__(*args, **kwargs)
def __repr__(self):
data_in_sql = ""
for row in self:
str_values = ", ".join([str(value) for value in row])
data_in_sql += "({})".format(str_values)
return "(VALUES {})".format(data_in_sql)
def __conform__(self, proto):
return self.__repr__()
def getquoted(self):
return self.__repr__()
def __str__(self):
return self.__repr__()
EDIT: If doing a bulk update can be done in a faster/cleaner way using another syntax than the one in my original question, then I'm all ears!
Requirements:
Postgres table, consisting of the fields id and msg (and potentially other fields)
Python data containing new values for msg
Postgres table should be updated via psycopg2
Example Table
CREATE TABLE einstein(
id CHAR(5) PRIMARY KEY,
msg VARCHAR(1024) NOT NULL
);
Test data
INSERT INTO einstein VALUES ('a', 'empty');
INSERT INTO einstein VALUES ('b', 'empty');
INSERT INTO einstein VALUES ('c', 'empty');
Python Program
Hypothetical, self-contained example program with quotations of a famous physicist.
import sys
import psycopg2
from psycopg2.extras import execute_values
def print_table(con):
cur = con.cursor()
cur.execute("SELECT * FROM einstein")
rows = cur.fetchall()
for row in rows:
print(f"{row[0]} {row[1]}")
def update(con, einstein_quotes):
cur = con.cursor()
execute_values(cur, """UPDATE einstein
SET msg = update_payload.msg
FROM (VALUES %s) AS update_payload (id, msg)
WHERE einstein.id = update_payload.id""", einstein_quotes)
con.commit()
def main():
con = None
einstein_quotes = [("a", "Few are those who see with their own eyes and feel with their own hearts."),
("b", "I have no special talent. I am only passionately curious."),
("c", "Life is like riding a bicycle. To keep your balance you must keep moving.")]
try:
con = psycopg2.connect("dbname='stephan' user='stephan' host='localhost' password=''")
print_table(con)
update(con, einstein_quotes)
print("rows updated:")
print_table(con)
except psycopg2.DatabaseError as e:
print(f'Error {e}')
sys.exit(1)
finally:
if con:
con.close()
if __name__ == '__main__':
main()
Prepared Statements Alternative
import sys
import psycopg2
from psycopg2.extras import execute_batch
def print_table(con):
cur = con.cursor()
cur.execute("SELECT * FROM einstein")
rows = cur.fetchall()
for row in rows:
print(f"{row[0]} {row[1]}")
def update(con, einstein_quotes, page_size):
cur = con.cursor()
cur.execute("PREPARE updateStmt AS UPDATE einstein SET msg=$1 WHERE id=$2")
execute_batch(cur, "EXECUTE updateStmt (%(msg)s, %(id)s)", einstein_quotes, page_size=page_size)
cur.execute("DEALLOCATE updateStmt")
con.commit()
def main():
con = None
einstein_quotes = ({"id": "a", "msg": "Few are those who see with their own eyes and feel with their own hearts."},
{"id": "b", "msg": "I have no special talent. I am only passionately curious."},
{"id": "c", "msg": "Life is like riding a bicycle. To keep your balance you must keep moving."})
try:
con = psycopg2.connect("dbname='stephan' user='stephan' host='localhost' password=''")
print_table(con)
update(con, einstein_quotes, 100) #choose some meaningful page_size here
print("rows updated:")
print_table(con)
except psycopg2.DatabaseError as e:
print(f'Error {e}')
sys.exit(1)
finally:
if con:
con.close()
if __name__ == '__main__':
main()
Output
The above program would output the following to the debug console:
a empty
b empty
c empty
rows updated:
a Few are those who see with their own eyes and feel with their own hearts.
b I have no special talent. I am only passionately curious.
c Life is like riding a bicycle. To keep your balance you must keep moving.
Short answer! Use execute_values(curs, sql, args), see docs
For those looking for short straightforward answer. Sample code to update users in bulk;
from psycopg2.extras import execute_values
sql = """
update users u
set
name = t.name,
phone_number = t.phone_number
from (values %s) as t(id, name, phone_number)
where u.id = t.id;
"""
rows_to_update = [
(2, "New name 1", '+923002954332'),
(5, "New name 2", '+923002954332'),
]
curs = conn.cursor() # Assuming you already got the connection object
execute_values(curs, sql, rows_to_update)
If you're using the uuid for primary key, and haven't registered the uuid data type in psycopg2 (keeping uuid as a python string), you can always use this condition u.id = t.id::uuid.

Python: update/delete before insert

Everytime I am executing my python (sql) code it just keeps adding more and more data to my table more and more rows its just keeps growing, somehow i need to Update it or Delete before INSERT'ing data to table, but i dont know how.
Here is my code:
import MySQLdb
from calculationmethod import Method
class dbcclib(Method):
def __str__(self):
"""Return a string representation of the object."""
return "Density matrix of" % (self.data)
def __repr__(self):
"""Return a representation of the object."""
return 'Density matrix("%s")' % (self.data)
def push(self):
# Open database connection
dbhost1 = raw_input("Enter databse host: ")
dbport = int(raw_input("Enter databse port: "))
dbuser = raw_input("Enter dabase user: ")
dbpass = raw_input("Enter databse password: ")
dbname = raw_input("Enter database: ")
db = MySQLdb.connect(host = dbhost1,port = dbport,user = dbuser,passwd = dbpass , db = dbname)
# prepare a cursor object using cursor() method
cur = db.cursor()
# Create table as per requirement
self.data.vibstate
cur.executemany("""
INSERT INTO
cord
(x, y, z)
VALUES
(%s, %s, %s)
""", self.data.vibstate)
db.commit()
# disconnect from server
db.close()
print "Baigta"
I will define now my "MESS", in this example i have 2D array lets say its like this :
a = [[1,2,3],[3,2,1]]
and now then i am INSERT'ing it couple of times into my table : it looks like that:
columns x y z
1 2 3
3 2 1
1 2 3
3 2 1
Its duplicating everytime. Everytime i execute it it adds more and more rows. So i need to get rid off of that duplication.
If you need to get rid of duplicate data, you should use PK and UPSERT (Insert if not exist, else update). if you need clean db each time, just run truncate command before insert.
In Mysql use following structure:
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
Check this link
If you don't want to modify and just need to get rid of duplicate user INSERT IGNORE instead.
INSERT IGNORE INTO table (a,b,c) VALUES (1,2,3);
Remember to create unique constrains for all 3 fields.
ALTER TABLE table ADD UNIQUE INDEX unq (a, b, c)

Inserting JSON into MySQL using Python

I have a JSON object in Python. I am Using Python DB-API and SimpleJson. I am trying to insert the json into a MySQL table.
At moment am getting errors and I believe it is due to the single quotes '' in the JSON Objects.
How can I insert my JSON Object into MySQL using Python?
Here is the error message I get:
error: uncaptured python exception, closing channel
<twitstream.twitasync.TwitterStreamPOST connected at
0x7ff68f91d7e8> (<class '_mysql_exceptions.ProgrammingError'>:
(1064, "You have an error in your SQL syntax; check the
manual that corresponds to your MySQL server version for
the right syntax to use near ''favorited': '0',
'in_reply_to_user_id': '52063869', 'contributors':
'NULL', 'tr' at line 1")
[/usr/lib/python2.5/asyncore.py|read|68]
[/usr/lib/python2.5/asyncore.py|handle_read_event|390]
[/usr/lib/python2.5/asynchat.py|handle_read|137]
[/usr/lib/python2.5/site-packages/twitstream-0.1-py2.5.egg/
twitstream/twitasync.py|found_terminator|55] [twitter.py|callback|26]
[build/bdist.linux-x86_64/egg/MySQLdb/cursors.py|execute|166]
[build/bdist.linux-x86_64/egg/MySQLdb/connections.py|defaulterrorhandler|35])
Another error for reference
error: uncaptured python exception, closing channel
<twitstream.twitasync.TwitterStreamPOST connected at
0x7feb9d52b7e8> (<class '_mysql_exceptions.ProgrammingError'>:
(1064, "You have an error in your SQL syntax; check the manual
that corresponds to your MySQL server version for the right
syntax to use near 'RT #tweetmeme The Best BlackBerry Pearl
Cell Phone Covers http://bit.ly/9WtwUO''' at line 1")
[/usr/lib/python2.5/asyncore.py|read|68]
[/usr/lib/python2.5/asyncore.py|handle_read_event|390]
[/usr/lib/python2.5/asynchat.py|handle_read|137]
[/usr/lib/python2.5/site-packages/twitstream-0.1-
py2.5.egg/twitstream/twitasync.py|found_terminator|55]
[twitter.py|callback|28] [build/bdist.linux-
x86_64/egg/MySQLdb/cursors.py|execute|166] [build/bdist.linux-
x86_64/egg/MySQLdb/connections.py|defaulterrorhandler|35])
Here is a link to the code that I am using http://pastebin.com/q5QSfYLa
#!/usr/bin/env python
try:
import json as simplejson
except ImportError:
import simplejson
import twitstream
import MySQLdb
USER = ''
PASS = ''
USAGE = """%prog"""
conn = MySQLdb.connect(host = "",
user = "",
passwd = "",
db = "")
# Define a function/callable to be called on every status:
def callback(status):
twitdb = conn.cursor ()
twitdb.execute ("INSERT INTO tweets_unprocessed (text, created_at, twitter_id, user_id, user_screen_name, json) VALUES (%s,%s,%s,%s,%s,%s)",(status.get('text'), status.get('created_at'), status.get('id'), status.get('user', {}).get('id'), status.get('user', {}).get('screen_name'), status))
# print status
#print "%s:\t%s\n" % (status.get('user', {}).get('screen_name'), status.get('text'))
if __name__ == '__main__':
# Call a specific API method from the twitstream module:
# stream = twitstream.spritzer(USER, PASS, callback)
twitstream.parser.usage = USAGE
(options, args) = twitstream.parser.parse_args()
if len(args) < 1:
args = ['Blackberry']
stream = twitstream.track(USER, PASS, callback, args, options.debug, engine=options.engine)
# Loop forever on the streaming call:
stream.run()
use json.dumps(json_value) to convert your json object(python object) in a json string that you can insert in a text field in mysql
http://docs.python.org/library/json.html
To expand on the other answers:
Basically you need make sure of two things:
That you have room for the full amount of data that you want to insert in the field that you are trying to place it. Different database field types can fit different amounts of data.
See: MySQL String Datatypes. You probably want the "TEXT" or "BLOB" types.
That you are safely passing the data to database. Some ways of passing data can cause the database to "look" at the data and it will get confused if the data looks like SQL. It's also a security risk. See: SQL Injection
The solution for #1 is to check that the database is designed with correct field type.
The solution for #2 is use parameterized (bound) queries. For instance, instead of:
# Simple, but naive, method.
# Notice that you are passing in 1 large argument to db.execute()
db.execute("INSERT INTO json_col VALUES (" + json_value + ")")
Better, use:
# Correct method. Uses parameter/bind variables.
# Notice that you are passing in 2 arguments to db.execute()
db.execute("INSERT INTO json_col VALUES %s", json_value)
Hope this helps. If so, let me know. :-)
If you are still having a problem, then we will need to examine your syntax more closely.
The most straightforward way to insert a python map into a MySQL JSON field...
python_map = { "foo": "bar", [ "baz", "biz" ] }
sql = "INSERT INTO your_table (json_column_name) VALUES (%s)"
cursor.execute( sql, (json.dumps(python_map),) )
You should be able to insert intyo a text or blob column easily
db.execute("INSERT INTO json_col VALUES %s", json_value)
You need to get a look at the actual SQL string, try something like this:
sqlstr = "INSERT INTO tweets_unprocessed (text, created_at, twitter_id, user_id, user_screen_name, json) VALUES (%s,%s,%s,%s,%s,%s)", (status.get('text'), status.get('created_at'), status.get('id'), status.get('user', {}).get('id'), status.get('user', {}).get('screen_name'), status)
print "about to execute(%s)" % sqlstr
twitdb.execute(sqlstr)
I imagine you are going to find some stray quotes, brackets or parenthesis in there.
#route('/shoes', method='POST')
def createorder():
cursor = db.cursor()
data = request.json
p_id = request.json['product_id']
p_desc = request.json['product_desc']
color = request.json['color']
price = request.json['price']
p_name = request.json['product_name']
q = request.json['quantity']
createDate = datetime.now().isoformat()
print (createDate)
response.content_type = 'application/json'
print(data)
if not data:
abort(400, 'No data received')
sql = "insert into productshoes (product_id, product_desc, color, price, product_name, quantity, createDate) values ('%s', '%s','%s','%d','%s','%d', '%s')" %(p_id, p_desc, color, price, p_name, q, createDate)
print (sql)
try:
# Execute dml and commit changes
cursor.execute(sql,data)
db.commit()
cursor.close()
except:
# Rollback changes
db.rollback()
return dumps(("OK"),default=json_util.default)
One example, how add a JSON file into MySQL using Python. This means that it is necessary to convert the JSON file to sql insert, if there are several JSON objects then it is better to have only one call INSERT than multiple calls, ie for each object to call the function INSERT INTO.
# import Python's JSON lib
import json
# use JSON loads to create a list of records
test_json = json.loads('''
[
{
"COL_ID": "id1",
"COL_INT_VAULE": 7,
"COL_BOOL_VALUE": true,
"COL_FLOAT_VALUE": 3.14159,
"COL_STRING_VAULE": "stackoverflow answer"
},
{
"COL_ID": "id2",
"COL_INT_VAULE": 10,
"COL_BOOL_VALUE": false,
"COL_FLOAT_VALUE": 2.71828,
"COL_STRING_VAULE": "http://stackoverflow.com/"
},
{
"COL_ID": "id3",
"COL_INT_VAULE": 2020,
"COL_BOOL_VALUE": true,
"COL_FLOAT_VALUE": 1.41421,
"COL_STRING_VAULE": "GIRL: Do you drink? PROGRAMMER: No. GIRL: Have Girlfriend? PROGRAMMER: No. GIRL: Then how do you enjoy life? PROGRAMMER: I am Programmer"
}
]
''')
# create a nested list of the records' values
values = [list(x.values()) for x in test_json]
# print(values)
# get the column names
columns = [list(x.keys()) for x in test_json][0]
# value string for the SQL string
values_str = ""
# enumerate over the records' values
for i, record in enumerate(values):
# declare empty list for values
val_list = []
# append each value to a new list of values
for v, val in enumerate(record):
if type(val) == str:
val = "'{}'".format(val.replace("'", "''"))
val_list += [ str(val) ]
# put parenthesis around each record string
values_str += "(" + ', '.join( val_list ) + "),\n"
# remove the last comma and end SQL with a semicolon
values_str = values_str[:-2] + ";"
# concatenate the SQL string
table_name = "json_data"
sql_string = "INSERT INTO %s (%s)\nVALUES\n%s" % (
table_name,
', '.join(columns),
values_str
)
print("\nSQL string:\n\n")
print(sql_string)
output:
SQL string:
INSERT INTO json_data (COL_ID, COL_INT_VAULE, COL_BOOL_VALUE, COL_FLOAT_VALUE, COL_STRING_VAULE)
VALUES
('id1', 7, True, 3.14159, 'stackoverflow answer'),
('id2', 10, False, 2.71828, 'http://stackoverflow.com/'),
('id3', 2020, True, 1.41421, 'GIRL: Do you drink? PROGRAMMER: No. GIRL: Have Girlfriend? PROGRAMMER: No. GIRL: Then how do you enjoy life? PROGRAMMER: I am Programmer.');
The error may be due to an overflow of the size of the field in which you try to insert your json. Without any code, it is hard to help you.
Have you considerate a no-sql database system such as couchdb, which is a document oriented database relying on json format?
Here's a quick tip, if you want to write some inline code, say for a small json value, without import json.
You can escape quotes in SQL by a double quoting, i.e. use '' or "", to enter ' or ".
Sample Python code (not tested):
q = 'INSERT INTO `table`(`db_col`) VALUES ("{k:""some data"";}")'
db_connector.execute(q)

Categories