I have a Python script that uses the Psycopg2 library to insert data in a Postgres database. The tables were created using Django migrations. Although the database is used by Django, it is part of a data analysis system and it will also be accessed and manipulated using Psycopg2. Everything is running on Ubuntu. A simplified version of the database is presented below.
Django receives zip files via POST requests and adds the corresponding entries to the Upload table. Each entry has the zip file location. Inside each Upload zip, there are Session zips, which in turn have CSV files inside with the relevant data. The Session zips and CSV files have no references in the database, but its information is inserted using the aforementioned script. In my complete system, there are more tables, but to present the problem a Data table per Session suffices - each session zip should have 2 CSVs, one for Data and another for Session metadata. The cycle of the script is presented below.
So basically for each Upload zip, its sessions are extracted and inserted one by one in the database. The data for Session and the corresponding Data are to be inserted in a single transaction. Since Data have foreign keys referencing Session, those fields have to be deferred. The constraints are set to deferrable initially deferred. The Session id primary key to be used is calculated incrementing the greatest existing Session id value.
Sometimes though, the received data is corrupt or incomplete, and the transaction commit fails, as it should. The problem is that after a single one of those failures, every time new Session insertions are attempted, the transactions fail with the error message stating that the foreign key constraint between Session and Data is violated as if the fields were not deferred!
The system still receives and inserts new entries in Upload, but the problem inserting new sessions persists. If I destroy the database and recreate it everything works perfectly well until a single one of those transactions fails, after which once again no more sessions can be inserted due to foreign key violations.
What could be causing this behavior? Apparently, due to a failed transaction, the fields no longer behave as deferred as they are defined.
I understand that my text is very long, but it was the best way I found to express my problem. I thank in advance anyone who takes the time to read it and possibly share their expertise.
The software versions are Postgres 10.12; Psycopg 2.8.5; Django 2.2.12; Python 3.6.9; Ubuntu 18.04.
UPDATE - MINIMAL REPRODUCIBLE EXAMPLE
The steps to fully reproduce my problem are listed below. Many are unnecessary or too obvious for some of you, but I opted to include everything so than anyone who so wishes can follow. This will have to be adapted if different software is used. I modified my example system so that it is completely independent from Django.
A - Enter your Ubuntu system
B - Install the software (some of this may be unnecessary)
sudo apt update
sudo apt install python3-pip python3-dev libpq-dev postgresql postgresql-contrib
C - In your Linux home directory create an alt_so_reprex directory and cd to it
mkdir alt_so_reprex
cd alt_so_reprex
D - Create a virtual environment
virtualenv venv
source venv/bin/activate
pip install psycopg2
E - Create the 5 scripts listed below - in each one replace {YOUR_USERNAME} with your Linux username. Grant permission to run to each one.
chmod +x 1_user_and_db.sh 2_db_tables.py 3_insert_uploads.py 4_create_test_files.sh 5_insert_sessions.py
Script 1: 1_user_and_db.sh
#!/bin/bash
# Create user if it does not exist
if [ "$( sudo -u postgres -H -- psql -c "SELECT 1 FROM pg_roles WHERE rolname='{YOUR_USERNAME}'" )" != '1' ]
then
sudo -u postgres -H -- psql -c "CREATE USER {YOUR_USERNAME} WITH PASSWORD 'password';";
fi
# Create the PgSQL database (ignore the error the first time this runs)
sudo -u postgres -H -- psql -c "DROP DATABASE test_db;";
sudo -u postgres -H -- psql -c "CREATE DATABASE test_db;";
sudo -u postgres -H -- psql -d test_db -c "ALTER ROLE {YOUR_USERNAME} SET client_encoding TO 'utf8';";
sudo -u postgres -H -- psql -d test_db -c "ALTER ROLE {YOUR_USERNAME} SET default_transaction_isolation TO 'read committed';";
sudo -u postgres -H -- psql -d test_db -c "ALTER ROLE {YOUR_USERNAME} SET timezone TO 'UTC';";
sudo -u postgres -H -- psql -d test_db -c "GRANT ALL PRIVILEGES ON DATABASE test_db TO {YOUR_USERNAME};";
# Show database
sudo -u postgres -H -- psql -d test_db -c "\l";
Script 2: 2_db_tables.py (based on a #snakecharmerb contribution - thanks)
#!/usr/bin/env python3
import psycopg2
# TABLE CREATION
reprex_upload = """CREATE TABLE Reprex_Upload (
id BIGSERIAL PRIMARY KEY,
zip_file VARCHAR(128),
processed BOOLEAN DEFAULT FALSE
) """
reprex_session = """CREATE TABLE Reprex_Session (
id BIGSERIAL PRIMARY KEY,
metadata VARCHAR(128),
upload_id BIGINT REFERENCES Reprex_Upload ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
) """
reprex_data = """CREATE TABLE Reprex_Data (
id BIGSERIAL PRIMARY KEY,
data VARCHAR(128),
session_id BIGINT REFERENCES Reprex_Session ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
)"""
print("Creating tables...")
with psycopg2.connect(dbname='test_db', user='{YOUR_USERNAME}', host='localhost', password='password') as conn:
cur = conn.cursor()
cur.execute(reprex_upload)
cur.execute(reprex_session)
cur.execute(reprex_data)
conn.commit()
Script 3: 3_insert_uploads.py
#!/usr/bin/env python3
import psycopg2
from psycopg2 import sql
DATABASE = 'test_db'
USER = '{YOUR_USERNAME}'
PASSWORD = 'password'
conn = None
cur = None
try:
conn = psycopg2.connect(database=DATABASE, user=USER, password=PASSWORD)
cur = conn.cursor()
cur.execute(sql.SQL("INSERT INTO reprex_upload VALUES (DEFAULT, 'uploads/ok_upload.zip', DEFAULT)"))
cur.execute(sql.SQL("INSERT INTO reprex_upload VALUES (DEFAULT, 'uploads/bad_upload.zip', DEFAULT)"))
cur.execute(sql.SQL("INSERT INTO reprex_upload VALUES (DEFAULT, 'uploads/ok_upload.zip', DEFAULT)"))
conn.commit()
except (Exception, psycopg2.Error) as err:
print("Exception/Error:", err)
finally:
# closing database conn.
if cur:
cur.close()
if conn:
conn.close()
print("PostgreSQL conn is closed")
Script 4: 4_create_test_files.sh
#!/bin/bash
mkdir uploads
cd uploads
rm *
{ echo "metadata"; echo "Session data..."; } > 123_Session.csv
{ echo "data"; echo "Data 1..."; } > 123_Data.csv
zip 123_session.zip 123_Data.csv 123_Session.csv
zip ok_upload.zip 123_session.zip
rm 123_session.zip
zip 123_session.zip 123_Session.csv
zip bad_upload.zip 123_session.zip
rm 123*
Script 5: 5_insert_sessions.py
#!/usr/bin/env python3
import psycopg2
from psycopg2 import sql
import csv
from zipfile import ZipFile
import os
import shutil
import sys
MEDIA_ROOT_DIR = '/home/{YOUR_USERNAME}/alt_so_reprex/'
EXTRACTED_UPLOADS_DIR = '/home/{YOUR_USERNAME}/alt_so_reprex/extracted_uploads/'
EXTRACTED_SESSIONS_DIR = '/home/{YOUR_USERNAME}/alt_so_reprex/extracted_sessions/'
DATABASE = 'test_db'
USER = '{YOUR_USERNAME}'
PASSWORD = 'password'
def insert_csv(filepath, message, table, num_args, foreign_key):
with open(filepath, 'r') as f:
reader = csv.reader(f)
next(reader) # Skip the header row
count = 0
print(message)
arguments_format = sql.SQL(', ').join(sql.Placeholder() * (num_args - 1))
print('The arguments format is:', arguments_format.as_string(connection))
for row in reader:
row.append(foreign_key)
cursor.execute(
sql.SQL('INSERT INTO {} VALUES (DEFAULT, {})').format(sql.Identifier(table), arguments_format), row)
count += 1
print(count, 'record(s) will be inserted into %s table' % table)
def get_unprocessed_uploaded_zips():
conn = None
cur = None
try:
conn = psycopg2.connect(database=DATABASE, user=USER, password=PASSWORD)
cur = conn.cursor()
query = "SELECT * FROM reprex_upload WHERE processed=FALSE"
cur.execute(query)
res = cur.fetchall()
# return true and res
return True, res
except (Exception, psycopg2.Error) as err:
# return false and err message
print("Exception/Error:", err)
return False, None
finally:
# closing database conn.
if cur:
cur.close()
if conn:
conn.close()
print("PostgreSQL conn is closed")
# COALESCE is used for the first insertion ever, where a NULL would be returned
def get_last_session_id():
conn = None
cur = None
try:
conn = psycopg2.connect(database=DATABASE, user=USER, password=PASSWORD)
cur = conn.cursor()
query = "SELECT COALESCE(MAX(id), 0) FROM reprex_session"
cur.execute(query)
result = cur.fetchone()
# return true and results
return True, result[0]
except (Exception, psycopg2.Error) as err:
# return false and err message
print("Exception/Error:", err)
return False, None
finally:
# closing database conn.
if cur:
cur.close()
if conn:
conn.close()
print("PostgreSQL conn is closed")
# get all entries in Upload witch are unprocessed
query_success, results = get_unprocessed_uploaded_zips()
if query_success is False:
sys.exit()
uploaded_zips = 0
for unprocessed_upload in results:
uploaded_zips += 1
print('\n\t' + '### UNPROCESSED UPLOAD ' + str(uploaded_zips) + ' ###\n')
# The id field is the first one
upload_zip_id = unprocessed_upload[0]
# The zip_file field is the second one
upload_zip_path = unprocessed_upload[1]
print(upload_zip_path)
# The filename will be the second part of the filepath
upload_zip_name = upload_zip_path.split('/')[1]
print(upload_zip_name)
print(upload_zip_path)
# The full filepath
upload_zip_full_path = MEDIA_ROOT_DIR + upload_zip_path
print(upload_zip_full_path)
if upload_zip_full_path.endswith('.zip'):
print('There is a new upload zip file: ' + upload_zip_full_path)
# the folder name will be the file name minus the .zip extension
upload_zip_folder_name = upload_zip_name.split('.')[0]
upload_zip_folder_path = EXTRACTED_UPLOADS_DIR + upload_zip_folder_name
# Create a ZipFile Object and load the received zip file in it
with ZipFile(upload_zip_full_path, 'r') as zipObj:
# Extract all the contents of zip file to the referred directory
zipObj.extractall(upload_zip_folder_path)
inserted_sessions = 0
# Iterate over all session files inserting data in database
for session_zip in os.scandir(upload_zip_folder_path):
inserted_sessions += 1
print('\n\t\t' + '### INSERTING SESSION ' + str(inserted_sessions) + ' ###\n')
if session_zip.path.endswith('.zip') and session_zip.is_file():
print('There is a new session zip file: ' + session_zip.name + '\n' + 'Located in: ' + session_zip.path)
# the folder name will be the file name minus the .zip extension
session_zip_folder_name = session_zip.name.split('.')[0]
session_zip_folder_path = EXTRACTED_SESSIONS_DIR + session_zip_folder_name
# Create a ZipFile Object and load the received zip file in it
with ZipFile(session_zip, 'r') as zipObj:
# Extract all the contents of zip file to the referred directory
zipObj.extractall(session_zip_folder_path)
session_file_path = session_zip_folder_path + '/' + \
session_zip_folder_name.replace('session', 'Session.csv')
data_file_path = session_zip_folder_path + '/' + \
session_zip_folder_name.replace('session', 'Data.csv')
# get the latest session id and increase it by one
query_success, last_session_id = get_last_session_id()
if query_success is False:
sys.exit()
session_id = last_session_id + 1
print('The session ID will be: ', session_id)
connection = None
cursor = None
try:
# open a new database connection
connection = psycopg2.connect(database=DATABASE, user=USER, password=PASSWORD)
cursor = connection.cursor()
# First insert the Session file -> Link entry to Upload entry (upload_zip_id)
insert_csv(session_file_path, 'PROCESSING SESSION!\n', 'reprex_session', 3,
upload_zip_id)
# Then insert the Data file -> Link entry to Session entry (session_id)
insert_csv(data_file_path, 'PROCESSING DATA!\n', 'reprex_data', 3, session_id)
# modify the Upload entry to processed
update = "UPDATE reprex_upload SET processed=TRUE WHERE id=%s"
cursor.execute(update, (upload_zip_id,))
# make all changes or none
connection.commit()
except (Exception, psycopg2.Error) as error:
# print error message
if connection:
print('ERROR:', error)
finally:
# closing database connection.
if cursor:
cursor.close()
if connection:
connection.close()
print("PostgreSQL connection is closed")
# Remove folder with extracted content - this erases the csv files
try:
shutil.rmtree(session_zip_folder_path)
except OSError as e:
print("Error: %s" % e.strerror)
# Remove folder with extracted content - this erases the session zips
try:
shutil.rmtree(upload_zip_folder_path)
except OSError as e:
print("Error: %s " % e.strerror)
F - Run the 5 scripts in order. You will verify that the "bad upload" causes the second "good upload" to not be inserted due to a foreign key violation. More uploads can be inserted with script 3 but no more sessions can be inserted. If you manually delete the "bad upload" you will verify that you still cannot insert more sessions due to foreign key violation. But if you recreate the database starting over from script 1 you can insert "good sessions" again. If you delete the "bad upload" from the upload directory you can insert as many sessions as you like. But after a single error, there are always foreign key violations, as if the constraints were not deferred.
I changed the original title of this question since I have now found that the issue is not in any case caused by Django. I also changed the database model to an even simpler one than I originally presented, and changed the original text to reflect this. I also removed the Django tag.
The particular error in this example can be trivially avoided by checking for the existence of the correct CSVs inside the ZIPs, but in my real system, other errors can occur. What I need is a solution to the apparent change of behavior in the constraints.
I know I am being extremely verbose, I thank you for your patience and your help.
With the help of a colleague, I have found the solution to my problem.
The failed transaction with the "bad_upload" internally increments the nextval of the sequence. So after one failed transaction the id to be used for the next Session will be not the current maximum id + 1, but the current maximum + 2.
To prevent this kind of issues, the correct way of obtaining the next Session id would be executing the Session insertion with:
cursor.execute(sql.SQL('INSERT INTO {} VALUES (DEFAULT, {}) RETURNING id').format(sql.Identifier(table), arguments_format), row)
Then getting the id to be used with:
session_id = cursor.fetchone()
And then use that Session id when inserting the Data table.
According to the Postgres documentation: "To avoid blocking concurrent transactions that obtain numbers from the same sequence, a nextval operation is never rolled back; that is, once a value has been fetched it is considered used and will not be returned again. This is true even if the surrounding transaction later aborts, or if the calling query ends up not using the value."
So this ended up being an issue with Postgres, not even Psycopg. Many thanks to everyone who helped with this issue.
I'm having problems returning auto-incremented ID columns from a MySQL database using MySQLdb python library.
I have something like:
sql = """INSERT INTO %s (%s) VALUES (\"%s\")""" %(tbl, colsf, valsf)
try:
cursor.execute(sql)
id = cursor.lastrowid
db.close()
except:
print "Failed to add to MySQL database: \n%s" %sql
print sys.exc_info()
db.close()
exit()
However the lastrowid command seems to be returning incorrect values. For instance, I've tried printing out various id columns from the MySQL command line which shows them to be empty, but the lastrowid value keeps increasing by 1 every time the python script is run. Any ideas?
Turned out that the values weren't being committed to the MySQL database properly, adding "db.commit()" command seems to solve the problem.
sql = """INSERT INTO %s (%s) VALUES (\"%s\")""" %(tbl, colsf, valsf)
try:
cursor.execute(sql)
id = cursor.lastrowid
cursor.close()
db.commit()
db.close()
except:
print "Failed to add to MySQL database: \n%s" %sql
print sys.exc_info()
db.close()
exit()
I'm currently writing a chat bot with plugin functionality and at the moment, I'm working on a permission system.
However, my insert query into my database somehow doesn't work. If I do it by hand, it works flawless.
Here's that piece of code... hopefully you see what I try here:
def dothis(message):
if message.content.split()[1].lower() == "op":
user = get_member_by_name(message, message.content.split()[2])
try:
pmcon = mdb.connect(db_server, db_user, db_pass, db_name)
pmcur = pmcon.cursor()
pmcur.execute("INSERT INTO users (username,userid,hasop) VALUES (\'{}\',\'{}\',{})".format(message.content.split()[2], user.id, "TRUE"))
except: mdb.Error, e:
print "Error %d: %s" % (e.args[0], e.args[1])
finally:
if pmcon:
pmcon.close()
I already tried putting the query in a string and let it be printed out, but I don't see an error.
Am I doing something wrong?
if your database connection is not configured to autocommit, you need to commit your statements:
pmcon.commit()
(after the execute statement.)
I've been trying to automate the installation of an Open Street Map Server since no one has published one yet and the task is pretty tedious. In order to do this I'm dealing with PostgreSQL databases in a script, which I left Python in charge of.
Here's the situation: Basically I'm running python scripts dealing with the database throughout bash code. I'm trying to make the install as user friendly as possible, part of that is automating the PostgreSQL setup. I prompt the user, in bash, for a password they would like to use for the postgres database that already comes with PostgreSQL. I then send their password as a command line argument to a Python script.
This is the part of the script I'm having problems with:
import psycopg2
import sys
con = None
code = sys.argv[1]
try:
con = psycopg2.connect(database='postgres', user='postgres')
cur = con.cursor()
cur.execute("ALTER USER postgres WITH PASSWORD '%s'" % code)
Basically: On the bottom line where I change the password for the postgres database, it doesn't actually work. I know this because later I am prompted in my bash script to enter the password and it results in an authentication failure.
I'm pretty new to this, so if anyone has some good advice, it would be greatly appreciated.
Please use the below code, you can generate random passwords and update them
NOTE: For this code to work, the readwrite1 user has to be present in database prior using this
from psycopg2 import Error
import psycopg2
import random
#password generation
def password_generator(password_length):
# maximum length of password needed
characters = string.ascii_letters + string.digits + '!##$%^&*()'
password = ''.join(random.choice(characters) for i in range(password_length))
return password
#define a function that handles and parses psycopg2 exceptions
def print_psycopg2_exception(err):
err_type, err_obj, traceback = sys.exc_info()
# get the line number when exception occured
line_num = traceback.tb_lineno
# print the connect() error
print ("\npsycopg2 ERROR:", err, "on line number:", line_num)
print ("psycopg2 traceback:", traceback, "-- type:", err_type)
# psycopg2 extensions.Diagnostics object attribute
print ("\nextensions.Diagnostics:", err.diag)
# print the pgcode and pgerror exceptions
print ("pgerror:", err.pgerror)
print ("pgcode:", err.pgcode, "\n")
def update_password():
password=password_generator(10)
try:
con = psycopg2.connect(host="host here",database="dbhere", user="username",password="password")
cur = con.cursor()
cur.execute("alter user readwrite1 with password %(password)s;", {'password': password})
con.commit()
except Exception as err:
# pass exception to function
print_psycopg2_exception(err)
exit(1)
finally:
print('password is: ', password)
if con:
con.close()
return password
I'm using python-mysql, attacked below is the code snippet I'm using to insert into a database table. For some reasons, the code is not populating any rows in the database. There are no exceptions raised and the SELECT queries work fine. On copying the code inside execute and running in phpmyadmin, the database is populated fine.
import MySQLdb as mdb
try:
con = mdb.connect(host='localhost', user='', passwd='', db='indoor')
cur = con.cursor()
cur.execute("INSERT INTO locationdata VALUES('1','1','1','1','1','1')")
numrows = cur.execute("SELECT * FROM locationdata")
print str(numrows) + " : total Rows"
print cur.fetchone()
if con.open:
print "Hello DB"
except mdb.Error, e:
Print "Error " + e. args [0]
Any ideas what am I missing?