PyMySQL throws 'BrokenPipeError' after making frequent reads - python

I have written a script to help me work with a database. Specifically, I am trying to work with files on disk and add the result of this work to my database. I have copied the code below, but removed most of the logic which isn't related to my database to try to keep this question broad and helpful.
I used the code to operate on the files and add the result to the database, overwriting any files with the same identifier as the one I was working on. Later, I modified the script to ignore documents which have already been added to the database, and now whenever I run it I get an error:
pymysql.err.OperationalError: (2006, "MySQL server has gone away (BrokenPipeError(32, 'Broken pipe'))")
It seems like the server is rejecting the requests, possibly because I have written my code poorly? I have noticed that the error always occurs at the same place in the list of files, which doesn't change. If I re-run run the code, replacing the file list with a list of only the file on which the program crashes, it works fine. This makes me think that after making a certain number of requests, the database just bottoms out.
I'm using Python 3 and MySQL Community Edition Version 14.14 on OS X.
Code (stripped of stuff that doesn't have to do with the database):
import pymysql
# Stars for user-specific stuff
connection = pymysql.connect(host='localhost',
user='root',
password='*******',
db='*******',
use_unicode=True,
charset="utf8mb4",
)
cursor = connection.cursor()
f_arr = # An array of all of my data objects
def convertF(file_):
# General layout: Try to work with input and add it the result to DB. The work can raise an exception
# If the record already exists in the DB, ignore it
# Elif the work was already done and the result is on disk, put it on the database
# Else do the work and put it on the database - this can raise exceptions
# Except: Try another way to do the work, and put the result in the database. This can raise an error
# Second (nested) except: Add the record to the database with indicator that the work failed
# This worked before I added the initial check on whether or not the record already exists in the database. Now, for some reason, I get the error:
# pymysql.err.OperationalError: (2006, "MySQL server has gone away (BrokenPipeError(32, 'Broken pipe'))")
# I'm pretty sure that I have written code to work poorly with the database. I had hoped to finish this task quickly instead of efficiently.
try:
# Find record in DB, if text exists just ignore the record
rc = cursor.execute("SELECT LENGTH(text) FROM table WHERE name = '{0}'".format(file_["name"]))
length = cursor.fetchall()[0][0] # Gets the length
if length != None and length > 4:
pass
elif ( "work already finished on disk" ):
# get "result_text" from disk
cmd = "UPDATE table SET text = %s, hascontent = 1 WHERE name = %s"
cursor.execute(cmd, ( pymysql.escape_string(result_text), file_["name"] ))
connection.commit()
else:
# do work to get result_text
cmd = "UPDATE table SET text = %s, hascontent = 1 WHERE name = %s"
cursor.execute(cmd, ( pymysql.escape_string(result_text), file_["name"] ))
connection.commit()
except:
try:
# Alternate method of work to get result_text
cmd = "UPDATE table SET text = %s, hascontent = 1 WHERE name = %s"
cursor.execute(cmd, ( pymysql.escape_string(result_text), file_["name"] ))
connection.commit()
except:
# Since the job can't be done, tell the database
cmd = "UPDATE table SET text = %s, hascontent = 0 WHERE name = %s"
cursor.execute(cmd, ( "NO CONTENT", file_["name"]) )
connection.commit()
for file in f_arr:
convertF(file)

Mysql Server Has Gone Away
This problem is described extensively at http://dev.mysql.com/doc/refman/5.7/en/gone-away.html the usual cause is that the server has disconnected for whatever reason and the usual remedy is to retry the query or to reconnect and retry.
But why this breaks your code is because of the way you have written your code. See below
Possibly because I have written my code poorly?
Since you asked.
rc = cursor.execute("SELECT LENGTH(text) FROM table WHERE name = '{0}'".format(file_["name"]))
This is a bad habit. The manually explicitly warns you against doing this to avoid SQL injections. The correct way is
rc = cursor.execute("SELECT LENGTH(text) FROM table WHERE name = %s", (file_["name"],))
The second problem with the above code is that you don't need to check if a value exists before you try to update it. You can delete the above line and it's associated if else and jump straight to the update. Besides, our elif and else seem to do exactly the same thing. So your code can just be
try:
cmd = "UPDATE table SET text = %s, hascontent = 1 WHERE name = %s"
cursor.execute(cmd, ( pymysql.escape_string(result_text), file_["name"] ))
connection.commit()
except: # <-- next problem.
And we come to the next problem. Never ever catch generic exceptions like this. you should always catch specific exceptions like TypeError, AttributeError etc. When catching generic exceptions is unavoidable, you should at least log it.
For example, here you could catch connection errors and attempt to reconnect to the database. Then the code will not stop executing when your server gone away problem happens.

I've solved the same error in the case when I tried to make a bulk inserts by reducing the number of lines I wanted to insert in one command.
Even the maximum number of lines for bulk insert was much higher, I had this kind of error.

Related

Inserting JPEG-filenames into PostgreSQL table using Psycopg2 causes "not all arguments converted during string formatting" error

I'm trying to fill a PostgreSQL table (psycopg2, Python) with the filenames I have in a specific folder. I have created a function that should do the trick, but I get the error:
not all arguments converted during string formatting,
when I run my function. I did a test run and called the function in the following way:
insert_file_names_into_database(["filename1_without_extension", "filename2_without_extension"]),
and I had no problems and the INSERT worked fine. If I did the following:
insert_file_names_into_database(["filename1.extension", "filename2.extension"]),
Then I get the error above. So the problem seems to be the "." character (e.g. image.jpg) which causes the SQL INSERT to fail. I tried to consult the Psycopg2 docs about this, but I found no examples relating to this specific case.
How should I edit the piece of code so I can get to work even with "." characters in the filenames?
def insert_file_names_into_database(file_name_list):
""" insert multiple filenames into the table """
sql = "INSERT INTO mytable(filename) VALUES(%s)"
conn = None
try:
# read database configuration
# connect to the PostgreSQL database
conn = psycopg2.connect(
host="localhost",
database="mydatabase",
user="myusername",
password="mypassword")
# create a new cursor
cur = conn.cursor()
# execute the INSERT statement
cur.executemany(sql, file_name_list)
# commit the changes to the database
conn.commit()
# close communication with the database
cur.close()
except (Exception, psycopg2.DatabaseError) as error:
print(error)
finally:
if conn is not None:
conn.close()
Solved it myself already. I knew I should be using tuples when working with the INSERT, but my function worked fine with list of strings without the "." characters.
The solution I got working was to convert the list of strings into a list of tuples like so:
tuple_file_name = [tuple((file_name,)) for file_name in file_name_list]
So for example if:
file_name_list = ["filename1.jpg", "filename2.jpg"]
Then giving this as input to my function fails. But by making it a list of tuples:
tuple_file_name = [tuple((file_name,)) for file_name in file_name_list]
print(tuple_file_name)
[('filename1.jpg',), ('filename2.jpg',)]
Then now the function accepts the input tuple_file_name and the filenames are saved into the SQL table.

Except not running as expected and postgresql issues

I'm creating a simple app to connect to a postgresql database, query it to list all the tables, and query each of the table to output some information. My code is running but I need to fix some issues of the part below. df_dbtables is my pandas dataframe of db schemas and tables.
for index, row in df_dbtables.iterrows():
try:
schema_table = row['schema'] + "." + row['table']
cur.execute("SELECT type,stylename FROM %s" % schema_table)
rows = cur.fetchall()
for row in rows:
data.append({"Type" : row[0], "Stylename" : row[1]})
except:
continue
Issue #1:
My first table runs perfectly. But the second table doesn't have a type field, so It runs into this postgresql error: psycopg2.errors.UndefinedColumn: column "type" does not exist Then, the code runs into the except that tells the code to continue. The problem is that after the first time running into the except, all my others table queries run into the except too, ignoring that they have type and stylename fields. How can I properly ignore this error message and continue to the next iteration? Also, what is the best way to output SQL errors using try/except?
Issue #2:
Once fixed the above issue I would like to know how can I prevent this: If a field doesn't exists and run into a SQL error, it will ignore the other field (if it exists) because it will run into the except. For example: The script is querying table X, that is a table without type field, when it runs into the except It will ignore its stylenames data.
Improvement:
I've tried many ways to parameterize the sql query. I know that way that I used is very prune to SQL injections, but the correct ways just don't work.
Tried these methods and others but couldn't run them successfully.
APP
My next step is creating a Flask app for this code. So, if you have a solution that uses Flask it will be welcome.
Code updated at July 12th. But still with the same issues:
for index, row in df_dbtables.iterrows():
try:
schema_table = row['schema'] + "." + row['table']
cur = con.cursor()
cur.execute("SELECT type,stylename FROM %s" % schema_table)
for r in rs:
data.append({"Type" : r[0], "Stylename" : r[1]})
#except psycopg2.OperationalError: traceback.print_exc()
except: continue

Python multiple MySQL Inserts

I'm trying to do multiple inserts on a MySQL db like this:
p = 1
orglist = buildjson(buildorgs(p, p))
while (orglist is not None):
for org in orglist:
sid = org['sid']
try:
sql = "INSERT INTO `Orgs` (`sid`) VALUES (\"{0}\");".format(sid)
cursor.execute(sql)
print("Added {0}".format(org['title']))
except Exception as bug:
print(bug)
conn.commit()
conn.close()
p += 1
orglist = buildjson(buildorgs(p, p))
However I keep getting a bunch of 2055: Lost connection to MySQL server at 'localhost:3306', system error: 9 Bad file descriptor
How can I correctly do multiple inserts at once so I don't have to commit after every single insert. Also, can i only do conn.close()after the while loop or is it better to keep it where it is?
This may be related to this question and/or this question. A couple ideas from the answers to those questions which you might try:
Try closing the cursor before closing the connection (cursor.close() before conn.close(); I don't know if you should close the cursor before or after conn.commit(), so try both.)
If you're using the Oracle MySQL connector, try using PyMySQL instead; several people said that that fixed this problem for them.

2 Try / Except statement doesn't work in normal order but works when codes are flipped

Good day guys, I hope to get a little advice on this. I can't seem to get this 2 TRY/EXCEPT statement to run in the order I want them to. They however, work great if I put STEP2 first then STEP1.
This current code prints out only.
Transferred: x rows.
If flipped, they print both.
Unfetched: x rows.
Transferred: x rows.
I tried:
Assigning individual cur.close() and db.commit() as per the examples
here, didn't work either. (Side question: Should I be closing/committing
them individually nevertheless? Is that a general good practice or
context-based?)
Using a cur.rowcount method for Step 2 as well as I thought maybe the
problem was on the SQL side but, the problem still persists.
Did a search on SO and couldn't find any similar case.
Running on Python 2.7. Code:
import MySQLdb
import os
#Initiate connection to database.
db = MySQLdb.connect(host="localhost",user="AAA",passwd="LETMEINYO",db="sandbox")
cur = db.cursor()
#Declare variables.
viewvalue = "1"
mainreplace = (
"INSERT INTO datalog "
"SELECT * FROM cachelog WHERE viewcount = %s; "
"DELETE FROM cachelog WHERE viewcount = %s; "
% (viewvalue, viewvalue)
)
balance = (
"SELECT COUNT(*) FROM cachelog "
"WHERE viewcount > 1"
)
#STEP 1: Copy and delete old data then print results.
try:
cur.execute(mainreplace)
transferred = cur.rowcount
print "Transferred: %s rows." %(transferred)
except:
pass
#STEP 2: Check for unfetched data and print results.
try:
cur.execute(balance)
unfetched = cur.fetchone()
print "Unfetched: %s rows." % (unfetched)
except:
pass
#Confirm and close connection.
cur.close()
db.commit()
db.close()
Pardon any of my un-Pythonic ways as I am still very much a beginner. Any advice is much appreciated, thank you!
You have two blaring un-Pythonic bits of code: the use of a bare except: without saying which exception you want to catch, and using pass in that except block so the exception is entirely ignored!
The problem with code like that is that if something goes wrong, you'll never see the error message, so you can't find out what's wrong.
The problem is perhaps that your "mainreplace" query deletes everything from the "cachelog" table, so the "balance" query after it has no rows, so fetchone() fails, throws an exception and the line after it is never executed. Or maybe something completely different, hard to tell from here.
If you didn't have that try/except there, you would have had a nice error message and you wouldn't have had to ask this question.

Error checking with MySQLdb

I'm having trouble finding any information on how to do error checking on MySQLdb. I have been trying to do a simple update command for a MySQL database and it simply is not working. No matter how I change the terms, or the type of variables I submit to it.
Here are some of my (commented out) attempts:
timeid = twitseek['max_id']
#timeup = "UPDATE `timeid` set `timestamp`='" + str(timeid) + "';"
#print timeup
#c.execute(timeup)
#timeup = "UPDATE timeid SET timestamp=\"" + str(timeid) + "\"";
#timeup = "UPDATE timeid set timestamp = '500';"
timeup = 500
c.execute("""UPDATE timeid SET timestamp = %d;""", timeup)
#c.execute(timeup)
All I want to do is upload the value of timeid to the timestamp column's first value (or any value) in the table timeid.
Nothing I do seems to work and I've been sitting here for literally hours trying countless iterations.
You seem to be missing an obligatory call to .commit() on your connection object to commit your change.
# Your cursor is c
# We don't see your connection object, but assuming it is conn...
c.execute("""UPDATE timeid SET timestamp = %d;""", timeup)
conn.commit()
The above method will produce valid SQL, but you don't get the security benefit of prepared statements this way. The proper method to pass in parameters is to use %s, and pass in a tuple of parameters:
c.execute("UPDATE timeid SET timestamp = %s;", (timeup,))
conn.commit()
From the MySQLdb FAQ:
Starting with 1.2.0, MySQLdb disables autocommit by default, as
required by the DB-API standard (PEP-249). If you are using InnoDB
tables or some other type of transactional table type, you'll need to
do connection.commit() before closing the connection, or else none of
your changes will be written to the database.
Conversely, you can also use connection.rollback() to throw away any
changes you've made since the last commit.
As far as error checking goes, a failed connection or a syntactically invalid query will throw an exception. So you would want to wrap it in a try/except as is common in Python.

Categories