Getting error when INSERT into MySQL - python

_mysql_exceptions.Warning: Incorrect string value: '\xE7\xB9\x81\xE9\xAB\x94...' for column 'html' at row 1
def getSource(theurl, moved = 0):
if moved == 1:
theurl = urllib2.urlopen(theurl).geturl()
urlReq = urllib2.Request(theurl)
urlReq.add_header('User-Agent',random.choice(agents))
urlResponse = urllib2.urlopen(urlReq)
htmlSource = urlResponse.read()
return htmlSource
new_u = Url(source_url = source_url, source_url_short = source_url_short, source_url_hash = source_url_hash, html = htmlSource)
new_u.save()
Why is this happening?
I am basically downloading URL of a page...and then saving it to a database using Django.
It only happens sometimes....and sometimes it works fine.
Edit: it seems like I have to set the database to UTF-8? What is the command to do that?

You basically need to ensure proper a string encoding. E.g. the string you provide to django is not UTF-8 encoded and therefore some characters can't be resolved.
Some helpful advice on how to find the encoding of the requested page can be found here: urllib2 read to Unicode

There are 2 ways to go if you want to alter the character set in MySQL.
First is the default of the database, see MySQL Alter database,
and the second is per-table: MySQL Alter Table.
The database gives the default charset for, I believe, new tables. This
can be overridden on a per-table basis, which you need to do since you
already have tables. "utf8" is a supported character set.
Also have a look at Blog about UTF8 with django and MySQL.

Related

Python error - Unicode/Ascii problems with value pulled out of MySql database

This has been asked a million times but every single thing I try hasn't worked and all are for slightly different issues. I'm losing my mind over it!
I have a Python Script which pulls data from a MySql database - all works well.
Database Information:
I believe the information in the database is correct. I am trying to parse multiple records into word documents - that is why I am not too bothered about accuracy - even if the bad characters are removed - that is fine.
The Charset of the database is UTF-8 and the field I am working with is VarChar
I am using mysql.connector python module to connect
However, I am getting errors and I've realised it's because of values with unicode in, such as this:
The value of this item is "DOMAINoardroom".
I have tried:
text = order[11].encode().decode("utf-8")
text = order[11].encode("ascii", errors="ignore").decode()
text = str(order[11].encode("utf8", errors="ignore"))
The latter does work however it outputs it as b'DOMAIN\x08oardroom' due to it being bytes
I can get it to accept the text by print(text) to the screen. However when I try to output it to a word document (using the docx module), it produces an error:
table = document.add_table(rows=total_orders*2, cols=1)
row = table.rows[0].cells
row[0].text = row_text
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
I am not particularly fussy over how it handles the unicode, e.g. remove it if needed, but I just need it to parse without error.
Any thoughts or advice here?

Inserting HTML into mysql Database showing Type Error [duplicate]

This question already has an answer here:
How do I use SQL parameters with python?
(1 answer)
Closed 5 years ago.
I am developing a Webapp using Flask. At some point, I have to insert a certain HTML script into a MySQL database:
<h3>Welcome!</h3>
<p>Some text</p>
When I insert it into the database (when it is returned by flask's 'render_template' function):
\n\n<h3>Welcome!</h3>\n\n\n\n<p>Some text</p>
I get the following error:
TypeError: ProgrammingError(1064, "You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '\\\\\\\\n\\\\n<h3>Welcome!</h3>\\\\n\\\\n\\\\n\\\\n<p>Some text' at line 1") is not JSON serializable
I first don't understand what 'JSON serializable' means, and I want to know what I am doing wrong. I have already tried taking off the linebreaks (\n) but it still shows the same error. Why? I am thankful for any answer you can provide.
A solutions that are commonly used when writing HTML to databases to:
1) Simply convert the database field type to blob so it will accept binary data and then encode HTML to binary (example below).
2) Leave the database field as a text field, but base64 encode the data so that the database will not complain about illegal characters.
# Example for case 1.
# Note that you need to make sure the database field is a blob:
html = '<h3>Welcome!</h3>\n<p>Some text</p>'
bin = html.encode()
dbhandle.execute('INSERT INTO script (various fields, binhtml) VALUES (..., bin)')
# When you read back the data, remember to decode.
dbhandle.execute('SELECT binhtml FROM script WHERE...')
resultset = dbhandle.fetchall()
htmlresult = resultset.decode()
# Example for case 2.
# Database field can be a text/varchar type because base64 ensures it will work.
import base64
html = '<h3>Welcome!</h3>\n<p>Some text</p>'
# Convert HTML into base64 encoded *text* so it can be stored in text field.
encoded = base64.b64decode(html.encode()).decode()
# Do the database INSERT.
...
# Retrieve the stored text from the database and convert back to HTML
dbhandle.execute('SELECT encodedhtml FROM script WHERE...')
resultset = dbhandle.fetchall()
htmlresult = base64.b64decode(resultset).decode()

Python MySQL connector returns bytearray instead of regular string value

I am loading data from one table into pandas and then inserting that data into new table. However, instead of normal string value I am seeing bytearray.
bytearray(b'TM16B0I8') it should be TM16B0I8
What am I doing wrong here?
My code:
engine_str = 'mysql+mysqlconnector://user:pass#localhost/db'
engine = sqlalchemy.create_engine(engine_str, echo=False, encoding='utf-8')
connection = engine.connect()
th_df = pd.read_sql('select ticket_id, history_date', con=connection)
for row in th_df.to_dict(orient="records"):
var_ticket_id = row['ticket_id']
var_history_date = row['history_date']
query = 'INSERT INTO new_table(ticket_id, history_date)....'
For some reason the Python MySql connector only returns bytearrys, (more info in (How return str from mysql using mysql.connector?) but you can decode them into unicode strings with
var_ticket_id = row['ticket_id'].decode()
var_history_date = row['history_date'].decode()
Make sure you are using the right collation, and encoding. I happen to use UTF8MB4_BIN for one of my website db tables. Changed it to utf8mb4_general_ci, and it did the trick.
Producing a bytearray is now the expected behaviour.
It changed with mysql-connector-python 8.0.24 (2021-04-20). According to the v8.0.24 release notes, "Binary columns were returned as strings instead of 'bytes' or 'bytearray'" behaviour was a bug that was fixed in that release.
So producing a Python binaryarray is the correct behaviour, if the database column is a binary type (e.g. binary or varbinary). Previously, it produced a Python string, but now it produces a binaryarray.
So either change the data type in the database to a non-binary data type, or convert the binaryarray to a string in your code. If the column is nullable, you'll have to check for that first; since attempting to invoke decode() method on None would produce an error. You'll also have to be sure the bytes represent a valid string, in the character encoding being used for the decoding/conversion.
Much easier...
How to return str from MySQL using mysql.connector?
Adding mysql-connector-python==8.0.17 to requirements.txt resolved this issue for me
"pip install mysql-connector-python" from terminal

pickle unicode strings with non-ascii caracters to mysql in django

Consider the I have an dictionary that I want to store in db using python's pickle.
My question is: which django models' field should I use?
So far I've been using a CharField, but there seems to be an error:
I pickle a u'\xe9' (i.e. 'É'), and I get:
Incorrect string value: '\xE1, ist...' for column 'edition' at row 1
(the ,"ist..." was because I have more text after the 'É').
I'm using
data = dict();
data['foo'] = input_that_has_the_caracter
to_save_in_db = cPickle.dumps(data)
Should I use a binary field and pickle with a protocol that uses binary? Because I have to change the db in order to do that, so it is better to be sure first...
You should check if you are using a proper encoding for your table AND column in your database backend (I'm assuming MySQL since your error message seems to be from it). In MySQL columns can have different encoding than the table. See if it's UTF-8.

How to solve this double encoding?

I'm developing a website using python to preprocess request and a MySQL database to store information.
All my tables are utf8 and I also use utf8 as Content-type.
I have this code to establish connection to the db:
database_connection = MySQLdb.connect(host = database_host, user = database_username, passwd = database_password, db = database_name, use_unicode = True)
cursor = database_connection.cursor()
cursor.execute("""SET NAMES utf8;""");
cursor.execute("""SET CHARACTER SET utf8;""");
cursor.execute("""SET character_set_connection=utf8;""");
Running a simple test on my GoDaddy hosting printing the results of a simple SELECT query like this:
print results.encode("utf-8")
Shows a double encoded string. (So all non-ascii characters are transformed into two different specials). But if I leave the encode statement, it gives an encoding error for each non-ascii letter.
It sounds as though results contains a Unicode string that was incorrectly decoded from a byte string coming from the database. I.e. when you read the data from the database, it decoded the byte string as Latin-1 rather than the UTF-8 it really is.
So if you fix the decoding of the database contents, then you should be in business.
I use something like this which I found on the internet during one of my own encoding hunts. You can keep on chaining encoding styles to find a fit.
Also, as others said, try fixing the source first. This hack is just to figure out what encoding is being actually returned. Hope this helps.
#this method is a simple recursive hack that is going to find a compatible encoding for the problematic field
#does not guarantee successful encoding match. If no match is found, an error code will be returned: ENC_ERR
def findencoding(field, level):
print "level: " + str(level)
try:
if(level == 0):
field = field.encode('cp1252')
elif(level == 1):
field = field.encode('cp1254')
else:
return "ENC_ERR"
except Exception:
field = findencoding(field,level+1)
return field

Categories