UnicodeError with SqlAlchemy + Firebird + FDB - python

I'm triying to display results from a firebird 3.x database, but get:
File
"/...../Envs/pos/lib/python3.6/site-packages/fdb/fbcore.py",
line 479, in b2u
return st.decode(charset) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 9: invalid continuation byte
Despite I set utf-8 everywhere:
# -- coding: UTF-8 --
import os
os.environ["PYTHONIOENCODING"] = "utf8"
from sqlalchemy import *
SERVIDOR = "localhost"
BASEDATOS_1 = "db.fdb"
PARAMS = dict(
user="SYSDBA",
pwd="masterkey",
host="localhost",
port=3050,
path=BASEDATOS_1,
charset='utf-8'
)
firebird = create_engine("firebird+fdb://%(user)s:%(pwd)s#%(host)s:%(port)d/%(path)s?charset=%(charset)s" % PARAMS, encoding=PARAMS['charset'])
def select(eng, sql):
with eng.connect() as con:
return eng.execute(sql)
for row in select(firebird, "SELECT * from clientes"):
print(row)

I had the same problem.
In my situation the database was not in UTF-8.
After setting the correct charset in the connection string it worked: ?charset=ISO8859_1

I would try to use the module unidecode.
Your script is crashing when it tries to convert, so this module can help you. As they says in the module documentation:
The module exports a single function that takes an Unicode object
(Python 2.x) or string (Python 3.x) and returns a string (that can be
encoded to ASCII bytes in Python 3.x)
First you download it using pip and then try this:
import unidecode
...
if type(line) is unicode:
line = unidecode.unidecode(line)
I hope it solves your problem.

Related

Python Encoding NLTK - 'charmap' codec can't encode character

import pypyodbc
from pypyodbc import *
import nltk
from nltk import *
import csv
import sys
import codecs
import re
#connect to the database
conn = pypyodbc.connect('Driver={Microsoft Access Driver (*.Mdb)};\
DBQ=C:\\TextData.mdb')
#create a cursor to control the datbase with
cur = conn.cursor()
cur.execute('''SELECT Text FROM MessageCreationDate WHERE Tags LIKE 'GHS - %'; ''')
TextSet = cur.fetchall()
ghsWordList = []
TextWords = list(TextSet)
for row in TextWords :
message = re.split('\W+',str(row))
for eachword in message :
if eachword.isalpha() :
ghsWordList.append(eachword.lower())
print(ghsWordList)
When I run this code, it's giving me an error:
'charmap' codec can't encode character '\u0161' in position 2742: character maps to <undefined>
I've looked at a number of other answers on here to similar questions, and googled the hell out of it; however I am not well versed enough in Python nor Character Encoding to know where I need to used the Codecs module to change the character set being used to present/append/create the list?
Could someone not only help me with the code but also point me in the direct of some good reading materials for understanding this sort of thing?
If you are using Python 2.x, add the following lines to your code:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
Note: if you are using Python 3.x, reload is not a built-in, it is imp.relaod(), so an import needs to be added for my solution to work. I don't develop in 3.x, so my suggestion is:
from imp import reload
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
Place this ahead of all your other imports.

Python output replaces non ASCII characters with �

I am using Python 2.7 to read data from a MySQL table. In MySQL the name looks like this:
Garasa, Ángel.
But when I print it in Python the output is
Garasa, �ngel
The character set name in MySQL is utf8.
This is my Python code:
# coding: utf-8
import MySQLdb
connection = MySQLdb.connect
(host="localhost",user="root",passwd="root",db="jmdb")
cursor = connection.cursor ()
cursor.execute ("select * from actors where actorid=672462;")
data = cursor.fetchall ()
for row in data:
print "IMDB Name=",row[4]
wiki=("".join(row[4]))
print wiki
I have tried decoding it, but get error such as:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc1 in position 8:
invalid start byte
I have read about decoding and UTF-8 but couldn't find a solution.
Get the Mysql driver to return Unicode strings instead. This means that you don't have to deal with decoding in your code.
Simply set use_unicode=True in the connection parameters. If the table has been set with a specific encoding then set the charset attribute accordingly.
I think the right character mapping in your case is cp1252 :
>>> s = 'Garasa, Ángel.'
>>> s.decode('utf-8')
Traceback (most recent call last):
File "<pyshell#63>", line 1, in <module>
s.decode('utf-8')
File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc1 in position 8: invalid start byte
>>> s.decode('cp1252')
u'Garasa, \xc1ngel.'
>>>
>>> print s.decode('cp1252')
Garasa, Ángel.
EDIT: It could also be possible that it is latin-1 as well:
>>> s.decode('latin-1')
u'Garasa, \xc1ngel.'
>>> print s.decode('latin-1')
Garasa, Ángel.
As cp1252 and latin-1 code pages intersects for all codes except the range 128 to 159.
Quoting from this source (latin-1):
The Windows-1252 codepage coincides with ISO-8859-1 for all codes
except the range 128 to 159 (hex 80 to 9F), where the little-used C1
controls are replaced with additional characters including all the
missing characters provided by ISO-8859-15
And this one (cp1252):
This character encoding is a superset of ISO 8859-1, but differs from
the IANA's ISO-8859-1 by using displayable characters rather than
control characters in the 80 to 9F (hex) range.

Using postgres thru ODBC in python 2.7

I have installed Postgres.app and started it.
I have pip installed pypyodbc
I have copied the hello world lines from the Pypyodbc docs, and received the
error below. any ideas what the issue might be?
Here is my code
from __future__ import print_function
import pypyodbc
import datetime
conn = pypyodbc.connect("DRIVER={psqlOBDC};SERVER=localhost")
And I receive this error:
File "/ob/pkg/python/dan27/lib/python2.7/site-packages/pypyodbc.py", line 975, in ctrl_err
err_list.append((from_buffer_u(state), from_buffer_u(Message), NativeError.value))
File "/ob/pkg/python/dan27/lib/python2.7/site-packages/pypyodbc.py", line 482, in UCS_dec
uchar = buffer.raw[i:i + ucs_length].decode(odbc_decoding)
File "/ob/pkg/python/dan27/lib/python2.7/encodings/utf_32.py", line 11, in decode
return codecs.utf_32_decode(input, errors, True)
UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-1: truncated data
what am I doing wrong?
Do I need to somehow initialize the DB / tables first? it is a weird error if that is the issue.
I copied your code on my Fedora machine and it started when I changed connect string to something like:
conn = pypyodbc.connect("Driver={PostgreSQL};Server=IP address;Port=5432;Database=myDataBase;Uid=myUsername;Pwd=myPassword;")
You can find more connect strings for PostgreSQL and ODBC at: https://connectionstrings.com/postgresql-odbc-driver-psqlodbc/

Insert record of utf-8 character (Chinese, Arabic, Japanese.. etc) into GAE datastore programatically with python

I just want to build simple UI translation built in GAE (using python SDK).
def add_translation(self, pid=None):
trans = Translation()
trans.tlang = db.Key("agtwaW1kZXNpZ25lcnITCxILQXBwTGFuZ3VhZ2UY8aIEDA")
trans.ttype = "UI"
trans.transid = "ui-about"
trans.content = "关于我们"
trans.put()
this is resulting encoding error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128)
How to encode the correct insert content with unicode(utf-8) character?
using the u notation:
>>> s=u"关于我们"
>>> print s
关于我们
Or explicitly, stating the encoding:
>>> s=unicode('אדם מתן', 'utf8')
>>> print s
אדם מתן
Read more at the Unicode HOWTO page in the python documentation site.

Why am I getting this UnicodeEncodeError when I am inserting into the MySQL database?

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 2: ordinal not in range(128)
I changed my database default to be utf-8, and not "latin"....but this error still occurs. why?
This is in my.cnf. Am I doing this wrong? I just want EVERYTHING TO BE UTF-8.
init_connect='SET collation_connection = utf8_general_ci'
init_connect='SET NAMES utf8'
default-character-set=utf8
character-set-server = utf8
collation-server = utf8_general_ci
default-character-set=utf8
MySQLdb.connect(read_default_*) options won't set the character set from default-character-set. You will need to set this explicitly:
MySQLdb.connect(..., charset='utf8')
Or the equivalent setting in your django databases settings.
If you get an exception from Python then it's nothing to do with MySQL -- the error happens before the expression is sent to MySQL. I would presume that the MySQLdb driver doesn't handle unicode.
If you are dealing with the raw MySQLdb interface this will be somewhat annoying (database wrappers like SQLAlchemy will handle this stuff for you), but you might want to create a function like this:
def exec_sql(conn_or_cursor, sql, *args, **kw):
if hasattr(conn_or_cursor):
cursor = conn_or_cursor.cursor()
else:
cursor = conn_or_cursor
cursor.execute(_convert_utf8(sql), *(_convert_utf8(a) for a in args),
**dict((n, _convert_utf8(v)) for n, v in kw.iteritems()))
return cursor
def _convert_utf8(value):
if isinstance(value, unicode):
return value.encode('utf8')
else:
return value

Categories