Pymssql, How to use it to read unicode data from MSSQL2008

Pymssql, How to use it to read unicode data from MSSQL2008 - python

I've used pymssql-1.0.2 and freetds-0.82.7 on ubuntu-10.10.
Also, I have a mssql2008 server on windows-7.
I can connect with mssql from ubuntu using pymssql and freetds.
But I can't get unicode data from mssql database. Database collation is Cyrillic_General_CI_AS.
My freetds.conf file looks like this:
[mssql2008]
host=10.0.0.34
port=1433
tds version=7.0
My code looks like this:
conn = pymssql.connect(host=10.0.0.34\mssql2008, user=***, password=***, database=eoffice, as_dict=true, charset='iso-8859-1')
crms = conn.cursor()
crms.execute('SELECT cc_Name FROM tblHR_CodeClass')
for row in crms.fetchall():
raise u"Succeeded! Test data: " + row['cc_Name']
break
Expected result is: "Өмнөговь аймаг"
Actual result is: "ªìíºãîâü àéìàã"
When I use 'UTF-8' charset, the fetchall() call throws an error means the utf8 can't read the data which is out of range of code page.
How to get unicode data as it stored on mssql database?
Please give your hand!
Regards,
Orgil

Is it really Unicode data? I.e., is the cc_Name column varchar or nvarchar? It sounds like it's varchar--in which case, try using cp1251 or windows-1251 as the charset instead of iso-8859-1.

Related

Python psycopg2 not in utf-8

I use Python to connect to my postgresql data base like this:
conn=psycopg2.connect(database="fedour", user="fedpur", password="***", host="127.0.0.1", port="5432")
No problem for that.
But when I make a query and I want to print the cursor I have something like this:
"Fran\xc3\xa7ois" instead of "François" and it cause problem when I want to create a XML document with this.
I thkink is come from my encodage, but I found any solution. I try to encode('utf-8') but doesn't work.
I have also seen something like this, but its only for mySQL
MySQLdb.connect(charset='utf8', init_command='SET NAMES UTF8')
Can you help me ? Thanks

Make sure you're using the right encodind by running: print conn.encoding, and if you need, you can set the right encoding by conn.set_client_encoding('UNICODE'), or conn.set_client_encoding('UTF8').

Does the value get inserted correctly when query is executed via command prompt?
If yes, then the problem is with your cursor execution.Try cur.execute(u"querystring");(The u indicates utf encoding)
If no, then you need to set the encoding type of postgres to consider utf-8. You can refer character encoding in Postgres to see how to set default character encoding for a database, and on-the-fly conversion from one encoding to another.

Problems inserting utf8 data to PostgreSQL with Python

I am reading scandinavian language websites with a web-crawler - and wish to insert them into my PostgreSQL database.
Originally I tried to encode my PSQL DB as utf-8, then manually tried to insert the characters that would be of a problem like this:
Insert into name (surname) VALUES ('Børre');
This was done in the windows PSQL shell.
This gave me the following error: ERROR: invalid byte sequence for encoding "UTF8": 0x9b. So after doing some googling I changed the client encoding to latin1. Now that statement was successfull. The server encoding is still utf8.
When I do the same insert through my python script the name appears in my database as B°rre. If I change back the encoding of client to utf8, I also get entries with wrong special characters.
My python script is utf8 encoded, but prints the name correct.
Insert statement:
con = psycopg2.connect(*database details*)
print("Opened database successfully")
cur = con.cursor()
#INSERT NAME
query = "INSERT INTO name (surname) VALUES (%s) RETURNING id"
data = ('børre')
cur.execute(query,data)
As previously stated, print(personObject.surname) gives 'Børre'
If I try the following:
query = "INSERT INTO name (surname) VALUES (%s) RETURNING id"
data = ('børre'.encode('utf-8'))
cur.execute(query,data)
I get the following in my database:
\x62c383c2b8727265

psycopg2 doesn't understand postgresql queries it just converts the arguments given into their postgresql representation
if you give it an array of bytes to will convert it to a postgresql BYTEA literal,
data = ('børre'.encode('utf-8')) gets you a bytes.
so, don't do that, use a string.
The code fragment you have at the top should work.
In the error I see ø encoded as hex c383c2b8, that hex translates to UTF8 as two charactersÃ and ¸. It looks to me like python thinks your script is not wtitten is UTF8, but instead some other codepage.

using client_encoding key words
eg: conn=psycopg2.connect("dbname='foo' user='dbuser' password='mypass' client_encoding='utf8'")

MySQLdb can't initialize character set utf-8 error

I am trying to insert some Arabic word into the arabic_word column of my hanswehr2 database Maria DB using the MySQLdb driver.
I was getting a latin-1 encode error. But after reading around, I found out that the MySQLdb driver was defaulted to latin-1 and I had to explicitly set utf-8 as my charset of choice at the mariadb.connect() function. Sauce.
The entire database is set to utf-8.
Code:
def insert_into_db(arabic_word, definition):
try:
conn = mariadb.connect('localhost', 'root', 'xyz1234passwd', 'hans_wehr', charset='utf-8', use_unicode=True)
conn.autocommit(True)
cur = conn.cursor()
cur.execute("INSERT INTO hanswehr2 (arabic_word , definition) VALUES (%s,%s)", (arabic_word, definition,))
except mariadb.Error, e:
print e
sys.exit(1)
However now I get the following error:
/usr/bin/python2.7 /home/heisenberg/hans_wehr/main.py
Total lines 87672
(2019, "Can't initialize character set utf-8 (path: /usr/share/mysql/charsets/)")
Process finished with exit code 1
I have specified the Python MySQL driver to use the utf-8 character however it seems to ignore this.
Any inputs would be highly appreciated.

The charset alias for UTF-8 in MySQL is utf8 (no hyphen).
See https://dev.mysql.com/doc/refman/5.5/en/charset-charsets.html for available charsets.
Note, if you need to use non-BMP Unicode points, such as emojis, use utf8mb4 for the connection charset and the varchar type.

There is a thing called collations that helps encode/decode characters for specific languages.
https://softwareengineering.stackexchange.com/questions/95048/what-is-the-difference-between-collation-and-character-set
I think u need to specify it when creating your database table or in the connection string. refer this:
store arabic in SQL database
More on python mysql connection :
https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlconnection-set-charset-collation.html

How to solve this double encoding?

I'm developing a website using python to preprocess request and a MySQL database to store information.
All my tables are utf8 and I also use utf8 as Content-type.
I have this code to establish connection to the db:
database_connection = MySQLdb.connect(host = database_host, user = database_username, passwd = database_password, db = database_name, use_unicode = True)
cursor = database_connection.cursor()
cursor.execute("""SET NAMES utf8;""");
cursor.execute("""SET CHARACTER SET utf8;""");
cursor.execute("""SET character_set_connection=utf8;""");
Running a simple test on my GoDaddy hosting printing the results of a simple SELECT query like this:
print results.encode("utf-8")
Shows a double encoded string. (So all non-ascii characters are transformed into two different specials). But if I leave the encode statement, it gives an encoding error for each non-ascii letter.

It sounds as though results contains a Unicode string that was incorrectly decoded from a byte string coming from the database. I.e. when you read the data from the database, it decoded the byte string as Latin-1 rather than the UTF-8 it really is.
So if you fix the decoding of the database contents, then you should be in business.

I use something like this which I found on the internet during one of my own encoding hunts. You can keep on chaining encoding styles to find a fit.
Also, as others said, try fixing the source first. This hack is just to figure out what encoding is being actually returned. Hope this helps.
#this method is a simple recursive hack that is going to find a compatible encoding for the problematic field
#does not guarantee successful encoding match. If no match is found, an error code will be returned: ENC_ERR
def findencoding(field, level):
print "level: " + str(level)
try:
if(level == 0):
field = field.encode('cp1252')
elif(level == 1):
field = field.encode('cp1254')
else:
return "ENC_ERR"
except Exception:
field = findencoding(field,level+1)
return field

Getting error when INSERT into MySQL

_mysql_exceptions.Warning: Incorrect string value: '\xE7\xB9\x81\xE9\xAB\x94...' for column 'html' at row 1
def getSource(theurl, moved = 0):
if moved == 1:
theurl = urllib2.urlopen(theurl).geturl()
urlReq = urllib2.Request(theurl)
urlReq.add_header('User-Agent',random.choice(agents))
urlResponse = urllib2.urlopen(urlReq)
htmlSource = urlResponse.read()
return htmlSource
new_u = Url(source_url = source_url, source_url_short = source_url_short, source_url_hash = source_url_hash, html = htmlSource)
new_u.save()
Why is this happening?
I am basically downloading URL of a page...and then saving it to a database using Django.
It only happens sometimes....and sometimes it works fine.
Edit: it seems like I have to set the database to UTF-8? What is the command to do that?

You basically need to ensure proper a string encoding. E.g. the string you provide to django is not UTF-8 encoded and therefore some characters can't be resolved.
Some helpful advice on how to find the encoding of the requested page can be found here: urllib2 read to Unicode

There are 2 ways to go if you want to alter the character set in MySQL.
First is the default of the database, see MySQL Alter database,
and the second is per-table: MySQL Alter Table.
The database gives the default charset for, I believe, new tables. This
can be overridden on a per-table basis, which you need to do since you
already have tables. "utf8" is a supported character set.
Also have a look at Blog about UTF8 with django and MySQL.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pymssql, How to use it to read unicode data from MSSQL2008 - python

Is it really Unicode data? I.e., is the cc_Name column varchar or nvarchar? It sounds like it's varchar--in which case, try using cp1251 or windows-1251 as the charset instead of iso-8859-1.

Related

Python psycopg2 not in utf-8

Problems inserting utf8 data to PostgreSQL with Python

MySQLdb can't initialize character set utf-8 error

How to solve this double encoding?

Getting error when INSERT into MySQL

Categories

Resources