Python 2.7 - Pandas UnicodeEncodeError with data from pyodbc - python

I'm trying to pull data from SQL Server using pyodbc and load it into a dataframe, then export it to an HTML file, except I keep receiving the following Unicode error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 15500: ordinal not in range(128)
Here is my current setup (encoding instructions per docs):
cnxn = pyodbc.connect('DSN=Planning;UID=USER;PWD=PASSWORD;')
cnxn.setdecoding(pyodbc.SQL_CHAR, encoding='cp1252', to=unicode)
cnxn.setdecoding(pyodbc.SQL_WCHAR, encoding='cp1252', to=unicode)
cnxn.setdecoding(pyodbc.SQL_WMETADATA, encoding='cp1252', to=unicode)
cnxn.setencoding(str, encoding='utf-8')
cnxn.setencoding(unicode, encoding='utf-8')
cursor = cnxn.cursor()
with open('Initial Dataset.sql') as f:
initial_query = f.read()
cursor.execute(initial_query)
columns = [column[0] for column in cursor.description]
initial_data = cursor.fetchall()
i_df = pd.DataFrame.from_records(initial_data, columns=columns)
i_df.to_html('initial.html')
An odd but useful point to note is that when I try to export a CSV:
i_df.to_csv('initial.csv')
I get the same error, however when I add:
i_df.to_csv('initial.csv', encoding='utf-8')
It works. Can someone help me understand this encoding issue?
Side note: I've also tried using a sqlalchemy connection and pandas.read_sql() and the same error persists.

The second answer on this question seems to be an acceptable workaround, except for Python 2.x users, you must use io, so:
import io
html = df.to_html()
with io.open("mypage.html", "w", encoding="utf-8") as file:
file.write(html)
It was not included in the latest release, but it looks like the next version of pandas will have an encoding option for to_html(), see docs (line 2228).

Related

Error importing data to a model in Django

I´m trying to import data from a csv file to a Django model. I´m using the manage.py shell for it with the following code:
>>> import csv
>>> import os
>>> path = "C:\\Users\Lia Love\Downloads"
>>> os.chdir(path)
>>> from catalog.models import ProductosBase
>>> with open('FarmaciasGob.csv') as csvfile:
... reader = csv.DictReader(csvfile)
... for row in reader:
... p = Country(country=row['Country'], continent=row['Continent'])
... p.save()
...
>>>
>>> exit()
I get the following error message at a given point of the dataset:
UnicodeDecodeError: "charmap" codec can´t decode byte 0x81 in position 7823: character maps to (undefined)
For what I could find, it seems to be a problem with the "latin" encoding of the csv file.
Inspecting the csv, I don´t see nothing special about the specific row where it get´s the error. I´m able to import about 2200 rows before this one, all with latin characters.
Any clues?
Assuming you are in python3, this is an issue of the character encoding of your file. Most likely, the encoding is 'utf-8', but it could also be 'utf-16', 'utf-16le', 'cp1252', or 'cp437', all of which are also commonly used. In python3, you can specify the encoding of the file on the open:
with open('FarmaciasGob.csv', encoding='utf-8') as csvfile:

Ignore encoding errors with csv writer and MySQLdb

I've been trying to get a full data from a MySQL table to csv with python and I've done it well but now that table has a column "description" and it's a piece of hell now coz there're encoding issues everywhere.
After trying tons and tons of things readed from other posts now i give up with those characters and i want to skip them directly and avoid those errors.
test.py:
import MySQLdb, csv, codecs
dbConn = MySQLdb.connect(dbServer,dbUser,dbPass,dbName,charset='utf8')
cur = dbConn.cursor()
def createFile(self):
SQLview = 'SELECT fruit_type, fruit_qty, fruit_price, fruit_description FROM fruits'
cur.execute(SQLview)
with codecs.open('fruits.csv','wb',encoding='utf8',errors='ignore') as csv_file:
csv_writer = csv.writer(csv_file)
csv_writer.writerows(cur)
Still getting that error from the function:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xd1' in position 27: ordinal not in range(128)
Any idea if i can skip that error and still writting the rest of the data from the DB Query?
PD: The function crash line is:
csv_writer.writerows(cur)
Don't know if that's usefull info for someone
Finally solved
changed:
import csv
to:
import unicodecsv as csv
changed:
csv_writer = csv.writer(csv_file)
to:
csv_writer = csv.writer(csv_file,encoding='utf-8')

UnicodeEncode error in python when uploading CSV file to postgres DB

I keep getting a UnicodeEncode error when trying to upload a CSV file into a Postgres DB using Python2.7
First I create the file in CSV format. The file has non latin characters that's why I download it and encoding the second column which it has strings:
writer = csv.writer(response, dialect='excel')
writer.writerow(tuple(corresponding_data[btn]["columns"].split(',')))
for row in rows:
field_1 = row[0]
field_2 = row[1].encode(encoding='UTF-8')
fields = [field_1, field_2]
writer.writerows([fields])
The file is created without errors. When I open it in Excel I see that there are some values like: Dajï¿ï¿
In order to upload the file and save it in a table in Postgres I use the python module called: CSVKit.
This is what I do:
import codecs
f = codecs.open(absolute_base_file, 'rb', encoding='utf-8')
delimiter = ","
no_header_row = False
try:
csv_table = table.Table.from_csv(f, name=table_name_temp, no_header_row=no_header_row, delimiter=delimiter)
Although I specify the encoding I keep getting an error:
<type 'exceptions.UnicodeEncodeError'>
I don't know what else to try here.
EDITED
After checking the values in the DB I see they dont really have any not latin characters but there are values with white spaces that when I save them they get unicoded (the whitespaces).
I think this is what is causing the issue.
You can try using unicodecsv instead of the built-in csv
After all I have flattened the values before I write them into the CSV.
I used the unidecode module as following:
from unidecode import unidecode
for row in rows:
field_1 = row[0]
field_2 = unidecode(row[1]).encode(encoding='UTF-8') # LINE CHANGED
fields = [field_1, field_2]
writer.writerows([fields])
return response
Although not a permanent solution, this solved my issue for now.

Error n reading csv file: utf-8 codec cant decode

While running the code to merge(basically inner join) two csv files I am facing an error while reading csv file. My code:
import csv
import pandas as pd
s1= pd.read_csv(".../noun.csv")
s2= pd.read_csv(".../verb.csv")
merged= s1.merge(s2, on=("userID" ,"sentID"), how ="inner")
merged.to_excel(".../merge1.xlsx",index = False)
Error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 5: invalid start byte
example of my content is:
verb file
userID sentID verb
['3477' 1 ['am', 'were', 'having', 'attended', 'stopped']
['3477' 2 ['felt', 'thrusting']
noun file
userID sentID Sentences
['3477' 1 Thursday,
['3477' 1 November
You can use a library that attempts to detect the encoding, for example cchardet:
pip install cchardet
If you use python 2.X you also need a backport of the CSV library. They support Unicode natively, while Python 2's csv does not:
pip install backports.csv
Then in your code you can do something like this:
import cchardet
import io
from backports import csv
# detect encoding
with io.open(filename, mode="rb") as f:
data = f.read()
detect = cchardet.detect(data)
encoding_ = detect['encoding']
# retrieve data
with io.open(filename, encoding=encoding_) as csvfile:
reader = csv.reader(csvfile, ...)
...
I don't know pandas, but you can do something like this:
# retrieve data
s1= pd.read_csv(".../noun.csv", encoding=encoding_)

Encoding CSV lists in python

I need some help with the encoding of a list. I'm new in python, sorry.
First, I'm using Python 2.7.3
I have two lists (entidad & valores), and I need to get them encoded or something of that.
My code:
import urllib
from bs4 import BeautifulSoup
import csv
sock = urllib.urlopen("http://www.fatm.com.es/Datos_Equipo.asp?Cod=01HU0010")
htmlSource = sock.read()
sock.close()
soup = BeautifulSoup(htmlSource)
form = soup.find("form", {'id': "FORM1"})
table = form.find("table")
entidad = [item.text.strip() for item in table.find_all('td')]
valores = [item.get('value') for item in form.find_all('input')]
valores.remove('Imprimir')
valores.remove('Cerrar')
header = entidad
values = valores
print values
out = open('tomate.csv', 'w')
w = csv.writer(out)
w.writerow(header)
w.writerow(values)
out.close()
the log: UnicodeEncodeError: 'ascii' codec can't encode character
any ideas? Thanks in advance!!
You should encode your data to utf-8 manually, csv.writer didnt do it for you:
w.writerow([s.encode("utf-8") for s in header])
w.writerow([s.encode("utf-8") for s in values])
#w.writerow(header)
#w.writerow(values)
This appears to be the same type of problem as had been found here UnicodeEncodeError in csv writer in Python
UnicodeEncodeError in csv writer in Python
Today I was writing a
program that generates a csv file after some processing. But I got the
following error while trying on some test data:
writer.writerow(csv_li) UnicodeEncodeError: 'ascii' codec can't encode
character u'\xbf' in position 5: ordinal not in range(128)
I looked into the documentation of csv module in Python and found a
class named UnicodeWriter. So I changed my code to
writer = UnicodeWriter(open("filename.csv", "wb"))
Then I tried to run it again. It got rid of the previous
UnicodeEncodeError but got into another error.
self.writer.writerow([s.encode("utf-8") for s in row]) AttributeError:
'int' object has no attribute 'encode'
So, before writing the list, I had to change every value to string.
row = [str(item) for item in row]
I think this line can be added in the writerow function of
UnicodeWriter class.

Categories