Ignore encoding errors with csv writer and MySQLdb

Ignore encoding errors with csv writer and MySQLdb - python

I've been trying to get a full data from a MySQL table to csv with python and I've done it well but now that table has a column "description" and it's a piece of hell now coz there're encoding issues everywhere.
After trying tons and tons of things readed from other posts now i give up with those characters and i want to skip them directly and avoid those errors.
test.py:
import MySQLdb, csv, codecs
dbConn = MySQLdb.connect(dbServer,dbUser,dbPass,dbName,charset='utf8')
cur = dbConn.cursor()
def createFile(self):
SQLview = 'SELECT fruit_type, fruit_qty, fruit_price, fruit_description FROM fruits'
cur.execute(SQLview)
with codecs.open('fruits.csv','wb',encoding='utf8',errors='ignore') as csv_file:
csv_writer = csv.writer(csv_file)
csv_writer.writerows(cur)
Still getting that error from the function:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xd1' in position 27: ordinal not in range(128)
Any idea if i can skip that error and still writting the rest of the data from the DB Query?
PD: The function crash line is:
csv_writer.writerows(cur)
Don't know if that's usefull info for someone

Finally solved
changed:
import csv
to:
import unicodecsv as csv
changed:
csv_writer = csv.writer(csv_file)
to:
csv_writer = csv.writer(csv_file,encoding='utf-8')

Related

Python 2.7 - Pandas UnicodeEncodeError with data from pyodbc

I'm trying to pull data from SQL Server using pyodbc and load it into a dataframe, then export it to an HTML file, except I keep receiving the following Unicode error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 15500: ordinal not in range(128)
Here is my current setup (encoding instructions per docs):
cnxn = pyodbc.connect('DSN=Planning;UID=USER;PWD=PASSWORD;')
cnxn.setdecoding(pyodbc.SQL_CHAR, encoding='cp1252', to=unicode)
cnxn.setdecoding(pyodbc.SQL_WCHAR, encoding='cp1252', to=unicode)
cnxn.setdecoding(pyodbc.SQL_WMETADATA, encoding='cp1252', to=unicode)
cnxn.setencoding(str, encoding='utf-8')
cnxn.setencoding(unicode, encoding='utf-8')
cursor = cnxn.cursor()
with open('Initial Dataset.sql') as f:
initial_query = f.read()
cursor.execute(initial_query)
columns = [column[0] for column in cursor.description]
initial_data = cursor.fetchall()
i_df = pd.DataFrame.from_records(initial_data, columns=columns)
i_df.to_html('initial.html')
An odd but useful point to note is that when I try to export a CSV:
i_df.to_csv('initial.csv')
I get the same error, however when I add:
i_df.to_csv('initial.csv', encoding='utf-8')
It works. Can someone help me understand this encoding issue?
Side note: I've also tried using a sqlalchemy connection and pandas.read_sql() and the same error persists.

The second answer on this question seems to be an acceptable workaround, except for Python 2.x users, you must use io, so:
import io
html = df.to_html()
with io.open("mypage.html", "w", encoding="utf-8") as file:
file.write(html)
It was not included in the latest release, but it looks like the next version of pandas will have an encoding option for to_html(), see docs (line 2228).

Error importing data to a model in Django

I´m trying to import data from a csv file to a Django model. I´m using the manage.py shell for it with the following code:
>>> import csv
>>> import os
>>> path = "C:\\Users\Lia Love\Downloads"
>>> os.chdir(path)
>>> from catalog.models import ProductosBase
>>> with open('FarmaciasGob.csv') as csvfile:
... reader = csv.DictReader(csvfile)
... for row in reader:
... p = Country(country=row['Country'], continent=row['Continent'])
... p.save()
...
>>>
>>> exit()
I get the following error message at a given point of the dataset:
UnicodeDecodeError: "charmap" codec can´t decode byte 0x81 in position 7823: character maps to (undefined)
For what I could find, it seems to be a problem with the "latin" encoding of the csv file.
Inspecting the csv, I don´t see nothing special about the specific row where it get´s the error. I´m able to import about 2200 rows before this one, all with latin characters.
Any clues?

Assuming you are in python3, this is an issue of the character encoding of your file. Most likely, the encoding is 'utf-8', but it could also be 'utf-16', 'utf-16le', 'cp1252', or 'cp437', all of which are also commonly used. In python3, you can specify the encoding of the file on the open:
with open('FarmaciasGob.csv', encoding='utf-8') as csvfile:

Avoiding UnicodeEncodeError in python

I tried to parse an html table into csv using python with a following script:
from bs4 import BeautifulSoup
import requests
import csv
csvFile = open('log.csv', 'w', newline='')
writer = csv.writer(csvFile)
def parse():
html = requests.get('https://en.wikipedia.org/wiki/Comparison_of_text_editors')
bs = BeautifulSoup(html.text, 'lxml')
table = bs.select_one('table.wikitable')
rows = table.select('tr')
for row in rows:
csvRow = []
for cell in row.findAll(['th', 'td']):
csvRow.append(cell.getText())
writer.writerow(csvRow)
print(csvRow)
parse()
csvFile.close()
This code outputed a clear formated CSV file with no encoding issues.
All was just fine before Enrico Tröger's Geany. My script was unable to write ö
into a csv file, so i tried this:
csvRow.append(cell.text.encode('ascii', 'replace')) instead of that: csvRow.append(cell.getText())
All was fine, despite the fact that each table cell was nested in b''. So, how can i get a clear formated csv file withous encoding issues(like in the first screenshot) and replaced or ignored all
non-unicode symbols(like in the second screenshot) using my scipt?

Change this one:
csvFile = open('log.csv', 'w', newline='')
To this one:
csvFile = open('log.csv', 'w', newline='', encoding='utf8')
csv module documentation:
Since open() is used to open a CSV file for reading, the file will by default be decoded into unicode using the system default encoding (see locale.getpreferredencoding()). To decode a file using a different encoding, use the encoding argument of open:
import csv
with open('some.csv', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
for row in reader:
print(row)
The same applies to writing in something other than the system default encoding: specify the encoding argument when opening the output file.
I suppose your system default encoding is not utf8.
You can check it like this:
import locale
locale.getpreferredencoding()
Hope it helps!

Looks like the csv module expects strings, not bytes. So you could de-encode your bytes before passing them:
cell.text.encode('ascii', 'replace').decode('ascii')

UnicodeEncode error in python when uploading CSV file to postgres DB

I keep getting a UnicodeEncode error when trying to upload a CSV file into a Postgres DB using Python2.7
First I create the file in CSV format. The file has non latin characters that's why I download it and encoding the second column which it has strings:
writer = csv.writer(response, dialect='excel')
writer.writerow(tuple(corresponding_data[btn]["columns"].split(',')))
for row in rows:
field_1 = row[0]
field_2 = row[1].encode(encoding='UTF-8')
fields = [field_1, field_2]
writer.writerows([fields])
The file is created without errors. When I open it in Excel I see that there are some values like: Dajï¿ï¿
In order to upload the file and save it in a table in Postgres I use the python module called: CSVKit.
This is what I do:
import codecs
f = codecs.open(absolute_base_file, 'rb', encoding='utf-8')
delimiter = ","
no_header_row = False
try:
csv_table = table.Table.from_csv(f, name=table_name_temp, no_header_row=no_header_row, delimiter=delimiter)
Although I specify the encoding I keep getting an error:
<type 'exceptions.UnicodeEncodeError'>
I don't know what else to try here.
EDITED
After checking the values in the DB I see they dont really have any not latin characters but there are values with white spaces that when I save them they get unicoded (the whitespaces).
I think this is what is causing the issue.

You can try using unicodecsv instead of the built-in csv

After all I have flattened the values before I write them into the CSV.
I used the unidecode module as following:
from unidecode import unidecode
for row in rows:
field_1 = row[0]
field_2 = unidecode(row[1]).encode(encoding='UTF-8') # LINE CHANGED
fields = [field_1, field_2]
writer.writerows([fields])
return response
Although not a permanent solution, this solved my issue for now.

Encoding CSV lists in python

I need some help with the encoding of a list. I'm new in python, sorry.
First, I'm using Python 2.7.3
I have two lists (entidad & valores), and I need to get them encoded or something of that.
My code:
import urllib
from bs4 import BeautifulSoup
import csv
sock = urllib.urlopen("http://www.fatm.com.es/Datos_Equipo.asp?Cod=01HU0010")
htmlSource = sock.read()
sock.close()
soup = BeautifulSoup(htmlSource)
form = soup.find("form", {'id': "FORM1"})
table = form.find("table")
entidad = [item.text.strip() for item in table.find_all('td')]
valores = [item.get('value') for item in form.find_all('input')]
valores.remove('Imprimir')
valores.remove('Cerrar')
header = entidad
values = valores
print values
out = open('tomate.csv', 'w')
w = csv.writer(out)
w.writerow(header)
w.writerow(values)
out.close()
the log: UnicodeEncodeError: 'ascii' codec can't encode character
any ideas? Thanks in advance!!

You should encode your data to utf-8 manually, csv.writer didnt do it for you:
w.writerow([s.encode("utf-8") for s in header])
w.writerow([s.encode("utf-8") for s in values])
#w.writerow(header)
#w.writerow(values)

This appears to be the same type of problem as had been found here UnicodeEncodeError in csv writer in Python
UnicodeEncodeError in csv writer in Python
Today I was writing a
program that generates a csv file after some processing. But I got the
following error while trying on some test data:
writer.writerow(csv_li) UnicodeEncodeError: 'ascii' codec can't encode
character u'\xbf' in position 5: ordinal not in range(128)
I looked into the documentation of csv module in Python and found a
class named UnicodeWriter. So I changed my code to
writer = UnicodeWriter(open("filename.csv", "wb"))
Then I tried to run it again. It got rid of the previous
UnicodeEncodeError but got into another error.
self.writer.writerow([s.encode("utf-8") for s in row]) AttributeError:
'int' object has no attribute 'encode'
So, before writing the list, I had to change every value to string.
row = [str(item) for item in row]
I think this line can be added in the writerow function of
UnicodeWriter class.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Ignore encoding errors with csv writer and MySQLdb - python

Finally solved changed: import csv to: import unicodecsv as csv changed: csv_writer = csv.writer(csv_file) to: csv_writer = csv.writer(csv_file,encoding='utf-8')

Related

Python 2.7 - Pandas UnicodeEncodeError with data from pyodbc

Error importing data to a model in Django

Avoiding UnicodeEncodeError in python

UnicodeEncode error in python when uploading CSV file to postgres DB

Encoding CSV lists in python

Categories

Resources