I need some help with the encoding of a list. I'm new in python, sorry.
First, I'm using Python 2.7.3
I have two lists (entidad & valores), and I need to get them encoded or something of that.
My code:
import urllib
from bs4 import BeautifulSoup
import csv
sock = urllib.urlopen("http://www.fatm.com.es/Datos_Equipo.asp?Cod=01HU0010")
htmlSource = sock.read()
sock.close()
soup = BeautifulSoup(htmlSource)
form = soup.find("form", {'id': "FORM1"})
table = form.find("table")
entidad = [item.text.strip() for item in table.find_all('td')]
valores = [item.get('value') for item in form.find_all('input')]
valores.remove('Imprimir')
valores.remove('Cerrar')
header = entidad
values = valores
print values
out = open('tomate.csv', 'w')
w = csv.writer(out)
w.writerow(header)
w.writerow(values)
out.close()
the log: UnicodeEncodeError: 'ascii' codec can't encode character
any ideas? Thanks in advance!!
You should encode your data to utf-8 manually, csv.writer didnt do it for you:
w.writerow([s.encode("utf-8") for s in header])
w.writerow([s.encode("utf-8") for s in values])
#w.writerow(header)
#w.writerow(values)
This appears to be the same type of problem as had been found here UnicodeEncodeError in csv writer in Python
UnicodeEncodeError in csv writer in Python
Today I was writing a
program that generates a csv file after some processing. But I got the
following error while trying on some test data:
writer.writerow(csv_li) UnicodeEncodeError: 'ascii' codec can't encode
character u'\xbf' in position 5: ordinal not in range(128)
I looked into the documentation of csv module in Python and found a
class named UnicodeWriter. So I changed my code to
writer = UnicodeWriter(open("filename.csv", "wb"))
Then I tried to run it again. It got rid of the previous
UnicodeEncodeError but got into another error.
self.writer.writerow([s.encode("utf-8") for s in row]) AttributeError:
'int' object has no attribute 'encode'
So, before writing the list, I had to change every value to string.
row = [str(item) for item in row]
I think this line can be added in the writerow function of
UnicodeWriter class.
Related
I've written a script in python using post requests to scrape the json content from a webpage. When I run my script, I get the result in the console as expected. However, I encounter an issue, when I try to write the same in a csv file.
When I try like:
with open ("outputContent.csv","w",newline="") as f:
I encounter the following error:
Traceback (most recent call last):
File "C:\Users\WCS\AppData\Local\Programs\Python\Python36-32\all_reviews_grabber.py", line 27, in <module>
writer.writerow([nom,ville,region])
File "C:\Users\WCS\AppData\Local\Programs\Python\Python36-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\ufb02' in position 16: character maps to <undefined>
When I try like the following, the script does produce a data ridden csv file:
with open ("outputContent.csv","w",newline="",encoding="utf-8") as f:
But, the csv file contains some illegible contents, as in:
Beijingshì
Xinjiangwéiwúerzìzhìqu
Shà nghaishì
Qingpuqu
Shà nghaishì
Xúhuìqu
Putuóqu
This is my script so far:
import csv
import requests
from bs4 import BeautifulSoup
baseUrl = "https://fr-vigneron.gilbertgaillard.com/importer"
postUrl = "https://fr-vigneron.gilbertgaillard.com/importer/ajax"
with requests.Session() as s:
req = s.get(baseUrl)
sauce = BeautifulSoup(req.text,"lxml")
token = sauce.select_one("input[name='_token']")['value']
payload = {
'data': 'country=0&type=0&input_search=',
'_token': token
}
res = s.post(postUrl,data=payload)
with open ("outputContent.csv","w",newline="",encoding="utf-8") as f:
writer = csv.writer(f)
writer.writerow(['nom','ville','region'])
for item in res.json():
nom = item['prospect_nom']
ville = item['prospect_ville']
region = item['prospect_region']
print(nom,ville,region)
writer.writerow([nom,ville,region])
How can I write the content in the right way in a csv file?
Take a look at this - http://www.pgbovine.net/unicode-python-errors.htm
Check your default encoding in your interpreter:
import sys
sys.stdout.encoding
An old version of Python can also cause this error.
Would using pandas to parse and then write alleviate the issue?
import pandas as pd
import requests
from bs4 import BeautifulSoup
baseUrl = "https://fr-vigneron.gilbertgaillard.com/importer"
postUrl = "https://fr-vigneron.gilbertgaillard.com/importer/ajax"
with requests.Session() as s:
req = s.get(baseUrl)
sauce = BeautifulSoup(req.text,"lxml")
token = sauce.select_one("input[name='_token']")['value']
payload = {
'data': 'country=0&type=0&input_search=',
'_token': token
}
res = s.post(postUrl,data=payload)
jsonObj = res.json()
results = pd.DataFrame()
for item in jsonObj:
nom = item['prospect_nom']
ville = item['prospect_ville']
region = item['prospect_region']
#print(id_,nom,ville,region)
temp_df = pd.DataFrame([[nom,ville,region]], columns = ['nom','ville','region'])
results = results.append(temp_df)
results = results.reset_index(drop=True)
results.to_csv("outputContent.csv", idex=False)
The code works correctly, as long as the print statement is removed*.
The corrupted data that you are seeing is because you are decoding the file data from cp1252, rather than UTF-8 when you view it.
>>> s = 'Xinjiangwéiwúerzìzhìqu'
>>> encoded = s.encode('utf-8')
>>> encoded.decode('cp1252')
'Xinjiangwéiwúerzìzhìqu'
If you are viewing the data by opening the csv file in Python, ensure that you specify UTF-8 encoding** when you open it:
open('outputContent.csv', 'r', encoding='utf-8'...
If you are opening the file with an application such as Excel, ensure that you specify that the encoding is UTF-8 when opening it.
If you don't specify an encoding the default cp1252 encoding will be used to decode the data in the file, and you will see garbage data.
* print will automatically use the default encoding, so you'll get an exception if it tries to encode characters which can't be encoded as cp1252.
** It may also be worth trying the 'utf-8-sig' encoding, which is a Microsoft-specific version of UTF-8 that inserts a byte-order-mark or BOM (b'\xef\xbb\xbf') at the beginning of encoded strings, but is otherwise identical to UTF-8.
I´m trying to import data from a csv file to a Django model. I´m using the manage.py shell for it with the following code:
>>> import csv
>>> import os
>>> path = "C:\\Users\Lia Love\Downloads"
>>> os.chdir(path)
>>> from catalog.models import ProductosBase
>>> with open('FarmaciasGob.csv') as csvfile:
... reader = csv.DictReader(csvfile)
... for row in reader:
... p = Country(country=row['Country'], continent=row['Continent'])
... p.save()
...
>>>
>>> exit()
I get the following error message at a given point of the dataset:
UnicodeDecodeError: "charmap" codec can´t decode byte 0x81 in position 7823: character maps to (undefined)
For what I could find, it seems to be a problem with the "latin" encoding of the csv file.
Inspecting the csv, I don´t see nothing special about the specific row where it get´s the error. I´m able to import about 2200 rows before this one, all with latin characters.
Any clues?
Assuming you are in python3, this is an issue of the character encoding of your file. Most likely, the encoding is 'utf-8', but it could also be 'utf-16', 'utf-16le', 'cp1252', or 'cp437', all of which are also commonly used. In python3, you can specify the encoding of the file on the open:
with open('FarmaciasGob.csv', encoding='utf-8') as csvfile:
I've been trying to get a full data from a MySQL table to csv with python and I've done it well but now that table has a column "description" and it's a piece of hell now coz there're encoding issues everywhere.
After trying tons and tons of things readed from other posts now i give up with those characters and i want to skip them directly and avoid those errors.
test.py:
import MySQLdb, csv, codecs
dbConn = MySQLdb.connect(dbServer,dbUser,dbPass,dbName,charset='utf8')
cur = dbConn.cursor()
def createFile(self):
SQLview = 'SELECT fruit_type, fruit_qty, fruit_price, fruit_description FROM fruits'
cur.execute(SQLview)
with codecs.open('fruits.csv','wb',encoding='utf8',errors='ignore') as csv_file:
csv_writer = csv.writer(csv_file)
csv_writer.writerows(cur)
Still getting that error from the function:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xd1' in position 27: ordinal not in range(128)
Any idea if i can skip that error and still writting the rest of the data from the DB Query?
PD: The function crash line is:
csv_writer.writerows(cur)
Don't know if that's usefull info for someone
Finally solved
changed:
import csv
to:
import unicodecsv as csv
changed:
csv_writer = csv.writer(csv_file)
to:
csv_writer = csv.writer(csv_file,encoding='utf-8')
I'm scraping a £ value in python and when I try to write it into an excel sheet the process breaks and I get the following error
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 0: ordinal not in range(128)
The £ sign is printing without any error in the cmd prompt. Could some suggest how I can write the value (£1,750) into my sheet (with or without £ sign). many thanks...
import requests
from bs4 import BeautifulSoup as soup
import csv
outputfilename = 'Ed_Streets2.csv'
outputfile = open(outputfilename, 'wb')
writer = csv.writer(outputfile)
writer.writerow([Rateable Value])
url = 'https://www.saa.gov.uk/search/?SEARCHED=1&ST=&SEARCH_TERM=city+of+edinburgh%2C+EDINBURGH&ASSESSOR_ID=&SEARCH_TABLE=valuation_roll_cpsplit&PAGE=0&DISPLAY_COUNT=100&TYPE_FLAG=CP&ORDER_BY=PROPERTY_ADDRESS&H_ORDER_BY=SET+DESC&ORIGINAL_SEARCH_TERM=city+of+edinburgh&DRILL_SEARCH_TERM=BOSWALL+PARKWAY%2C+EDINBURGH&DD_TOWN=EDINBURGH&DD_STREET=BOSWALL+PARKWAY#results'
response = session.get(url)
html = soup(response.text, 'lxml')
prop_link = html.find_all("a", class_="pagelink button small")
for link in prop_link:
prop_url = base_url+(link["href"])
response = session.get(prop_url)
prop = soup(response.content,"lxml")
RightBlockData = prop.find_all("div", class_="columns small-7 cell")
Rateable_Value = RightBlockData[0].get_text().strip()
print (Rateable_Value)
writer.writerow([Rateable_Value])
You need to encode your unicode object into bytes explicitely. Or else, your system will automatically try to encode it using ascii codec, which will fail with non-ascii characters. So, this:
Rateable_Value = Rateable_Value.encode('utf8')
before you
writer.writerow([Rateable_Value])
Should do the trick.
I'm currently running into an issue when trying to write data into a file from an api get request. the error is the following message: "UnicodeEncodeError: 'ascii' codec can't encode character u'\xe2' in position 1: ordinal not in range(128)"
I know this means I must convert the text from ascii to utf-8, but I'm not sure how to do this. This is the code that I have so far
import urllib2
import json
def moviesearch(query):
title = query
api_key = ""
f = open('movie_ID_name.txt', 'w')
for i in range(1,15,1):
api_key = "http://api.themoviedb.org/3/search/movie?api_key=b4a53d5c860f2d09852271d1278bec89&query="+title+"&page="+str(i)
json_obj = urllib2.urlopen(api_key)
json_obj.encode('utf-8')
data = json.load(json_obj)
for item in data['results']:
f.write("<"+str(item['id'])+", "+str(item['title'])+'>\n')
f.close()
moviesearch("life")
When I run this I get the following error: AttributeError: addinfourl instance has no attribute 'encode'
What can I do to solve this?
Thanks in advance!
Encoding/decoding only makes sense on things like byte strings or unicode strings. The strings in the data dictionary are Unicode, which is good, since this makes your life easy. Just encode the value as UTF-8:
import urllib2
import json
def moviesearch(query):
title = query
api_key = ""
with open('movie_ID_name.txt', 'w') as f:
for i in range(1,15,1):
api_key = "http://api.themoviedb.org/3/search/movie?api_key=b4a53d5c860f2d09852271d1278bec89&query="+title+"&page="+str(i)
json_obj = urllib2.urlopen(api_key)
data = json.load(json_obj)
for item in data['results']:
f.write("<"+str(item['id'])+", "+item['title'].encode('utf-8')+'>\n')
moviesearch("life")