UnicodeEncodeError: 'ascii' codec can't encode character '\xa9' python 3 - python

This is my code:
import urllib.request
imglinks = ["http://www.katytrailweekly.com/Files/MalibuPokeMatt_©Marple_449-EDITED_15920174118.jpg"]
for link in imglinks:
filename = link.split('/')[-1]
urllib.request.urlretrieve(link, filename)
It gives me the error:
UnicodeEncodeError: 'ascii' codec can't encode character '\xa9'
How do I solve this? I tried using .encode('utf-8'), but it gives me:
TypeError: cannot use a string pattern on a bytes-like object

The problem here is not the encoding itself but the correct encoding to pass to `request'.
You need to quote the url as follows:
import urllib.request
import urllib.parse
imglinks = ["http://www.katytrailweekly.com/Files/MalibuPokeMatt_©Marple_449-EDITED_15920174118.jpg"]
for link in imglinks:
link = urllib.parse.quote(link,safe=':/') # <- here
filename = link.split('/')[-1]
urllib.request.urlretrieve(link, filename)
This way your © symbol is encoded as %C2%A9 as the web server wants.
The safe parameter is specified to prevent quote to modify also the : after http.
Is up to you to modify the code to save the file with the correct original filename. ;)

Related

problem of urlretrieve cannot get image from url contains unicode string

I write a python script to retrieve the image from url:
url = `https://uploads0.wikiart.org/images/albrecht-durer/watermill-at-the-montaсa.jpg`
urllib.request.urlretrieve(url, STYLE_IMAGE_UPLOAD + "wikiart" + "/" + url)
When I run I got the message
UnicodeEncodeError: 'ascii' codec can't encode character '\u0441' in position 49: ordinal not in range(128)
I think the problem from the image url
'https://uploads0.wikiart.org/images/albrecht-durer/watermill-at-the-monta\u0441a.jpg',
How to fix this problem?
The URL contains a non-ASCII character (a Cyrillic letter that looks like a Latin "c").
Escape this character using the urllib.parse.quote function:
url = 'https://uploads0.wikiart.org' + urllib.parse.quote('/images/albrecht-durer/watermill-at-the-montaсa.jpg')
urllib.request.urlretrieve(url, '/tmp/watermill.jpg')
Don't put the entire URL in the quote function, otherwise it would escape the colon (":") in "https://".

Can't find the directory no idea why

import requests
test = requests.get("https://www.hipstercode.com/")
outfile = open("./settings.txt", "w")
test.encoding = 'ISO-8859-1'
outfile.write(str(test.text))
The error that i'm getting is:
File "C:/Users/Bamba/PycharmProjects/Requests/Requests/Requests.py", line 8, in <module>
outfile.write(str(test.text))
File "C:\Users\Bamba\AppData\Local\Programs\Python\Python35\lib\encodings\cp1255.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xef' in position 0: character maps to <undefined>
So, looks like response contains smth you can't encode in cp1251.
If utf-8 is ok for you, try
import requests
test = requests.get("https://www.hipstercode.com/")
outfile = open("./settings.txt", "wb")
outfile.write(test.text.encode('ISO-8859-1'))
If you're getting error while encoding, you simply cannot encode lossless. Options you have described in encode docs: https://docs.python.org/3/library/stdtypes.html#str.encode
I.e., you can
outfile.write(test.text.encode('ISO-8859-1', 'replace'))
to handle errors without losing most sense of text written in smth that doesn't fit ISO-8859-1

'UnicodeEncodeError: 'ascii' codec' Error when try to write £ sign into excel sheet using python

I'm scraping a £ value in python and when I try to write it into an excel sheet the process breaks and I get the following error
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 0: ordinal not in range(128)
The £ sign is printing without any error in the cmd prompt. Could some suggest how I can write the value (£1,750) into my sheet (with or without £ sign). many thanks...
import requests
from bs4 import BeautifulSoup as soup
import csv
outputfilename = 'Ed_Streets2.csv'
outputfile = open(outputfilename, 'wb')
writer = csv.writer(outputfile)
writer.writerow([Rateable Value])
url = 'https://www.saa.gov.uk/search/?SEARCHED=1&ST=&SEARCH_TERM=city+of+edinburgh%2C+EDINBURGH&ASSESSOR_ID=&SEARCH_TABLE=valuation_roll_cpsplit&PAGE=0&DISPLAY_COUNT=100&TYPE_FLAG=CP&ORDER_BY=PROPERTY_ADDRESS&H_ORDER_BY=SET+DESC&ORIGINAL_SEARCH_TERM=city+of+edinburgh&DRILL_SEARCH_TERM=BOSWALL+PARKWAY%2C+EDINBURGH&DD_TOWN=EDINBURGH&DD_STREET=BOSWALL+PARKWAY#results'
response = session.get(url)
html = soup(response.text, 'lxml')
prop_link = html.find_all("a", class_="pagelink button small")
for link in prop_link:
prop_url = base_url+(link["href"])
response = session.get(prop_url)
prop = soup(response.content,"lxml")
RightBlockData = prop.find_all("div", class_="columns small-7 cell")
Rateable_Value = RightBlockData[0].get_text().strip()
print (Rateable_Value)
writer.writerow([Rateable_Value])
You need to encode your unicode object into bytes explicitely. Or else, your system will automatically try to encode it using ascii codec, which will fail with non-ascii characters. So, this:
Rateable_Value = Rateable_Value.encode('utf8')
before you
writer.writerow([Rateable_Value])
Should do the trick.

Python, UnicodeEncodeError: 'charmap' codec can't encode characters in position

I want to write the HTML of a website to the file I created, tough I decode to utf-8 but still it puts up a error like this, I use print(data1) and the html is printed properlyand I am using python 3.5.0
import re
import urllib.request
city = input("city name")
url = "http://www.weather-forecast.com/locations/"+city+"/forecasts/latest"
data = urllib.request.urlopen(url).read()
data1 = data.decode("utf-8")
f = open("C:\\Users\\Gopal\\Desktop\\test\\scrape.txt","w")
f.write(data1)
You've opened a file with the default system encoding:
f = open("C:\\Users\\Gopal\\Desktop\\test\\scrape.txt", "w")
You need to specify your encoding explicitly:
f = open("C:\\Users\\Gopal\\Desktop\\test\\scrape.txt", "w", encoding='utf8')
See the open() function documentation:
In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding.
On your system, the default is a codec that cannot handle your data.
f = open("C:\\Users\\Gopal\\Desktop\\test\\scrape.txt","w",encoding='utf8')
f.write(data1)
This should work, it did for me

Python encoding issue w/ ascii to utf-8

I'm currently running into an issue when trying to write data into a file from an api get request. the error is the following message: "UnicodeEncodeError: 'ascii' codec can't encode character u'\xe2' in position 1: ordinal not in range(128)"
I know this means I must convert the text from ascii to utf-8, but I'm not sure how to do this. This is the code that I have so far
import urllib2
import json
def moviesearch(query):
title = query
api_key = ""
f = open('movie_ID_name.txt', 'w')
for i in range(1,15,1):
api_key = "http://api.themoviedb.org/3/search/movie?api_key=b4a53d5c860f2d09852271d1278bec89&query="+title+"&page="+str(i)
json_obj = urllib2.urlopen(api_key)
json_obj.encode('utf-8')
data = json.load(json_obj)
for item in data['results']:
f.write("<"+str(item['id'])+", "+str(item['title'])+'>\n')
f.close()
moviesearch("life")
When I run this I get the following error: AttributeError: addinfourl instance has no attribute 'encode'
What can I do to solve this?
Thanks in advance!
Encoding/decoding only makes sense on things like byte strings or unicode strings. The strings in the data dictionary are Unicode, which is good, since this makes your life easy. Just encode the value as UTF-8:
import urllib2
import json
def moviesearch(query):
title = query
api_key = ""
with open('movie_ID_name.txt', 'w') as f:
for i in range(1,15,1):
api_key = "http://api.themoviedb.org/3/search/movie?api_key=b4a53d5c860f2d09852271d1278bec89&query="+title+"&page="+str(i)
json_obj = urllib2.urlopen(api_key)
data = json.load(json_obj)
for item in data['results']:
f.write("<"+str(item['id'])+", "+item['title'].encode('utf-8')+'>\n')
moviesearch("life")

Categories