'utf-8' codec can't decode byte 0xa0 in position 12387 - python

My Code:
import re
import urllib.request
url="https://www.google.com/search?sxsrf="
stock=input("Enter your stock: ") # Enter your stock: FB
url=url+stock
print(url) # https://www.google.com/search?sxsrf=FB
data=urllib.request.urlopen(url).read()
data1=data.decode("utf-8")
My Error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 12387:
invalid start byte

The data isn't UTF-8-encoded; it's ISO-8859-1.
>>> url="https://www.google.com/search?sxsrf=FB"
>>> d = urllib.request.urlopen(url)
>>> dict(d.getheaders())['Content-Type']
'text/html; charset=ISO-8859-1'
>>> data1 = d.read().decode('iso-8859-1')

Related

'charmap' codec can't decode byte 0x9d in position 4836: character maps to <undefined>

I am trying to figure out this error that pops up from this code:
filename = os.path.join(os.path.expanduser("~"), "data", "blogs",
"1005545.male.25.Engineering.Sagittarius.xml")
#filename = open('C:/Users/spenc/data/blogs/1005545.male.25.Engineering.Sagittarius.xml',
#encoding='utf-8', errors = 'ignore')
all_posts = []
allPosts = []
with open(filename) as inf:
postStart = False
post = []
for line in inf:
line = line.strip()
if line == "<post>":
postStart = True
elif line == "</post>":
postStart = False
allPosts.append("\n".join(post))
post =[]
elif postStart:
post.append(line)
print(allPosts[0])
print(len(allPosts))
filename.close()
and get this error:
File "D:\Anaconda-Python\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 4836: character maps to <undefined> here
I am just trying to figure out the encoding error to make sure this works in finding the length of the posts and print the post itself, but it keeps getting caught up on the allposts.append line. Not really sure of anywork around or if there is a newer way of doing something of this sort. I was trying to follow a textbook on it, but cant continue on in the chapter until this has been worked out.

How can I replace 'æ' 'ø' and 'å' in a text without error: 'ascii' codec can't decode byte 0xc3 in position 0

I am making a program which is supposed to open a textfile, then replace letters 'æ, ø, and å' (Danish text) with 'ae, oe, aa'.
I need to open the program and run it through the mac terminal.
I tried using the replace() function, and tried writing:
# -*- coding: utf-8 -*-
#!/usr/bin/env python
in the beginning of the file.
But I keep getting error:
File "replace.py", line 20, in replace_nonascii
word = word.replace('å', 'aa')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
any suggestions? have tried googling this for days, I have no clue how to fix it.
Here is my program:
filepath = input('insert path for text')
with codecs.open(filepath, 'r', encoding = 'utf8') as file_object:
filename_cont['text1'] = file_object.read()
def replace_nonascii(word):
word = word.lower()
word = word.replace('å', 'aa')
word = word.replace('æ', 'ae')
word = word.strip('/-.,?!')
print(word)
for text in filename_cont:
newtext = filename_cont[text]
for word in newtext.split():
replace_nonascii(word)

Python, PyGame UnicodeDecodeError

Python, PyGame UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)
I'm aware to the other answers to similar questions but none of them solved my problem.
This is my code:
# coding=utf-8
W = "─│"
ENCODING = "utf-8"
def maze():
tr_list = pygame.sprite.Group()
count_i = 0
count_j = 0
f = codecs.open("files/ma.txt", mode="r+", encoding=ENCODING)
# Open file as f
read = f.read().splitlines()
f.close()
for line in read:
for m in line:
if m in W:
if m == '│':
tr_list.add(MazeV(count_j, count_i))
elif m == '─':
tr_list.add(MazeH(count_j, count_i))
count_j += ADD
count_i += ADD
return tr_list
This is the error when I run the code:
File "/Users/user/Documents/Pact/Main.py", line 637, in <module>
main()
File "/Users/user/Documents/Pact/Main.py", line 121, in main
wall_list = maze() # Set up the maze
File "/Users/user/Documents/Pact/Main.py", line 493, in maze
if i in WALL: # If wall
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0:
ordinal not in range(128)
I tried encoding and decoding to many formats, but the problem keeps the same. Is there anything that I can do?
This is ma.txt:
ma.txt
Thanks in advance
try to decode line, maybe will help
for line in read:
for m in line.decode(ENCODING):
...

'charmap' codec can't decode byte 0x9d

i'm making a program in python that convert a the input json file in file xml, it work good for the first 560 files but than this happened:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1871: character maps to <undefined>
this is my code:
# -*- coding: utf-8 -*-
##IMPORT
import codecs
import string
import sys
from src.json2xml import Json2xml
import unicodedata
import os
##FUNCTION
def fn_conversione(f):
data = Json2xml.fromjsonfile('json//' + f).data
data_object = Json2xml(data)
output = data_object.json2xml() #xml output
return (output)
def fn_letturaFile():
filenamelist = []
path = './json'
for filename in os.listdir(path):
filenamelist.append(filename)
return (filenamelist)
def fn_createXml(filenamejson, content):
path = './xml'
filenamexml = filenamejson.replace('.json', '.xml')
f = open("./xml/" + filenamexml, "w+", encoding="utf8")
f.write(content)
f.close()
return("scritto")
nomeFile = fn_letturaFile()
for i in range (len(nomeFile)):
contenuto = fn_conversione(nomeFile[i])
if contenuto != None:
fn_createXml(nomeFile[i], contenuto)
print(i, "/", len(nomeFile))
print(i, "/", len(nomeFile))

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9'

I use python to get json data from bing api
accountKeyEnc = base64.b64encode(accountKey + ':' + accountKey)
headers = {'Authorization': 'Basic ' + accountKeyEnc}
req = urllib2.Request(bingUrl, headers = headers)
response = urllib2.urlopen(req)
content = response.read()
data = json.loads(content)
for i in range(0,6):
print data["d"]["results"][i]["Description"]
But I got error
print data["d"]["results"][0]["Description"]
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 11: ordinal not in range(128)
Your problem is that you are reading Unicode from the Bing API and then failing to explicitly convert it to ASCII. There does not exist a good mapping between the two. Prefix all of your const strings with u so that they will be seen as Unicode strings, see if that helps.

Categories