import requests
test = requests.get("https://www.hipstercode.com/")
outfile = open("./settings.txt", "w")
test.encoding = 'ISO-8859-1'
outfile.write(str(test.text))
The error that i'm getting is:
File "C:/Users/Bamba/PycharmProjects/Requests/Requests/Requests.py", line 8, in <module>
outfile.write(str(test.text))
File "C:\Users\Bamba\AppData\Local\Programs\Python\Python35\lib\encodings\cp1255.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xef' in position 0: character maps to <undefined>
So, looks like response contains smth you can't encode in cp1251.
If utf-8 is ok for you, try
import requests
test = requests.get("https://www.hipstercode.com/")
outfile = open("./settings.txt", "wb")
outfile.write(test.text.encode('ISO-8859-1'))
If you're getting error while encoding, you simply cannot encode lossless. Options you have described in encode docs: https://docs.python.org/3/library/stdtypes.html#str.encode
I.e., you can
outfile.write(test.text.encode('ISO-8859-1', 'replace'))
to handle errors without losing most sense of text written in smth that doesn't fit ISO-8859-1
Related
I am trying to load a utf-8 encoded json file using python's json module. The file contains several right quotation marks, encoded as E2 80 9D. When I call
json.load(f, encoding='utf-8')
I receive the message:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 212068: character maps to
How can I convince the json module to decode this properly?
EDIT: Here's a minimal example:
[
{
"aQuote": "“A quote”"
}
]
There is no encoding in the signature of json.load. The solution should be simply:
with open(filename, encoding='utf-8') as f:
x = json.load(f)
I have this text.ucs file which I am trying to decode using python.
file = open('text.ucs', 'r')
content = file.read()
print content
My result is
\xf\xe\x002\22
I tried doing decoding with utf-16, utf-8
content.decode('utf-16')
and getting error
Traceback (most recent call last): File "", line 1, in
File "C:\Python27\lib\encodings\utf_16.py", line 16, in
decode
return codecs.utf_16_decode(input, errors, True) UnicodeDecodeError: 'utf16' codec can't decode bytes in position
32-33: illegal encoding
Please let me know if I am missing anything or my approach is wrong
Edit: Screenshot has been asked
The string is encoded as UTF16-BE (Big Endian), this works:
content.decode("utf-16-be")
oooh, as i understand you using python 2.x.x but encoding parameter was added only in python 3.x.x as I know, i am doesn't master of python 2.x.x but you can search in google about io.open for example try:
file = io.open('text.usc', 'r',encoding='utf-8')
content = file.read()
print content
but chek do you need import io module or not
You can specify which encoding to use with the encoding argument:
with open('text.ucs', 'r', encoding='utf-16') as f:
text = f.read()
your string need to Be Uncoded With The Coding utf-8 you can do What I Did Now for decode your string
f = open('text.usc', 'r',encoding='utf-8')
print f
This is my code:
import urllib.request
imglinks = ["http://www.katytrailweekly.com/Files/MalibuPokeMatt_©Marple_449-EDITED_15920174118.jpg"]
for link in imglinks:
filename = link.split('/')[-1]
urllib.request.urlretrieve(link, filename)
It gives me the error:
UnicodeEncodeError: 'ascii' codec can't encode character '\xa9'
How do I solve this? I tried using .encode('utf-8'), but it gives me:
TypeError: cannot use a string pattern on a bytes-like object
The problem here is not the encoding itself but the correct encoding to pass to `request'.
You need to quote the url as follows:
import urllib.request
import urllib.parse
imglinks = ["http://www.katytrailweekly.com/Files/MalibuPokeMatt_©Marple_449-EDITED_15920174118.jpg"]
for link in imglinks:
link = urllib.parse.quote(link,safe=':/') # <- here
filename = link.split('/')[-1]
urllib.request.urlretrieve(link, filename)
This way your © symbol is encoded as %C2%A9 as the web server wants.
The safe parameter is specified to prevent quote to modify also the : after http.
Is up to you to modify the code to save the file with the correct original filename. ;)
Ia have the following data container which is constantly being updated:
data = []
for val, track_id in zip(values,list(track_ids)):
#below
if val < threshold:
#structure data as dictionary
pre_data = {"artist": sp.track(track_id)['artists'][0]['name'], "track":sp.track(track_id)['name'], "feature": filter_name, "value": val}
data.append(pre_data)
#write to file
with open('db/json/' + user + '_' + product + '_' + filter_name + '.json', 'w') as f:
json.dump(data,f, ensure_ascii=False, indent=4, sort_keys=True)
but I am getting a lot of errors like this:
json.dump(data,f, ensure_ascii=False, indent=4, sort_keys=True)
File"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 190, in dump
fp.write(chunk)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 6: ordinal not in range(128)
Is there a way I can get rid of this encoding problem once and for all?
I was told that this would do it:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
but many people do not recommend it.
I use python 2.7.10
any clues?
When you write to a file that was opened in text mode, Python encodes the string for you. The default encoding is ascii, which generates the error you see; there are a lot of characters that can't be encoded to ASCII.
The solution is to open the file in a different encoding. In Python 2 you must use the codecs module, in Python 3 you can add the encoding= parameter directly to open. utf-8 is a popular choice since it can handle all of the Unicode characters, and for JSON specifically it's the standard; see https://en.wikipedia.org/wiki/JSON#Data_portability_issues.
import codecs
with codecs.open('db/json/' + user + '_' + product + '_' + filter_name + '.json', 'w', encoding='utf-8') as f:
Your object has unicode strings and python 2.x's support for unicode can be a bit spotty. First, lets make a short example that demonstrates the problem:
>>> obj = {"artist":u"Björk"}
>>> import json
>>> with open('deleteme', 'w') as f:
... json.dump(obj, f, ensure_ascii=False)
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/lib/python2.7/json/__init__.py", line 190, in dump
fp.write(chunk)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 3: ordinal not in range(128)
From the json.dump help text:
If ``ensure_ascii`` is true (the default), all non-ASCII characters in the
output are escaped with ``\uXXXX`` sequences, and the result is a ``str``
instance consisting of ASCII characters only. If ``ensure_ascii`` is
``False``, some chunks written to ``fp`` may be ``unicode`` instances.
This usually happens because the input contains unicode strings or the
``encoding`` parameter is used. Unless ``fp.write()`` explicitly
understands ``unicode`` (as in ``codecs.getwriter``) this is likely to
cause an error.
Ah! There is the solution. Either use the default ensure_ascii=True and get ascii escaped unicode characters or use the codecs module to open the file with the encoding you want. This works:
>>> import codecs
>>> with codecs.open('deleteme', 'w', encoding='utf-8') as f:
... json.dump(obj, f, ensure_ascii=False)
...
>>>
Why not encode the specific string instead? try, the .encode('utf-8') method on the string that is raising the exception.
I have a GIF file (or any image format) in unicode form:
>>> data
u'GIF89a,\x000\x00\ufffd\ufffd\x00\x00\x00\x00\ufffd\ufffd\ufff...
I want to write this to file:
>>> f = open('file.gif', 'wb')
>>> f.write(data)
But I get an error:
UnicodeEncodeError at /image
'ascii' codec can't encode characters in position 10-11: ordinal not in range(128)
How do I do this?
Try this:
utf8data = data.encode('UTF-8')
open('file.gif', 'w').write(utf8data)
You must encode the string to unicode explicitly
f.write(data.encode('utf-8'))