UnidecodeDecode when reading .txt file - python

This may be a very basic fix, but I've dived through every example online trying to sort this out. I'm loading in a text file with Python 3.4 like so:
text = open("/Users/Stu/python/extext.txt")
text = unidecode(text)
text = open(text, "r").read()
and then I get thrown this error:
Traceback (most recent call last):
File "/Users/Stu/Twitter Python/Victoria.py", line 46, in <module>
short_pos = unidecode(short_pos)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/unidecode/__init__.py", line 37, in unidecode
for char in string:
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf3 in position 4645: ordinal not in range(128)
I'm assuming that it's finding a character that it can't decode, but all there is in this doc is english and basic punctuation. Any support you guys could give would be greatly appreciated.
Cheers!

This seemed to allow me to read the text:
short_pos = open("/Users/Stu/Twitter Python/short_reviews/positive1.txt","r", encoding = "latin-1").read()
Thanks for everyone's support!

Related

Unable to encode a unicode into a .txt file in python

So while trying to mess around with python i tried making a program which would get me the content from pastebin url's and then save each ones content into a file of their own. I got an error
This is the code :-
import requests
file = open("file.txt", "r", encoding="utf-8").readlines()
for line in file:
link = line.rstrip("\n")
n_link = link.replace("https://pastebin.com/", "https://pastebin.com/raw/")
pastebin = n_link.replace("https://pastebin.com/raw/", "")
r = requests.get(n_link, timeout=3)
x = open(f"{pastebin}.txt", "a+")
x.write(r.text)
x.close
I get the following error :-
Traceback (most recent call last):
File "C:\Users\Lenovo\Desktop\Py\Misc. Scripts\ai.py", line 9, in <module>
x.write(r.text)
File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2694' in position 9721: character maps to <undefined>
Can somebody help?
You’re doing good at the start by reading in the input file as UTF-8. The only thing you’re missing is to do the same thing with your output file:
x = open(f"{pastebin}.txt", "a+", encoding="utf-8")

Python read text

I am simply trying to read a text file that has 4000+ lines of nouns all single column and I’m getting an error:
Traceback (most recent call last):
File "/private/var/mobile/Library/Mobile Documents/iCloud~com~omz-software~Pythonista3/Documents/nouns.py", line 4, in <module>
for i in nouns_file:
File "/var/containers/Bundle/Application/107074CD-03B1-4FB3-809A-CBD44D6CF245/Pythonista3.app/Frameworks/Py3Kit.framework/pylib/encodings/ascii.py", line 27, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2241: ordinal not in range(128)
With code:
with open("nounlist.txt", "r") as nouns_file:
for i in nouns_file:
print(i)
I’m not sure what’s causing this. I would think that it would just output all of the nouns from my nounlist.txt file.

Encode/decode documents to base64 dynamically

How do I encode pdf and word files in a folder to base64 and decode them and save into the same folder?
The pdf and word files are generated dynamically through a web service.
I would like to use python to do so.
I used this. But it gives the error
Traceback (most recent call last):
File "sample.py", line 7, in
base64.encode(open("hello.pdf"), open("hello1.b64", "w"))
File "C:\Python34\lib\base64.py", line 496, in encode
s = input.read(MAXBINSIZE)
File "C:\Python34\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1340: character maps
base64.encode(open("hello.pdf"), open("hello1.b64", "w"))
The base64 module, which is included in the standard lib. The documentation is here.

While reading file on Python, I got a UnicodeDecodeError. What can I do to resolve this?

This is one of my own projects. This will later help benefit other people in a game I am playing (AssaultCube). Its purpose is to break down the log file and make it easier for users to read.
I kept getting this issue. Anyone know how to fix this? Currently, I am not planning to write/create the file. I just want this error to be fixed.
The line that triggered the error is a blank line (it stopped on line 66346).
This is what the relevant part of my script looks like:
log = open('/Users/Owner/Desktop/Exodus Logs/DIRTYLOGS/serverlog_20130430_00.15.21.txt', 'r')
for line in log:
and the exception is:
Traceback (most recent call last):
File "C:\Users\Owner\Desktop\Exodus Logs\Log File Translater.py", line 159, in <module>
main()
File "C:\Users\Owner\Desktop\Exodus Logs\Log File Translater.py", line 7, in main
for line in log:
File "C:\Python32\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3074: character maps to <undefined>
Try:
enc = 'utf-8'
log = open('/Users/Owner/Desktop/Exodus Logs/DIRTYLOGS/serverlog_20130430_00.15.21.txt', 'r', encoding=enc)
if it won't work try:
enc = 'utf-16'
log = open('/Users/Owner/Desktop/Exodus Logs/DIRTYLOGS/serverlog_20130430_00.15.21.txt', 'r', encoding=enc)
you could also try it with
enc = 'iso-8859-15'
also try:
enc = 'cp437'
wich is very old but it also has the "ü" at 0x81 wich would fit to the string "üßer" wich I found on the homepage of assault cube.
If all the codings are wrong try to contact some of the guys developing assault cube or as mentioned in a comment: have a look at https://pypi.python.org/pypi/chardet

UnicodeDecodeError reading string in CSV

I'm having a problem reading some chars in python.
I have a csv file in UTF-8 format, and I'm reading, but when script read:
Preußen Münster-Kaiserslautern II
I get this error:
Traceback (most recent call last):
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/webapp/__init__.py", line 515, in __call__
handler.get(*groups)
File "/Users/fermin/project/gae/cuotastats/controllers/controllers.py", line 50, in get
f.name = unicode( row[1])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
I tried to use Unicode functions and convert string to Unicode, but I haven't found the solution. I tried to use sys.setdefaultencoding('utf8') but that doesn't work either.
Try the unicode_csv_reader() generator described in the csv module docs.

Categories