Unicode Decode Error while trying to extract data from text file - python

I am writing a code to extract specific lines from a text file into an output file, and here is what my code looks like:
with open('output.txt', 'w') as outfile:
with open('testfile.txt') as infile:
for line in infile.readlines:
if 'Emotion' in line or 'PANS.RESP' in line:
outfile.write(line)
outfile.write('-----------------------------------------------------')
I keep getting this error:
Traceback (most recent call last): File
"/Users/Sam/Desktop/readstuff.py", line 3, in
for line in infile: File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py",
line 26, in decode
return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position
154: ordinal not in range(128)
What should I do?

Related

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 886: invalid start byte: jsonlines

I am trying to read lines from a jsonl file, but I am getting the following error.
Traceback (most recent call last): File "insertion_script.py", line
12, in
for line in f.iter(): File "C:\Users\Administrator\Anaconda3\lib\site-packages\jsonlines\jsonlines.py",
line 204, in iter
skip_empty=skip_empty) File "C:\Users\Administrator\Anaconda3\lib\site-packages\jsonlines\jsonlines.py",
line 143, in read
lineno, line = next(self._line_iter) File "C:\Users\Administrator\Anaconda3\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position
886: invalid start byte
BH_data = []
with jsonlines.open('2401659.jsonl','r') as f:
for line in f.iter():
BH_data.append(line)
The implication is that your data is not actually in UTF-8. 0xA3 happens to be the British pound sterling symbol in the Windows code page. You should try
import codecs
with codecs.open('2401659.jsonl','r',encoding='cp1252') as jfile:
with jsonlines.Reader(jfile) as f:

LookupError: unknown encoding: utf8r

when I try the code:
f = open("xronia.txt", "r")
for x in f:
print(x)
I always take this Error:Traceback (most recent call last):
File "C:\Users\Desktop\PYTHON\Προγραμματισμός Σταύρος\disekta.py",
line 2, in
lines=fo.readlines() File "C:\Users\AppData\Local\Programs\Python\Python38-32\lib\encodings\cp1253.py",
line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0xff in position
0: character maps to
I have tried to use encoding='utf8' but it didn't work. The file is an excel file formatted as .txt(as I read in a site). I am new to this world, so any help is acceptable..

Python read text

I am simply trying to read a text file that has 4000+ lines of nouns all single column and I’m getting an error:
Traceback (most recent call last):
File "/private/var/mobile/Library/Mobile Documents/iCloud~com~omz-software~Pythonista3/Documents/nouns.py", line 4, in <module>
for i in nouns_file:
File "/var/containers/Bundle/Application/107074CD-03B1-4FB3-809A-CBD44D6CF245/Pythonista3.app/Frameworks/Py3Kit.framework/pylib/encodings/ascii.py", line 27, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2241: ordinal not in range(128)
With code:
with open("nounlist.txt", "r") as nouns_file:
for i in nouns_file:
print(i)
I’m not sure what’s causing this. I would think that it would just output all of the nouns from my nounlist.txt file.

How to read and understand the .hcc file with Python?

I am having a .hcc file, which I am trying to read but I am getting error.
This is what I am tried:
chardetect 2016.hcc
2016.hcc: windows-1253 with confidence 0.2724130248827703
I have tried the following:
>>> with open("2016.hcc","r",encoding="windows-1253") as f:
... print(f.read())
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "C:\Python35\lib\encodings\cp1253.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9c in position 232: character maps to <undefined>
then I tried this without using encoding:
>>> with open("2016.hcc","r") as f:
... print(f.read())
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "C:\Python35\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 284: character maps to <undefined>
After opening the file in byte mode, I was able to read but none was understandable.
Here is the sample file: 2016.hcc
Please let me know how I can do that.
**UPDATED ATTEMPT: **
>>> with open("2016.hcc","r",encoding="utf-16") as f:
... print(f.read())
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "C:\Python35\lib\codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
File "C:\Python35\lib\encodings\utf_16.py", line 61, in _buffer_decode
codecs.utf_16_ex_decode(input, errors, 0, final)
UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 15390-15391: illegal encoding

CSV import error

I'm trying to read a csv file file and keep getting the following error. Can anyone point me in the right direction to resolve this error?
import csv
with open('apple.csv') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
print("1")
Here is the output:
Traceback (most recent call last):
File "/Users/arch/Desktop/ark/ark.py", line 4, in <module>
for row in reader:
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 263: ordinal not in range(128)

Categories