Not capable to splitlines from a file (python3) - python

So I'm doing what I always do when I read the file:
code:
f= open(filename,'r')
t= f.read().splitlines()
print(t)
but I'm getting a UnicodeDecodeError I don't know why.
the error:
Traceback (most recent call last):
File "try.py", line 21, in <module>
t= f.read().splitlines()
File "/Users/jamilaldani/miniconda3/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 307: invalid start byte

As described in a few places on this site, and AndreaConte's comment this is likely to be a file encoded in a different encoding (ie not UTF-8)
This answer may help: https://stackoverflow.com/a/19706723/70131,
as may this one if you're willing to lose some data: https://stackoverflow.com/a/12468274/70131

Related

error Traceback (most recent call last): after reading .txt file in Python

this is my first time trying to program in Python
with open('/Users/solidaneziri/Downloads/Data_Exercise_1.txt') as infile:
for line in infile:
print(line.split()[0])
this is the code that I wrote when reading the file and it complied and ran the first time, after the first time I keep getting this error and I don't know to fix it
/usr/bin/python3 /Users/solidaneziri/IdeaProjects/Abgabe1/src/BonusAufgabe/aufgabe1.py
Traceback (most recent call last):
File "/Users/solidaneziri/IdeaProjects/Abgabe1/src/BonusAufgabe/aufgabe1.py", line 2, in <module>
for line in infile:
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9f in position 190: invalid start byte
I replaced the letters with accents to normal ones and it worked. This is not the optimal Solution but it worked

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 886: invalid start byte: jsonlines

I am trying to read lines from a jsonl file, but I am getting the following error.
Traceback (most recent call last): File "insertion_script.py", line
12, in
for line in f.iter(): File "C:\Users\Administrator\Anaconda3\lib\site-packages\jsonlines\jsonlines.py",
line 204, in iter
skip_empty=skip_empty) File "C:\Users\Administrator\Anaconda3\lib\site-packages\jsonlines\jsonlines.py",
line 143, in read
lineno, line = next(self._line_iter) File "C:\Users\Administrator\Anaconda3\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position
886: invalid start byte
BH_data = []
with jsonlines.open('2401659.jsonl','r') as f:
for line in f.iter():
BH_data.append(line)
The implication is that your data is not actually in UTF-8. 0xA3 happens to be the British pound sterling symbol in the Windows code page. You should try
import codecs
with codecs.open('2401659.jsonl','r',encoding='cp1252') as jfile:
with jsonlines.Reader(jfile) as f:

How to avoid problem with encode UTF-8 error

I've got problem with reading text files. When I start program and add file, it throws an error:
Traceback (most recent call last):
File "c:/Users/Marcin/Desktop/python/graf_menu.py", line 38, in <module>
main_func()
File "c:/Users/Marcin/Desktop/python/graf_menu.py", line 32, in main_func
read_file()
File "c:/Users/Marcin/Desktop/python/graf_menu.py", line 15, in read_file
for i in f.read():
File "C:\Users\Marcin\AppData\Local\Programs\Python\Python38-32\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb3 in position 19: invalid start byte
In my code there is a line with "encoding="UTF-8". How to solve the problem. The code below:
files = input("File name: ")
try:
with open(files,"r",encoding="UTF-8") as f:
for i in f.read():
print(i,end='')
except FileNotFoundError:
print("FileNotFoundError")
There is nothing wrong with the program itself. You are getting this error because you are trying to read a file which is not encoded as UTF-8 as UTF-8-encoded. You have to either convert the contents of the file to UTF-8 or specify a different encoding (the one that the file actually uses) in the call to open.
This file is not encoded as UTF-8 try to use encoded="iso-8859-1"

Problems read files in python:UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 168: invalid continuation byte

I try to open the text file and it does not work
with open('quiz.txt') as f:
lines = f.readlines()
Traceback (most recent call last):
File "<pyshell#35>", line 2, in <module>
lines=f.readlines()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 168: invalid continuation byte
Invalid continuation byte = not unicode = probably a binary file.
with open('quiz.txt', 'rb') as f:
lines = f.readlines()
will open the file in bytes mode.
Another possibility is that you are executing this in your shell, and the program looks for stuff only in the working directory.
import os
os.chdir('/path/to/your/file/excluding/file/name')
with open('quiz.txt', 'rb') as f:
lines = f.readlines()

Python fails with parsing file using re

I have a file that is mostly ascii file, but there appear some non-ascii characters sometimes. I want to parse this files and extract the lines that are marked in a certain way. Previously I used sed for this, but now I need to do the same in python. (Of course I still can use os.system, but I'm hoping for something more convenient).
I'm doing following.
p = re.compile(".*STATWAH ([0-9]*):([0-9]*):([0-9 ]*):([0-9 ]*) STATWAH.*")
f = open("capture_8_8_8__1_2_3.log", encoding="ascii")
fl = filter(lambda line: p.match(line), f)
len(list(fl))
And in the last line I get following error message:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.4/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x81 in position 2227: ordinal not in range(128)
If I remove encoding parameter from the second line, i. e. use default encoding which is utf-8, the error is following:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.4/codecs.py", line 313, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 2227: invalid start byte
Could you help me please what can I do here, except calling sed from python?
UPD.
Thanks to #Wooble I found the answer.
The correct code looks following:
p = re.compile(rb".*STATWAH ([0-9]*):([0-9]*):([0-9 ]*):([0-9 ]*) STATWAH.*")
f = open("capture_8_8_8__1_2_3.log", "rb")
fl = filter(lambda line: p.match(line), f)
len(list(fl))
I opened file in binary mode and also compile regex from binary string representation.

Categories