invalid continuation byte when reading file - python

Here is my Code line:
m_data = pd.read_table(m_path, sep='::', header=None, names=mnames)
results in the error:
'utf-8' codec can't decode byte 0xe9 in position 3114: invalid continuation byte
I have specified a coder in my code:
m_data = pd.read_table(m_path, sep='::', header=None, names=mnames,encoding='utf-8')
But the problem still exists. What should I do then?

'utf-8' codec can't decode byte 0xe9 in position 3114: invalid continuation byte
Here the error message means you should NOT use utf8 encoding.
It might be utf16, gbk and so on, if you have ever heard them.
If you still got the message like that, after some possible attempts.
I will suggest chardet package.
It is very easy to use.
import chardet
with open("your_file", mode="rb") as f:
print(chardet.detect(f.read(2000)))
rb means, read it as binary code.
2000 means, the bytes size you wanna detect. Often, the larger you set, the more accurate the results.
chardet - pypi

Related

UBlox NAV_PVT message: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5

Does anybody know how to decode the NAV_PVT message in python?
I tried the UTF-8 but it I get this error message:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 0: invalid start byte
I can't find the right decode format.
You should read the file as binary, because it is binary. UBlox has a nice documentation on various formats/protocols. Check them
E.g. https://www.u-blox.com/sites/default/files/products/documents/u-blox8-M8_ReceiverDescrProtSpec_%28UBX-13003221%29.pdf page 332. Is this what you are looking for?
Or if you were using some libraries, you should check such documentation. But I assume or you mixed up the binary with ascii version, or you are just using the binary protocol.

I am facing an error while i try to load a file in python 3

f = open(path,'r',encoding='utf8')
This is the code I'm trying to run but it outputs 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte as the error. What might be the reason for this?
Try changing your encoding to utf-8, and see if that fixes it. Otherwise, the file might not be encoded in utf-8.

'utf-8' codec can't decode byte 0xa0 in position 24: invalid start byte

I am trying to read a csv file using the following lines of Python code:
crimes = pd.read_csv('C:/Users/usuario1/Desktop/python/csv/001 Boston crimes/crime.csv', encoding = 'utf8')
crimes.head(5)
But I am getting decode error as follws:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 24: invalid start byte
What is going wrong?
May be your file does not support utf-8 codec or has a character that does not support utf-8. You can try other encodings like ISO-8859-1. But it is best to check your file encoding first. To do so, something like the following should work:
1.
with open('Your/file/path') as f:
print(f)
This should print file details with encoding.
Or you can just open the csv and when you go to File -> Save As this should show your encoding.
If those don't help, you can ignore the rows that are causing problems by using `error_bad_lines=False'
crimes = pd.read_csv('Your/file/path', encoding='utf8', error_bad_lines=False)
Hope these will help

Python 3 UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

I'm implementing this notebook on Windows with Python 3.5.3 and got the follow error on load_vectors() call. I've tried different solutions posted but none worked.
<ipython-input-86-dd4c123b0494> in load_vectors(loc)
1 def load_vectors(loc):
2 return (load_array(loc+'.dat'),
----> 3 pickle.load(open(loc+'_words.pkl','rb')),
4 pickle.load(open(loc+'_idx.pkl','rb')))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)
I solved this issue by copying and pasting the entire csv file into text and reading it with:
with open(self.path + "/review_collection.txt", "r", encoding="utf-8") as f:
read = f.read().splitlines()
for row in read:
print(row)
You should probably give encoding for pickle.load(f, encoding='latin1'), but please make sure all the characters in your file will follow the encoding.
By default, your pickle code is trying to decode the file with 'ASCII' which fails. Instead you can explicitly tell which one to use. See this from Documentation.
If latin1 doesn't solve, try with encoding='bytes' and then decode all the keys and values later on.
I got the same error as well. I realized that I copy and pasted text from a file that had left and right double-quotes (curly quotes). Once I changed it to the standard straight double-quotes (") the issue was fixed!
See this link for the difference between the quotes: https://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html

Recode bytes which cannot be decoded in utf-8 in python

reading in from txt files - there is one byte which is causing me issues to encode:
with open(input_filename_and_director, 'rb') as f:
r = unicodecsv.reader(f, delimiter="|")
Results in an error message:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 26: invalid continuation byte
Is there anyway to specify how I want these bytes handled (i.e. to read this byte in as another character?)
Depending upon what you want, try using unicodecsv.reader(f, delimiter="|", errors='replace') or unicodecsv.reader(f, delimiter="|", errors='ignore'). unicodecsv passes through the errors parameter to the unicode encoding. See the help for unicode or here for more information.

Categories