Can't read data using read_csv due to encoding errors - python

So, I am facing a huge issue. I am trying to read a csv file which has '|' as delimiters. If I use utf-8 or utf-sig-8 as encoders then I get the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 0: invalid start byte
but I use the unicode_escape encoding then I get this error:
UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 13: \ at end of string
Is it an issue with the dataset?

it worked after I 'Saved with Encoding - utf-8' in Sublime Text Editor. I think the data had some issues.

Related

Trying to load a csv file which is encoded binarily in python

I am trying to load a csv file which is encoded binarily in python. When using pd.read_csv(), I get the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfe in position 16: invalid start byte
I have tried adding "encoding = 'utf-8'" and tried adding delimiters but that did not help.

unicode error occures when trying to get data in a memoryView

i have a function returns objects as memoryView as follows:
data:[(<memory at 0x000001C665563E80>,)]
data[0]:(<memory at 0x000001C665563E80>,)
data[0][0]:<memory at 0x000001C665563E80>
as an attempt to see the data contained in the memoryView, i altered the encoding to be iso-8859-1 and used .tobytes but both resulted in an empty string as follwos
iso-8859-1:
tobytes:b''
i also used base64.b64encode(data[0][0]) and the result was base64_data:b''
please let me know how to extract the data contained in a memoryView objects
NOTE:i am using windows operating system
errors received:
UnicodeEncodeError: 'charmap' codec can't encode character '\x93'
UnicodeEncodeError: 'charmap' codec can't encode character '\x89'
attempts to solve this issue:
str(data[0][0],'iso-8859-1')#> caused:UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 1: invalid start byte and UnicodeDecodeError: 'utf-8' codec can't decode byte 0x91
running: `chcp 65001` #>did not solve it

UBlox NAV_PVT message: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5

Does anybody know how to decode the NAV_PVT message in python?
I tried the UTF-8 but it I get this error message:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 0: invalid start byte
I can't find the right decode format.
You should read the file as binary, because it is binary. UBlox has a nice documentation on various formats/protocols. Check them
E.g. https://www.u-blox.com/sites/default/files/products/documents/u-blox8-M8_ReceiverDescrProtSpec_%28UBX-13003221%29.pdf page 332. Is this what you are looking for?
Or if you were using some libraries, you should check such documentation. But I assume or you mixed up the binary with ascii version, or you are just using the binary protocol.

'utf-8' codec can't decode byte 0xa0 in position 24: invalid start byte

I am trying to read a csv file using the following lines of Python code:
crimes = pd.read_csv('C:/Users/usuario1/Desktop/python/csv/001 Boston crimes/crime.csv', encoding = 'utf8')
crimes.head(5)
But I am getting decode error as follws:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 24: invalid start byte
What is going wrong?
May be your file does not support utf-8 codec or has a character that does not support utf-8. You can try other encodings like ISO-8859-1. But it is best to check your file encoding first. To do so, something like the following should work:
1.
with open('Your/file/path') as f:
print(f)
This should print file details with encoding.
Or you can just open the csv and when you go to File -> Save As this should show your encoding.
If those don't help, you can ignore the rows that are causing problems by using `error_bad_lines=False'
crimes = pd.read_csv('Your/file/path', encoding='utf8', error_bad_lines=False)
Hope these will help

Unicode Using sqlite3 in Python 2.7.3

I'm trying to insert into a table, but it seems that the file I opened has non-ascii characters in it. This is the error I got:
sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.
So after doing some research, I tried putting this in my code:
encode("utf8","ignore")
Which then gave me this error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 9: ordinal not in range(128)
So then I tried using the codecs library and open the file like this:
codecs.open(fileName, encoding='utf-8')
which gave me this error:
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 0: invalid start byte
Then instead of utf-8, I used utf-16 to see if that would do anything and I got this error:
raise UnicodeError,"UTF-16 stream does not start with BOM"
UnicodeError: UTF-16 stream does not start with BOM
I'm all out of ideas...
Also I'm using Ubuntu, if it helps.

Categories