Unicode Using sqlite3 in Python 2.7.3 - python

I'm trying to insert into a table, but it seems that the file I opened has non-ascii characters in it. This is the error I got:
sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.
So after doing some research, I tried putting this in my code:
encode("utf8","ignore")
Which then gave me this error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 9: ordinal not in range(128)
So then I tried using the codecs library and open the file like this:
codecs.open(fileName, encoding='utf-8')
which gave me this error:
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 0: invalid start byte
Then instead of utf-8, I used utf-16 to see if that would do anything and I got this error:
raise UnicodeError,"UTF-16 stream does not start with BOM"
UnicodeError: UTF-16 stream does not start with BOM
I'm all out of ideas...
Also I'm using Ubuntu, if it helps.

Related

Python 3 can't decode certain characters when readin

I have some super simple code trying to open a file, but it contains some Chinese/Arabic characters which I believe are stopping me from being able to open it. I'm not sure how to modify the file in order to allow it to open these characters. My code is simply
a_file = open("test2.txt")
lines = a_file.readlines()
print(lines)
and my error message is
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 2948: character maps to <undefined>
How do I fix this? Thanks!
The error message
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 2948: character maps to
is telling you that the bytes in the file cannot be decoded using the system's default encoding (the "'charmap' codec can't decode" message typically appears on Windows systems using legacy 8-bit encodings.)
If the file contains chinese or arabic characters it's more likely that the correct encoding to use when opening the file is UTF-8 or UTF-16.
Note that ISO-5589-1 / latin-1 encoding will decode any bytes, but the result may be meaningless, because it's an 8-bit encoding that can only represent 256 characters.
>>> s = '你好,世界'
>>> bs = s.encode('utf-8')
>>> print(bs.decode('ISO-8859-1'))
你好ï¼ä¸ç

UBlox NAV_PVT message: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5

Does anybody know how to decode the NAV_PVT message in python?
I tried the UTF-8 but it I get this error message:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 0: invalid start byte
I can't find the right decode format.
You should read the file as binary, because it is binary. UBlox has a nice documentation on various formats/protocols. Check them
E.g. https://www.u-blox.com/sites/default/files/products/documents/u-blox8-M8_ReceiverDescrProtSpec_%28UBX-13003221%29.pdf page 332. Is this what you are looking for?
Or if you were using some libraries, you should check such documentation. But I assume or you mixed up the binary with ascii version, or you are just using the binary protocol.

Can't read data using read_csv due to encoding errors

So, I am facing a huge issue. I am trying to read a csv file which has '|' as delimiters. If I use utf-8 or utf-sig-8 as encoders then I get the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 0: invalid start byte
but I use the unicode_escape encoding then I get this error:
UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 13: \ at end of string
Is it an issue with the dataset?
it worked after I 'Saved with Encoding - utf-8' in Sublime Text Editor. I think the data had some issues.

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2860: ordinal not in range(128)

I have the following record from JSON file that is giving me the error-
{"categoryId":"mpc-pc-optimization",
"categoryName":"PC Optimization",
"productMap":
{"mpp-aol-computer-checkup":"AOL Computer Checkup®",
"mpp-assist-by-aol-free-scan":"Assist by AOL Free Scan",
"mpp-mybenefits":"Monthly Statement of Benefits",
"mpp-perfectspeed":"PerfectSpeed",
"mpp-system-checkup":"System Checkup™","mpp-system-mechanic":"System Mechanic®"}}
The highlighted portion is causing the error.
How do I fix it?
The error comes from that ™ (trademark symbol), which is not part of the ascii code.
The byte 0xe2 is 11100010 in binary, which is outside the range of 128 (01111111 in binary).
The problem is that you are trying to decode with ascii, and instead should decode with unicode (e.g. UTF-8).
You could use a try-catch-block to catch the exception and then handle it by decoding as UTF-8.
try:
unicode(my_json_string, "ascii")
except UnicodeError:
value = unicode(my_json_string, "utf-8")

Python 3.4 decode HEX string

I'm facing issues to decode the following hex string in python 3.4:
b'"\x00\x08\x00\x83\x80\x00\x00\x00\x86\x11\x1dBA\x8c\xdb\xc0\\p\xfe#NR09G06654\x00\x00\x00'
I'm trying with a simple:
data = b'"\x00\x08\x00\x83\x80\x00\x00\x00\x86\x11\x1dBA\x8c\xdb\xc0\\p\xfe#NR09G06654\x00\x00\x00'
print(data.decode('ascii'))
But I am getting the following error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position 4: ordinal not in range(128)
I have also tried to change to UTF-8
print(data.decode('utf-8'))
But with no success as the error is:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x83 in position 4: invalid start byte
I have no clue what the problem could be.
There are many communication protocols for GPS devices. A lot of devices use NMEA0183, but that is a plain text protocol and this is clearly not plain text.
If you're not running ms-windows, you should check if your GPS is supported by gpsd. It translates the signals from the GPS into something understandable. It has Python bindings available.

Categories