I'm getting a UnicodeDecodeError: 'ascii' codec can't decode byte 0x84 in position 24245: ordinal not in range(128) on multiple files, for all of which the position given is basically the end of the file. Chardet.detect() gives me ASCII as the codec with 1.0 confidence.
Does anyone know what encoding this should probably be? This file was written in windows so I assume that has something to do with it.
Edit: Removed hex dump.
Related
I used sed = np.loadtxt("file.dat") to read the dat file which include different signals. However, I got UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 1406: character maps to <undefined>. May I know how can I solve this issue?
I am not sure how to read the .dat file which contains the vibration signal.
i have the below posted binary contents. i would like to convert them into a readible text/string. i found some questions related to the same issue and they suggested that the binary contents must be decoded to utf-8 or ascii. however, i tried both and
i got the following errors respectively:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 54: invalid start byte
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 54: ordinal not in range(128)
please let me know how to decode the below posted binary
binary contents:
b"MM\x00*\x00\x00\x00\x08\x00\x12\x01\x00\x00\x03\x00\x00\x00\x01\x00$\x00\x00\x01\x01\x00\x03\x00\x00\x00\x01\x00%\x00\x00\x01\x02\x00\x03\x00\x00\x00\x01\x00 \x00\x00\x01\x03\x00\x03\x00\x00\x00\x01\x80\xb2\x00\x00\x01\x06\x00\x03\x00\x00\x00\x01\x00\x01\x00\x00\x01\x11\x00\x04\x00\x00\x00\x01\x00\x00\x01\xee\x01\x15\x00\x03\x00\x00\x00\x01\x00\x01\x00\x00\x01\x16\x00\x03\x00\x00\x00\x01\x00%\x00\x00\x01\x17\x00\x04\x00\x00\x00\x01\x00\x00\x00 \x01\x1a\x00\x05\x00\x00\x00\x01\x00\x00\x00\xe8\x01\x1b\x00\x05\x00\x00\x00\x01\x00\x00\x00\xf0\x01(\x00\x03\x00\x00\x00\x01\x00\x01\x00\x00\x01S\x00\x03\x00\x00\x00\x01\x00\x03\x00\x00\x83\x0e\x00\x0c\x00\x00\x00\x03\x00\x00\x00\xf8\x84\x82\x00\x0c\x00\x00\x00\x06\x00\x00\x01\x10\x87\xaf\x00\x03\x00\x00\x004\x00\x00\x01#\x87\xb0\x00\x0c\x00\x00\x00\x03\x00\x00\x01\xa8\x87\xb1\x00\x02\x00\x00\x00.\x00\x00\x01\xc0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x01#0\x03\x80T\x81\xa0\x00#/\xfb\x9e\xac\xf4S\x07\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00A&\xb9\xf8\x04\xce4\x9bAYYY\xd7\xe3ZB\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x01\x00\x00\x00\x0c\x04\x00\x00\x00\x00\x01\x00\x01\x04\x01\x00\x00\x00\x01\x00\x01\x08\x02\x00\x00\x00\x01\x00\x01\x04\x02\x87\xb1\x00%\x00\x00\x08\x01\x87\xb1\x00\x06\x00&\x08\x06\x00\x00\x00\x01#\x8e\x08\x08\x00\x00\x00\x01\x00\x01\x08\t\x87\xb0\x00\x01\x00\x00\x08\n\x87\xb0\x00\x01\x00\x01\x08\r\x87\xb0\x00\x01\x00\x02\x0c\x00\x00\x00\x00\x01\x0f\x11\x0c\x04\x00\x00\x00\x01#)AXT\xa6#\x00\x00\x00AXT\xa6#\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00Popular Visualisation Pseudo Mercator|WGS 84|\x00x^\xed\xc3\x01\t\x00\x00\x0c\x03\xa05_\xb5G[\x8e\x83\x82\xbd\xa4\xaa\xaa\xaa\xaa\xaa\x0f\x0e]\x13|'"
i have a function returns objects as memoryView as follows:
data:[(<memory at 0x000001C665563E80>,)]
data[0]:(<memory at 0x000001C665563E80>,)
data[0][0]:<memory at 0x000001C665563E80>
as an attempt to see the data contained in the memoryView, i altered the encoding to be iso-8859-1 and used .tobytes but both resulted in an empty string as follwos
iso-8859-1:
tobytes:b''
i also used base64.b64encode(data[0][0]) and the result was base64_data:b''
please let me know how to extract the data contained in a memoryView objects
NOTE:i am using windows operating system
errors received:
UnicodeEncodeError: 'charmap' codec can't encode character '\x93'
UnicodeEncodeError: 'charmap' codec can't encode character '\x89'
attempts to solve this issue:
str(data[0][0],'iso-8859-1')#> caused:UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 1: invalid start byte and UnicodeDecodeError: 'utf-8' codec can't decode byte 0x91
running: `chcp 65001` #>did not solve it
I am really beginning at python, but I am hours in this line, can't go anywhere without fixing it.
cadastro_2019_10= pd.read_csv("inf_cadastral_fi_20191015.csv",delimiter=";")[["CNPJ_FUNDO","DENOM_SOCIAL","CLASSE"]]
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc9 in position 49: invalid continuation byte
cadastro_2019_10= pd.read_csv("inf_cadastral_fi_20191015.csv",delimiter=";")[["CNPJ_FUNDO","DENOM_SOCIAL","CLASSE"]]
again:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc9 in position 388: invalid continuation byte
Figure out what encoding the CSV file uses. Seems it doesn't use UTF-8. Say it's latin1, then you can try with read_csv(..., encoding="latin1").
If you are on a UNIX system, you can use the file command to try to detect the encoding.
I found that I had to add :encoding='cp1252'
but thank you for your time
I have the following record from JSON file that is giving me the error-
{"categoryId":"mpc-pc-optimization",
"categoryName":"PC Optimization",
"productMap":
{"mpp-aol-computer-checkup":"AOL Computer Checkup®",
"mpp-assist-by-aol-free-scan":"Assist by AOL Free Scan",
"mpp-mybenefits":"Monthly Statement of Benefits",
"mpp-perfectspeed":"PerfectSpeed",
"mpp-system-checkup":"System Checkup™","mpp-system-mechanic":"System Mechanic®"}}
The highlighted portion is causing the error.
How do I fix it?
The error comes from that ™ (trademark symbol), which is not part of the ascii code.
The byte 0xe2 is 11100010 in binary, which is outside the range of 128 (01111111 in binary).
The problem is that you are trying to decode with ascii, and instead should decode with unicode (e.g. UTF-8).
You could use a try-catch-block to catch the exception and then handle it by decoding as UTF-8.
try:
unicode(my_json_string, "ascii")
except UnicodeError:
value = unicode(my_json_string, "utf-8")