Unable to decode byte string from MQTT - python

I have a MQTT broker which is recieving data from an external publisher. When a subscriber is recieving this data it is as a byte string. Here is an example:
payload = b'\xd1\x04\x1c\x00\x00\x00A8000_CP805x_VG_BF1702056637\x01\xe8\x03\x8f\x03\x01\x00'
The problem is in decoding this payload.
When i try to decode it with different encodings i get:
encodings = ['utf-7', 'utf-8', 'utf-8-sig',
'utf-16', 'utf-16-be', 'utf-16-le',
'utf-32', 'utf-32-be', 'utf-32-le',
'ASCII', 'latin-1', 'iso-8859-1']
for enc in encodings:
try:
print('[' + enc + ']: \t\t' + b'\xd1\x04\x1c\x00\x00\x00A8000_CP805x_VG_BF1702056637\x01\xe8\x03\x8f\x03\x01\x00'.decode(enc))
except Exception as e:
print('[' + enc + ']: \t\t' + str(e))
# Output:
#[utf-7]: 'utf7' codec can't decode byte 0xd1 in position 0: unexpected special character
#[utf-8]: 'utf-8' codec can't decode byte 0xd1 in position 0: invalid continuation byte
#[utf-8-sig]: 'utf-8' codec can't decode byte 0xd1 in position 0: invalid continuation byte
#[utf-16]: 'utf-16-le' codec can't decode byte 0x00 in position 40: truncated data
#[utf-16-be]: 'utf-16-be' codec can't decode byte 0x00 in position 40: truncated data
#[utf-16-le]: 'utf-16-le' codec can't decode byte 0x00 in position 40: truncated data
#[utf-32]: 'utf-32-le' codec can't decode bytes in position 0-3: code point not in range(0x110000)
#[utf-32-be]: 'utf-32-be' codec can't decode bytes in position 0-3: code point not in range(0x110000)
#[utf-32-le]: 'utf-32-le' codec can't decode bytes in position 0-3: code point not in range(0x110000)
#[ASCII]: 'ascii' codec can't decode byte 0xd1 in position 0: ordinal not in range(128)
#[latin-1]: ÑA8000_CP805x_VG_BF1702056637è
#[iso-8859-1]: ÑA8000_CP805x_VG_BF1702056637è
None of these are acceptable, and i am at a miss to what i can do. I'm guessing the problem is that i'm using the wrong encoding, but i unfortunately have no documentation on the payload.
Additional information:
The 'A8000_CP805x_VG_BF1702056637' of the byte string is a reference to the device sending the data. I am not sure if this is part of the problem.
This is only one of the payloads. There are also much larger payloads, where i get similar results that are unreadable, but also contain non byte strings in between the bytes.
Any and all help is welcome.

Related

How to decode python byte code to ASCII? (Selenium. Getting xml from network response)

How to decode pyhon bytecode to ascii?
I extract data with selenium from network response. Should get xml.
Getting: ['b'\xa5\xff\xff\xc7\x88\xe4\xb4\xd7\x03\xa0\x11:|\xce\xdb\xb7\x0f\xf1\xdf\xfc\x1f\xdb\x93\x91^\xbc\xa3\xdd\xc2\x02V\x00\xba$\xbd\x10\xd2\xd0E\xf2\x90\xb6\xca\xee\x10\xbf\xbf_\xbf\xfc\xef?\xe9\x13{H\xf1\xa1\xa0\x00\x1c\x01(\x80\x1c\x81\x02(s\xe7Z\xf3\xb3N\xf5L\xdc>\xe7\x8f\xbbwl\xbf\x99\x91\xd4O\xde\xb4,\xf3PH\x02L1\x00\xc98\xc3,\x13!\x82\xc6\xc2\xa6Bd"k\xcb\x9d(\xb9\x13%WQr\x15%W\xb1\xe5J\t\x9e:\x8a\x03\x99\x06H\xd0\x8f\xd8\xfe\x9f9\xbc\xfc\x157\x111\xd7\x15\xaab\xfb\xe8;\xab\xee\xfc\x9b\xeeu\x10<d\x04\x06Y\xa8\xd7\x9f\x11...
Code:
...
for request in driver.requests: if request.response: text_file.write(str(request.response.body))
I've tried:
decoded = request.response.body.decode('ascii')
or request.response.body.decode('utf-8') or cp1251/1252
I get:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa5 in position 0: ordinal not in range(128)
Response should be xml (~1,5mb) in attached photoresponse
If I use:
decoded = base64.b64decode(request.response.body)
I'm getting smth like: b'T#\x00\xad\x9a\xb5\xba\xfa3u\xca\x84PG\xbd\x8a\xab\x1f\xcdcJ%\r\xd4\xff\x0c$)\x9a>.... not what to be expected.
Combining decoded = base64.b64decode(request.response.body).decode('ascii') also doesnt help:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x85 in position 0: ordinal not in range(128)
Help me, please.
Its because of the Header 'Content-Encoding': 'br'
Installing brotly helped. Also deleting
This message helped a lot

E-Mail decode issue: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 135: invalid start byte

E-Mail clients decode messages correctly. So I assume there must be also a way do decode emails with python correctly.
I use the building email python library to process incoming emails.
import email
...
email_message = email.message_from_file(fp)
email_message.is_multipart() # => False
email_message.get_content_type() # 'text/plain'
to_decode = email_message.get_payload(decode=True)
charset = email_message.get_content_charset()
# charset is utf-8
to_decode.decode(charset)
Exception:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 135: invalid start byte
This is a part of the string within to_decode variable.
b'Dzie\\u0144 dobry,\n\nniestety w podany'
I figured out with try and error that I can to the following.
test = b'Dzie\\u0144 dobry,\n\nniestety w podany'
test.decode('unicode-escape')
>> output: 'Dzień dobry,\n\nniestety w podany'
Which is correct. But I think there must be a better way instead of guessing. How is my email client doing this?

How to fix UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d : character maps to <undefined>?

I am using curl to data.
import os
cmd = "curl --data \"action=getdata\" https:localhost:8070"
print(cmd)
data = os.popen(cmd).read()
The line above produces an error UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 565334: character maps to <undefined>.
When I debugged using breakpoints, the command os.popen generates a large corpus of text and when it goes to read() the error arises in file cp1252.py in IncrementalDecoder class. I tried doing,
data = os.popen(cmd).read().encode('utf-8').decode('ascii')
and
data = os.popen(cmd).read().encode().decode('utf-8')
But the error persists. How can we solve this?

I got an error decoding from binary to ascii

When I use the following code:
import requests
def googleSearch(qu):
with requests.session() as c:
url = 'https://www.google.com'
qu = {'q': qu}
urllink = requests.get(url, params=qu)
x=urllink.url
return x
x=googleSearch('translation')
print(x)
import urllib.request
site=urllib.request.urlopen(x)
bytes=site.read()
"artificial limit of size: "
"bytes=bytes[0:6000]"
text=bytes.decode("utf8")
print (text)
I got the the following errors (running the program again and again):
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 6116: invalid continuation byte
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 6143: invalid continuation byte
etc.
So I suppose the "site" file is to big.
When I limit the size of the file to 6000 bytes there is no error"
What is happening? Should I slice the file and treat each slice separately?

'utf-8' codec can't decode byte 0xff in position 0: invalid start byte. What should I do?

It doesn't decode it properly. In fact, it says what's on the title. What does it mean? What should I do?
im1_bytes = client_socket.recv(int_size)
im1_str = im1_bytes.decode('utf-8')

Categories