I have a Python string of bytes data. An example string looks like this:
string = "b'\xabVJ-K\xcd+Q\xb2R*.M*N.\xcaLJU\xd2QJ\xceH\xcc\xcbK\xcd\x01\x89\x16\xe4\x97\xe8\x97d&g\xa7\x16Y\x85\x06\xbb8\xeb\x02\t\xa5Z\x00'"
It is a string, it not not bytes. I wish to convert it to bytes. Normal approaches (like encode) yield this:
b'\\xabVJ-K\\xcd+Q\\xb2R*.M*N.\\xcaLJU\\xd2QJ\\xceH\\xcc\\xcbK\\xcd\\x01\\x89\\x16\\xe4\\x97\\xe8\\x97d&g\\xa7\\x16Y\\x85\\x06\\xbb8\\xeb\\x02\\t\\xa5Z\\x00'
which leads to issues (note the addition of all the extra slashes).
I've looked through 10+ potential answers to this question on SO and only one of them works, and its a solution I'd prefer not to use, for obvious reasons:
this_works = eval(string)
Is there any way to get this to work without eval? Other potential solutions I've tried, that failed:
Option 1
Option 2
Option 3
I assume that you have python-like string representation in variable s:
s = r"b'\xabVJ-K\xcd+Q\xb2R*.M*N.\xcaLJU\xd2QJ\xceH\xcc\xcbK\xcd\x01\x89\x16\xe4\x97\xe8\x97d&g\xa7\x16Y\x85\x06\xbb8\xeb\x02\t\xa5Z\x00'"
Yes, if you eval this then you got real python bytes object.
But you can try parse it with ast module:
import ast
s = r"b'\xabVJ-K\xcd+Q\xb2R*.M*N.\xcaLJU\xd2QJ\xceH\xcc\xcbK\xcd\x01\x89\x16\xe4\x97\xe8\x97d&g\xa7\x16Y\x85\x06\xbb8\xeb\x02\t\xa5Z\x00'"
tree = ast.parse(s)
value = tree.body[0].value.value
print(type(value), value)
This will output your bytes object:
<class 'bytes'> b'\xabVJ-K\xcd+Q\xb2R*.M*N.\xcaLJU\xd2QJ\xceH\xcc\xcbK\xcd\x01\x89\x16\xe4\x97\xe8\x97d&g\xa7\x16Y\x85\x06\xbb8\xeb\x02\t\xa5Z\x00'
Related
In python 3, I have a str like this, which is the exactly literal representation of bytes data:
'8\x81p\x925\x00\x003dx\x91P\x00x\x923\x00\x00\x91Pd\x00\x921d\x81p1\x00\x00'
I would like to convert it to real byte,
b'8\x81p\x925\x00\x003dx\x91P\x00x\x923\x00\x00\x91Pd\x00\x921d\x81p1\x00\x00'
I tried to use .encode() on the str data, but the result added many "xc2":
b'8\xc2\x81p\xc2\x925\x00\x003dx\xc2\x91P\x00x\xc2\x923\x00\x00\xc2\x91Pd\x00\xc2\x921d\xc2\x81p1\x00\x00'.
I also tried:
import ast
ast.literal_eval("b'8\x81p\x925\x00\x003dx\x91P\x00x\x923\x00\x00\x91Pd\x00\x921d\x81p1\x00\x00'")
The result is:
ValueError: source code string cannot contain null bytes
How to convert the str input to the bytes as exactly the same as follows?
b'8\x81p\x925\x00\x003dx\x91P\x00x\x923\x00\x00\x91Pd\x00\x921d\x81p1\x00\x00'
You are on the right track with the encode function already. Just try with this encoding:
>>> '8\x81p\x925\x00\x003dx\x91P\x00x\x923\x00\x00\x91Pd\x00\x921d\x81p1\x00\x00'.encode('raw_unicode_escape')
b'8\x81p\x925\x00\x003dx\x91P\x00x\x923\x00\x00\x91Pd\x00\x921d\x81p1\x00\x00'
I took it from this table in Python's codecs documentation
Edit: I just found it needs raw_unicode_escape instead of unicode_escape
What's the correct way to convert bytes to a hex string in Python 3?
I see claims of a bytes.hex method, bytes.decode codecs, and have tried other possible functions of least astonishment without avail. I just want my bytes as hex!
Since Python 3.5 this is finally no longer awkward:
>>> b'\xde\xad\xbe\xef'.hex()
'deadbeef'
and reverse:
>>> bytes.fromhex('deadbeef')
b'\xde\xad\xbe\xef'
works also with the mutable bytearray type.
Reference: https://docs.python.org/3/library/stdtypes.html#bytes.hex
Use the binascii module:
>>> import binascii
>>> binascii.hexlify('foo'.encode('utf8'))
b'666f6f'
>>> binascii.unhexlify(_).decode('utf8')
'foo'
See this answer:
Python 3.1.1 string to hex
Python has bytes-to-bytes standard codecs that perform convenient transformations like quoted-printable (fits into 7bits ascii), base64 (fits into alphanumerics), hex escaping, gzip and bz2 compression. In Python 2, you could do:
b'foo'.encode('hex')
In Python 3, str.encode / bytes.decode are strictly for bytes<->str conversions. Instead, you can do this, which works across Python 2 and Python 3 (s/encode/decode/g for the inverse):
import codecs
codecs.getencoder('hex')(b'foo')[0]
Starting with Python 3.4, there is a less awkward option:
codecs.encode(b'foo', 'hex')
These misc codecs are also accessible inside their own modules (base64, zlib, bz2, uu, quopri, binascii); the API is less consistent, but for compression codecs it offers more control.
New in python 3.8, you can pass a delimiter argument to the hex function, as in this example
>>> value = b'\xf0\xf1\xf2'
>>> value.hex('-')
'f0-f1-f2'
>>> value.hex('_', 2)
'f0_f1f2'
>>> b'UUDDLRLRAB'.hex(' ', -4)
'55554444 4c524c52 4142'
https://docs.python.org/3/library/stdtypes.html#bytes.hex
The method binascii.hexlify() will convert bytes to a bytes representing the ascii hex string. That means that each byte in the input will get converted to two ascii characters. If you want a true str out then you can .decode("ascii") the result.
I included an snippet that illustrates it.
import binascii
with open("addressbook.bin", "rb") as f: # or any binary file like '/bin/ls'
in_bytes = f.read()
print(in_bytes) # b'\n\x16\n\x04'
hex_bytes = binascii.hexlify(in_bytes)
print(hex_bytes) # b'0a160a04' which is twice as long as in_bytes
hex_str = hex_bytes.decode("ascii")
print(hex_str) # 0a160a04
from the hex string "0a160a04" to can come back to the bytes with binascii.unhexlify("0a160a04") which gives back b'\n\x16\n\x04'
import codecs
codecs.getencoder('hex_codec')(b'foo')[0]
works in Python 3.3 (so "hex_codec" instead of "hex").
it can been used the format specifier %x02 that format and output a hex value. For example:
>>> foo = b"tC\xfc}\x05i\x8d\x86\x05\xa5\xb4\xd3]Vd\x9cZ\x92~'6"
>>> res = ""
>>> for b in foo:
... res += "%02x" % b
...
>>> print(res)
7443fc7d05698d8605a5b4d35d56649c5a927e2736
OK, the following answer is slightly beyond-scope if you only care about Python 3, but this question is the first Google hit even if you don't specify the Python version, so here's a way that works on both Python 2 and Python 3.
I'm also interpreting the question to be about converting bytes to the str type: that is, bytes-y on Python 2, and Unicode-y on Python 3.
Given that, the best approach I know is:
import six
bytes_to_hex_str = lambda b: ' '.join('%02x' % i for i in six.iterbytes(b))
The following assertion will be true for either Python 2 or Python 3, assuming you haven't activated the unicode_literals future in Python 2:
assert bytes_to_hex_str(b'jkl') == '6a 6b 6c'
(Or you can use ''.join() to omit the space between the bytes, etc.)
If you want to convert b'\x61' to 97 or '0x61', you can try this:
[python3.5]
>>>from struct import *
>>>temp=unpack('B',b'\x61')[0] ## convert bytes to unsigned int
97
>>>hex(temp) ##convert int to string which is hexadecimal expression
'0x61'
Reference:https://docs.python.org/3.5/library/struct.html
I have a column in my pandas dataframe which stores bytes. I believe the bytes are getting converted to a string when I put it in the dataframe because dataframe doesn't support actual bytes as a dtype. So instead of the column values being b'1a2b', it ends up getting wrapped in a string like this: "b'1a2b'".
I'm passing these values into a method that expects bytes. When I pass it like this ParseFromString("b'1a2b'"), I get the error message:
TypeError: memoryview: a bytes-like object is required, not 'str'
I was confused if encode or decode works in this case or if there is some other way to convert this wrapped bytes into bytes? (I'm using Python 3)
Since these values are in a dataframe, I can use a helper method during the conversion process from string-->bytes--> protocol buffer since the actual dataframe might not be able to store it as bytes. For example, my_dataframe.apply(_helper_method_convert_string_to_bytes_to_protobuf).
So the problem seems to be that you are unable to extract the byte object from the string. When you pass the string to the function, which is expecting a byte object like b'1a2b', it throws an error. My suggestion would be to try wrapping your string in an eval function. Like:
a = "b'1a2b'"
b = eval(a)
b is what you want. You haven't shared the code for your function, so I'm unable to do amend the actual code for you.
You can take a few approaches here, noting that eval() is considered bad practice and it is best to avoid this where possible.
Store your byte representation as a string and encode() on call to function
Extract the byte representation out of your string, then call encode() to function
Whilst if possible, it would be best to just store your bytes as 1a2b when importing the data, if that's not possible you could use regex to extract the contents of the string between b'' and pass the result to encode().
import re
string = "b'1a2b'"
re.search(r"(?<=').*(?=')", string).group().encode()
Output:
#b'1a2b'
type(re.search(r"(?<=').*(?=')", string).group().encode())
#<class 'bytes'>
I want to generate bytes sequence containing string length and string content.
For example, for string 'hello' I want to get b'\x05hello'
After some docs reading I've wrote a function:
def LenAndStrBytes(strdata):
return bytearray([len(strdata)&0xFF])+strdata if strdata!=[] else 0
question:
I'm new in python programming and I wonder, which are best python practices to concatenate different types of data like int and something iterable like bytearray
Did I write my function optimally?
Well, just larsmans points out, it depends on your usage. If you can get the result w/ clear code and fulfilling context limitation, it is nice practice which is suitable.
No need for &0xFF, bytearray checks to ensure values between 0 and 255.
>>> strdata = 'hello'
>>> bytearray([len(strdata)]) + strdata if strdata else bytearray()
bytearray(b'\x05hello')
And you could also
import struct
bytearray(struct.pack('B%ds' % len(strdata), len(strdata), strdata))
Are you trying to serialise binary data before you write it to a file (or send it over a network?)
Did you perhaps mean to use the pickle module for data serialisation instead?
I have a dictionary of dictionaries in Python:
d = {"a11y_firesafety.html":{"lang:hi": {"div1": "http://a11y.in/a11y/idea/a11y_firesafety.html:hi"}, "lang:kn": {"div1": "http://a11y.in/a11ypi/idea/a11y_firesafety.html:kn}}}
I have this in a JSON file and I encoded it using json.dumps(). Now when I decode it using json.loads() in Python I get a result like this:
temp = {u'a11y_firesafety.html': {u'lang:hi': {u'div1': u'http://a11y.in/a11ypi/idea/a11y_firesafety.html:hi'}, u'lang:kn': {u'div1': u'http://a11y.in/a11ypi/idea/a11y_firesafety.html:kn'}}}
My problem is with the "u" which signifies the Unicode encoding in front of every item in my temp (dictionary of dictionaries). How to get rid of that "u"?
Why do you care about the 'u' characters? They're just a visual indicator; unless you're actually using the result of str(temp) in your code, they have no effect on your code. For example:
>>> test = u"abcd"
>>> test == "abcd"
True
If they do matter for some reason, and you don't care about consequences like not being able to use this code in an international setting, then you could pass in a custom object_hook (see the json docs here) to produce dictionaries with string contents rather than unicode.
You could also use this:
import fileinput
fout = open("out.txt", 'a')
for i in fileinput.input("in.txt"):
str = i.replace("u\"","\"").replace("u\'","\'")
print >> fout,str
The typical json responses from standard websites have these two encoding representations - u' and u"
This snippet gets rid of both of them. It may not be required as this encoding doesn't hinder any logical processing, as mentioned by previous commenter
There is no "unicode" encoding, since unicode is a different data type and I don't really see any reason unicode would be a problem, since you may always convert it to string doing e.g. foo.encode('utf-8').
However, if you really want to have string objects upfront you should probably create your own decoder class and use it while decoding JSON.