Converting Address List to Hash 160 [duplicate] - python

What's the correct way to convert bytes to a hex string in Python 3?
I see claims of a bytes.hex method, bytes.decode codecs, and have tried other possible functions of least astonishment without avail. I just want my bytes as hex!

Since Python 3.5 this is finally no longer awkward:
>>> b'\xde\xad\xbe\xef'.hex()
'deadbeef'
and reverse:
>>> bytes.fromhex('deadbeef')
b'\xde\xad\xbe\xef'
works also with the mutable bytearray type.
Reference: https://docs.python.org/3/library/stdtypes.html#bytes.hex

Use the binascii module:
>>> import binascii
>>> binascii.hexlify('foo'.encode('utf8'))
b'666f6f'
>>> binascii.unhexlify(_).decode('utf8')
'foo'
See this answer:
Python 3.1.1 string to hex

Python has bytes-to-bytes standard codecs that perform convenient transformations like quoted-printable (fits into 7bits ascii), base64 (fits into alphanumerics), hex escaping, gzip and bz2 compression. In Python 2, you could do:
b'foo'.encode('hex')
In Python 3, str.encode / bytes.decode are strictly for bytes<->str conversions. Instead, you can do this, which works across Python 2 and Python 3 (s/encode/decode/g for the inverse):
import codecs
codecs.getencoder('hex')(b'foo')[0]
Starting with Python 3.4, there is a less awkward option:
codecs.encode(b'foo', 'hex')
These misc codecs are also accessible inside their own modules (base64, zlib, bz2, uu, quopri, binascii); the API is less consistent, but for compression codecs it offers more control.

New in python 3.8, you can pass a delimiter argument to the hex function, as in this example
>>> value = b'\xf0\xf1\xf2'
>>> value.hex('-')
'f0-f1-f2'
>>> value.hex('_', 2)
'f0_f1f2'
>>> b'UUDDLRLRAB'.hex(' ', -4)
'55554444 4c524c52 4142'
https://docs.python.org/3/library/stdtypes.html#bytes.hex

The method binascii.hexlify() will convert bytes to a bytes representing the ascii hex string. That means that each byte in the input will get converted to two ascii characters. If you want a true str out then you can .decode("ascii") the result.
I included an snippet that illustrates it.
import binascii
with open("addressbook.bin", "rb") as f: # or any binary file like '/bin/ls'
in_bytes = f.read()
print(in_bytes) # b'\n\x16\n\x04'
hex_bytes = binascii.hexlify(in_bytes)
print(hex_bytes) # b'0a160a04' which is twice as long as in_bytes
hex_str = hex_bytes.decode("ascii")
print(hex_str) # 0a160a04
from the hex string "0a160a04" to can come back to the bytes with binascii.unhexlify("0a160a04") which gives back b'\n\x16\n\x04'

import codecs
codecs.getencoder('hex_codec')(b'foo')[0]
works in Python 3.3 (so "hex_codec" instead of "hex").

it can been used the format specifier %x02 that format and output a hex value. For example:
>>> foo = b"tC\xfc}\x05i\x8d\x86\x05\xa5\xb4\xd3]Vd\x9cZ\x92~'6"
>>> res = ""
>>> for b in foo:
... res += "%02x" % b
...
>>> print(res)
7443fc7d05698d8605a5b4d35d56649c5a927e2736

OK, the following answer is slightly beyond-scope if you only care about Python 3, but this question is the first Google hit even if you don't specify the Python version, so here's a way that works on both Python 2 and Python 3.
I'm also interpreting the question to be about converting bytes to the str type: that is, bytes-y on Python 2, and Unicode-y on Python 3.
Given that, the best approach I know is:
import six
bytes_to_hex_str = lambda b: ' '.join('%02x' % i for i in six.iterbytes(b))
The following assertion will be true for either Python 2 or Python 3, assuming you haven't activated the unicode_literals future in Python 2:
assert bytes_to_hex_str(b'jkl') == '6a 6b 6c'
(Or you can use ''.join() to omit the space between the bytes, etc.)

If you want to convert b'\x61' to 97 or '0x61', you can try this:
[python3.5]
>>>from struct import *
>>>temp=unpack('B',b'\x61')[0] ## convert bytes to unsigned int
97
>>>hex(temp) ##convert int to string which is hexadecimal expression
'0x61'
Reference:https://docs.python.org/3.5/library/struct.html

Related

convert python string of byte data to bytes

I have a Python string of bytes data. An example string looks like this:
string = "b'\xabVJ-K\xcd+Q\xb2R*.M*N.\xcaLJU\xd2QJ\xceH\xcc\xcbK\xcd\x01\x89\x16\xe4\x97\xe8\x97d&g\xa7\x16Y\x85\x06\xbb8\xeb\x02\t\xa5Z\x00'"
It is a string, it not not bytes. I wish to convert it to bytes. Normal approaches (like encode) yield this:
b'\\xabVJ-K\\xcd+Q\\xb2R*.M*N.\\xcaLJU\\xd2QJ\\xceH\\xcc\\xcbK\\xcd\\x01\\x89\\x16\\xe4\\x97\\xe8\\x97d&g\\xa7\\x16Y\\x85\\x06\\xbb8\\xeb\\x02\\t\\xa5Z\\x00'
which leads to issues (note the addition of all the extra slashes).
I've looked through 10+ potential answers to this question on SO and only one of them works, and its a solution I'd prefer not to use, for obvious reasons:
this_works = eval(string)
Is there any way to get this to work without eval? Other potential solutions I've tried, that failed:
Option 1
Option 2
Option 3
I assume that you have python-like string representation in variable s:
s = r"b'\xabVJ-K\xcd+Q\xb2R*.M*N.\xcaLJU\xd2QJ\xceH\xcc\xcbK\xcd\x01\x89\x16\xe4\x97\xe8\x97d&g\xa7\x16Y\x85\x06\xbb8\xeb\x02\t\xa5Z\x00'"
Yes, if you eval this then you got real python bytes object.
But you can try parse it with ast module:
import ast
s = r"b'\xabVJ-K\xcd+Q\xb2R*.M*N.\xcaLJU\xd2QJ\xceH\xcc\xcbK\xcd\x01\x89\x16\xe4\x97\xe8\x97d&g\xa7\x16Y\x85\x06\xbb8\xeb\x02\t\xa5Z\x00'"
tree = ast.parse(s)
value = tree.body[0].value.value
print(type(value), value)
This will output your bytes object:
<class 'bytes'> b'\xabVJ-K\xcd+Q\xb2R*.M*N.\xcaLJU\xd2QJ\xceH\xcc\xcbK\xcd\x01\x89\x16\xe4\x97\xe8\x97d&g\xa7\x16Y\x85\x06\xbb8\xeb\x02\t\xa5Z\x00'

how to keep the raw format of hexdump in python2? [duplicate]

i need to load the third column of this text file as a hex string
http://www.netmite.com/android/mydroid/1.6/external/skia/emoji/gmojiraw.txt
>>> open('gmojiraw.txt').read().split('\n')[0].split('\t')[2]
'\\xF3\\xBE\\x80\\x80'
how do i open the file so that i can get the third column as hex string:
'\xF3\xBE\x80\x80'
i also tried binary mode and hex mode, with no success.
You can:
Remove the \x-es
Use .decode('hex') on the resulting string
Code:
>>> '\\xF3\\xBE\\x80\\x80'.replace('\\x', '').decode('hex')
'\xf3\xbe\x80\x80'
Note the appropriate interpretation of backslashes. When the string representation is '\xf3' it means it's a single-byte string with the byte value 0xF3. When it's '\\xf3', which is your input, it means a string consisting of 4 characters: \, x, f and 3
Quick'n'dirty reply
your_string.decode('string_escape')
>>> a='\\xF3\\xBE\\x80\\x80'
>>> a.decode('string_escape')
'\xf3\xbe\x80\x80'
>>> len(_)
4
Bonus info
>>> u='\uDBB8\uDC03'
>>> u.decode('unicode_escape')
Some trivia
What's interesting, is that I have Python 2.6.4 on Karmic Koala Ubuntu (sys.maxunicode==1114111) and Python 2.6.5 on Gentoo (sys.maxunicode==65535); on Ubuntu, the unicode_escape-decode result is \uDBB8\uDC03 and on Gentoo it's u'\U000fe003', both correctly of length 2. Unless it's something fixed between 2.6.4 and 2.6.5, I'm impressed the 2-byte-per-unicode-character Gentoo version reports the correct character.
If you are using Python2.6+ here is a safe way to use eval
>>> from ast import literal_eval
>>> item='\\xF3\\xBE\\x80\\x80'
>>> literal_eval("'%s'"%item)
'\xf3\xbe\x80\x80'
After stripping out the "\x" as Eli's answer, you can just do:
int("F3BE8080",16)
If you trust the source, you can use eval('"%s"' % data)

Py2/3 compatibility for string input to dll used with ctypes

I am creating a python wrapper around a dll and am trying to make it compatible with both Python 2 and 3. Some of the functions in the dll only take in bytes and return bytes. This is fine on Py2 as I can just deal with strings, but on Py3 I need to convert unicode to bytes for input then bytes to unicode for the output.
For example:
import ctypes
from ctypes import util
path = util.find_library(lib)
dll = ctypes.windll.LoadLibrary(path)
def some_function(str_input):
#Will need to convert string to bytes in the case of Py3
bytes_output = dll.some_function(str_input)
return bytes_output # Want this to be str (bytes/unicode for py2/3)
What is the best way to ensure compatibility here? Would it be fine just to use sys.version_info and encode/decode appropriately or what is the most accepted way to ensure compatibility between versions in this case?
I'd generally avoid hard checking for Python interpreter versions.
You might find this document helpful:
http://python-future.org/compatible_idioms.html#strings-and-bytes
Also, note that you can use this import for unicode literals:
from __future__ import unicode_literals
And for byte strings:
# Python 2 and 3
s = b'This must be a byte-string'
As for the best way to convert string to bytes:
https://stackoverflow.com/a/7585619/295246
The recommended way of converting a String to Bytes in Python 3 (as extracted from the link above) is to do as follows:
>>> a = 'some words'
>>> b = a.encode('utf-8')
>>> print(b)
b'some words'
>>> c = b.decode('utf-8')
>>> print(c)
'some words'
>>> isinstance(b, bytes)
True
>>> isinstance(b, str)
False
>>> isinstance(c, str)
True
>>> isinstance(c, bytes)
False
You could also do bytes(a, 'utf-8'), but the aforementioned way is more Pythonic (because you can inversely decode the same way from bytes back to str).

Raw unicode literal that is valid in Python 2 and Python 3?

Apparently the ur"" syntax has been disabled in Python 3. However, I need it! "Why?", you may ask. Well, I need the u prefix because it is a unicode string and my code needs to work on Python 2. As for the r prefix, maybe it's not essential, but the markup format I'm using requires a lot of backslashes and it would help avoid mistakes.
Here is an example that does what I want in Python 2 but is illegal in Python 3:
tamil_letter_ma = u"\u0bae"
marked_text = ur"\a%s\bthe Tamil\cletter\dMa\e" % tamil_letter_ma
After coming across this problem, I found http://bugs.python.org/issue15096 and noticed this quote:
It's easy to overcome the limitation.
Would anyone care to offer an idea about how?
Related: What exactly do "u" and "r" string flags do in Python, and what are raw string literals?
Why don't you just use raw string literal (r'....'), you don't need to specify u because in Python 3, strings are unicode strings.
>>> tamil_letter_ma = "\u0bae"
>>> marked_text = r"\a%s\bthe Tamil\cletter\dMa\e" % tamil_letter_ma
>>> marked_text
'\\aம\\bthe Tamil\\cletter\\dMa\\e'
To make it also work in Python 2.x, add the following Future import statement at the very beginning of your source code, so that all the string literals in the source code become unicode.
from __future__ import unicode_literals
The preferred way is to drop u'' prefix and use from __future__ import unicode_literals as #falsetru suggested. But in your specific case, you could abuse the fact that "ascii-only string" % unicode returns Unicode:
>>> tamil_letter_ma = u"\u0bae"
>>> marked_text = r"\a%s\bthe Tamil\cletter\dMa\e" % tamil_letter_ma
>>> marked_text
u'\\a\u0bae\\bthe Tamil\\cletter\\dMa\\e'
Unicode strings are the default in Python 3.x, so using r alone will produce the same as ur in Python 2.

How to define a binary string in Python in a way that works with both py2 and py3?

I am writing a module that is supposed to work in both Python 2 and 3 and I need to define a binary string.
Usually this would be something like data = b'abc' but this code code fails on Python 2.5 with invalid syntax.
How can I write the above code in a way that will work in all versions of Python 2.5+
Note: this has to be binary (it can contain any kind of characters, 0xFF), this is very important.
I would recommend the following:
from six import b
That requires the six module, of course.
If you don't want that, here's another version:
import sys
if sys.version < '3':
def b(x):
return x
else:
import codecs
def b(x):
return codecs.latin_1_encode(x)[0]
More info.
These solutions (essentially the same) work, are clean, as fast as you are going to get, and can support all 256 byte values (which none of the other solutions here can).
If the string only has ASCII characters, call encode. This will give you a str in Python 2 (just like b'abc'), and a bytes in Python 3:
'abc'.encode('ascii')
If not, rather than putting binary data in the source, create a data file, open it with 'rb' and read from it.
You could store the data base64-encoded.
First step would be to transform into base64:
>>> import base64
>>> base64.b64encode(b"\x80\xFF")
b'gP8='
This is to be done once, and using the b or not depends on the version of Python you use for it.
In the second step, you put this byte string into a program without the b.
Then it is ensured that it works in py2 and py3.
import base64
x = 'gP8='
base64.b64decode(x.encode("latin1"))
gives you a str '\x80\xff' in 2.6 (should work in 2.5 as well) and a b'\x80\xff'in 3.x.
Alternatively to the two steps above, you can do the same with hex data, you can do
import binascii
x = '80FF'
binascii.unhexlify(x) # `bytes()` in 3.x, `str()` in 2.x

Categories