How to convert hex str to an hex bytearray [duplicate] - python

I have a long sequence of hex digits in a string, such as
000000000000484240FA063DE5D0B744ADBED63A81FAEA390000C8428640A43D5005BD44
only much longer, several kilobytes. Is there a builtin way to convert this to a bytes object in python 2.6/3?

result = bytes.fromhex(some_hex_string)

Works in Python 2.7 and higher including python3:
result = bytearray.fromhex('deadbeef')
Note: There seems to be a bug with the bytearray.fromhex() function in Python 2.6. The python.org documentation states that the function accepts a string as an argument, but when applied, the following error is thrown:
>>> bytearray.fromhex('B9 01EF')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: fromhex() argument 1 must be unicode, not str`

You can do this with the hex codec. ie:
>>> s='000000000000484240FA063DE5D0B744ADBED63A81FAEA390000C8428640A43D5005BD44'
>>> s.decode('hex')
'\x00\x00\x00\x00\x00\x00HB#\xfa\x06=\xe5\xd0\xb7D\xad\xbe\xd6:\x81\xfa\xea9\x00\x00\xc8B\x86#\xa4=P\x05\xbdD'

Try the binascii module
from binascii import unhexlify
b = unhexlify(myhexstr)

import binascii
binascii.a2b_hex(hex_string)
Thats the way I did it.

Related

How do I encode hexadecimal to base64 in python?

If I try to do:
from base64 import b64encode
b64encode('ffffff')
I get this error:
Traceback (most recent call last):
File "<pyshell#13>", line 1, in <module>
base64.b64encode('ffffff')
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/base64.py", line 58, in b64encode
encoded = binascii.b2a_base64(s, newline=False)
TypeError: a bytes-like object is required, not 'str'
Because it said bytes-like object I then tried this:
b64encode(bytes('ffffff'))
Which failed.
Traceback (most recent call last):
File "<pyshell#10>", line 1, in <module>
b64encode(bytes('ffffff'))
TypeError: string argument without an encoding
Finally, using the .encode('utf-8') function:
b64encode('ffffff'.encode('utf-8'))
With incorrect output b'ZmZmZmZm', the correct base64 encoding is ////.
I already know how to decode b64 to hex so don't say how to do that.
Edit: This question got flagged for being the same as converting hex strings to hex bytes. This involves base64.
To fully go from the string ffffff to base64 of the hex value, you need to run it through some encoding and decoding, using the codecs module:
import codecs
# Convert string to hex
hex = codecs.decode('ffffff', 'hex')
# Encode as base64 (bytes)
codecs.encode(hex, 'base64')
For an odd-length string like 0xfffff you need to put a zero at the beginning of the hex string (0x0fffff), otherwise python will give you an error.
Here's an alternative to using codecs.
This one is a bit less readable, but works great and hopefully teaches you how codecs, hex and integers work. (word of caution, works on odd lengths, but will ignore the odd byte-string-representation)
import struct
s = 'ffffff'
b''.join([struct.pack('B', int(''.join(x), 16)) for x in zip(s[0::2], s[1::2])])
Which should give you b'\xff\xff\xff'.
Your main problem is probably that you think 'ffffff' represents the values 255, 255, 255. Which they don't. They're still in a string format with the letters ff. Subsequently you need to parse/convert the string representation of hex, into actual hex. We can do this by first passing the string through int() which can intemperate hex in string representation format.
You will need to convert each pair of ff individually by doing int('ff', 16) which tells Python to intemperate the string as a base-16 integer (hex-numbers).
And then convert that integer into a bytes like object representing that integer. That's where struct.pack comes in. It's meant for exactly this.
struct.pack('B', 255) # 255 is given to us by int('ff', 16)
Essentially, 'B' tells Python to pack the value 255 into a 1-byte-object, in this case, that gives us b'\xff' which is your end goal. Now, do this for every 2-pair of letters in your original data.
This is more of a manual approach where you'll iterate over 2 characters in the string at a time, and use the above description to bundle them into what you expect them to be. Or just use codecs, either way works.
Expanded version of the above oneliner:
import struct
hex_string = 'ffffff'
result = b''
for pair in zip(hex_string[0::2], hex_string[1::2]):
value = int(''.join(pair), 16)
result += struct.pack('B', value)
At the very least, I hope this explains how hex works on a practical level. And how the computer interpenetrates hour humanly readable version of bits and bytes.

Python formatted string literals for objects

one of a very cool new feature of Python3.6 is the implementation of Formatted string literals (https://docs.python.org/3.6/whatsnew/3.6.html#whatsnew36-pep498).
Unfortunately, it does not behave like the well known format() function:
>> a="abcd"
>> print(f"{a[:2]}")
>> 'ab'
As you see, slicing is possible (actually all python functions on string).
But format() will not work with slicing:
>> print("{a[:2]}".format(a="abcd")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: string indices must be integers
Is there a way to get the functionality of the new formatted string literals on string objects??
>> string_object = "{a[:2]}" # may also be comming from a file
>> # some way to get the result 'ab' with 'string_object'
The str.format syntax does not and will not support the full range of expressions that the newer f-strings will. You'll have to manually evaluate the slice expression outside of the string and supply it to the format function instead:
a = "abcd"
string_object = "{a}".format(a = a[:2])
It should also be noted there are subtle differences between the syntax allowed by f-strings and str.format, so that the former is not strictly a superset of the latter.
Nope, str.format tries to cast the indexes to a str first before applying them, that's why you get that error; it tries to index the string with str indices:
a = "abcd"
>>> a[:'2']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: slice indices must be integers or None or have an __index__ method
It really isn't meant for cases like that; "{a[::]}".format(a=a) would probably be evaluated as a[:':'] too I'd guess.
This is one of the reasons f-strings came about, in order to support any Python expressions' desire to be formatted.

base64.encodestring failing in python 3

The following piece of code runs successfully on a python 2 machine:
base64_str = base64.encodestring('%s:%s' % (username,password)).replace('\n', '')
I am trying to port it over to Python 3 but when I do so I encounter the following error:
>>> a = base64.encodestring('{0}:{1}'.format(username,password)).replace('\n','')
Traceback (most recent call last):
File "/auto/pysw/cel55/python/3.4.1/lib/python3.4/base64.py", line 519, in _input_type_check
m = memoryview(s)
TypeError: memoryview: str object does not have the buffer interface
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/auto/pysw/cel55/python/3.4.1/lib/python3.4/base64.py", line 548, in encodestring
return encodebytes(s)
File "/auto/pysw/cel55/python/3.4.1/lib/python3.4/base64.py", line 536, in encodebytes
_input_type_check(s)
File "/auto/pysw/cel55/python/3.4.1/lib/python3.4/base64.py", line 522, in _input_type_check
raise TypeError(msg) from err
TypeError: expected bytes-like object, not str
I tried searching examples for encodestring usage but not able to find a good document. Am I missing something obvious? I am running this on RHEL 2.6.18-371.11.1.el5
You can encode() the string (to convert it to byte string) , before passing it into base64.encodestring . Example -
base64_str = base64.encodestring(('%s:%s' % (username,password)).encode()).decode().strip()
To expand on Anand's answer (which is quite correct), Python 2 made little distinction between "Here's a string which I want to treat like text" and "Here's a string which I want to treat like a sequence of 8-bit byte values". Python 3 firmly distinguishes the two, and doesn't let you mix them up: the former is the str type, and the latter is the bytes type.
When you Base64 encode a string, you're not actually treating the string as text, you're treating it as a series of 8-bit byte values. That's why you're getting an error from base64.encodestring() in Python 3: because that is an operation that deals with the string's characters as 8-bit bytes, and so you should pass it a paramter of type bytes rather than a parameter of type str.
Therefore, to convert your str object into a bytes object, you have to call its encode() method to turn it into a set of 8-bit byte values, in whatever Unicode encoding you have chosen to use. (Which should be UTF-8 unless you have a very specific reason to choose something else, but that's another topic).
In Python 3 encodestring docs says:
def encodestring(s):
"""Legacy alias of encodebytes()."""
import warnings
warnings.warn("encodestring() is a deprecated alias, use encodebytes()", DeprecationWarning, 2)
return encodebytes(s)
Here is working code for Python 3.5.1, it also shows how to url encode:
def _encodeBase64(consumer_key, consumer_secret):
"""
:type consumer_key: str
:type consumer_secret: str
:rtype str
"""
# 1. URL encode the consumer key and the consumer secret according to RFC 1738.
dummy_param_name = 'bla'
key_url_encoded = urllib.parse.urlencode({dummy_param_name: consumer_key})[len(dummy_param_name) + 1:]
secret_url_encoded = urllib.parse.urlencode({dummy_param_name: consumer_secret})[len(dummy_param_name) + 1:]
# 2. Concatenate the encoded consumer key, a colon character “:”, and the encoded consumer secret into a single string.
credentials = '{}:{}'.format(key_url_encoded, secret_url_encoded)
# 3. Base64 encode the string from the previous step.
bytes_base64_encoded_credentials = base64.encodebytes(credentials.encode('utf-8'))
return bytes_base64_encoded_credentials.decode('utf-8').replace('\n', '')
(I am sure it could be more concise, I am new to Python...)
Also see: http://pythoncentral.io/encoding-and-decoding-strings-in-python-3-x/

IronPython define utf-16 character in string

I want to define utf-16 (LE) characters by their number.
An example is 'LINEAR B SYLLABLE B028 I'.
When I escape this character by u'\U00010001' I receive u'\u0001'.
Really,
>>> u'\U00010001' == u'\u0001'
True
If I use unichr() I get errors too:
>>> unichr(0x10001)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: 65536 is not in required range
How can I define utf-16 characters in my Python app?
IronPython 2.7
You could try using a named literal:
print "\N{LINEAR B SYLLABLE B038 E}"
If the other methods work on cpython but not ironpython please open an ironpython issue with a minimal test case.

How to create python bytes object from long hex string?

I have a long sequence of hex digits in a string, such as
000000000000484240FA063DE5D0B744ADBED63A81FAEA390000C8428640A43D5005BD44
only much longer, several kilobytes. Is there a builtin way to convert this to a bytes object in python 2.6/3?
result = bytes.fromhex(some_hex_string)
Works in Python 2.7 and higher including python3:
result = bytearray.fromhex('deadbeef')
Note: There seems to be a bug with the bytearray.fromhex() function in Python 2.6. The python.org documentation states that the function accepts a string as an argument, but when applied, the following error is thrown:
>>> bytearray.fromhex('B9 01EF')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: fromhex() argument 1 must be unicode, not str`
You can do this with the hex codec. ie:
>>> s='000000000000484240FA063DE5D0B744ADBED63A81FAEA390000C8428640A43D5005BD44'
>>> s.decode('hex')
'\x00\x00\x00\x00\x00\x00HB#\xfa\x06=\xe5\xd0\xb7D\xad\xbe\xd6:\x81\xfa\xea9\x00\x00\xc8B\x86#\xa4=P\x05\xbdD'
Try the binascii module
from binascii import unhexlify
b = unhexlify(myhexstr)
import binascii
binascii.a2b_hex(hex_string)
Thats the way I did it.

Categories