Hex string to signed int in Python - python

How do I convert a hex string to a signed int in Python 3?
The best I can come up with is
h = '9DA92DAB'
b = bytes(h, 'utf-8')
ba = binascii.a2b_hex(b)
print(int.from_bytes(ba, byteorder='big', signed=True))
Is there a simpler way? Unsigned is so much easier: int(h, 16)
BTW, the origin of the question is itunes persistent id - music library xml version and iTunes hex version

In n-bit two's complement, bits have value:
bit 0 = 20
bit 1 = 21
bit n-2 = 2n-2
bit n-1 = -2n-1
But bit n-1 has value 2n-1 when unsigned, so the number is 2n too high. Subtract 2n if bit n-1 is set:
def twos_complement(hexstr, bits):
value = int(hexstr, 16)
if value & (1 << (bits - 1)):
value -= 1 << bits
return value
print(twos_complement('FFFE', 16))
print(twos_complement('7FFF', 16))
print(twos_complement('7F', 8))
print(twos_complement('FF', 8))
Output:
-2
32767
127
-1

import struct
For Python 3 (with comments' help):
h = '9DA92DAB'
struct.unpack('>i', bytes.fromhex(h))
For Python 2:
h = '9DA92DAB'
struct.unpack('>i', h.decode('hex'))
or if it is little endian:
h = '9DA92DAB'
struct.unpack('<i', h.decode('hex'))

Here's a general function you can use for hex of any size:
import math
# hex string to signed integer
def htosi(val):
uintval = int(val,16)
bits = 4 * (len(val) - 2)
if uintval >= math.pow(2,bits-1):
uintval = int(0 - (math.pow(2,bits) - uintval))
return uintval
And to use it:
h = str(hex(-5))
h2 = str(hex(-13589))
x = htosi(h)
x2 = htosi(h2)

This works for 16 bit signed ints, you can extend for 32 bit ints. It uses the basic definition of 2's complement signed numbers. Also note xor with 1 is the same as a binary negate.
# convert to unsigned
x = int('ffbf', 16) # example (-65)
# check sign bit
if (x & 0x8000) == 0x8000:
# if set, invert and add one to get the negative value, then add the negative sign
x = -( (x ^ 0xffff) + 1)

It's a very late answer, but here's a function to do the above. This will extend for whatever length you provide. Credit for portions of this to another SO answer (I lost the link, so please provide it if you find it).
def hex_to_signed(source):
"""Convert a string hex value to a signed hexidecimal value.
This assumes that source is the proper length, and the sign bit
is the first bit in the first byte of the correct length.
hex_to_signed("F") should return -1.
hex_to_signed("0F") should return 15.
"""
if not isinstance(source, str):
raise ValueError("string type required")
if 0 == len(source):
raise valueError("string is empty")
sign_bit_mask = 1 << (len(source)*4-1)
other_bits_mask = sign_bit_mask - 1
value = int(source, 16)
return -(value & sign_bit_mask) | (value & other_bits_mask)

Related

Converting struct format string to range of allowable int values

Python struct library has a bunch of format strings corresponding with a ctype ("h": int16, "H": uint16).
Is there a simple way to go from a format string (e.g. "h", "H", etc.) to the range of possible values (e.g. -32768 to 32767, 0 to 65535, etc.)?
I see the struct library provides calcsize, but what I really want is something like calcrange.
Is there a built-in solution, or an elegant solution I am neglecting? I am also open to third party libraries.
I have made a DIY calcrange below, but it only covers a limited number of possible format strings and makes some non-generalizable assumptions.
def calcrange(fmt: str) -> Tuple[int, int]:
"""Calculate the min and max possible value of a given struct format string."""
size: int = calcsize(fmt)
unsigned_max = int("0x" + "FF" * size, 16)
if fmt.islower():
# Signed case
min_ = -1 * int("0x80" + "00" * (calcsize(fmt) - 1), 16)
return min_, unsigned_max + min_
# Unsigned case
return 0, unsigned_max
The math can be simplified. If b is the bit-width, then unsigned values are 0 to 2b-1 and signed values are -2(b-1) to 2(b-1)-1. It only works for the integer types.
Here's a the simplified version:
from typing import *
import struct
def calcrange(intcode):
b = struct.calcsize(intcode) * 8
if intcode.islower():
return -2**(b-1),2**(b-1)-1
else:
return 0,2**b-1
for code in 'bBhHiIlLqQnN':
s,e = calcrange(code)
print(f'{code} {s:26,} to {e:26,}')
Output:
b -128 to 127
B 0 to 255
h -32,768 to 32,767
H 0 to 65,535
i -2,147,483,648 to 2,147,483,647
I 0 to 4,294,967,295
l -2,147,483,648 to 2,147,483,647
L 0 to 4,294,967,295
q -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
Q 0 to 18,446,744,073,709,551,615
n -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
N 0 to 18,446,744,073,709,551,615

Determine 8 bit modulo 256 Checksum form ascii string [-Python]

I want to determine the 8 bit modulo 256 checksum of an ASCII string. I know the formula is:
checksum = (sum of bytes)%256
How can I do this in python (manipulate bytes)? If I start with a string: '1c03e8', I should output 0x94. The main problem is I'm not sure as to how to find the sum of the bytes of an ASCII string. Here is the main idea of what I'm looking for :
https://www.scadacore.com/tools/programming-calculators/online-checksum-calculator/
It has CheckSum8 Modulo 256
I have tried:
component = ('1c03e8')
for i in range(len(component)):
checksum.append(int(float(component[i].encode("hex"))))
print checksum
print hex(int(sum(checksum)%256))
this although gives me 0x52
You to need to encode the string as ASCII, because as you said, it's an ASCII string.
Example, quick-and-dirty solution:
print(hex(sum('1c03e8'.encode('ascii')) % 256))
def calc_checksum(s):
sum = 0
for c in s:
sum += ord(c)
sum = sum % 256
return '%2X' % (sum & 0xFF)
print calc_checksum('1c03e8'.encode('ascii'))
Try the checksum-calculator package.
Step 1:
pip install checksum-calculator
Step 2:
from checksum_calculator import *
inputString = "your_string"
outputString = inputString.encode('utf-8').hex()
print(compute_checksum8_xor(outputString))
print(compute_checksum8_mod256(outputString))
print(compute_checksum8_2s_complement(outputString))

Ascii string of bytes packed into bitmap/bitstring back to string?

I have a string that is packed such that each character was originally an unsigned byte but is stored as 7 bits and then packed into an unsigned byte array. I'm trying to find a quick way to unpack this string in Python but the function I wrote that uses the bitstring module works well but is very slow. It seems like something like this should not be so slow but I'm probably doing it very inefficiently...
This seems like something that is probably trivial but I just don't know what to use, maybe there is already a function that will unpack the string?
from bitstring import BitArray
def unpackString(raw):
msg = ''
bits = BitArray(bytes=raw)
mask = BitArray('0b01111111')
i = 0
while 1:
try:
iByte = (bits[i:i + 8] & mask).int
# value of 0 denotes a line break
if iByte == 0:
msg += '\n'
elif iByte >= 32 and iByte <= 126:
msg += chr(iByte)
i += 7
except:
break
return msg
This took me a while to figure out, as your solution seems to ignore the first bit of data. Given the input byte of 129 (0b10000001) I would expect to see 64 '1000000' printed by the following, but your code produces 1 '0000001' -- ignoring the first bit.
bs = b'\x81' # one byte string, whose value is 129 (0x81)
arr = BitArray(bs)
mask = BitArray('0b01111111')
byte = (arr[0:8] & mask).int
print(byte, repr("{:07b}".format(byte)))
Simplest solution would be to modify your solution to use bitstring.ConstBitStream -- I got an order of magnitude speed increase with the following.
from bitstring import ConstBitStream
def unpack_bitstream(raw):
num_bytes, remainder = divmod(len(raw) * 8 - 1, 7)
bitstream = ConstBitStream(bytes=raw, offset=1) # use offset to ignore leading bit
msg = b''
for _ in range(num_bytes):
byte = bitstream.read("uint:7")
if not byte:
msg += b'\n'
elif 32 <= byte <= 126:
msg += bytes((byte,))
# msg += chr(byte) # python 2
return msg
However, this can be done quite easily using only the standard library. This makes the solution more portable and, in the instances I tried, faster by another order of magnitude (I didn't try the cythonised version of bitstring).
def unpack_bytes(raw, zero_replacement=ord("\n")):
# use - 1 to ignore leading bit
num_bytes, remainder = divmod(len(raw) * 8 - 1, 7)
i = int.from_bytes(raw, byteorder="big")
# i = int(raw.encode("hex"), 16) # python 2
if remainder:
# remainder means there are unused trailing bits, so remove these
i >>= remainder
msg = []
for _ in range(num_bytes):
byte = i & 127
if not byte:
msg.append(zero_replacement)
elif 32 <= byte <= 126:
msg.append(byte)
i >>= 7
msg.reverse()
return bytes(msg)
# return b"".join(chr(c) for c in msg) # python 2
I've used python 3 to create these methods. If you're using python 2 then there are a number of adjustments you'll need to make. I've added these as comments after the line they are intended to replace and marked them python 2.

python: unpack IBM 32-bit float point

I was reading a binary file in python like this:
from struct import unpack
ns = 1000
f = open("binary_file", 'rb')
while True:
data = f.read(ns * 4)
if data == '':
break
unpacked = unpack(">%sf" % ns, data)
print str(unpacked)
when I realized unpack(">f", str) is for unpacking IEEE floating point, my data is IBM 32-bit float point numbers
My question is:
How can I impliment my unpack to unpack IBM 32-bit float point type numbers?
I don't mind using like ctypes to extend python to get better performance.
EDIT: I did some searching:
http://mail.scipy.org/pipermail/scipy-user/2009-January/019392.html
This looks very promising, but I want to get more efficient: there are potential tens of thousands of loops.
EDIT: posted answer below. Thanks for the tip.
I think I understood it:
first unpack the string to unsigned 4 byte integer, and then use this function:
def ibm2ieee(ibm):
"""
Converts an IBM floating point number into IEEE format.
:param: ibm - 32 bit unsigned integer: unpack('>L', f.read(4))
"""
if ibm == 0:
return 0.0
sign = ibm >> 31 & 0x01
exponent = ibm >> 24 & 0x7f
mantissa = (ibm & 0x00ffffff) / float(pow(2, 24))
return (1 - 2 * sign) * mantissa * pow(16, exponent - 64)
Thanks for all who helped!
IBM Floating Point Architecture, how to encode and decode:
http://en.wikipedia.org/wiki/IBM_Floating_Point_Architecture
My solution:
I wrote a class, I think in this way, it can be a bit faster, because used Struct object, so that the unpack fmt is compiled only once.
EDIT: also because it's unpacking size*bytes all at once, and unpacking can be an expensive operation.
from struct import Struct
class StructIBM32(object):
"""
see example in:
http://en.wikipedia.org/wiki/IBM_Floating_Point_Architecture#An_Example
>>> import struct
>>> c = StructIBM32(1)
>>> bit = '11000010011101101010000000000000'
>>> c.unpack(struct.pack('>L', int(bit, 2)))
[-118.625]
"""
def __init__(self, size):
self.p24 = float(pow(2, 24))
self.unpack32int = Struct(">%sL" % size).unpack
def unpack(self, data):
int32 = self.unpack32int(data)
return [self.ibm2ieee(i) for i in int32]
def ibm2ieee(self, int32):
if int32 == 0:
return 0.0
sign = int32 >> 31 & 0x01
exponent = int32 >> 24 & 0x7f
mantissa = (int32 & 0x00ffffff) / self.p24
return (1 - 2 * sign) * mantissa * pow(16, exponent - 64)
if __name__ == "__main__":
import doctest
doctest.testmod()

Is there a faster way to convert an arbitrary large integer to a big endian sequence of bytes?

I have this Python code to do this:
from struct import pack as _pack
def packl(lnum, pad = 1):
if lnum < 0:
raise RangeError("Cannot use packl to convert a negative integer "
"to a string.")
count = 0
l = []
while lnum > 0:
l.append(lnum & 0xffffffffffffffffL)
count += 1
lnum >>= 64
if count <= 0:
return '\0' * pad
elif pad >= 8:
lens = 8 * count % pad
pad = ((lens != 0) and (pad - lens)) or 0
l.append('>' + 'x' * pad + 'Q' * count)
l.reverse()
return _pack(*l)
else:
l.append('>' + 'Q' * count)
l.reverse()
s = _pack(*l).lstrip('\0')
lens = len(s)
if (lens % pad) != 0:
return '\0' * (pad - lens % pad) + s
else:
return s
This takes approximately 174 usec to convert 2**9700 - 1 to a string of bytes on my machine. If I'm willing to use the Python 2.7 and Python 3.x specific bit_length method, I can shorten that to 159 usecs by pre-allocating the l array to be the exact right size at the very beginning and using l[something] = syntax instead of l.append.
Is there anything I can do that will make this faster? This will be used to convert large prime numbers used in cryptography as well as some (but not many) smaller numbers.
Edit
This is currently the fastest option in Python < 3.2, it takes about half the time either direction as the accepted answer:
def packl(lnum, padmultiple=1):
"""Packs the lnum (which must be convertable to a long) into a
byte string 0 padded to a multiple of padmultiple bytes in size. 0
means no padding whatsoever, so that packing 0 result in an empty
string. The resulting byte string is the big-endian two's
complement representation of the passed in long."""
if lnum == 0:
return b'\0' * padmultiple
elif lnum < 0:
raise ValueError("Can only convert non-negative numbers.")
s = hex(lnum)[2:]
s = s.rstrip('L')
if len(s) & 1:
s = '0' + s
s = binascii.unhexlify(s)
if (padmultiple != 1) and (padmultiple != 0):
filled_so_far = len(s) % padmultiple
if filled_so_far != 0:
s = b'\0' * (padmultiple - filled_so_far) + s
return s
def unpackl(bytestr):
"""Treats a byte string as a sequence of base 256 digits
representing an unsigned integer in big-endian format and converts
that representation into a Python integer."""
return int(binascii.hexlify(bytestr), 16) if len(bytestr) > 0 else 0
In Python 3.2 the int class has to_bytes and from_bytes functions that can accomplish this much more quickly that the method given above.
Here is a solution calling the Python/C API via ctypes. Currently, it uses NumPy, but if NumPy is not an option, it could be done purely with ctypes.
import numpy
import ctypes
PyLong_AsByteArray = ctypes.pythonapi._PyLong_AsByteArray
PyLong_AsByteArray.argtypes = [ctypes.py_object,
numpy.ctypeslib.ndpointer(numpy.uint8),
ctypes.c_size_t,
ctypes.c_int,
ctypes.c_int]
def packl_ctypes_numpy(lnum):
a = numpy.zeros(lnum.bit_length()//8 + 1, dtype=numpy.uint8)
PyLong_AsByteArray(lnum, a, a.size, 0, 1)
return a
On my machine, this is 15 times faster than your approach.
Edit: Here is the same code using ctypes only and returning a string instead of a NumPy array:
import ctypes
PyLong_AsByteArray = ctypes.pythonapi._PyLong_AsByteArray
PyLong_AsByteArray.argtypes = [ctypes.py_object,
ctypes.c_char_p,
ctypes.c_size_t,
ctypes.c_int,
ctypes.c_int]
def packl_ctypes(lnum):
a = ctypes.create_string_buffer(lnum.bit_length()//8 + 1)
PyLong_AsByteArray(lnum, a, len(a), 0, 1)
return a.raw
This is another two times faster, totalling to a speed-up factor of 30 on my machine.
For completeness and for future readers of this question:
Starting in Python 3.2, there are functions int.from_bytes() and int.to_bytes() that perform the conversion between bytes and int objects in a choice of byte orders.
I suppose you really should just be using numpy, which I'm sure has something or other built in for this. It might also be faster to hack around with the array module. But I'll take a stab at it anyway.
IMX, creating a generator and using a list comprehension and/or built-in summation is faster than a loop that appends to a list, because the appending can be done internally. Oh, and 'lstrip' on a large string has got to be costly.
Also, some style points: special cases aren't special enough; and you appear not to have gotten the memo about the new x if y else z construct. :) Although we don't need it anyway. ;)
from struct import pack as _pack
Q_size = 64
Q_bitmask = (1L << Q_size) - 1L
def quads_gen(a_long):
while a_long:
yield a_long & Q_bitmask
a_long >>= Q_size
def pack_long_big_endian(a_long, pad = 1):
if lnum < 0:
raise RangeError("Cannot use packl to convert a negative integer "
"to a string.")
qs = list(reversed(quads_gen(a_long)))
# Pack the first one separately so we can lstrip nicely.
first = _pack('>Q', qs[0]).lstrip('\x00')
rest = _pack('>%sQ' % len(qs) - 1, *qs[1:])
count = len(first) + len(rest)
# A little math trick that depends on Python's behaviour of modulus
# for negative numbers - but it's well-defined and documented
return '\x00' * (-count % pad) + first + rest
Just wanted to post a follow-up to Sven's answer (which works great). The opposite operation - going from arbitrarily long bytes object to Python Integer object requires the following (because there is no PyLong_FromByteArray() C API function that I can find):
import binascii
def unpack_bytes(stringbytes):
#binascii.hexlify will be obsolete in python3 soon
#They will add a .tohex() method to bytes class
#Issue 3532 bugs.python.org
return int(binascii.hexlify(stringbytes), 16)

Categories