I was reading a binary file in python like this:
from struct import unpack
ns = 1000
f = open("binary_file", 'rb')
while True:
data = f.read(ns * 4)
if data == '':
break
unpacked = unpack(">%sf" % ns, data)
print str(unpacked)
when I realized unpack(">f", str) is for unpacking IEEE floating point, my data is IBM 32-bit float point numbers
My question is:
How can I impliment my unpack to unpack IBM 32-bit float point type numbers?
I don't mind using like ctypes to extend python to get better performance.
EDIT: I did some searching:
http://mail.scipy.org/pipermail/scipy-user/2009-January/019392.html
This looks very promising, but I want to get more efficient: there are potential tens of thousands of loops.
EDIT: posted answer below. Thanks for the tip.
I think I understood it:
first unpack the string to unsigned 4 byte integer, and then use this function:
def ibm2ieee(ibm):
"""
Converts an IBM floating point number into IEEE format.
:param: ibm - 32 bit unsigned integer: unpack('>L', f.read(4))
"""
if ibm == 0:
return 0.0
sign = ibm >> 31 & 0x01
exponent = ibm >> 24 & 0x7f
mantissa = (ibm & 0x00ffffff) / float(pow(2, 24))
return (1 - 2 * sign) * mantissa * pow(16, exponent - 64)
Thanks for all who helped!
IBM Floating Point Architecture, how to encode and decode:
http://en.wikipedia.org/wiki/IBM_Floating_Point_Architecture
My solution:
I wrote a class, I think in this way, it can be a bit faster, because used Struct object, so that the unpack fmt is compiled only once.
EDIT: also because it's unpacking size*bytes all at once, and unpacking can be an expensive operation.
from struct import Struct
class StructIBM32(object):
"""
see example in:
http://en.wikipedia.org/wiki/IBM_Floating_Point_Architecture#An_Example
>>> import struct
>>> c = StructIBM32(1)
>>> bit = '11000010011101101010000000000000'
>>> c.unpack(struct.pack('>L', int(bit, 2)))
[-118.625]
"""
def __init__(self, size):
self.p24 = float(pow(2, 24))
self.unpack32int = Struct(">%sL" % size).unpack
def unpack(self, data):
int32 = self.unpack32int(data)
return [self.ibm2ieee(i) for i in int32]
def ibm2ieee(self, int32):
if int32 == 0:
return 0.0
sign = int32 >> 31 & 0x01
exponent = int32 >> 24 & 0x7f
mantissa = (int32 & 0x00ffffff) / self.p24
return (1 - 2 * sign) * mantissa * pow(16, exponent - 64)
if __name__ == "__main__":
import doctest
doctest.testmod()
Related
I have the following byte array
>>> string_ba
bytearray(b'4\x00/\t\xb5')
which is easily converted to hex string with the next 2 lines:
hex_string = [chr(x).encode('hex') for x in string_ba]
hex_string = ''.join(hex_string)
that return
>>> hex_string.lower()
'34002f09b5'
which is expected. (This is an RFID card signature)
I convert this to decimal by doing the above and then converting from hex string to decimal string (padded with zeroes) with the next line. I have a limit of 10 characters in string, so I'm forced to remove the first 2 characters in the string to be able to convert it to, at most, 10 character decimal number.
dec_string = str(int(hex_string[2:], 16)).zfill(10)
>>> dec_string
'0003082677'
which is correct, as I tested this with an online converter (hex: 002f09b5, dec: 3082677)
The question is, if there's a way to skip converting from bytearray to hex_string, to obtain a decimal string. In other words to go straight from bytearray to dec_string
This will be running on Python 2.7.15.
>>> sys.version
'2.7.15rc1 (default, Apr 15 2018, 21:51:34) \n[GCC 7.3.0]'
I've tried removing the first element from bytearray and then converting it to string directly and joining. But this does not provide the desired result.
string_ba = string_ba[1:]
test_string = [str(x) for x in string_ba]
test_dec_string = ''.join(test_string).zfill(10)
>>> test_dec_string
'0000479181'
To repeat the question is there a way to go straight from bytearray to decimal string
You can use struct library to convert bytearray to decimal. https://codereview.stackexchange.com/questions/142915/converting-a-bytearray-into-an-integer maybe help you
A number (let's call it X) consisting of n digits (in a base, let's refer to it as B) is written as:
Dn-1Dn-2Dn-3...D2D1D0 (where each Di is a digit in base B)
and its value can be computed based on the formula:
VX = Σin=-01(Bi * Di) (notice that in this example the number is traversed from right to left - the traversing sense doesn't affect the final value).
As an example, number 2468 (B10) = 100 * 8 + 101 * 6 + 102 * 4 + 103 * 2 (= 8 + 60 + 400 + 2000).
An ASCII string is actually a number in base 256 (0x100), where each char (byte) is a digit.
Here's an alternative based on the above:
It only performs mathematical operations on integers (the conversion to string is done only at the end)
The traversing sense (from above) is helpful with the restriction (final (decimal) number must fit in a number of digits, and in case of overflow the most significant ones are to be ignored)
The algorithm is simple, starting from the right, compute the partial number value, until reaching the maximum allowed value, or exhausting the number digits (string chars)
code.py:
#!/usr/bin/env python
import sys
DEFAULT_MAX_DIGITS = 10
def convert(array, max_digits=DEFAULT_MAX_DIGITS):
max_val = 10 ** max_digits
number_val = 0
for idx, digit in enumerate(reversed(array)):
cur_val = 256 ** idx * digit
if number_val + cur_val > max_val:
break
number_val += cur_val
return str(number_val).zfill(max_digits)
def main():
b = bytearray("4\x00/\t\xb5")
print("b: {:}\n".format(repr(b)))
for max_digits in range(6, 15, 2):
print("Conversion of b (with max {:02d} digits): {:}{:s}".format(
max_digits, convert(b, max_digits=max_digits),
" (!!! Default case - required in the question)" if max_digits == DEFAULT_MAX_DIGITS else ""
))
if __name__ == "__main__":
print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
main()
Outpput:
(py_064_02.07.15_test0) e:\Work\Dev\StackOverflow\q054091895>"e:\Work\Dev\VEnvs\py_064_02.07.15_test0\Scripts\python.exe" code.py
Python 2.7.15 (v2.7.15:ca079a3ea3, Apr 30 2018, 16:30:26) [MSC v.1500 64 bit (AMD64)] on win32
b: bytearray(b'4\x00/\t\xb5')
Conversion of b (with max 06 digits): 002485
Conversion of b (with max 08 digits): 03082677
Conversion of b (with max 10 digits): 0003082677 (!!! Default case - required in the question)
Conversion of b (with max 12 digits): 223341382069
Conversion of b (with max 14 digits): 00223341382069
Here is the example which is bothering me:
>>> x = decimal.Decimal('0.0001')
>>> print x.normalize()
>>> print x.normalize().to_eng_string()
0.0001
0.0001
Is there a way to have engineering notation for representing mili (10e-3) and micro (10e-6)?
Here's a function that does things explicitly, and also has support for using SI suffixes for the exponent:
def eng_string( x, format='%s', si=False):
'''
Returns float/int value <x> formatted in a simplified engineering format -
using an exponent that is a multiple of 3.
format: printf-style string used to format the value before the exponent.
si: if true, use SI suffix for exponent, e.g. k instead of e3, n instead of
e-9 etc.
E.g. with format='%.2f':
1.23e-08 => 12.30e-9
123 => 123.00
1230.0 => 1.23e3
-1230000.0 => -1.23e6
and with si=True:
1230.0 => 1.23k
-1230000.0 => -1.23M
'''
sign = ''
if x < 0:
x = -x
sign = '-'
exp = int( math.floor( math.log10( x)))
exp3 = exp - ( exp % 3)
x3 = x / ( 10 ** exp3)
if si and exp3 >= -24 and exp3 <= 24 and exp3 != 0:
exp3_text = 'yzafpnum kMGTPEZY'[ ( exp3 - (-24)) / 3]
elif exp3 == 0:
exp3_text = ''
else:
exp3_text = 'e%s' % exp3
return ( '%s'+format+'%s') % ( sign, x3, exp3_text)
EDIT:
Matplotlib implemented the engineering formatter, so one option is to directly use Matplotlibs formatter, e.g.:
import matplotlib as mpl
formatter = mpl.ticker.EngFormatter()
formatter(10000)
result: '10 k'
Original answer:
Based on Julian Smith's excellent answer (and this answer), I changed the function to improve on the following points:
Python3 compatible (integer division)
Compatible for 0 input
Rounding to significant number of digits, by default 3, no trailing zeros printed
so here's the updated function:
import math
def eng_string( x, sig_figs=3, si=True):
"""
Returns float/int value <x> formatted in a simplified engineering format -
using an exponent that is a multiple of 3.
sig_figs: number of significant figures
si: if true, use SI suffix for exponent, e.g. k instead of e3, n instead of
e-9 etc.
"""
x = float(x)
sign = ''
if x < 0:
x = -x
sign = '-'
if x == 0:
exp = 0
exp3 = 0
x3 = 0
else:
exp = int(math.floor(math.log10( x )))
exp3 = exp - ( exp % 3)
x3 = x / ( 10 ** exp3)
x3 = round( x3, -int( math.floor(math.log10( x3 )) - (sig_figs-1)) )
if x3 == int(x3): # prevent from displaying .0
x3 = int(x3)
if si and exp3 >= -24 and exp3 <= 24 and exp3 != 0:
exp3_text = 'yzafpnum kMGTPEZY'[ exp3 // 3 + 8]
elif exp3 == 0:
exp3_text = ''
else:
exp3_text = 'e%s' % exp3
return ( '%s%s%s') % ( sign, x3, exp3_text)
The decimal module is following the Decimal Arithmetic Specification, which states:
This is outdated - see below
to-scientific-string – conversion to numeric string
[...]
The coefficient is first converted to a string in base ten using the characters 0 through 9 with no leading zeros (except if its value is zero, in which case a single 0 character is used).
Next, the adjusted exponent is calculated; this is the exponent, plus the number of characters in the converted coefficient, less one. That is, exponent+(clength-1), where clength is the length of the coefficient in decimal digits.
If the exponent is less than or equal to zero and the adjusted exponent is greater than or equal to -6, the number will be converted
to a character form without using exponential notation.
[...]
to-engineering-string – conversion to numeric string
This operation converts a number to a string, using engineering
notation if an exponent is needed.
The conversion exactly follows the rules for conversion to scientific
numeric string except in the case of finite numbers where exponential
notation is used. In this case, the converted exponent is adjusted to be a multiple of three (engineering notation) by positioning the decimal point with one, two, or three characters preceding it (that is, the part before the decimal point will range from 1 through 999).
This may require the addition of either one or two trailing zeros.
If after the adjustment the decimal point would not be followed by a digit then it is not added. If the final exponent is zero then no indicator letter and exponent is suffixed.
Examples:
For each abstract representation [sign, coefficient, exponent] on the left, the resulting string is shown on the right.
Representation
String
[0,123,1]
"1.23E+3"
[0,123,3]
"123E+3"
[0,123,-10]
"12.3E-9"
[1,123,-12]
"-123E-12"
[0,7,-7]
"700E-9"
[0,7,1]
"70"
Or, in other words:
>>> for n in (10 ** e for e in range(-1, -8, -1)):
... d = Decimal(str(n))
... print d.to_eng_string()
...
0.1
0.01
0.001
0.0001
0.00001
0.000001
100E-9
I realize that this is an old thread, but it does come near the top of a search for python engineering notation and it seems prudent to have this information located here.
I am an engineer who likes the "engineering 101" engineering units. I don't even like designations such as 0.1uF, I want that to read 100nF. I played with the Decimal class and didn't really like its behavior over the range of possible values, so I rolled a package called engineering_notation that is pip-installable.
pip install engineering_notation
From within Python:
>>> from engineering_notation import EngNumber
>>> EngNumber('1000000')
1M
>>> EngNumber(1000000)
1M
>>> EngNumber(1000000.0)
1M
>>> EngNumber('0.1u')
100n
>>> EngNumber('1000m')
1
This package also supports comparisons and other simple numerical operations.
https://github.com/slightlynybbled/engineering_notation
The «full» quote shows what is wrong!
The decimal module is indeed following the proprietary (IBM) Decimal Arithmetic Specification.
Quoting this IBM specification in its entirety clearly shows what is wrong with decimal.to_eng_string() (emphasis added):
to-engineering-string – conversion to numeric string
This operation converts a number to a string, using engineering
notation if an exponent is needed.
The conversion exactly follows the rules for conversion to scientific
numeric string except in the case of finite numbers where exponential
notation is used. In this case, the converted exponent is adjusted to be a multiple of three (engineering notation) by positioning the decimal point with one, two, or three characters preceding it (that is, the part before the decimal point will range from 1 through 999). This may require the addition of either one or two trailing zeros.
If after the adjustment the decimal point would not be followed by a digit then it is not added. If the final exponent is zero then no indicator letter and exponent is suffixed.
This proprietary IBM specification actually admits to not applying the engineering notation for numbers with an infinite decimal representation, for which ordinary scientific notation is used instead! This is obviously incorrect behaviour for which a Python bug report was opened.
Solution
from math import floor, log10
def powerise10(x):
""" Returns x as a*10**b with 0 <= a < 10
"""
if x == 0: return 0,0
Neg = x < 0
if Neg: x = -x
a = 1.0 * x / 10**(floor(log10(x)))
b = int(floor(log10(x)))
if Neg: a = -a
return a,b
def eng(x):
"""Return a string representing x in an engineer friendly notation"""
a,b = powerise10(x)
if -3 < b < 3: return "%.4g" % x
a = a * 10**(b % 3)
b = b - b % 3
return "%.4gE%s" % (a,b)
Source: https://code.activestate.com/recipes/578238-engineering-notation/
Test result
>>> eng(0.0001)
100E-6
Like the answers above, but a bit more compact:
from math import log10, floor
def eng_format(x,precision=3):
"""Returns string in engineering format, i.e. 100.1e-3"""
x = float(x) # inplace copy
if x == 0:
a,b = 0,0
else:
sgn = 1.0 if x > 0 else -1.0
x = abs(x)
a = sgn * x / 10**(floor(log10(x)))
b = int(floor(log10(x)))
if -3 < b < 3:
return ("%." + str(precision) + "g") % x
else:
a = a * 10**(b % 3)
b = b - b % 3
return ("%." + str(precision) + "gE%s") % (a,b)
Trial:
In [10]: eng_format(-1.2345e-4,precision=5)
Out[10]: '-123.45E-6'
How do I convert a hex string to a signed int in Python 3?
The best I can come up with is
h = '9DA92DAB'
b = bytes(h, 'utf-8')
ba = binascii.a2b_hex(b)
print(int.from_bytes(ba, byteorder='big', signed=True))
Is there a simpler way? Unsigned is so much easier: int(h, 16)
BTW, the origin of the question is itunes persistent id - music library xml version and iTunes hex version
In n-bit two's complement, bits have value:
bit 0 = 20
bit 1 = 21
bit n-2 = 2n-2
bit n-1 = -2n-1
But bit n-1 has value 2n-1 when unsigned, so the number is 2n too high. Subtract 2n if bit n-1 is set:
def twos_complement(hexstr, bits):
value = int(hexstr, 16)
if value & (1 << (bits - 1)):
value -= 1 << bits
return value
print(twos_complement('FFFE', 16))
print(twos_complement('7FFF', 16))
print(twos_complement('7F', 8))
print(twos_complement('FF', 8))
Output:
-2
32767
127
-1
import struct
For Python 3 (with comments' help):
h = '9DA92DAB'
struct.unpack('>i', bytes.fromhex(h))
For Python 2:
h = '9DA92DAB'
struct.unpack('>i', h.decode('hex'))
or if it is little endian:
h = '9DA92DAB'
struct.unpack('<i', h.decode('hex'))
Here's a general function you can use for hex of any size:
import math
# hex string to signed integer
def htosi(val):
uintval = int(val,16)
bits = 4 * (len(val) - 2)
if uintval >= math.pow(2,bits-1):
uintval = int(0 - (math.pow(2,bits) - uintval))
return uintval
And to use it:
h = str(hex(-5))
h2 = str(hex(-13589))
x = htosi(h)
x2 = htosi(h2)
This works for 16 bit signed ints, you can extend for 32 bit ints. It uses the basic definition of 2's complement signed numbers. Also note xor with 1 is the same as a binary negate.
# convert to unsigned
x = int('ffbf', 16) # example (-65)
# check sign bit
if (x & 0x8000) == 0x8000:
# if set, invert and add one to get the negative value, then add the negative sign
x = -( (x ^ 0xffff) + 1)
It's a very late answer, but here's a function to do the above. This will extend for whatever length you provide. Credit for portions of this to another SO answer (I lost the link, so please provide it if you find it).
def hex_to_signed(source):
"""Convert a string hex value to a signed hexidecimal value.
This assumes that source is the proper length, and the sign bit
is the first bit in the first byte of the correct length.
hex_to_signed("F") should return -1.
hex_to_signed("0F") should return 15.
"""
if not isinstance(source, str):
raise ValueError("string type required")
if 0 == len(source):
raise valueError("string is empty")
sign_bit_mask = 1 << (len(source)*4-1)
other_bits_mask = sign_bit_mask - 1
value = int(source, 16)
return -(value & sign_bit_mask) | (value & other_bits_mask)
I want to realize IDEA algorithm in Python. In Python we have no limits for variable size, but I need limit bit number in the integer number, for example, to do cyclic left shift. What do you advise?
One way is to use the BitVector library.
Example of use:
>>> from BitVector import BitVector
>>> bv = BitVector(intVal = 0x13A5, size = 32)
>>> print bv
00000000000000000001001110100101
>>> bv << 6 #does a cyclic left shift
>>> print bv
00000000000001001110100101000000
>>> bv[0] = 1
>>> print bv
10000000000001001110100101000000
>>> bv << 3 #cyclic shift again, should be more apparent
>>> print bv
00000000001001110100101000000100
An 8-bit mask with a cyclic left shift:
shifted = number << 1
overflowed = (number & 0x100) >> 8
shifted &= 0xFF
result = overflowed | shifted
You should be able to make a class that does this for you. With a bit more of the same, it can shift an arbitrary amount out of an arbitrary sized value.
The bitstring module might be of help (documentation here). This example creates a 22 bit bitstring and rotates the bits 3 to the right:
>>> from bitstring import BitArray
>>> a = BitArray(22) # creates 22-bit zeroed bitstring
>>> a.uint = 12345 # set the bits with an unsigned integer
>>> a.bin # view the binary representation
'0b0000000011000000111001'
>>> a.ror(3) # rotate to the right
>>> a.bin
'0b0010000000011000000111'
>>> a.uint # and back to the integer representation
525831
If you want a the low 32 bits of a number, you can use binary-and like so:
>>> low32 = (1 << 32) - 1
>>> n = 0x12345678
>>> m = ((n << 20) | (n >> 12)) & low32
>>> "0x%x" % m
'0x67812345'
I have this Python code to do this:
from struct import pack as _pack
def packl(lnum, pad = 1):
if lnum < 0:
raise RangeError("Cannot use packl to convert a negative integer "
"to a string.")
count = 0
l = []
while lnum > 0:
l.append(lnum & 0xffffffffffffffffL)
count += 1
lnum >>= 64
if count <= 0:
return '\0' * pad
elif pad >= 8:
lens = 8 * count % pad
pad = ((lens != 0) and (pad - lens)) or 0
l.append('>' + 'x' * pad + 'Q' * count)
l.reverse()
return _pack(*l)
else:
l.append('>' + 'Q' * count)
l.reverse()
s = _pack(*l).lstrip('\0')
lens = len(s)
if (lens % pad) != 0:
return '\0' * (pad - lens % pad) + s
else:
return s
This takes approximately 174 usec to convert 2**9700 - 1 to a string of bytes on my machine. If I'm willing to use the Python 2.7 and Python 3.x specific bit_length method, I can shorten that to 159 usecs by pre-allocating the l array to be the exact right size at the very beginning and using l[something] = syntax instead of l.append.
Is there anything I can do that will make this faster? This will be used to convert large prime numbers used in cryptography as well as some (but not many) smaller numbers.
Edit
This is currently the fastest option in Python < 3.2, it takes about half the time either direction as the accepted answer:
def packl(lnum, padmultiple=1):
"""Packs the lnum (which must be convertable to a long) into a
byte string 0 padded to a multiple of padmultiple bytes in size. 0
means no padding whatsoever, so that packing 0 result in an empty
string. The resulting byte string is the big-endian two's
complement representation of the passed in long."""
if lnum == 0:
return b'\0' * padmultiple
elif lnum < 0:
raise ValueError("Can only convert non-negative numbers.")
s = hex(lnum)[2:]
s = s.rstrip('L')
if len(s) & 1:
s = '0' + s
s = binascii.unhexlify(s)
if (padmultiple != 1) and (padmultiple != 0):
filled_so_far = len(s) % padmultiple
if filled_so_far != 0:
s = b'\0' * (padmultiple - filled_so_far) + s
return s
def unpackl(bytestr):
"""Treats a byte string as a sequence of base 256 digits
representing an unsigned integer in big-endian format and converts
that representation into a Python integer."""
return int(binascii.hexlify(bytestr), 16) if len(bytestr) > 0 else 0
In Python 3.2 the int class has to_bytes and from_bytes functions that can accomplish this much more quickly that the method given above.
Here is a solution calling the Python/C API via ctypes. Currently, it uses NumPy, but if NumPy is not an option, it could be done purely with ctypes.
import numpy
import ctypes
PyLong_AsByteArray = ctypes.pythonapi._PyLong_AsByteArray
PyLong_AsByteArray.argtypes = [ctypes.py_object,
numpy.ctypeslib.ndpointer(numpy.uint8),
ctypes.c_size_t,
ctypes.c_int,
ctypes.c_int]
def packl_ctypes_numpy(lnum):
a = numpy.zeros(lnum.bit_length()//8 + 1, dtype=numpy.uint8)
PyLong_AsByteArray(lnum, a, a.size, 0, 1)
return a
On my machine, this is 15 times faster than your approach.
Edit: Here is the same code using ctypes only and returning a string instead of a NumPy array:
import ctypes
PyLong_AsByteArray = ctypes.pythonapi._PyLong_AsByteArray
PyLong_AsByteArray.argtypes = [ctypes.py_object,
ctypes.c_char_p,
ctypes.c_size_t,
ctypes.c_int,
ctypes.c_int]
def packl_ctypes(lnum):
a = ctypes.create_string_buffer(lnum.bit_length()//8 + 1)
PyLong_AsByteArray(lnum, a, len(a), 0, 1)
return a.raw
This is another two times faster, totalling to a speed-up factor of 30 on my machine.
For completeness and for future readers of this question:
Starting in Python 3.2, there are functions int.from_bytes() and int.to_bytes() that perform the conversion between bytes and int objects in a choice of byte orders.
I suppose you really should just be using numpy, which I'm sure has something or other built in for this. It might also be faster to hack around with the array module. But I'll take a stab at it anyway.
IMX, creating a generator and using a list comprehension and/or built-in summation is faster than a loop that appends to a list, because the appending can be done internally. Oh, and 'lstrip' on a large string has got to be costly.
Also, some style points: special cases aren't special enough; and you appear not to have gotten the memo about the new x if y else z construct. :) Although we don't need it anyway. ;)
from struct import pack as _pack
Q_size = 64
Q_bitmask = (1L << Q_size) - 1L
def quads_gen(a_long):
while a_long:
yield a_long & Q_bitmask
a_long >>= Q_size
def pack_long_big_endian(a_long, pad = 1):
if lnum < 0:
raise RangeError("Cannot use packl to convert a negative integer "
"to a string.")
qs = list(reversed(quads_gen(a_long)))
# Pack the first one separately so we can lstrip nicely.
first = _pack('>Q', qs[0]).lstrip('\x00')
rest = _pack('>%sQ' % len(qs) - 1, *qs[1:])
count = len(first) + len(rest)
# A little math trick that depends on Python's behaviour of modulus
# for negative numbers - but it's well-defined and documented
return '\x00' * (-count % pad) + first + rest
Just wanted to post a follow-up to Sven's answer (which works great). The opposite operation - going from arbitrarily long bytes object to Python Integer object requires the following (because there is no PyLong_FromByteArray() C API function that I can find):
import binascii
def unpack_bytes(stringbytes):
#binascii.hexlify will be obsolete in python3 soon
#They will add a .tohex() method to bytes class
#Issue 3532 bugs.python.org
return int(binascii.hexlify(stringbytes), 16)