Python - reading 10 bit integers from a binary file - python

I have a binary file containing a stream of 10-bit integers. I want to read it and store the values in a list.
It is working with the following code, which reads my_file and fills pixels with integer values:
file = open("my_file", "rb")
pixels = []
new10bitsByte = ""
try:
byte = file.read(1)
while byte:
bits = bin(ord(byte))[2:].rjust(8, '0')
for bit in reversed(bits):
new10bitsByte += bit
if len(new10bitsByte) == 10:
pixels.append(int(new10bitsByte[::-1], 2))
new10bitsByte = ""
byte = file.read(1)
finally:
file.close()
It doesn't seem very elegant to read the bytes into bits, and read it back into "10-bit" bytes. Is there a better way to do it?
With 8 or 16 bit integers I could just use file.read(size) and convert the result to an int directly. But here, as each value is stored in 1.25 bytes, I would need something like file.read(1.25)...

Here's a generator that does the bit operations without using text string conversions. Hopefully, it's a little more efficient. :)
To test it, I write all the numbers in range(1024) to a BytesIO stream, which behaves like a binary file.
from io import BytesIO
def tenbitread(f):
''' Generate 10 bit (unsigned) integers from a binary file '''
while True:
b = f.read(5)
if len(b) == 0:
break
n = int.from_bytes(b, 'big')
#Split n into 4 10 bit integers
t = []
for i in range(4):
t.append(n & 0x3ff)
n >>= 10
yield from reversed(t)
# Make some test data: all the integers in range(1024),
# and save it to a byte stream
buff = BytesIO()
maxi = 1024
n = 0
for i in range(maxi):
n = (n << 10) | i
#Convert the 40 bit integer to 5 bytes & write them
if i % 4 == 3:
buff.write(n.to_bytes(5, 'big'))
n = 0
# Rewind the stream so we can read from it
buff.seek(0)
# Read the data in 10 bit chunks
a = list(tenbitread(buff))
# Check it
print(a == list(range(maxi)))
output
True
Doing list(tenbitread(buff)) is the simplest way to turn the generator output into a list, but you can easily iterate over the values instead, eg
for v in tenbitread(buff):
or
for i, v in enumerate(tenbitread(buff)):
if you want indices as well as the data values.
Here's a little-endian version of the generator which gives the same results as your code.
def tenbitread(f):
''' Generate 10 bit (unsigned) integers from a binary file '''
while True:
b = f.read(5)
if not len(b):
break
n = int.from_bytes(b, 'little')
#Split n into 4 10 bit integers
for i in range(4):
yield n & 0x3ff
n >>= 10
We can improve this version slightly by "un-rolling" that for loop, which lets us get rid of the final masking and shifting operations.
def tenbitread(f):
''' Generate 10 bit (unsigned) integers from a binary file '''
while True:
b = f.read(5)
if not len(b):
break
n = int.from_bytes(b, 'little')
#Split n into 4 10 bit integers
yield n & 0x3ff
n >>= 10
yield n & 0x3ff
n >>= 10
yield n & 0x3ff
n >>= 10
yield n
This should give a little more speed...

As there is no direct way to read a file x-bit by x-bit in Python, we have to read it byte by byte. Following MisterMiyagi and PM 2Ring's suggestions I modified my code to read the file by 5 byte chunks (i.e. 40 bits) and then split the resulting string into 4 10-bit numbers, instead of looping over the bits individually. It turned out to be twice as fast as my previous code.
file = open("my_file", "rb")
pixels = []
exit_loop = False
try:
while not exit_loop:
# Read 5 consecutive bytes into fiveBytesString
fiveBytesString = ""
for i in range(5):
byte = file.read(1)
if not byte:
exit_loop = True
break
byteString = format(ord(byte), '08b')
fiveBytesString += byteString[::-1]
# Split fiveBytesString into 4 10-bit numbers, and add them to pixels
pixels.extend([int(fiveBytesString[i:i+10][::-1], 2) for i in range(0, 40, 10) if len(fiveBytesString[i:i+10]) > 0])
finally:
file.close()

Adding a Numpy based solution suitable for unpacking large 10-bit packed byte buffers like the ones you might receive from AVT and FLIR cameras.
This is a 10-bit version of #cyrilgaudefroy's answer to a similar question; there you can also find a Numba alternative capable of yielding an additional speed increase.
import numpy as np
def read_uint10(byte_buf):
data = np.frombuffer(byte_buf, dtype=np.uint8)
# 5 bytes contain 4 10-bit pixels (5x8 == 4x10)
b1, b2, b3, b4, b5 = np.reshape(data, (data.shape[0]//5, 5)).astype(np.uint16).T
o1 = (b1 << 2) + (b2 >> 6)
o2 = ((b2 % 64) << 4) + (b3 >> 4)
o3 = ((b3 % 16) << 6) + (b4 >> 2)
o4 = ((b4 % 4) << 8) + b5
unpacked = np.reshape(np.concatenate((o1[:, None], o2[:, None], o3[:, None], o4[:, None]), axis=1), 4*o1.shape[0])
return unpacked
Reshape can be omitted if returning a buffer instead of a Numpy array:
unpacked = np.concatenate((o1[:, None], o2[:, None], o3[:, None], o4[:, None]), axis=1).tobytes()
Or if image dimensions are known it can be reshaped directly, e.g.:
unpacked = np.reshape(np.concatenate((o1[:, None], o2[:, None], o3[:, None], o4[:, None]), axis=1), (1024, 1024))
If the use of the modulus operator appears confusing, try playing around with:
np.unpackbits(np.array([255%64], dtype=np.uint8))
Edit: It turns out that the Allied Vision Mako-U cameras employ a different ordering than the one I originally suggested above:
o1 = ((b2 % 4) << 8) + b1
o2 = ((b3 % 16) << 6) + (b2 >> 2)
o3 = ((b4 % 64) << 4) + (b3 >> 4)
o4 = (b5 << 2) + (b4 >> 6)
So you might have to test different orders if images come out looking wonky initially for your specific setup.

Related

CRC32 calculation in Python without using libraries

I have been trying to get my head around CRC32 calculations without much success, the values that I seem to get do not match what I should get.
I am aware that Python has libraries that are capable of generating these checksums (namely zlib and binascii) but I do not have the luxury of being able to use them as the CRC functionality do not exist on the micropython.
So far I have the following code:
import binascii
import zlib
from array import array
poly = 0xEDB88320
table = array('L')
for byte in range(256):
crc = 0
for bit in range(8):
if (byte ^ crc) & 1:
crc = (crc >> 1) ^ poly
else:
crc >>= 1
byte >>= 1
table.append(crc)
def crc32(string):
value = 0xffffffffL
for ch in string:
value = table[(ord(ch) ^ value) & 0x000000ffL] ^ (value >> 8)
return value
teststring = "test"
print "binascii calc: 0x%08x" % (binascii.crc32(teststring) & 0xffffffff)
print "zlib calc: 0x%08x" % (zlib.crc32(teststring) & 0xffffffff)
print "my calc: 0x%08x" % (crc32(teststring))
Then I get the following output:
binascii calc: 0xd87f7e0c
zlib calc: 0xd87f7e0c
my calc: 0x2780810c
The binascii and zlib calculations agree where as my one doesn't. I believe the calculated table of bytes is correct as I have compared it to examples available on the net. So the issue must be the routine where each byte is calculated, could anyone point me in the correct direction?
Thanks in advance!
I haven't looked closely at your code, so I can't pinpoint the exact source of the error, but you can easily tweak it to get the desired output:
import binascii
from array import array
poly = 0xEDB88320
table = array('L')
for byte in range(256):
crc = 0
for bit in range(8):
if (byte ^ crc) & 1:
crc = (crc >> 1) ^ poly
else:
crc >>= 1
byte >>= 1
table.append(crc)
def crc32(string):
value = 0xffffffffL
for ch in string:
value = table[(ord(ch) ^ value) & 0xff] ^ (value >> 8)
return -1 - value
# test
data = (
'',
'test',
'hello world',
'1234',
'A long string to test CRC32 functions',
)
for s in data:
print repr(s)
a = binascii.crc32(s)
print '%08x' % (a & 0xffffffffL)
b = crc32(s)
print '%08x' % (b & 0xffffffffL)
print
output
''
00000000
00000000
'test'
d87f7e0c
d87f7e0c
'hello world'
0d4a1185
0d4a1185
'1234'
9be3e0a3
9be3e0a3
'A long string to test CRC32 functions'
d2d10e28
d2d10e28
Here are a couple more tests that verify that the tweaked crc32 gives the same result as binascii.crc32.
from random import seed, randrange
print 'Single byte tests...',
for i in range(256):
s = chr(i)
a = binascii.crc32(s) & 0xffffffffL
b = crc32(s) & 0xffffffffL
assert a == b, (repr(s), a, b)
print('ok')
seed(42)
print 'Multi-byte tests...'
for width in range(2, 20):
print 'Width', width
r = range(width)
for n in range(1000):
s = ''.join([chr(randrange(256)) for i in r])
a = binascii.crc32(s) & 0xffffffffL
b = crc32(s) & 0xffffffffL
assert a == b, (repr(s), a, b)
print('ok')
output
Single byte tests... ok
Multi-byte tests...
Width 2
Width 3
Width 4
Width 5
Width 6
Width 7
Width 8
Width 9
Width 10
Width 11
Width 12
Width 13
Width 14
Width 15
Width 16
Width 17
Width 18
Width 19
ok
As discussed in the comments, the source of the error in the original code is that this CRC-32 algorithm inverts the initial crc buffer, and then inverts the final buffer contents. So value is initialised to 0xffffffff instead of zero, and we need to return value ^ 0xffffffff, which can also be written as ~value & 0xffffffff, i.e. invert value and then select the low-order 32 bits of the result.
If using binary data where the crc is chained over multiple buffers I used the following (using the OPs table):
def crc32(data, crc=0xffffffff):
for b in data:
crc = table[(b ^ crc) & 0xff] ^ (crc >> 8)
return crc
One can XOR the final result with -1 to agree with the online calculators.
crc = crc32(b'test')
print('0x{:08x}'.format(crc))
crc = crc32(b'te')
crc = crc32(b'st', crc)
print('0x{:08x}'.format(crc))
print('xor: 0x{:08x}'.format(crc ^ 0xffffffff))
output
0x278081f3
0x278081f3
xor: 0xd87f7e0c

Ascii string of bytes packed into bitmap/bitstring back to string?

I have a string that is packed such that each character was originally an unsigned byte but is stored as 7 bits and then packed into an unsigned byte array. I'm trying to find a quick way to unpack this string in Python but the function I wrote that uses the bitstring module works well but is very slow. It seems like something like this should not be so slow but I'm probably doing it very inefficiently...
This seems like something that is probably trivial but I just don't know what to use, maybe there is already a function that will unpack the string?
from bitstring import BitArray
def unpackString(raw):
msg = ''
bits = BitArray(bytes=raw)
mask = BitArray('0b01111111')
i = 0
while 1:
try:
iByte = (bits[i:i + 8] & mask).int
# value of 0 denotes a line break
if iByte == 0:
msg += '\n'
elif iByte >= 32 and iByte <= 126:
msg += chr(iByte)
i += 7
except:
break
return msg
This took me a while to figure out, as your solution seems to ignore the first bit of data. Given the input byte of 129 (0b10000001) I would expect to see 64 '1000000' printed by the following, but your code produces 1 '0000001' -- ignoring the first bit.
bs = b'\x81' # one byte string, whose value is 129 (0x81)
arr = BitArray(bs)
mask = BitArray('0b01111111')
byte = (arr[0:8] & mask).int
print(byte, repr("{:07b}".format(byte)))
Simplest solution would be to modify your solution to use bitstring.ConstBitStream -- I got an order of magnitude speed increase with the following.
from bitstring import ConstBitStream
def unpack_bitstream(raw):
num_bytes, remainder = divmod(len(raw) * 8 - 1, 7)
bitstream = ConstBitStream(bytes=raw, offset=1) # use offset to ignore leading bit
msg = b''
for _ in range(num_bytes):
byte = bitstream.read("uint:7")
if not byte:
msg += b'\n'
elif 32 <= byte <= 126:
msg += bytes((byte,))
# msg += chr(byte) # python 2
return msg
However, this can be done quite easily using only the standard library. This makes the solution more portable and, in the instances I tried, faster by another order of magnitude (I didn't try the cythonised version of bitstring).
def unpack_bytes(raw, zero_replacement=ord("\n")):
# use - 1 to ignore leading bit
num_bytes, remainder = divmod(len(raw) * 8 - 1, 7)
i = int.from_bytes(raw, byteorder="big")
# i = int(raw.encode("hex"), 16) # python 2
if remainder:
# remainder means there are unused trailing bits, so remove these
i >>= remainder
msg = []
for _ in range(num_bytes):
byte = i & 127
if not byte:
msg.append(zero_replacement)
elif 32 <= byte <= 126:
msg.append(byte)
i >>= 7
msg.reverse()
return bytes(msg)
# return b"".join(chr(c) for c in msg) # python 2
I've used python 3 to create these methods. If you're using python 2 then there are a number of adjustments you'll need to make. I've added these as comments after the line they are intended to replace and marked them python 2.

how to convert bytes to np.array

I'm read a buffer of bytes from data recorded through my computer's microphone (2 channels) using pyaudio example taken from site.
import pyaudio
import wave
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "output.wav"
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
print("* recording")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print("* done recording")
print frames
frames looks like this:
['\x00\xfd\xff\xff.....\xfc\xff\xff', '\xff\xfc\xff\xff......\xfc\xff\xff', ... ]
or if I change CHUNK = 1:
['\x00\xfd\xff\xff', '\xff\xfc\xff\xff', '\x00\xfd\xcc\xcf']
though of course much longer. I suspect that the bytes are interleaved for each channel so I think I need to break them out in pairs of two.
What I'd like is an array like this:
np.array([
[123, 43],
[3, 433],
[43, 66]
])
where the first column is the values from the first channel, and the second from the second channel. how do I go about interpreting these encoded values (with CHUNK set to a reasonable value like 1024)?
UPDATE:
I'm quite confused. I used the below to change the format list of strings into a single string of space-separated hex values, but there appears to be an odd number of them...which wouldn't happen if there are two values, one for each channel (would be even number):
fms = ''.join(frames)
fms_string = ''.join( [ "%02X " % ord( x ) for x in fms ] ).strip()
fms_list = fms_string.split(" ")
print len(fms_list) # this prints an ODD number...
UPDATE 2:
I tried a simpler route and tried this:
import array
fstring = ''.join(frames)
wave_nums = array.array('h', fstring) # this correctly returns list of ints!
print len(wave_nums)
I tried this for different recording times and got the following (confusing results):
RECORD_SECONDS = 2 ---> len(wave_nums) is 132300 (132300 / 44100 = 3 seconds of frames)
RECORD_SECONDS = 4 ---> len(wave_nums) is 308700 (308700 / 44100 = 7 seconds of frames)
RECORD_SECONDS = 5 ---> len(wave_nums) is 396900 (396900 / 44100 = 9 seconds of frames)
which implies that I'm getting a number of frames consistent with 2*(number of seconds recording) - 1 seconds...how is this possible?
Based on a quick glance of the portaudio source it looks like the channels are in fact interleaved
You can use a join to flatten the list, calculate the left and right values (you set them to be 16 bits long), and then zip the list with itself.
joined = ''.join(frames).encode('latin-1')
left = map(lambda m, l: (m << 8) + l, joined[0::4], joined[1::4])
right = map(lambda m, l: (m << 8) + l, joined[2::4], joined[3::4])
zipped = zip(left, right)
On python 2.x, the encode latin1 trick doesn't work, so you'll need to do
joined = ''.join(frames)
joined = map(ord, joined)
left = map(lambda m, l: (m << 8) + l, joined[0::4], joined[1::4])
right = map(lambda m, l: (m << 8) + l, joined[2::4], joined[3::4])
zipped = zip(left, right)
This has something to do with python 2.x's preference for ascii strings vs unicode.
Update:
w.r.t the odd number of bytes, read might have tried to read too many bytes ahead and failed silently, only returning whatever it had at the moment. You should always receive a multiple of CHUNK bytes from read under normal conditions, so unless your join function has an error, something is wrong on their end. Try it with mine and see what happens.
The simplest answer appears to be this:
import array
f = ''.join(frames)
nums = array.array('h', f)
left = nums[1::2]
right = nums[0::2]
#Dylan's answer is also good but a bit more verbose and also the values are unsigned, where wav values are signed.
Also changing CHUNK to a value of 1225 is best since 44100 is a multiple of 1225, and no frames are lost as a result of rounding error.

Hex string to signed int in Python

How do I convert a hex string to a signed int in Python 3?
The best I can come up with is
h = '9DA92DAB'
b = bytes(h, 'utf-8')
ba = binascii.a2b_hex(b)
print(int.from_bytes(ba, byteorder='big', signed=True))
Is there a simpler way? Unsigned is so much easier: int(h, 16)
BTW, the origin of the question is itunes persistent id - music library xml version and iTunes hex version
In n-bit two's complement, bits have value:
bit 0 = 20
bit 1 = 21
bit n-2 = 2n-2
bit n-1 = -2n-1
But bit n-1 has value 2n-1 when unsigned, so the number is 2n too high. Subtract 2n if bit n-1 is set:
def twos_complement(hexstr, bits):
value = int(hexstr, 16)
if value & (1 << (bits - 1)):
value -= 1 << bits
return value
print(twos_complement('FFFE', 16))
print(twos_complement('7FFF', 16))
print(twos_complement('7F', 8))
print(twos_complement('FF', 8))
Output:
-2
32767
127
-1
import struct
For Python 3 (with comments' help):
h = '9DA92DAB'
struct.unpack('>i', bytes.fromhex(h))
For Python 2:
h = '9DA92DAB'
struct.unpack('>i', h.decode('hex'))
or if it is little endian:
h = '9DA92DAB'
struct.unpack('<i', h.decode('hex'))
Here's a general function you can use for hex of any size:
import math
# hex string to signed integer
def htosi(val):
uintval = int(val,16)
bits = 4 * (len(val) - 2)
if uintval >= math.pow(2,bits-1):
uintval = int(0 - (math.pow(2,bits) - uintval))
return uintval
And to use it:
h = str(hex(-5))
h2 = str(hex(-13589))
x = htosi(h)
x2 = htosi(h2)
This works for 16 bit signed ints, you can extend for 32 bit ints. It uses the basic definition of 2's complement signed numbers. Also note xor with 1 is the same as a binary negate.
# convert to unsigned
x = int('ffbf', 16) # example (-65)
# check sign bit
if (x & 0x8000) == 0x8000:
# if set, invert and add one to get the negative value, then add the negative sign
x = -( (x ^ 0xffff) + 1)
It's a very late answer, but here's a function to do the above. This will extend for whatever length you provide. Credit for portions of this to another SO answer (I lost the link, so please provide it if you find it).
def hex_to_signed(source):
"""Convert a string hex value to a signed hexidecimal value.
This assumes that source is the proper length, and the sign bit
is the first bit in the first byte of the correct length.
hex_to_signed("F") should return -1.
hex_to_signed("0F") should return 15.
"""
if not isinstance(source, str):
raise ValueError("string type required")
if 0 == len(source):
raise valueError("string is empty")
sign_bit_mask = 1 << (len(source)*4-1)
other_bits_mask = sign_bit_mask - 1
value = int(source, 16)
return -(value & sign_bit_mask) | (value & other_bits_mask)

Is there a faster way to convert an arbitrary large integer to a big endian sequence of bytes?

I have this Python code to do this:
from struct import pack as _pack
def packl(lnum, pad = 1):
if lnum < 0:
raise RangeError("Cannot use packl to convert a negative integer "
"to a string.")
count = 0
l = []
while lnum > 0:
l.append(lnum & 0xffffffffffffffffL)
count += 1
lnum >>= 64
if count <= 0:
return '\0' * pad
elif pad >= 8:
lens = 8 * count % pad
pad = ((lens != 0) and (pad - lens)) or 0
l.append('>' + 'x' * pad + 'Q' * count)
l.reverse()
return _pack(*l)
else:
l.append('>' + 'Q' * count)
l.reverse()
s = _pack(*l).lstrip('\0')
lens = len(s)
if (lens % pad) != 0:
return '\0' * (pad - lens % pad) + s
else:
return s
This takes approximately 174 usec to convert 2**9700 - 1 to a string of bytes on my machine. If I'm willing to use the Python 2.7 and Python 3.x specific bit_length method, I can shorten that to 159 usecs by pre-allocating the l array to be the exact right size at the very beginning and using l[something] = syntax instead of l.append.
Is there anything I can do that will make this faster? This will be used to convert large prime numbers used in cryptography as well as some (but not many) smaller numbers.
Edit
This is currently the fastest option in Python < 3.2, it takes about half the time either direction as the accepted answer:
def packl(lnum, padmultiple=1):
"""Packs the lnum (which must be convertable to a long) into a
byte string 0 padded to a multiple of padmultiple bytes in size. 0
means no padding whatsoever, so that packing 0 result in an empty
string. The resulting byte string is the big-endian two's
complement representation of the passed in long."""
if lnum == 0:
return b'\0' * padmultiple
elif lnum < 0:
raise ValueError("Can only convert non-negative numbers.")
s = hex(lnum)[2:]
s = s.rstrip('L')
if len(s) & 1:
s = '0' + s
s = binascii.unhexlify(s)
if (padmultiple != 1) and (padmultiple != 0):
filled_so_far = len(s) % padmultiple
if filled_so_far != 0:
s = b'\0' * (padmultiple - filled_so_far) + s
return s
def unpackl(bytestr):
"""Treats a byte string as a sequence of base 256 digits
representing an unsigned integer in big-endian format and converts
that representation into a Python integer."""
return int(binascii.hexlify(bytestr), 16) if len(bytestr) > 0 else 0
In Python 3.2 the int class has to_bytes and from_bytes functions that can accomplish this much more quickly that the method given above.
Here is a solution calling the Python/C API via ctypes. Currently, it uses NumPy, but if NumPy is not an option, it could be done purely with ctypes.
import numpy
import ctypes
PyLong_AsByteArray = ctypes.pythonapi._PyLong_AsByteArray
PyLong_AsByteArray.argtypes = [ctypes.py_object,
numpy.ctypeslib.ndpointer(numpy.uint8),
ctypes.c_size_t,
ctypes.c_int,
ctypes.c_int]
def packl_ctypes_numpy(lnum):
a = numpy.zeros(lnum.bit_length()//8 + 1, dtype=numpy.uint8)
PyLong_AsByteArray(lnum, a, a.size, 0, 1)
return a
On my machine, this is 15 times faster than your approach.
Edit: Here is the same code using ctypes only and returning a string instead of a NumPy array:
import ctypes
PyLong_AsByteArray = ctypes.pythonapi._PyLong_AsByteArray
PyLong_AsByteArray.argtypes = [ctypes.py_object,
ctypes.c_char_p,
ctypes.c_size_t,
ctypes.c_int,
ctypes.c_int]
def packl_ctypes(lnum):
a = ctypes.create_string_buffer(lnum.bit_length()//8 + 1)
PyLong_AsByteArray(lnum, a, len(a), 0, 1)
return a.raw
This is another two times faster, totalling to a speed-up factor of 30 on my machine.
For completeness and for future readers of this question:
Starting in Python 3.2, there are functions int.from_bytes() and int.to_bytes() that perform the conversion between bytes and int objects in a choice of byte orders.
I suppose you really should just be using numpy, which I'm sure has something or other built in for this. It might also be faster to hack around with the array module. But I'll take a stab at it anyway.
IMX, creating a generator and using a list comprehension and/or built-in summation is faster than a loop that appends to a list, because the appending can be done internally. Oh, and 'lstrip' on a large string has got to be costly.
Also, some style points: special cases aren't special enough; and you appear not to have gotten the memo about the new x if y else z construct. :) Although we don't need it anyway. ;)
from struct import pack as _pack
Q_size = 64
Q_bitmask = (1L << Q_size) - 1L
def quads_gen(a_long):
while a_long:
yield a_long & Q_bitmask
a_long >>= Q_size
def pack_long_big_endian(a_long, pad = 1):
if lnum < 0:
raise RangeError("Cannot use packl to convert a negative integer "
"to a string.")
qs = list(reversed(quads_gen(a_long)))
# Pack the first one separately so we can lstrip nicely.
first = _pack('>Q', qs[0]).lstrip('\x00')
rest = _pack('>%sQ' % len(qs) - 1, *qs[1:])
count = len(first) + len(rest)
# A little math trick that depends on Python's behaviour of modulus
# for negative numbers - but it's well-defined and documented
return '\x00' * (-count % pad) + first + rest
Just wanted to post a follow-up to Sven's answer (which works great). The opposite operation - going from arbitrarily long bytes object to Python Integer object requires the following (because there is no PyLong_FromByteArray() C API function that I can find):
import binascii
def unpack_bytes(stringbytes):
#binascii.hexlify will be obsolete in python3 soon
#They will add a .tohex() method to bytes class
#Issue 3532 bugs.python.org
return int(binascii.hexlify(stringbytes), 16)

Categories