I have a file with big endian binaries. There are two numeric fields. The first has length 8 and the second length 12. How can I unpack the two numbers?
I am using the Python module struct (https://docs.python.org/2/library/struct.html) and it works for the first field
num1 = struct.unpack('>Q',payload[0:8])
but I don't know how I can unpack the second number. If I treat it as char(12), then I get something like '\x00\xe3AC\x00\x00\x00\x06\x00\x00\x00\x01'.
Thanks.
I think you should create a new string of bytes for the second number of length 16, fill the last 12 bytes with the string of bytes that hold your number and first 4 ones with zeros.
Then decode the bytestring with unpack with format >QQ, let's say to numHI, numLO variables. Then, you get final number with that: number = numHI * 2^64 + numLO*. AFAIR the integers in Python can be (almost) as large as you wish, so you will have no problems with overflows. That's only rough idea, please comment if you have problems with writing that in actual Python code, I'll then edit my answer to provide more help.
*^ is in this case the math power, so please use math.pow. Alternatively, you can use byte shift: number = numHI << 64 + numLO.
Related
I am working on a program which does a lot of hashing, and in one of the steps I take a result of hashlib's ripemd160 hash and convert it into an integer. The lines are:
ripe_fruit = new('ripemd160', sha256(key.to_der()).digest())
key_hash160 = struct.unpack("<Q", ripe_fruit.digest())[0]
It gives me the error:
struct.error: unpack requires a buffer of 8 bytes
I tried changing the value to L and other things, but they didn't work. How do I fix this?
RIPEMD-160 returns 160 bits, or 20 bytes. struct doesn't know how to unpack integers larger than 8 bytes. You have two options and the right one depends on what exactly you're trying to do.
If your algorithm is looking for just some of the bytes of the hash, you can take the first or last 8 bytes and unpack those.
key_hash160 = struct.unpack("<Q", ripe_fruit.digest()[:8])[0]
If you need a 160 bytes integer, you first have to decide how that's represented. Is it little endian or big endian or something in between? Then you can break the array into 20 bytes and then calculate one number from them. Assuming little endian based on the < in your question, you can then do something like:
key_parts = struct.unpack("B" * 20, ripe_fruit.digest())
key_hash160 = 0
for b in key_parts[::-1]:
key_hash160 <<= 8
key_hash160 |= b
Reverse bits of a given 32 bits unsigned integer.
For example, given input 43261596 (represented in binary as
00000010100101000001111010011100), return 964176192 (represented in
binary as 00111001011110000010100101000000).
This does not work
def reverseBits(self, n):
return int(bin(n)[:1:-1], 2)
Your problem is in assuming Python's bin produces a 32 bit aligned output. It doesn't; it outputs the smallest number of bits possible. Python 3's int type has an unbounded number of bits, and even in Python 2, int will auto-promote to long if it overflows the bounds of int (which is not related to the limits of C's int).
If you want it to act like a specific width, the easiest way is to use formatting tools with more control (which will also simplify your slice operation).
For example, by formatting to a fixed 32 characters wide, padding with zeroes, you get your desired result:
>>> int('{:032b}'.format(43261596)[::-1], 2)
964176192
The answer is in the output of bin():
>>> bin(12345)
'0b11000000111001'
As you can see, it only outputs the first 14 ones and zeros. This is because it removes any leading zeros. Why does it do this? Well, python doesn't use a fixed size for integers like many other languages. The ints might be any number of bytes in practice, depending on the system and implementation.
So instead of 00000000000000001111111111111111 becoming 11111111111111110000000000000000, it becomes 1111111111111111 instead
I'm currently trying to take integer arrays that actually represent other data types and convert them into the correct datatype.
So for example, if I had the integer array [1196773188, 542327116], I discover that this integer array represents a string from some other function, convert it, and realize it represents the string "DOUGLAS". The first number translates to the hexadecimal number 0x47554F44 and the second number represents the hexadecimal number 0x2053414C. Using a hex to string converter, these correspond to the strings 'GOUD' and 'SAL' respectively, spelling DOUGLAS in a little endian manner. The way the letters are backwards in individual elements of the array likely stem from the bytes being stored in a litte endian manner, although I might be mistaken on that.
These integer arrays could represent a number of datatypes, including strings, booleans, and floats.
I need to use Python 2.7, so I unfortunately can't use the bytes function.
Is there a simple way to convert an integer array to its corresponding datatype?
It seems that the struct module is the best way to go when converting between different types like this:
import struct
bufferstr = ""
dougarray = [1196773188, 542327116]
for num in dougarray:
bufferstr += struct.pack("i", num)
print bufferstr # Result is 'DOUGLAS'
From this point on we can easily convert 'DOUGLAS' to any datatype we want using struct.unpack():
print struct.unpack("f", bufferstr[0:4]) # Result is (54607.265625)
We can only unpack a certain number of bytes at a time however. Thank you all for the suggestions!
I'm fairly weak with structs but I have a feeling they're the best way to do this. I have a large string of binary data and need to pull 32 of those chars, starting at a specific index, and store them as an int. What is the best way to do this?
Since I need to start at an initial position I have been playing with struct.unpack_from(). Based on the format table here, I thought the 'i' formatting being 4 bytes is exactly what I needed but the code below executes and prints "(825307441,)" where I was expecting either the binary, decimal or hex form. Can anyone explain to me what 825307441 represents?
Also is there a method of extracting the data in a similar fashion but returning it in a list instead of a tuple? Thank you
st = "1111111111111111111111111111111"
test = struct.unpack_from('i',st,0)
print test
Just use int
>>> st = "1111111111111111111111111111111"
>>> int(st,2)
2147483647
>>> int(st[1:4],2)
7
You can slice the string any way you want to get the indices you desire. Passing 2 to int tells int that you are passing it a string in binary
I need to rewrite some Python script in Objective-C. It's not that hard since Python is easily readable but this piece of code struggles me a bit.
def str_to_a32(b):
if len(b) % 4:
# pad to multiple of 4
b += '\0' * (4 - len(b) % 4)
return struct.unpack('>%dI' % (len(b) / 4), b)
What is this function supposed to do?
I'm not positive, but I'm using the documentation to take a stab at it.
Looking at the docs, we're going to return a tuple based on the format string:
Unpack the string (presumably packed by pack(fmt, ...)) according to the given format. The result is a tuple even if it contains exactly one item. The string must contain exactly the amount of data required by the format (len(string) must equal calcsize(fmt)).
The item coming in (b) is probably a byte buffer (represented as a string) - looking at the examples they are represented the the \x escape, which consumes the next two characters as hex.
It appears the format string is
'>%dI' % (len(b) / 4)
The % and %d are going to put a number into the format string, so if the length of b is 32 the format string becomes
`>8I`
The first part of the format string is >, which the documentation says is setting the byte order to big-endian and size to standard.
The I says it will be an unsigned int with size 4 (docs), and the 8 in front of it means it will be repeated 8 times.
>IIIIIIII
So I think this is saying: take this byte buffer, make sure it's a multiple of 4 by appending as many 0x00s as is necessary, then unpack that into a tuple with as many unsigned integers as there are blocks of 4 bytes in the buffer.
Looks like it's supposed to take an input array of bytes represented as a string and unpack them as big-endian (the ">") unsigned ints (the 'I') The formatting codes are explaied in http://docs.python.org/2/library/struct.html
This takes a string and converts it into a tuple of Unsigned Integers. If you look at the python struct documentation you will see how it works. In a nutshell it handles conversions between Python values and C structs represented as Python strings for handling binary data stored in files (unceremoniously copied from the link provided).
In your case, the function takes a string, b and adds some extra characters to make sure that it is the standard size of the an unsigned int (see link), and then converts it into a tuple of integers using the big endian representation of the characters. This is the '>' part. The I part says to use unsigned integers