Unpacking ripemd160 result in python - python

I am working on a program which does a lot of hashing, and in one of the steps I take a result of hashlib's ripemd160 hash and convert it into an integer. The lines are:
ripe_fruit = new('ripemd160', sha256(key.to_der()).digest())
key_hash160 = struct.unpack("<Q", ripe_fruit.digest())[0]
It gives me the error:
struct.error: unpack requires a buffer of 8 bytes
I tried changing the value to L and other things, but they didn't work. How do I fix this?

RIPEMD-160 returns 160 bits, or 20 bytes. struct doesn't know how to unpack integers larger than 8 bytes. You have two options and the right one depends on what exactly you're trying to do.
If your algorithm is looking for just some of the bytes of the hash, you can take the first or last 8 bytes and unpack those.
key_hash160 = struct.unpack("<Q", ripe_fruit.digest()[:8])[0]
If you need a 160 bytes integer, you first have to decide how that's represented. Is it little endian or big endian or something in between? Then you can break the array into 20 bytes and then calculate one number from them. Assuming little endian based on the < in your question, you can then do something like:
key_parts = struct.unpack("B" * 20, ripe_fruit.digest())
key_hash160 = 0
for b in key_parts[::-1]:
key_hash160 <<= 8
key_hash160 |= b

Related

Pack into c types and obtain the binary value back

I'm using the following code to pack an integer into an unsigned short as follows,
raw_data = 40
# Pack into little endian
data_packed = struct.pack('<H', raw_data)
Now I'm trying to unpack the result as follows. I use utf-16-le since the data is encoded as little-endian.
def get_bin_str(data):
bin_asc = binascii.hexlify(data)
result = bin(int(bin_asc.decode("utf-16-le"), 16))
trimmed_res = result[2:]
return trimmed_res
print(get_bin_str(data_packed))
Unfortunately, it throws the following error,
result = bin(int(bin_asc.decode("utf-16-le"), 16)) ValueError: invalid
literal for int() with base 16: '㠲〰'
How do I properly decode the bytes in little-endian to binary data properly?
Use unpack to reverse what you packed. The data isn't UTF-encoded so there is no reason to use UTF encodings.
>>> import struct
>>> data_packed = struct.pack('<H', 40)
>>> data_packed.hex() # the two little-endian bytes are 0x28 (40) and 0x00 (0)
2800
>>> data = struct.unpack('<H',data_packed)
>>> data
(40,)
unpack returns a tuple, so index it to get the single value
>>> data = struct.unpack('<H',data_packed)[0]
>>> data
40
To print in binary use string formatting. Either of these work work best. bin() doesn't let you specify the number of binary digits to display and the 0b needs to be removed if not desired.
>>> format(data,'016b')
'0000000000101000'
>>> f'{data:016b}'
'0000000000101000'
You have not said what you are trying to do, so let's assume your goal is to educate yourself. (If you are trying to pack data that will be passed to another program, the only reliable test is to check if the program reads your output correctly.)
Python does not have an "unsigned short" type, so the output of struct.pack() is a byte array. To see what's in it, just print it:
>>> data_packed = struct.pack('<H', 40)
>>> print(data_packed)
b'(\x00'
What's that? Well, the character (, which is decimal 40 in the ascii table, followed by a null byte. If you had used a number that does not map to a printable ascii character, you'd see something less surprising:
>>> struct.pack("<H", 11)
b'\x0b\x00'
Where 0b is 11 in hex, of course. Wait, I specified "little-endian", so why is my number on the left? The answer is, it's not. Python prints the byte string left to right because that's how English is written, but that's irrelevant. If it helps, think of strings as growing upwards: From low memory locations to high memory. The least significant byte comes first, which makes this little-endian.
Anyway, you can also look at the bytes directly:
>>> print(data_packed[0])
40
Yup, it's still there. But what about the bits, you say? For this, use bin() on each of the bytes separately:
>>> bin(data_packed[0])
'0b101000'
>>> bin(data_packed[1])
'0b0'
The two high bits you see are worth 32 and 8. Your number was less than 256, so it fits entirely in the low byte of the short you constructed.
What's wrong with your unpacking code?
Just for fun let's see what your sequence of transformations in get_bin_str was doing.
>>> binascii.hexlify(data_packed)
b'2800'
Um, all right. Not sure why you converted to hex digits, but now you have 4 bytes, not two. (28 is the number 40 written in hex, the 00 is for the null byte.) In the next step, you call decode and tell it that these 4 bytes are actually UTF-16; there's just enough for two unicode characters, let's take a look:
>>> b'2800'.decode("utf-16-le")
'㠲〰'
In the next step Python finally notices that something is wrong, but by then it does not make much difference because you are pretty far away from the number 40 you started with.
To correctly read your data as a UTF-16 character, call decode directly on the byte string you packed.
>>> data_packed.decode("utf-16-le")
'('
>>> ord('(')
40

base64 string length calculation when encoding unsigned integers only

I am trying to figure out estimates for how many unsigned integer numbers I can encode with 5 characters of base64, 6 characters, and so on.
Through programmatic approach I found out that I can encode
2^28 - 1 = 268,435,455
with 6 characters and
2^35 - 1 = 34,359,738,368
with 7 characters.
(-1 because I start at uint 1)
I am struggling to generalize this though, since I would assume it starts at 2^8 = 256 but I don't get how I end up at 28 and 35.
This is my implementation in Go
func Shorten(num uint64) string {
buf := make([]byte, binary.MaxVarintLen64)
n := binary.PutUvarint(buf, num)
b := buf[:n]
encoded := base64.URLEncoding.EncodeToString(b)
return strings.Replace(encoded, "=", "", -1)
}
Also
0 -> AA
128 -> gAE
16384 -> gIAB
2097152 -> gICAAQ
268435456 -> gICAgAE
So it looks like it's going up in 7 increments: 2^7, 2^14, 2^21, etc. but why 7?
A byte is 8 bits and therefore 256 possible values. Base 64 uses 64 different characters to encode and therefore is using 6 bits. so how many 8 bit objects can you fit in 6 bits? 0 if you're rounding or 3/4 if you aren't. When you start talking about encoding integers however your numbers do not appear to make sense. Are you talking about integers written in ascii? with 6 base64 characters you have 36 bits to play with so if you're talking about binary 32-bit unsigned integers you can encode one at a time but you can encode any of them that you want for 2**32 different possibilities and then 4 wasted bits. With ascii you'd have 4 characters so it would be 10000 different possibilities (0 to 9999).
You are getting unexpected results because you're using go varints which are not encoded as regular binary integers. some ipython output for you:
In [22]: base64.b64encode((128).to_bytes(1,'little'))
Out[22]: b'gA=='
because 128 can be encoded in a single 8 bit byte it is only 2 characters with some padding. and look at this:
In [3]: base64.b64decode('gAE=')
Out[3]: b'\x80\x01'
In [4]: int.from_bytes(_,'little')
Out[4]: 384
So as you can see PutUVarint isn't just encoding an integer of variable length it's encoding a variable integer, ie it has been encoded in a way that it can be decoded without knowing in advance what size it is. If you look at the source code for the varint go module it describes this process. Go is using 7 bits of each byte to hold actual integer binary data and the most significant bit is a flag as to whether or not there is more data yet to come. 128 is just the most significant bit of one byte set. So basically you're encoding twice based on the way you're accomplishing this task. If you have a given integer to encode it as a var int you need the number of bytes that the integer uses *8/7 to store the value then you base64 encode that result so you need that value *8/6 to store that. Depending on what you're doing with the base64 you can likely determine how many bytes you're playing with without needing to resort to the go varints and then the calculation would just be the 8/6 conversion (which is 4/3 I just left it in bits to match the varint process more closely.)

Python unpack binary data, numeric of length 12

I have a file with big endian binaries. There are two numeric fields. The first has length 8 and the second length 12. How can I unpack the two numbers?
I am using the Python module struct (https://docs.python.org/2/library/struct.html) and it works for the first field
num1 = struct.unpack('>Q',payload[0:8])
but I don't know how I can unpack the second number. If I treat it as char(12), then I get something like '\x00\xe3AC\x00\x00\x00\x06\x00\x00\x00\x01'.
Thanks.
I think you should create a new string of bytes for the second number of length 16, fill the last 12 bytes with the string of bytes that hold your number and first 4 ones with zeros.
Then decode the bytestring with unpack with format >QQ, let's say to numHI, numLO variables. Then, you get final number with that: number = numHI * 2^64 + numLO*. AFAIR the integers in Python can be (almost) as large as you wish, so you will have no problems with overflows. That's only rough idea, please comment if you have problems with writing that in actual Python code, I'll then edit my answer to provide more help.
*^ is in this case the math power, so please use math.pow. Alternatively, you can use byte shift: number = numHI << 64 + numLO.

str_to_a32 - What does this function do?

I need to rewrite some Python script in Objective-C. It's not that hard since Python is easily readable but this piece of code struggles me a bit.
def str_to_a32(b):
if len(b) % 4:
# pad to multiple of 4
b += '\0' * (4 - len(b) % 4)
return struct.unpack('>%dI' % (len(b) / 4), b)
What is this function supposed to do?
I'm not positive, but I'm using the documentation to take a stab at it.
Looking at the docs, we're going to return a tuple based on the format string:
Unpack the string (presumably packed by pack(fmt, ...)) according to the given format. The result is a tuple even if it contains exactly one item. The string must contain exactly the amount of data required by the format (len(string) must equal calcsize(fmt)).
The item coming in (b) is probably a byte buffer (represented as a string) - looking at the examples they are represented the the \x escape, which consumes the next two characters as hex.
It appears the format string is
'>%dI' % (len(b) / 4)
The % and %d are going to put a number into the format string, so if the length of b is 32 the format string becomes
`>8I`
The first part of the format string is >, which the documentation says is setting the byte order to big-endian and size to standard.
The I says it will be an unsigned int with size 4 (docs), and the 8 in front of it means it will be repeated 8 times.
>IIIIIIII
So I think this is saying: take this byte buffer, make sure it's a multiple of 4 by appending as many 0x00s as is necessary, then unpack that into a tuple with as many unsigned integers as there are blocks of 4 bytes in the buffer.
Looks like it's supposed to take an input array of bytes represented as a string and unpack them as big-endian (the ">") unsigned ints (the 'I') The formatting codes are explaied in http://docs.python.org/2/library/struct.html
This takes a string and converts it into a tuple of Unsigned Integers. If you look at the python struct documentation you will see how it works. In a nutshell it handles conversions between Python values and C structs represented as Python strings for handling binary data stored in files (unceremoniously copied from the link provided).
In your case, the function takes a string, b and adds some extra characters to make sure that it is the standard size of the an unsigned int (see link), and then converts it into a tuple of integers using the big endian representation of the characters. This is the '>' part. The I part says to use unsigned integers

Read 14 bit number from 2 bytes

I am trying to decode the run-length-encoding described in this specification here.
it says:
There may be 1, 2, 3, or 4 bytes per count. The first two bits of the first count byte contains 0,1,2,3 indicating that the count is contained in 1, 2,3, or 4 bytes. Then the rest of the byte (6 bits) represent the six most significant bytes of the count. The next byte, if present, represents decreasing significance
I have successfully read the first 2 bits for the length, but am unable to figure out how to get the value encoded in the next 14 bits.
heres how I got the length:
number_of_bytes = (firstbyte >> 6) + 1
It seams that the data is big endian. I have tried bit shifting and unpacking and repacking with different endiannesses bit I cant get the numbers I expect.
To get the 6 least significant bits, use
firstbyte & 0b111111
so to get a 14 bit value
((firstbyte & 0b111111) << 8) + secondbyte

Categories