I am trying to decode the run-length-encoding described in this specification here.
it says:
There may be 1, 2, 3, or 4 bytes per count. The first two bits of the first count byte contains 0,1,2,3 indicating that the count is contained in 1, 2,3, or 4 bytes. Then the rest of the byte (6 bits) represent the six most significant bytes of the count. The next byte, if present, represents decreasing significance
I have successfully read the first 2 bits for the length, but am unable to figure out how to get the value encoded in the next 14 bits.
heres how I got the length:
number_of_bytes = (firstbyte >> 6) + 1
It seams that the data is big endian. I have tried bit shifting and unpacking and repacking with different endiannesses bit I cant get the numbers I expect.
To get the 6 least significant bits, use
firstbyte & 0b111111
so to get a 14 bit value
((firstbyte & 0b111111) << 8) + secondbyte
Related
I'm trying to pack an epoch with a string & I can't figure out why the number of bytes depends on the order I do the packing in. Unfortunately I'm not getting any replies on the relevant microcontroller forum https://forum.pycom.io/topic/6761/struct
If I convert a character to bytes I get 1 byte:
>>> bytes=struct.pack('s', 'F'); print(bytes, len(bytes))
b'F' 1
If I convert an epoch to bytes I get 4 bytes:
>>> bytes=struct.pack('I', 1611017052); print(bytes, len(bytes))
b'\\+\x06`' 4
How come when I do both together I get 8 instead of 4+1=5 ??
>>> bytes=struct.pack('sI', 'F', 1611017052); print(bytes, len(bytes))
b'F\x00\x00\x00\\+\x06`' 8
but this way I get the 5 I expect?
>>> bytes=struct.pack('Is', 1611017052, 'F'); print(bytes, len(bytes))
b'\\+\x06`F' 5
why is a different packing sequence giving different numbers of bytes?
It is a question of alignment. By default, struct packs according to standard C language conventions. This means that integers which are 4 bytes long are aligned on 4 byte boundaries. The start of the structure buffer is already aligned, so putting an integer first cause no padding. Putting a character first changes the offset to 1, so 3 bytes of padding are needed to get to a 4 byte boundary for a following integer.
This sort of padding is usually desirable, as some architectures may not allow the cpu to do an unaligned integer access, and others may do so much more slowly than an aligned access.
You can override the alignment by beginning the format with =:
bytes = struct.pack('=sI', b'F', 1611017052)
I am trying to figure out estimates for how many unsigned integer numbers I can encode with 5 characters of base64, 6 characters, and so on.
Through programmatic approach I found out that I can encode
2^28 - 1 = 268,435,455
with 6 characters and
2^35 - 1 = 34,359,738,368
with 7 characters.
(-1 because I start at uint 1)
I am struggling to generalize this though, since I would assume it starts at 2^8 = 256 but I don't get how I end up at 28 and 35.
This is my implementation in Go
func Shorten(num uint64) string {
buf := make([]byte, binary.MaxVarintLen64)
n := binary.PutUvarint(buf, num)
b := buf[:n]
encoded := base64.URLEncoding.EncodeToString(b)
return strings.Replace(encoded, "=", "", -1)
}
Also
0 -> AA
128 -> gAE
16384 -> gIAB
2097152 -> gICAAQ
268435456 -> gICAgAE
So it looks like it's going up in 7 increments: 2^7, 2^14, 2^21, etc. but why 7?
A byte is 8 bits and therefore 256 possible values. Base 64 uses 64 different characters to encode and therefore is using 6 bits. so how many 8 bit objects can you fit in 6 bits? 0 if you're rounding or 3/4 if you aren't. When you start talking about encoding integers however your numbers do not appear to make sense. Are you talking about integers written in ascii? with 6 base64 characters you have 36 bits to play with so if you're talking about binary 32-bit unsigned integers you can encode one at a time but you can encode any of them that you want for 2**32 different possibilities and then 4 wasted bits. With ascii you'd have 4 characters so it would be 10000 different possibilities (0 to 9999).
You are getting unexpected results because you're using go varints which are not encoded as regular binary integers. some ipython output for you:
In [22]: base64.b64encode((128).to_bytes(1,'little'))
Out[22]: b'gA=='
because 128 can be encoded in a single 8 bit byte it is only 2 characters with some padding. and look at this:
In [3]: base64.b64decode('gAE=')
Out[3]: b'\x80\x01'
In [4]: int.from_bytes(_,'little')
Out[4]: 384
So as you can see PutUVarint isn't just encoding an integer of variable length it's encoding a variable integer, ie it has been encoded in a way that it can be decoded without knowing in advance what size it is. If you look at the source code for the varint go module it describes this process. Go is using 7 bits of each byte to hold actual integer binary data and the most significant bit is a flag as to whether or not there is more data yet to come. 128 is just the most significant bit of one byte set. So basically you're encoding twice based on the way you're accomplishing this task. If you have a given integer to encode it as a var int you need the number of bytes that the integer uses *8/7 to store the value then you base64 encode that result so you need that value *8/6 to store that. Depending on what you're doing with the base64 you can likely determine how many bytes you're playing with without needing to resort to the go varints and then the calculation would just be the 8/6 conversion (which is 4/3 I just left it in bits to match the varint process more closely.)
I am working on a program which does a lot of hashing, and in one of the steps I take a result of hashlib's ripemd160 hash and convert it into an integer. The lines are:
ripe_fruit = new('ripemd160', sha256(key.to_der()).digest())
key_hash160 = struct.unpack("<Q", ripe_fruit.digest())[0]
It gives me the error:
struct.error: unpack requires a buffer of 8 bytes
I tried changing the value to L and other things, but they didn't work. How do I fix this?
RIPEMD-160 returns 160 bits, or 20 bytes. struct doesn't know how to unpack integers larger than 8 bytes. You have two options and the right one depends on what exactly you're trying to do.
If your algorithm is looking for just some of the bytes of the hash, you can take the first or last 8 bytes and unpack those.
key_hash160 = struct.unpack("<Q", ripe_fruit.digest()[:8])[0]
If you need a 160 bytes integer, you first have to decide how that's represented. Is it little endian or big endian or something in between? Then you can break the array into 20 bytes and then calculate one number from them. Assuming little endian based on the < in your question, you can then do something like:
key_parts = struct.unpack("B" * 20, ripe_fruit.digest())
key_hash160 = 0
for b in key_parts[::-1]:
key_hash160 <<= 8
key_hash160 |= b
Given this example in Python
sample = '5PB37L2CH5DUDWN2SUOYE6LJPYCJBFM5N2FGVEHF7HD224UR52KB===='
a = base64.b32decode(sample)
b = base64.b32encode(a)
why is it that
sample != b ?
BUT where
sample = '5PB37L2CH5DUDWN2SUOYE6LJPYCJBFM5N2FGVEHF7HD224UR52KBAAAA'
then
sample == b
the first sample you got there is invalid base64.
taken from wiki:
When the number of bytes to encode is not divisible by 3 (that is, if there are only one or two bytes of input for the last block), then the following action is performed: Add extra bytes with value zero so there are three bytes, and perform the conversion to base64. If there was only one significant input byte, only the first two base64 digits are picked, and if there were two significant input bytes, the first three base64 digits are picked. '=' characters might be added to make the last block contain four base64 characters.
http://en.wikipedia.org/wiki/Base64#Examples
edit:
taken from RFC 4648:
Special processing is performed if fewer than 24 bits are available
at the end of the data being encoded. A full encoding quantum is
always completed at the end of a quantity. When fewer than 24 input
bits are available in an input group, bits with value zero are added
(on the right) to form an integral number of 6-bit groups. Padding
at the end of the data is performed using the '=' character.
4 times 8bits (the ='s) (at the end of your sample) is more than 24bits so they are at the least unneccessary. (not sure what datatype sample is, but find out and take it's size times number of characters divided by 24)
about your particular sample:
base-encoding reads in 24bit chunks and only needs '=' padding characters at the end of the base'd string to make whatever was left of the string after splitting it into 24bit chunks be "of size 24" so it can be parsed by the decoder.
since the ===='s at the end of your string amount to more than 24bits they are useless, hence: invalid...
First, let's be clear: your question is about base32, not base64.
Your original sample is a bit too long. There are 4 = padding at the end, meaning at least 20 bits of padding. The number of bits must be a multiple of 8 so it's really 24 bits. The encoding for B in base32 is 1, which means one of the padding bits is set. This is a violation of the spec, which says all the padding bits must be clear. The decode drops the bit completely, and the encode produces the proper value A instead of B.
I have been having some real trouble with this for a while. I am receiving a string of binary data in python and I am having trouble unpacking and interpreting only a 5bit subset (not an entire byte) of the data. It seems like whatever method comes to mind just simply fails miserably.
Let's say I have two bytes packed binary data, and I would like to interpret the first 10bits within the 16. How could I convert this to an 2 integers representing 5bits each?
Use bitmasks and bitshifting:
>>> example = 0x1234 # Hexadecimal example; 2 bytes, 4660 decimal.
>>> bin(example) # Show as binary digits
'0b1001000110100'
>>> example & 31 # Grab 5 most significant bits
20
>>> bin(example & 31) # Same, now represented as binary digits
'0b10100'
>>> (example >> 5) & 31 # Grab the next 5 bits (shift right 5 times first)
17
>>> bin(example >> 5 & 31)
'0b10001'
The trick here is to know that 31 is a 5-bit bitmask:
>>> bin(31)
'0b11111'
>>> 0b11111
31
>>> example & 0b11111
20
As you can see you could also just use the 0b binary number literal notation if you find that easier to work with.
See the Python Wiki on bit manipulation for more background info.