Python - obtain least 4 significant bytes from SHA1 hash

Python - obtain least 4 significant bytes from SHA1 hash - python

I am trying to derive a function that will return least significant 4 bytes of a SHA1 hash.
For example, given the following SHA1 hash:
f552bfdca00b431eb0362ee5f638f0572c97475f (hex digest)
\xf5R\xbf\xdc\xa0\x0bC\x1e\xb06.\xe5\xf68\xf0W,\x97G_ (digest)
I need to derive a function that extracts the last significant 4 bytes from the hash and produces:
E456406 (hex)
239428614 (dec)
So far I have tried solutions as described in Get the x Least Significant Bits from a String in Python and Reading least significant bits in Python.
However, the resulting value is not the same.

If you want the least significant 4 bytes (8 hex characters) from a the sha1 value f552bfdca00b431eb0362ee5f638f0572c97475f use a binary mask:
sha1_value = 0xf552bfdca00b431eb0362ee5f638f0572c97475f
mask = 0xffffffff
least_four = sha1_value & mask # 0x2c97475f

Related

base64 string length calculation when encoding unsigned integers only

I am trying to figure out estimates for how many unsigned integer numbers I can encode with 5 characters of base64, 6 characters, and so on.
Through programmatic approach I found out that I can encode
2^28 - 1 = 268,435,455
with 6 characters and
2^35 - 1 = 34,359,738,368
with 7 characters.
(-1 because I start at uint 1)
I am struggling to generalize this though, since I would assume it starts at 2^8 = 256 but I don't get how I end up at 28 and 35.
This is my implementation in Go
func Shorten(num uint64) string {
buf := make([]byte, binary.MaxVarintLen64)
n := binary.PutUvarint(buf, num)
b := buf[:n]
encoded := base64.URLEncoding.EncodeToString(b)
return strings.Replace(encoded, "=", "", -1)
}
Also
0 -> AA
128 -> gAE
16384 -> gIAB
2097152 -> gICAAQ
268435456 -> gICAgAE
So it looks like it's going up in 7 increments: 2^7, 2^14, 2^21, etc. but why 7?

A byte is 8 bits and therefore 256 possible values. Base 64 uses 64 different characters to encode and therefore is using 6 bits. so how many 8 bit objects can you fit in 6 bits? 0 if you're rounding or 3/4 if you aren't. When you start talking about encoding integers however your numbers do not appear to make sense. Are you talking about integers written in ascii? with 6 base64 characters you have 36 bits to play with so if you're talking about binary 32-bit unsigned integers you can encode one at a time but you can encode any of them that you want for 2**32 different possibilities and then 4 wasted bits. With ascii you'd have 4 characters so it would be 10000 different possibilities (0 to 9999).
You are getting unexpected results because you're using go varints which are not encoded as regular binary integers. some ipython output for you:
In [22]: base64.b64encode((128).to_bytes(1,'little'))
Out[22]: b'gA=='
because 128 can be encoded in a single 8 bit byte it is only 2 characters with some padding. and look at this:
In [3]: base64.b64decode('gAE=')
Out[3]: b'\x80\x01'
In [4]: int.from_bytes(_,'little')
Out[4]: 384
So as you can see PutUVarint isn't just encoding an integer of variable length it's encoding a variable integer, ie it has been encoded in a way that it can be decoded without knowing in advance what size it is. If you look at the source code for the varint go module it describes this process. Go is using 7 bits of each byte to hold actual integer binary data and the most significant bit is a flag as to whether or not there is more data yet to come. 128 is just the most significant bit of one byte set. So basically you're encoding twice based on the way you're accomplishing this task. If you have a given integer to encode it as a var int you need the number of bytes that the integer uses *8/7 to store the value then you base64 encode that result so you need that value *8/6 to store that. Depending on what you're doing with the base64 you can likely determine how many bytes you're playing with without needing to resort to the go varints and then the calculation would just be the 8/6 conversion (which is 4/3 I just left it in bits to match the varint process more closely.)

Implementation of SHA256 in python3, final hash is too short

I'm trying to write an implementation of SHA-256 in python 3. My version is supposed to take in a hexadecimal encoding and output the corresponding hash value. I've used https://en.wikipedia.org/wiki/SHA-2#Pseudocode as guide.
My function works well for most inputs but sometimes it gives an output that is only 63bits (instead of 64). My function uses 32bit binary strings.
I think I have found the problem, in the last step of the algorithm the binary addition
h4 := h4 + e (or another h-vector and corresponding letter)
yields a binary number that is too small. The last thing I do is to use hex() and I should get a string of 8 characters. In this example I only get 7.
out4 = hex(int(h4,2))[2:]
One problematic input is e5e5e5
It gives
"10110101111110101011010101101100" for h4 and "01010001000011100101001001111111" for e
so the addition gives "00000111000010010000011111101011"
and out4 = 70907eb.
What should I do in these cases?

I should get a string of 8 characters
Why do you think so? hex doesn't allow to specify the length of the output to begin with, so, for example, if the correct output is 8 bytes of zeros, hex will return 0x0 - the shortest representation possible.
I'm guessing the correct output should begin with zero, but hex is cutting it off. Use format strings to specify the length of output:
In [1]: f'{0:08x}'
Out[1]: '00000000' # lowercase hexadecimal (x) digits that must fit into at least 8 characters, prefixed with zero (08) as needed

Why does using this code can generate a random password?

Here a snippet for generating password code,
I have 2 questions about this, Could you please share how to understand?
urandom(6), help from urandom said,return n random bytes suitable for cryptographic use, it is say, it will return 6 bytes, is it 6 of ASCII ?
ord(c) , get the decimal base for above bytes, why here transfer to decimal base?
Help for urandom:
def urandom(n): # real signature unknown; restored from __doc__
"""
urandom(n) -> str
Return n random bytes suitable for cryptographic use.
"""
return ""
Python script:
from os import urandom
letters = "ABCDEFGHJKLMNPRSTUVWXYZ"
password = "".join(letters[ord(c) % len(letters)] for c in urandom(6))

urandom will return a byte (i.e. a value between 0 and 255). The sample code uses that value and the modulo operator (%) to convert it into a value between 0 and 22, so that it can return one of the 23 letters (I, O, and Q are excluded not to be confused with numbers).
Note that it is not a perfectly balanced algorithm as it would favour the first 3 letters (A, B, and C) more, because 256 is not divisible by 23 and 256 % 23 is 3.

ord() function takes in a string containing a single character, and returns its Unicode index.
ex.
ord("A") => 65
ord("£") => 163
It is not used to get the decimal base of a byte as you mentioned, but rather its Unicode Index (its place in the Unicode Table).
P.S. :- Even though it returns the Unicode index but that doesn't mean its, range = len(Unicode Table) , the reason being that your python compiler may not support such long character sets under normal circumstances.

what does the following line of code do?

var=hashlib.md5(str(random.random())).hexdigest()[:16]
I was reading a code in python,when I came across the above code line.
can anybody explain me what the above code line does ?

The line creates a random 16 character hex string.
random.random() produces a random float value in the range [0.0, 1.0).
>>> import random
>>> random.random()
0.845295579640289
str() produces a string version of that random value.
>>> str(0.845295579640289)
'0.84529557964'
hashlib.md5() creates a MD5 message digest hash object, initialised with the string value.
>>> hashlib.md5('0.84529557964')
<md5 HASH object # 0x10074c530>
The hexdigest() method then produces the hash digest, expressed in hexadecimal. The MD5 algorithm produces a 16 bytes of information, when expressed as in hexadecimal that means 32 characters are produced:
>>> hashlib.md5('0.84529557964').hexdigest()
'5180b52225eac65bee1d6419e28ef397'
The [:16] slice picks out the first 16 characters. This step is halves the digest to just the first 16 characters out of 32:
>>> '5180b52225eac65bee1d6419e28ef397'[:16]
'5180b52225eac65b'
All in all, a rather verbose, inefficient and insecure way of producing a random 16 character hex value. I'd use os.urandom() instead, encoding to hex:
>>> import os
>>> os.urandom(8).encode('hex')
'a8cb7b56d476b556'
This produces a random 8-byte string value, which when expressed as hex, also produces 16 hex characters, entirely random.
My crypto-fu isn't that great, but I have the impression that the latter form is cryptographically stronger than taking half of a MD5 hash of a string of a floating point psuedo-random value.

md5 is Encryption-Decryption technique which generates 128 bit checksum and expressed in 32 digit Hex number in text format.
so hashlib.md5(str(random.random())).hexdigest() will give you this number in a string and
[:16] will extract first 16 Digits of that hash and will store in var
Read these References for more details .
Python Md5
Md5 Hash

base64 encode & decode questions where a != b

Given this example in Python
sample = '5PB37L2CH5DUDWN2SUOYE6LJPYCJBFM5N2FGVEHF7HD224UR52KB===='
a = base64.b32decode(sample)
b = base64.b32encode(a)
why is it that
sample != b ?
BUT where
sample = '5PB37L2CH5DUDWN2SUOYE6LJPYCJBFM5N2FGVEHF7HD224UR52KBAAAA'
then
sample == b

the first sample you got there is invalid base64.
taken from wiki:
When the number of bytes to encode is not divisible by 3 (that is, if there are only one or two bytes of input for the last block), then the following action is performed: Add extra bytes with value zero so there are three bytes, and perform the conversion to base64. If there was only one significant input byte, only the first two base64 digits are picked, and if there were two significant input bytes, the first three base64 digits are picked. '=' characters might be added to make the last block contain four base64 characters.
http://en.wikipedia.org/wiki/Base64#Examples
edit:
taken from RFC 4648:
Special processing is performed if fewer than 24 bits are available
at the end of the data being encoded. A full encoding quantum is
always completed at the end of a quantity. When fewer than 24 input
bits are available in an input group, bits with value zero are added
(on the right) to form an integral number of 6-bit groups. Padding
at the end of the data is performed using the '=' character.
4 times 8bits (the ='s) (at the end of your sample) is more than 24bits so they are at the least unneccessary. (not sure what datatype sample is, but find out and take it's size times number of characters divided by 24)
about your particular sample:
base-encoding reads in 24bit chunks and only needs '=' padding characters at the end of the base'd string to make whatever was left of the string after splitting it into 24bit chunks be "of size 24" so it can be parsed by the decoder.
since the ===='s at the end of your string amount to more than 24bits they are useless, hence: invalid...

First, let's be clear: your question is about base32, not base64.
Your original sample is a bit too long. There are 4 = padding at the end, meaning at least 20 bits of padding. The number of bits must be a multiple of 8 so it's really 24 bits. The encoding for B in base32 is 1, which means one of the padding bits is set. This is a violation of the spec, which says all the padding bits must be clear. The decode drops the bit completely, and the encode produces the proper value A instead of B.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.