I'm trying to write an implementation of SHA-256 in python 3. My version is supposed to take in a hexadecimal encoding and output the corresponding hash value. I've used https://en.wikipedia.org/wiki/SHA-2#Pseudocode as guide.
My function works well for most inputs but sometimes it gives an output that is only 63bits (instead of 64). My function uses 32bit binary strings.
I think I have found the problem, in the last step of the algorithm the binary addition
h4 := h4 + e (or another h-vector and corresponding letter)
yields a binary number that is too small. The last thing I do is to use hex() and I should get a string of 8 characters. In this example I only get 7.
out4 = hex(int(h4,2))[2:]
One problematic input is e5e5e5
It gives
"10110101111110101011010101101100" for h4 and "01010001000011100101001001111111" for e
so the addition gives "00000111000010010000011111101011"
and out4 = 70907eb.
What should I do in these cases?
I should get a string of 8 characters
Why do you think so? hex doesn't allow to specify the length of the output to begin with, so, for example, if the correct output is 8 bytes of zeros, hex will return 0x0 - the shortest representation possible.
I'm guessing the correct output should begin with zero, but hex is cutting it off. Use format strings to specify the length of output:
In [1]: f'{0:08x}'
Out[1]: '00000000' # lowercase hexadecimal (x) digits that must fit into at least 8 characters, prefixed with zero (08) as needed
Related
-------------------------- add new-----------------------------
Let me fill more info here:
The actual situation is that I have this LONG STRING in environment-A, and I need to copy and paste it to environment-B;
UNFORTUNATELY, envir-A and envir-B are not connected (no mutual access), so I'm thinking about a way to encode/decode to represent it, otherwise for more files I have to input the string hand by hand----which is slow and not reproducible.
Any suggestion or gadget recommend?
Many thanks!
I'm facing a weird problem to encode a SUPER LONG binaries to a simple form, like several digits.
Say, there is a long string consist of only 1 and 0, e.g. "110...011" of length 1,000 to 100,000 or even more digits, and I would like to encode this STRING to something that has fewer digits/chars. Then I need to reverse it back to original STRING.
Currently I'am trying using hex / int method in Python to 'compress' this String, and 'decompress' it back to original form.
A example would be:
1.input string : '110011110110011'
'''
def Bi_to_Hex_Int(input_str, method ):
#2to16
if method=='hex':
string= str(input_str)
input_two= string
result= hex(int(input_two,2))
#2to10
if method=='int':
string= str(input_str)
input_two= string
result= int(input_two,2)
print("input_bi length",len(str(input_two)), "\n output hex length",len(str(result)),'\n method: {}'.format(method) )
return result
res_16 =Bi_to_Hex_Int(gene , 'hex')
=='0x67b3'
res_10 =Bi_to_Hex_Int(gene , 'int')
== 26547
'''
Then I can reverse it back:
'''
def HexInt_to_bi(input_str , method):
if method =='hex':
back_two = bin(int(input_str,16))
back_two = back_two[2:]
if method =='int':
back_two = bin( int(input_str ))
back_two = back_two[2:]
print("input_hex length",len(str(input_str)), "\n output bi length",len(str(back_two)) )
return back_two
hexback_two = HexInt_to_bi(res_16, 'hex')
intback_two = HexInt_to_bi(res_10 , 'int')
'''
BUT, this does have a problem, I tried around 500 digits of String:101010...0001(500d), the best 'compressed' result is around 127 digits by hex;
So is there a better way to further 'compress' string to fewer digits?
**Say 5,000 digits of string consist of 1s&0s, compress to 50/100 something of digits/chars(even lower) ** ??
If you want it that simple, say 1 hex character compresses 4 binary characters (2 ^ 4 = 16). Compression ratio you want is about 100 / 50 times. For 50 times you need 50 binary characters to be compressed into 1 character, means you require 2 ^ 50 different characters to encode any combination. Quite a lot that is.
If you accept lower ratio, you may try base64 like described here. Its compress ratio is 6 to 1.
Otherwise you have to come up with some complex algorithm like splitting your string into blocks, looking for similar amongst them, encoding them with different symbols, building a map of those symbols, etc.
Probably it's easier to compress your string with an archivator, then return a base64 representation of the result.
If task allows, you may store the whole strings somewhere and give them short unique names, so instead of compression and decompression you have to store and retrieve strings by names.
This probably doesn't produce the absolutely shortest string you can get, but it's trivially easy using the facilities built into Python. No need to convert the characters into a binary format, the zlib compression will convert an input with only 2 different characters into something optimal.
Encoding:
import zlib
import base64
result = base64.b64encode(zlib.compress(input_str.encode()))
If the count of 0 and 1 is significant different than you can use enumerative coding to get shortest representation
If the string consists only of 0 and 1 digits, then you can pack eight digits into one byte. You will also need to keep track of how many digits there are past the last multiple of eight, since the last byte may be representing fewer than eight digits.
The output value is not including the 0's in the beginning, can someone help me fix the problem?
def bitwiseOR(P, Q):
return bin(P | Q)
bitwiseOR(0b01010111, 0b00111000)
OUTPUT: '0b1111111'
The leading zeroes are just for representation, so you can utilize Format Specification Mini-Language to display them as you wish:
Format string:
# Includes 0b prefix
0{length} Pad leading zeroes so total length is length
def bitwiseOR(P, Q, length=10):
return format(P | Q, f'#0{length}b')
x = bitwiseOR(0b01010111, 0b00111000)
# 0b01111111
print(x)
Leading zeros are a property of the string you produce, not the number. So, for example, if you're looking for a way to make the following two calls produce different results, that's not possible:1
bitwiseOR(0b01010111, 0b00111000)
bitwiseOR( 0b1010111, 0b111000)
However, if you can provide the number of digits separately, then you can do this using the format() function. It accepts a second argument which lets you customize how the number is printed out using the format spec. Based on that spec, you can print a number padded with zeros to a given width like this:
>>> format(127, '#010b')
'0b01111111'
Here the code consists of four pieces:
# means apply the 0b prefix at the beginning
0 means pad with leading zeros
10 means the total length of the resulting string should be at least 10 characters
b means to print the number in binary
You can tweak the format code to produce your desired string length, or even take the length from a variable.
1Well... technically there is a way to make Python re-read its own source code and possibly produce different results that way, but that's not useful in any real program, it's only useful if you want to learn something about how the Python interpreter works.
x = 64
var_in_bin_format = bin(64)
print(var_in_bin_format)
#Output
#0b1000000
#Desired Output -- > should always be in 8 bit format
#0b01000000
def call_another_api(var_in_bin_format):
pass
In Python, I need to call an API that expects its parameter to be always in 8 bit format regardless of the value of the decimal number?
I am not that good in bit manipulation so I am thinking if there is something I can do here?
How can I do this? I cannot use the format() function as it will convert the value into a string representation and the API that I am calling will alert me that it is not in the correct format.
Even though you say that you can't use format() because it returns a string, I'm going to post this because that's also what bin() does. bin(x) is equivalent to format(x, '#b'). I'd guess that you haven't added the '#', which means you won't have '0b' leading the value.
The Python 3 documentation for bin() actually gives a pretty strong hint about how you might do this, using format instead.
If you know that the value passed will not be negative, you can use the format string '#010b':
format(x, '#010b')
Breaking this down:
'b' means that the number will be a string binary representation.
'10' means that the entire string will be 10 characters long, two for '0b' and 8 for the value.
'0' makes it pad with '0' instead of ' '.
'#' will add the '0b' prefix, as done by bin().
Note that this assumes that the number is an integer in the range [0, 255]. Integers outside this range will generate valid representations, but will not match the format expected, and may have a leading '-'. Objects of type float can not be converted with the 'b' format code. I assume that these are not problems, given what your intended output is, but it might be a good idea to add an explicit check to throw a ValueError if the value is less than 0 or greater than 255.
If you're in Python 3.6+, you could also use f-strings:
f'{x:#010b}'
Is is not possible to convert all decimal numbers to 8 bit. You can only convert numbers from 0 to 255 in 8 bits.
Say I have 2 hex values that were taken in from keyboard input.
E.g. val1 = 0x000000 and val2 = 0xFF0000
and I do,
print(hex(val1))
print(hex(val2))
I get the right output for val2 but val1 is just 0x0, how do I get it to print the whole value?
By whole value, I mean, if the inputted value is 0x000000, I want to output the value as 0x000000 and not 0x0.
Use the format() built-in function to show the hex representation to the expected level of precision:
>>> format(53, '08x')
'00000035'
>>> format(1234567, '08x')
'0012d687'
The format code 08x means "eight lowercase hex digits padded with leading zeros".
You can pad the hex value to the specified number of digits by using the ljust method.
Whenever the string is less than the specified width, it'll append the specified filler character to extend it.
In your example example, hex(0x0).ljust(8, '0') == "0x000000".
Strings that are already long enough are preserved so that 0xFF0000 will still work.
print(hex(0x000000).ljust(8, '0')) # Prints 0x000000
print(hex(0xFF0000).ljust(8, '0')) # Prints 0xFF0000
A couple of important things to note that have bitten me in the past:
Make sure your width includes the length of the leading "0x"
This is because the ljust function operator operates on raw text and doesn't realize it's a hex string
If you give a width value shorter than you need, the strings won't be filled up enough and will have different lengths.
In other words len(hex(0xFF0000).ljust(4, '0')) != len(hex(0xFF0000).ljust(4, '0')) because you need a length of 8 characters to fit both cases
You say the user typed this input in on the keyboard. That means you already started with the strings you want, but you parsed integers out of the input and threw away the strings.
Don't do that. Keep the strings. Parse integers too if you need to do math, but keep the strings. Then you can just do
print(original_input)
without having to go through a reconstruction process and guess at how many leading zeros were originally input.
Given this example in Python
sample = '5PB37L2CH5DUDWN2SUOYE6LJPYCJBFM5N2FGVEHF7HD224UR52KB===='
a = base64.b32decode(sample)
b = base64.b32encode(a)
why is it that
sample != b ?
BUT where
sample = '5PB37L2CH5DUDWN2SUOYE6LJPYCJBFM5N2FGVEHF7HD224UR52KBAAAA'
then
sample == b
the first sample you got there is invalid base64.
taken from wiki:
When the number of bytes to encode is not divisible by 3 (that is, if there are only one or two bytes of input for the last block), then the following action is performed: Add extra bytes with value zero so there are three bytes, and perform the conversion to base64. If there was only one significant input byte, only the first two base64 digits are picked, and if there were two significant input bytes, the first three base64 digits are picked. '=' characters might be added to make the last block contain four base64 characters.
http://en.wikipedia.org/wiki/Base64#Examples
edit:
taken from RFC 4648:
Special processing is performed if fewer than 24 bits are available
at the end of the data being encoded. A full encoding quantum is
always completed at the end of a quantity. When fewer than 24 input
bits are available in an input group, bits with value zero are added
(on the right) to form an integral number of 6-bit groups. Padding
at the end of the data is performed using the '=' character.
4 times 8bits (the ='s) (at the end of your sample) is more than 24bits so they are at the least unneccessary. (not sure what datatype sample is, but find out and take it's size times number of characters divided by 24)
about your particular sample:
base-encoding reads in 24bit chunks and only needs '=' padding characters at the end of the base'd string to make whatever was left of the string after splitting it into 24bit chunks be "of size 24" so it can be parsed by the decoder.
since the ===='s at the end of your string amount to more than 24bits they are useless, hence: invalid...
First, let's be clear: your question is about base32, not base64.
Your original sample is a bit too long. There are 4 = padding at the end, meaning at least 20 bits of padding. The number of bits must be a multiple of 8 so it's really 24 bits. The encoding for B in base32 is 1, which means one of the padding bits is set. This is a violation of the spec, which says all the padding bits must be clear. The decode drops the bit completely, and the encode produces the proper value A instead of B.