Understanding RSA cell padding from textbook explaination - python

I've been doing an RSA encryption/decryption assignment for a school assignment, and I've actually gotten the whole thing working. The one thing I want to make sure I understand is the padding. The book states that after we turn the string of characters into a string of digits (A = 00, and Z = 25) we then need to determine the size of the blocks and add dummy characters to the end.
The book states:
Next, we divide this string into equally sized blocks of 2N digits,
where 2N is the largest even number such that the number 2525 ... 25
with 2N digits does not exceed n.
It doesn't tell me where it gets 25 from, so I deduced that it was the index of the last character (Z in this case) of the our actual key of characters.
So here is my Python3 implementation (fair warning it is somewhat cringe-worthy):
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
def __determineSize__(message, n):
if (n < len(alphabet) - 1):
raise Exception("n is not sufficiently large")
buffer = ""
for i in range(0, n, 2):
buffer += str(len(alphabet) - 1) #+= "25" in this case
if (int(buffer) > n):
groupSize = len(buffer) - 2
return groupSize
It starts with 25 ( len(alphabet) = 26, 26 - 1 = 25), if it is not larger than n we increase it to 2525. If it larger at this point we stop because we know we've gone to far and we return the length 2, because the length 4 is too large.
This is how I understood it, and it works but it doesn't seem right. Did I interpret this correctly or am I completely off base? If I am can someone set me straight? (I'm not asking for code, because it's for an assignment I don't want to plagerise anyone so if anyone could just tell me what I'm supposed to do in simple English or show me in pseudo code that would be great.)
Like always, thanks to everyone in advance!

Related

Numeric ID to very short unique strings

I have rather long IDs 1000000000109872 and would like to represent them as strings.
However all the libraries for Rust I've found such as hash_ids and block_id produce strings that are way bigger.
Ideally I'd like 4 to maybe 5 characters, numbers are okay but only uppercase letters. Doesn't need to be cryptographically secure as long as it's unique.
Is there anything that fits my needs?
I've tried this website: https://v2.cryptii.com/decimal/base64 and for 1000000000109872 I get 4rSw, this is very short which is great. But it's not uppercase.
This is the absolute best you can do if you want to guarantee no collisions without having any specific guarantees on the range of the inputs beyond "unsigned int" and you want it to be stateless:
def base_36(n: int) -> str:
if not isinstance(n, int):
raise TypeError("Check out https://mypy.readthedocs.io/")
if n < 0:
raise ValueError("IDs must be non-negative")
if n < 10:
return str(n)
if n < 36:
return chr(n - 10 + ord('A'))
return base_36(n // 36) + base_36(n % 36)
print(base_36(1000000000109872)) # 9UGXNOTWDS
If you're willing to avoid collisions by keeping track of id allocations, you can of course do much better:
ids: dict[int, int] = {}
def stateful_id(n: int) -> str:
return base_36(ids.setdefault(n, len(ids)))
print(stateful_id(1000000000109872)) # 0
print(stateful_id(1000000000109454)) # 1
print(stateful_id(1000000000109872)) # 0
or if some parts of the ID can be safely truncated:
MAGIC_NUMBER = 1000000000000000
def truncated_id(n: int) -> str:
if n < MAGIC_NUMBER:
raise ValueError(f"IDs must be >= {MAGIC_NUMBER}")
return base_36(n - MAGIC_NUMBER)
print(truncated_id(1000000000109872)) # 2CS0
Short Answer: Impossible.
Long Answer: You're asking to represent 10^16 digits in 36^5 (5 uppercase chars).
Actually, an uppercase/number char would be a one of 36 cases (10 numbers + 26 chars). But, 36^5 = 60,466,176 is less than 10^9, which wouldn't work.
Since 36^10 < 10^16 < 36^11, you'll need at least 11 uppercase chars to represent your (10^16) long IDs.
As you already stated that there is even a checksum inside the original ID, I assume the new representation should contain all of its data.
In this case, your question is strongly related to lossless compression and information content.
Information content says that every data contains a certain amount of information. Information can be measured in bits.
The sad news is that now matter what, you cannot magically reduce your data to less bits. It will always keep the same amount of bits. You can just change the representation to store those bits as compact as possible, but you cannot reduce the number.
You might think of jpg or compressed movies, that are stored very compact; the problem there is they are lossy. They discard information not perceived by the human eye/ear and just delete them.
In your case, there is no trickery possible. You will always have a smallest and a largest ID that you handed out. And all the IDs between your smallest and largest ID have to be distinguishable.
Now some math. If you know the amount of possible states of your data (e.g. the amount of distinguishable IDs), you can compute the required information content like this: log2(N), where N is the number of possible states.
So let's say you have 1000000 different IDs, that would mean you need log2(1000000) = 19.93 bits to represent those IDs. You will never be able to reduce this number to anything less.
Now to actually represent them: You say you want to store them in in a string of 26 different uppercase letters or 10 different digits. This is called a base36 encoding.
Each digit of this can carry log2(36) = 5.17 bits of information. Therefore, to store your 1000000 different IDs, you need at least 19.93/5.17 = 3.85 digits.
This is exactly what #Samwise's answer shows you. His answer is the mathematically most optimal way to encode this. You will never get better than his answer. And the amount if digits will always grow if the amount of possible IDs you want to represent grows. There's just no mathematical way around that.

Converting binary to decimal using only Boolean and logic comparisons

I am taking a Python Certification class and have taken two practice exams to prepare for the timed exam I will be scheduling soon. However, there is limited interaction with professors and the discussion board is mostly students. I have a question that has been on both practice exams, so I imagine it will be on the real exam as well, and I can not see to wrap my head around how to solve it. There is no way in the class to see how to solve coding problems you have gotten incorrect, which is a major disappointment as that helps me in the future. I know there are built in functions for solving binary/decimal conversions, but the professor is wanting this done using Boolean logic and numerical comparisons as we are still in the early stages of the course. If anyone could assist in "walking" through the why's of the answer I would greatly appreciate it. Thank you.
number = 1101
You may modify the lines of code above, but don't move them! When you
Submit your code, we'll change these lines to assign different values
to the variables.
The number above represents a binary number. It will always be up to
eight digits, and all eight digits will always be either 1 or 0.
The string gives the binary representation of a number. In binary,
each digit of that string corresponds to a power of
2. The far left digit represents 128, then 64, then 32, then 16, then 8, then 4, then 2, and then finally 1 at the far right.
So, to convert the number to a decimal number, you want to (for
example) add 128 to the total if the first digit is 1, 64 if the
second digit is 1, 32 if the third digit is 1, etc.
For example, 00001101 is the number 13: there is a 0 in the 128s
place, 64s place, 32s place, 16s place, and 2s place. There are 1s in
the 8s, 4s, and 1s place. 8 + 4 + 1 = 13.
Note that although we use 'if' a lot to describe this problem, this
can be done entirely boolean logic and numerical comparisons.
Print the number that results from this conversion.
number = "00001101" #in Python, leading zeros are not permitted, so use a string
total = 0 #this var will keep track of the number in decimal form
index = len(number)-1 #eg 1100 has 4 digits and the max power is 3, 2^3.
for str_digit in number: #for each digit (as a string) in the number,
#total += int(str_digit)* 2**index #add the value (0 or 1) multiplied by 2 raised to the index power
if int(str_digit): #either 'if 0' or 'if 1'
total += 2**index #add 2 raised to the index power
index -= 1 # decrease the index
print(total)
Note that the line if int(str_digit): is actually redundant if you use the commented line total += int(str_digit)* 2**index instead, but I included it because your question specified that you want to test the Boolean value.
This line is the same as if 0: or if 1: which is the same as if False: or if True:.
All you need is this:
int(number, base=2)

Python: Setting up a binary-number string converter, then indexing the result

I have a bit of an challenge before me.
Currently I'm trying to accomplish this process:
Feed a decimal, or any number really, into a binary converter
Now that we possess a binary string, we must measure the length of the string. (as in, numstr="10001010" - I want the return to count the characters and return "8")
Finally, I need to extract a section of said string, if I want to cut out the first half of the string "10001010" and save both halves, I want the return to read "1000" and "1010"
Current Progess:
newint=input("Enter a number:")
newint2= int(newint)
binStr=""
while newint2>0:
binStr= str(newint2%2) + binStr
newint2= newint2//2
print (binStr)
newint = input("Enter a binary number:")
temp=newint
power = 0
number = 0
while len(temp) > 0:
bit=int(temp[-1])
number = number + bit * 2 ** power
power+=1
temp = temp[:-1]
print(number)
//This works for integer values, how do I get it to also work for decimal values, where the integer is either there or 0 (35.45 or 0.4595)?
This is where I'm lost, I'm not sure what the best way to attempt this next step would be.
Once I convert my decimal or integer into binary representation, how can I cut my string by varying lengths? Let's say my binary representation is 100 characters, and I want to cut out lengths that are 10% the total length, so I get 10 blocks of 10 characters, or blocks that are 20% total length so I have 5 blocks of 20 characters.
Any advice is appreciated, I'm a super novice and this has been a steep challenge for me.
Strings can be divided up through slice notation.
a='101010101010'
>>>a[0]
'1'
>>>a[0:5]
'10101'
>>>a[0:int(len(a)/2)]
'101010'
That's something you should read up on, if you're getting into Python.
Here is my suggestion, based on answer from What's the best way to split a string into fixed length chunks and work with them in Python? :
def chunkstring(string, percent):
length = int(len(string) * percent / 100)
return ([string[0+i:length+i] for i in range(0, len(string), length)])
# Define a string with 100 characters
a = '0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789'
# Split this string by percentage of total length
print(chunkstring(a, 10)) # 10 %
print(chunkstring(a, 20)) # 20 %

Trying to generate numbers and divide them down but they aren't dividing to the very end(Clarified)

Sorry I'm reposting but I wasn't clear at all at what I was trying to do so hopefully this helps
I'm trying to take any number that can turn into a simplified radical(x) and then turn it into 1 of the two parts of the simplified radical. I'm trying to take x which in the case is 248 and then turn it into the number outside the square root by dividing it until it can't be divided anymore.
I'm taking a big number like for the example 248, but the way I'm trying to do it is by dividing by whole numbers until I get to the smallest number but it isn't working and I've tried for a while figuring out what's wrong and I can't. I also am horrible at python because I just learned it so I'm kinda more or so messing around. Also, not decrementing I'm dividing to get a lower number. The problem is that it doesn't pick the next smaller number that it divided but instead sticks with what it had
import math
import random
import time
x = float(248)
k = int(1000)
while True:
time.sleep(.00000000001)
y = float(x / random.randint(1, (x-1))) #Selects numbers smaller than x and then divides it by itself
g =(float(y).is_integer()) #Makes sure the divisions don't cause decimal numbers
print(int(k)) #Just showing info
if (bool(g) == 1) and ( y <= (x/10)) : #Checks if the divisions are small enough and if they are whole numbers
if int(k) >= (x) :
time.sleep(.1) #Decides what numbers are small enough to divide
if int(k) > (y) : #It also checks if the number it divided is smaller than the previous and if it is than it uses that instead, only problem is that it isn't doing that
k = y
print(int(k))
I'm trying to get the number 2(in this case) but it sometimes happens and sometimes its bigger.
Clarification: I'm trying to make a script that simplifies radical numbers and I should be getting 2 from 248.
You really need to design your algorithm before you start coding. This has several problems that cause you trouble.
This is a straight number theory application. Quit using float. You have the modulus operator to check divisibility.
Flailing around with random numbers doesn't help. Go through the integers in some methodical way.
You need to identify the largest integer whose square divides the given number. This part is relatively simply, if you don't mind a brute-force approach. I've used four test integers.
import math
for target in [248, 243, 700, 1001]:
start = int(math.sqrt(target))
for root in range(start, 0, -1): # start with the largest possible root and work down.
if target % (root*root) == 0: # is the target divisible?
break
print target, "=", root, "sqrt", target / (root*root)
Output:
248 = 2 sqrt 62
243 = 9 sqrt 3
700 = 10 sqrt 7
1001 = 1 sqrt 1001

Need help understanding a short python code

I want to emphasis that this is not a ask for completing my homework or job: I am studying the LZW algorithm for gif file compression by reading someone's code on github, and got confused by a code block here:
class DataBlock(object):
def __init__ (self):
self.bitstream = bytearray()
self.pos = 0
def encode_bits (self, num, size):
"""
Given a number *num* and a length in bits *size*, encode *num*
as a *size* length bitstring at the current position in the bitstream.
"""
string = bin(num)[2:]
string = '0'*(size - len(string)) + string
for digit in reversed(string):
if len(self.bitstream) * 8 <= self.pos:
self.bitstream.append(0)
if digit == '1':
self.bitstream[-1] |= 1 << self.pos % 8
self.pos += 1
What I cannot understand is the for loop in the function encode_bits():
for digit in reversed(string):
if len(self.bitstream) * 8 <= self.pos:
self.bitstream.append(0)
if digit == '1':
self.bitstream[-1] |= 1 << self.pos % 8
self.pos += 1
Here is my guess (depend on his comment):
The function encode_bits() will turn an input integer num into a binary string of length size (padding zeroes at left if needed) and reverse the string, and append the digits to bitstream one by one. Hence
suppose s=DataBlock(), then s.encode_bits(3, 3) would firstly turn 3 into 011 (padding a zero at left to make it length 3) and reverse it to 110, and then append 110 to self.bitstream, hence the result should be bytearray('110'). But as I run the code the result gives bytearray(b'\x03'), not as expected. Further more, \x03 is one byte, not 3 bits, conflicts with his comment, I cannot understand why?
I forgot to add that his code runs and gives correct output hence there's something wrong in my understanding.
Try looking at it this way:
You will be given a bytearray object (call it x for the moment).
How many bytes are in the object? (Obviously, it's just len(x).)
How many bits are in the object? (This is an exercise; please calculate the answer.)
Once you've done that, suppose we start with no (zero) bytes in x, i.e., x is a bytearray(b''). How many bytes do we need to add (x.append(...)) in order to store three bits? What if we want to store eight bits? What if we want to store ten bits?
Again, these are exercises. Calculate the answers and you should be enlightened.
(Incidentally, this technique, of compressing some number of sub-objects into some larger space, is called packing. In mathematics the problem is generalized, while in computers it is often more limited.)

Categories