Python: Setting up a binary-number string converter, then indexing the result

Python: Setting up a binary-number string converter, then indexing the result - python

I have a bit of an challenge before me.
Currently I'm trying to accomplish this process:
Feed a decimal, or any number really, into a binary converter
Now that we possess a binary string, we must measure the length of the string. (as in, numstr="10001010" - I want the return to count the characters and return "8")
Finally, I need to extract a section of said string, if I want to cut out the first half of the string "10001010" and save both halves, I want the return to read "1000" and "1010"
Current Progess:
newint=input("Enter a number:")
newint2= int(newint)
binStr=""
while newint2>0:
binStr= str(newint2%2) + binStr
newint2= newint2//2
print (binStr)
newint = input("Enter a binary number:")
temp=newint
power = 0
number = 0
while len(temp) > 0:
bit=int(temp[-1])
number = number + bit * 2 ** power
power+=1
temp = temp[:-1]
print(number)
//This works for integer values, how do I get it to also work for decimal values, where the integer is either there or 0 (35.45 or 0.4595)?
This is where I'm lost, I'm not sure what the best way to attempt this next step would be.
Once I convert my decimal or integer into binary representation, how can I cut my string by varying lengths? Let's say my binary representation is 100 characters, and I want to cut out lengths that are 10% the total length, so I get 10 blocks of 10 characters, or blocks that are 20% total length so I have 5 blocks of 20 characters.
Any advice is appreciated, I'm a super novice and this has been a steep challenge for me.

Strings can be divided up through slice notation.
a='101010101010'
>>>a[0]
'1'
>>>a[0:5]
'10101'
>>>a[0:int(len(a)/2)]
'101010'
That's something you should read up on, if you're getting into Python.

Here is my suggestion, based on answer from What's the best way to split a string into fixed length chunks and work with them in Python? :
def chunkstring(string, percent):
length = int(len(string) * percent / 100)
return ([string[0+i:length+i] for i in range(0, len(string), length)])
# Define a string with 100 characters
a = '0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789'
# Split this string by percentage of total length
print(chunkstring(a, 10)) # 10 %
print(chunkstring(a, 20)) # 20 %

Related

Numeric ID to very short unique strings

I have rather long IDs 1000000000109872 and would like to represent them as strings.
However all the libraries for Rust I've found such as hash_ids and block_id produce strings that are way bigger.
Ideally I'd like 4 to maybe 5 characters, numbers are okay but only uppercase letters. Doesn't need to be cryptographically secure as long as it's unique.
Is there anything that fits my needs?
I've tried this website: https://v2.cryptii.com/decimal/base64 and for 1000000000109872 I get 4rSw, this is very short which is great. But it's not uppercase.

This is the absolute best you can do if you want to guarantee no collisions without having any specific guarantees on the range of the inputs beyond "unsigned int" and you want it to be stateless:
def base_36(n: int) -> str:
if not isinstance(n, int):
raise TypeError("Check out https://mypy.readthedocs.io/")
if n < 0:
raise ValueError("IDs must be non-negative")
if n < 10:
return str(n)
if n < 36:
return chr(n - 10 + ord('A'))
return base_36(n // 36) + base_36(n % 36)
print(base_36(1000000000109872)) # 9UGXNOTWDS
If you're willing to avoid collisions by keeping track of id allocations, you can of course do much better:
ids: dict[int, int] = {}
def stateful_id(n: int) -> str:
return base_36(ids.setdefault(n, len(ids)))
print(stateful_id(1000000000109872)) # 0
print(stateful_id(1000000000109454)) # 1
print(stateful_id(1000000000109872)) # 0
or if some parts of the ID can be safely truncated:
MAGIC_NUMBER = 1000000000000000
def truncated_id(n: int) -> str:
if n < MAGIC_NUMBER:
raise ValueError(f"IDs must be >= {MAGIC_NUMBER}")
return base_36(n - MAGIC_NUMBER)
print(truncated_id(1000000000109872)) # 2CS0

Short Answer: Impossible.
Long Answer: You're asking to represent 10^16 digits in 36^5 (5 uppercase chars).
Actually, an uppercase/number char would be a one of 36 cases (10 numbers + 26 chars). But, 36^5 = 60,466,176 is less than 10^9, which wouldn't work.
Since 36^10 < 10^16 < 36^11, you'll need at least 11 uppercase chars to represent your (10^16) long IDs.

As you already stated that there is even a checksum inside the original ID, I assume the new representation should contain all of its data.
In this case, your question is strongly related to lossless compression and information content.
Information content says that every data contains a certain amount of information. Information can be measured in bits.
The sad news is that now matter what, you cannot magically reduce your data to less bits. It will always keep the same amount of bits. You can just change the representation to store those bits as compact as possible, but you cannot reduce the number.
You might think of jpg or compressed movies, that are stored very compact; the problem there is they are lossy. They discard information not perceived by the human eye/ear and just delete them.
In your case, there is no trickery possible. You will always have a smallest and a largest ID that you handed out. And all the IDs between your smallest and largest ID have to be distinguishable.
Now some math. If you know the amount of possible states of your data (e.g. the amount of distinguishable IDs), you can compute the required information content like this: log2(N), where N is the number of possible states.
So let's say you have 1000000 different IDs, that would mean you need log2(1000000) = 19.93 bits to represent those IDs. You will never be able to reduce this number to anything less.
Now to actually represent them: You say you want to store them in in a string of 26 different uppercase letters or 10 different digits. This is called a base36 encoding.
Each digit of this can carry log2(36) = 5.17 bits of information. Therefore, to store your 1000000 different IDs, you need at least 19.93/5.17 = 3.85 digits.
This is exactly what #Samwise's answer shows you. His answer is the mathematically most optimal way to encode this. You will never get better than his answer. And the amount if digits will always grow if the amount of possible IDs you want to represent grows. There's just no mathematical way around that.

How to convert a floating-point number to a fixed-width string?

I tried to find this question answered, but I haven't found anything related.
I have a variable that can be in a format like 50000.45 or in a format like 0.01.
I need to write this variable in a label that is 4 digits wide.
What is the best way to fit the label showing only the most significant digits?
To better explain, I would like to have for example:
for 50000.45: 50000
for 4786.847: 4786
for 354.5342: 354.5
for 11.43566: 11.43
and for 0.014556: 0.0145
Possibly without having to do:
if ... < variable < ...:
round(variable,xx)
for all cases.

In order to convert a number into a set number of digits, you can convert the number into only decimals (aka 0 <= n <= 1), then remove the last characters. You can do it like that:
from math import log10
number = 123.456789
n_digits = 4
log = int(log10(number) + 1)
number /= 10**log # convert the number into only decimals
number = int(number*10**n_digits)/10**n_digits # keep the n first digits
number *= 10**log # multiply the number back
Or a more compact form:
from math import log10
number = 123.456789
n_digits= 4
log = int(log10(number) + 1) - n_digits
number = int(number/10**log)*10**log
[Edit] You should use Python round() function in a simpler way:
number = round(number, n_digits-int(log10(number))-1)

Why do I have to change integers to strings in order to iterate them in Python?

First of all, I have only recently started to learn Python on codeacademy.com and this is probably a very basic question, so thank you for the help and please forgive my lack of knowledge.
The function below takes positive integers as input and returns the sum of all that numbers' digits. What I don't understand, is why I have to change the type of the input into str first, and then back into integer, in order to add the numbers' digits to each other. Could someone help me out with an explanation please? The code works fine for the exercise, but I feel I am missing the big picture here.
def digit_sum(n):
num = 0
for i in str(n):
num += int(i)
return num

Integers are not sequences of digits. They are just (whole) numbers, so they can't be iterated over.
By turning the integer into a string, you created a sequence of digits (characters), and a string can be iterated over. It is no longer a number, it is now text.
See it as a representation; you could also have turned the same number into hexadecimal text, or octal text, or binary text. It would still be the same numerical value, just written down differently in text.
Iteration over a string works, and gives you single characters, which for a number means that each character is also a digit. The code takes that character and turns it back into a number with int(i).
You don't have to use that trick. You could also use maths:
def digit_sum(n):
total = 0
while n:
n, digit = divmod(n, 10)
num += digit
return num
This uses a while loop, and repeatedly divides the input number by ten (keeping the remainder) until 0 is reached. The remainders are summed, giving you the digit sum. So 1234 is turned into 123 and 4, then 12 and 3, etc.

Let's say the number 12345
So I would need 1,2,3,4,5 from the given number and then sum it up.
So how to get individuals number. One mathematical way was how #Martijn Pieters showed.
Another is to convert it into a string , and make it iterable.
This is one of the many ways to do it.
>>> sum(map(int, list(str(12345))))
15
The list() function break a string into individual letters. SO I needed a string. Once I have all numbers as individual letters, I can convert them into integers and add them up .

Pad an integer to a given length with one 0 in front and some at the end

I need to manipulate a number as follows,
inputs
1
23456
6674321
outputs
01000000
02345600
06674321
Simply it's adding a zero to in front of number and still if less than eight characters add 0s to the end. It should be a number not a string . Is there a simple way get this done without casting from string to int or int to string?
A Sagemath code I tried is as follows. It only adds zeros to the front to pad the number to 8 characters. I need to modify this as I mentioned.
for num in range(1,25):
s=randrange(2^16)
r=mod((s-1)*503,randrange(2^32-1))
print "%08d" % (r)

To pad on the right means to multiply the number by an appropriate power of 10. The power 6-floor(log(x,10)) does the job here, since you want 1000000 to not be padded.
for x in range(1, 101):
print '%08d' % (x*10^(6-floor(log(x,10))))
This assumes that x is in a range where such padding is possible at all: that is, an integer between 1 and 9999999.

Understanding RSA cell padding from textbook explaination

I've been doing an RSA encryption/decryption assignment for a school assignment, and I've actually gotten the whole thing working. The one thing I want to make sure I understand is the padding. The book states that after we turn the string of characters into a string of digits (A = 00, and Z = 25) we then need to determine the size of the blocks and add dummy characters to the end.
The book states:
Next, we divide this string into equally sized blocks of 2N digits,
where 2N is the largest even number such that the number 2525 ... 25
with 2N digits does not exceed n.
It doesn't tell me where it gets 25 from, so I deduced that it was the index of the last character (Z in this case) of the our actual key of characters.
So here is my Python3 implementation (fair warning it is somewhat cringe-worthy):
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
def __determineSize__(message, n):
if (n < len(alphabet) - 1):
raise Exception("n is not sufficiently large")
buffer = ""
for i in range(0, n, 2):
buffer += str(len(alphabet) - 1) #+= "25" in this case
if (int(buffer) > n):
groupSize = len(buffer) - 2
return groupSize
It starts with 25 ( len(alphabet) = 26, 26 - 1 = 25), if it is not larger than n we increase it to 2525. If it larger at this point we stop because we know we've gone to far and we return the length 2, because the length 4 is too large.
This is how I understood it, and it works but it doesn't seem right. Did I interpret this correctly or am I completely off base? If I am can someone set me straight? (I'm not asking for code, because it's for an assignment I don't want to plagerise anyone so if anyone could just tell me what I'm supposed to do in simple English or show me in pseudo code that would be great.)
Like always, thanks to everyone in advance!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.