I am trying to convert alphanumeric string with maximum length of 40 characters to an integer as small as possible so that we can easily save and retrieve from database. I am not aware if there is any python method existing for it or any simple algorithms we can use. To be specific my string will have only characters 0-9 and a-g. So kindly help with any suggestions on how we can uniquely convert from string to int and vice versa. I am using Python 2.7 on Cent os 6.5
This is not that difficult:
def str2int(s, chars):
i = 0
for c in reversed(s):
i *= len(chars)
i += chars.index(c)
return i
def int2str(i, chars):
s = ""
while i:
s += chars[i % len(chars)]
i //= len(chars)
return s
Example:
>>> chars = "".join(str(n) for n in range(10)) + "abcdefg"
>>> str2int("0235abg02", chars)
14354195089
>>> int2str(_, chars)
'0235abg02'
Basically if you want to encode n characters into an integer you interpret it as base-n.
There are 17 symbols in your input, so you can treat is as a base-17 number:
>>> int('aga0',17)
53924
For the reverse conversion, there are lots of solutions over here.
Improving on the above answers:
# The location of a character in the string matters.
chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
charsLen = len(chars)
def numberToStr(num):
s = ""
while num:
s = self.chars[num % charsLen] + s
num //= charsLen
return s # Or e.g. "s.zfill(10)"
Can handle strings with leading 0s:
def strToNumber(numStr):
num = 0
for i, c in enumerate(reversed(numStr)):
num += chars.index(c) * (charsLen ** i)
return num
Related
I want to convert a string (composed of alphanumeric characters) into an integer and then convert this integer back into a string:
string --> int --> string
In other words, I want to represent an alphanumeric string by an integer.
I found a working solution, which I included in the answer, but I do not think it is the best solution, and I am interested in other ideas/methods.
Please don't tag this as duplicate just because a lot of similar questions already exist, I specifically want an easy way of transforming a string into an integer and vice versa.
This should work for strings that contain alphanumeric characters, i.e. strings containing numbers and letters.
Here's what I have so far:
First define an string
m = "test123"
string -> bytes
mBytes = m.encode("utf-8")
bytes -> int
mInt = int.from_bytes(mBytes, byteorder="big")
int -> bytes
mBytes = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")
bytes -> string
m = mBytes.decode("utf-8")
All together
m = "test123"
mBytes = m.encode("utf-8")
mInt = int.from_bytes(mBytes, byteorder="big")
mBytes2 = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")
m2 = mBytes2.decode("utf-8")
print(m == m2)
Here is an identical reusable version of the above:
class BytesIntEncoder:
#staticmethod
def encode(b: bytes) -> int:
return int.from_bytes(b, byteorder='big')
#staticmethod
def decode(i: int) -> bytes:
return i.to_bytes(((i.bit_length() + 7) // 8), byteorder='big')
If you're using Python <3.6, remove the optional type annotations.
Test:
>>> s = 'Test123'
>>> b = s.encode()
>>> b
b'Test123'
>>> BytesIntEncoder.encode(b)
23755444588720691
>>> BytesIntEncoder.decode(_)
b'Test123'
>>> _.decode()
'Test123'
Recall that a string can be encoded to bytes, which can then be encoded to an integer. The encodings can then be reversed to get the bytes followed by the original string.
This encoder uses binascii to produce an identical integer encoding to the one in the answer by charel-f. I believe it to be identical because I extensively tested it.
Credit: this answer.
from binascii import hexlify, unhexlify
class BytesIntEncoder:
#staticmethod
def encode(b: bytes) -> int:
return int(hexlify(b), 16) if b != b'' else 0
#staticmethod
def decode(i: int) -> int:
return unhexlify('%x' % i) if i != 0 else b''
If you're using Python <3.6, remove the optional type annotations.
Quick test:
>>> s = 'Test123'
>>> b = s.encode()
>>> b
b'Test123'
>>> BytesIntEncoder.encode(b)
23755444588720691
>>> BytesIntEncoder.decode(_)
b'Test123'
>>> _.decode()
'Test123'
Assuming the character set is merely alphanumeric, i.e. a-z A-Z 0-9, this requires 6 bits per character. As such, using an 8-bit byte-encoding is theoretically an inefficient use of memory.
This answer converts the input bytes into a sequence of 6-bit integers. It encodes these small integers into one large integer using bitwise operations. Whether this actually translates into real-world storage efficiency is measured by sys.getsizeof, and is more likely for larger strings.
This implementation customizes the encoding for the choice of character set. If for example you were working with just string.ascii_lowercase (5 bits) rather than string.ascii_uppercase + string.digits (6 bits), the encoding would be correspondingly efficient.
Unit tests are also included.
import string
class BytesIntEncoder:
def __init__(self, chars: bytes = (string.ascii_letters + string.digits).encode()):
num_chars = len(chars)
translation = ''.join(chr(i) for i in range(1, num_chars + 1)).encode()
self._translation_table = bytes.maketrans(chars, translation)
self._reverse_translation_table = bytes.maketrans(translation, chars)
self._num_bits_per_char = (num_chars + 1).bit_length()
def encode(self, chars: bytes) -> int:
num_bits_per_char = self._num_bits_per_char
output, bit_idx = 0, 0
for chr_idx in chars.translate(self._translation_table):
output |= (chr_idx << bit_idx)
bit_idx += num_bits_per_char
return output
def decode(self, i: int) -> bytes:
maxint = (2 ** self._num_bits_per_char) - 1
output = bytes(((i >> offset) & maxint) for offset in range(0, i.bit_length(), self._num_bits_per_char))
return output.translate(self._reverse_translation_table)
# Test
import itertools
import random
import unittest
class TestBytesIntEncoder(unittest.TestCase):
chars = string.ascii_letters + string.digits
encoder = BytesIntEncoder(chars.encode())
def _test_encoding(self, b_in: bytes):
i = self.encoder.encode(b_in)
self.assertIsInstance(i, int)
b_out = self.encoder.decode(i)
self.assertIsInstance(b_out, bytes)
self.assertEqual(b_in, b_out)
# print(b_in, i)
def test_thoroughly_with_small_str(self):
for s_len in range(4):
for s in itertools.combinations_with_replacement(self.chars, s_len):
s = ''.join(s)
b_in = s.encode()
self._test_encoding(b_in)
def test_randomly_with_large_str(self):
for s_len in range(256):
num_samples = {s_len <= 16: 2 ** s_len,
16 < s_len <= 32: s_len ** 2,
s_len > 32: s_len * 2,
s_len > 64: s_len,
s_len > 128: 2}[True]
# print(s_len, num_samples)
for _ in range(num_samples):
b_in = ''.join(random.choices(self.chars, k=s_len)).encode()
self._test_encoding(b_in)
if __name__ == '__main__':
unittest.main()
Usage example:
>>> encoder = BytesIntEncoder()
>>> s = 'Test123'
>>> b = s.encode()
>>> b
b'Test123'
>>> encoder.encode(b)
3908257788270
>>> encoder.decode(_)
b'Test123'
so I needed transfer a dictionary in terms of numbers,
it may look kinda ugly but it's efficient in the way that every char (english letters) is exactly 2 numbers but it's capable of transfering any kind of unicode char
import json
myDict = {
"le key": "le Valueue",
2 : {
"heya": 1234569,
"3": 4
},
'Α α, Β β, Γ γ' : 'שלום'
}
def convertDictToNum(toBeConverted):
return int(''.join([(lambda c: c if len(c) ==2 else '0'+c )(str(ord(c) - 26)) for c in str(json.dumps(toBeConverted))]))
def loadDictFromNum(toBeDecoded):
toBeDecoded = str(toBeDecoded)
return json.loads(''.join([chr(int(toBeDecoded[cut:cut + 2]) + 26) for cut in range(0, len(toBeDecoded), 2)]))
numbersDict = convertDictToNum(myDict)
print(numbersDict)
# 9708827506817595083206088....
recoveredDict = loadDictFromNum(numbersDict)
print(recoveredDict)
# {'le key': 'le Valueue', '2': {'heya': 1234569, '3': 4}, 'Α α, Β β, Γ γ': 'שלום'}
I want to be able to generate 12 character long chain, of hexadecimal, BUT with no more than 2 identical numbers duplicate in the chain: 00 and not 000
Because, I know how to generate ALL possibilites, including 00000000000 to FFFFFFFFFFF, but I know that I won't use all those values, and because the size of the file generated with ALL possibilities is many GB long, I want to reduce the size by avoiding the not useful generated chains.
So my goal is to have results like 00A300BF8911 and not like 000300BF8911
Could you please help me to do so?
Many thanks in advance!
if you picked the same one twice, remove it from the choices for a round:
import random
hex_digits = set('0123456789ABCDEF')
result = ""
pick_from = hex_digits
for digit in range(12):
cur_digit = random.sample(hex_digits, 1)[0]
result += cur_digit
if result[-1] == cur_digit:
pick_from = hex_digits - set(cur_digit)
else:
pick_from = hex_digits
print(result)
Since the title mentions generators. Here's the above as a generator:
import random
hex_digits = set('0123456789ABCDEF')
def hexGen():
while True:
result = ""
pick_from = hex_digits
for digit in range(12):
cur_digit = random.sample(hex_digits, 1)[0]
result += cur_digit
if result[-1] == cur_digit:
pick_from = hex_digits - set(cur_digit)
else:
pick_from = hex_digits
yield result
my_hex_gen = hexGen()
counter = 0
for result in my_hex_gen:
print(result)
counter += 1
if counter > 10:
break
Results:
1ECC6A83EB14
D0897DE15E81
9C3E9028B0DE
CE74A2674AF0
9ECBD32C003D
0DF2E5DAC0FB
31C48E691C96
F33AAC2C2052
CD4CEDADD54D
40A329FF6E25
5F5D71F823A4
You could also change the while true loop to only produce a certain number of these based on a number passed into the function.
I interpret this question as, "I want to construct a rainbow table by iterating through all strings that have the following qualities. The string has a length of 12, contains only the characters 0-9 and A-F, and it never has the same character appearing three times in a row."
def iter_all_strings_without_triplicates(size, last_two_digits = (None, None)):
a,b = last_two_digits
if size == 0:
yield ""
else:
for c in "0123456789ABCDEF":
if a == b == c:
continue
else:
for rest in iter_all_strings_without_triplicates(size-1, (b,c)):
yield c + rest
for s in iter_all_strings_without_triplicates(12):
print(s)
Result:
001001001001
001001001002
001001001003
001001001004
001001001005
001001001006
001001001007
001001001008
001001001009
00100100100A
00100100100B
00100100100C
00100100100D
00100100100E
00100100100F
001001001010
001001001011
...
Note that there will be several hundred terabytes' worth of values outputted, so you aren't saving much room compared to just saving every single string, triplicates or not.
import string, random
source = string.hexdigits[:16]
result = ''
while len(result) < 12 :
idx = random.randint(0,len(source))
if len(result) < 3 or result[-1] != result[-2] or result[-1] != source[idx] :
result += source[idx]
You could extract a random sequence from a list of twice each hexadecimal digits:
digits = list('1234567890ABCDEF') * 2
random.shuffle(digits)
hex_number = ''.join(digits[:12])
If you wanted to allow shorter sequences, you could randomize that too, and left fill the blanks with zeros.
import random
digits = list('1234567890ABCDEF') * 2
random.shuffle(digits)
num_digits = random.randrange(3, 13)
hex_number = ''.join(['0'] * (12-num_digits)) + ''.join(digits[:num_digits])
print(hex_number)
You could use a generator iterating a window over the strings your current implementation yields. Sth. like (hex_str[i:i + 3] for i in range(len(hex_str) - window_size + 1)) Using len and set you could count the number of different characters in the slice. Although in your example it might be easier to just compare all 3 characters.
You can create an array from 0 to 255, and use random.sample with your list to get your list
I am not really familiar with Python yet.
I have a string like "11223300". Now I want to increase the last byte of that string ( from "00" to "FF"). I tried to convert the string into an integer ( integer=int(string,16)) and increase it, and convert it back later, but that does not work for me. Maybe one of you guys has a better idea.
string = "11223300"
counter = int(string, 16)
for i in range(255):
counter = counter + 1
IV = hex(counter)
Now I want to convert the IV from hex into a string
Thanks!
You can use format to convert int to your hex string, which will not keep the 0x prefix:
string = "11223300"
counter = int(string, 16)
for i in range(255):
counter = counter + 1
IV = format(counter, 'X')
print(IV)
Output:
112233FF
The function below takes a string and increases it by n from a given charset
For example, in the charset ['a','b','c'] :
"aaa" + 1 = "aab"
"aac" + 1 = "aba"
"acc" + 2 = "bab"
def str_increaser(mystr, charset, n_increase):
# Replaces a char in given string & index
def replace_chr_in_str(mystr, index, char):
mystr = list(mystr)
mystr[index] = char
return ''.join(mystr)
# Increases the char last (n) and possibly its left neighboor (n-1).
def local_increase(mystr, charset):
l_cs = len(charset)
# increasing 1 to last char if it's not the last char in charset (e.g. 'z' in lowercase ascii).
if (charset.index(mystr[-1]) < l_cs - 1):
mystr = replace_chr_in_str(mystr, -1, charset[charset.index(mystr[-1]) + 1])
# else, reset last char to the first one in charset and increase the its left char, using a reducted string for recursion
else:
mystr = replace_chr_in_str(mystr, -1, charset[0])
mystr = local_increase(mystr[:-1], charset) + mystr[-1]
return mystr
# Case if input = "zz...zz" in an ascii lowercase charset for instance
if (mystr == charset[-1] * len(mystr)):
print("str_increaser(): Input already max in charset")
else:
for i in range(n_increase):
mystr = local_increase(mystr, charset)
return mystr
Here's an exemple :
# In bash : $ man ascii
# charset = map(chr, range(97, 123)) + map(chr, range(65, 91))
import string
charset = string.lowercase + string.uppercase
print(str_increaser("RfZ", charset, 2)) # outputs "Rgb"
This function might be used to get all permutations in some charsets.
So are you literally wanting to change the last two bits of the string to "FF"? If so, easy
string = "11223300"
modified_string = string[:-2] + "FF"
Edit from comments
hex_str = "0x11223300"
for i in range(256):
hex_int = int(hex_str, 16)
new_int = hex_int + 0x01
print(hex(hex_int))
hex_str = str(hex(new_int))
I'm trying to make an encryption function that encrypts plaintext messages but the problem is that if i input a key too large that goes past 'Z' then it goes onto greater unicode values.
My code:
def encrypt(var1,var2):
var3 = ""
for i in range(0, len(var1)):
if ord(var1[i])>64 and ord(var1[i])<90:
var3=var3+chr((ord(var1[i])+var2))
elif ord(var1[i])+var2>90:
???
else:
continue
return(var3)
How do I get it to loop 'Z' back to 'A'. I think I have to make an if statement like this but I'm not sure what to put into it.
elif ord(var1[i])+var2>90:
???
Here is my one! Im using the modulus operator to wrap around every 26 numbers (the number of letter between a-z). I also handle upper on lowercase separately.
def encrypt(data, shift):
result = ''
for c in data:
c_num = ord(c)
# is the letter lower case a - z?
if (c_num >= ord('a')) and (c_num <= ord('z')):
# get the letter number from 0 - 26
c_num = c_num - ord('a')
# shift the number
c_num += shift
# wrap the number every 26 numbers
c_num = c_num % 26
# now increase a by the new amount
c_num += ord('a')
result += chr(c_num)
# is the letter upper case A - Z?
elif (c_num >= ord('A')) and (c_num <= ord('Z')):
# get the letter number from 0 - 26
c_num = c_num - ord('A')
# shift the number
c_num += shift
# wrap the number every 26 numbers
c_num = c_num % 26
# now increase a by the new amount
c_num += ord('A')
result += chr(c_num)
return result
encrypt('aAbB', 2)
'cCdD'
encrypt('afZz', 2)
'chBb'
Here is the code golf version using list comprehension just for fun!
def encrypt(data, shift):
return ''.join([chr(((ord(c) - ord('a') + shift) % 26) + ord('a')) if ord(c) in range(ord('a'), ord('z')+1) else chr(((ord(c) - ord('A') + shift) % 26) + ord('A')) for c in data])
A straight-forward way would be to check if you have passed beyond Z, and modify the character in that case:
...
if var1[i] >= 'A' and var1[i] <= 'Z':
translated_char = chr(ord(var1[i])+var2)
if translated_char > 'Z':
# If the resulting character is beyond Z,
# we go 26 characters back
translated_char = chr(ord(translated_char)-26)
# Append the translated character to the output string
var3 += translated_char
...
You may want to consider more descriptive variable names -- you'll thank yourself if you revisit your code after two months :-)
I would recommend using the modulus operator to do what you are wanting. In python that is the % character. In modulus math. X % Y tells us what the remainder of X / Y is. For example. 27 % 26 is 1. Using this you can get your wrap around that you want. Here is a sample bit of code to encrypt a single character
def encrypt_character( valToEncrypt, keyVal ):
# Update the character to be our standard Alphabet mapping
# A -> 0; B->1 ... Z -> 25
x = ord(valToEncrypt) - ord('A')
# Perform the Encryption
retVal = ( x + keyVal ) % 26
# Translate back to the standard ASCII mapping of the character
# for display in python and translate it back into a string
retVal = chr(retVal + ord('A'))
return retVal
# end encrypt_character
Now if we feed the character "A" Into our encryption algorithm with a key of 13 we get "N" as shown:
>>> encrypt_character("A", 13)
'N'
The decrypt algorithm is very similar, except you do subtraction instead of addtion
def decrypt_character( valToDecrypt, keyVal ):
# Update the character to be our standard Alphabet mapping
# A -> 0; B->1 ... Z -> 25
x = ord(valToDecrypt) - ord('A')
retVal = ( x - keyVal ) % 26
# Translate back to the standard ASCII mapping of the character
# for display in python and translate it back into a string
retVal = chr(retVal + ord('A'))
return retVal
To encrypt a string you can use the following function:
from re import sub
def encrypt_message( message, key ):
# Convert the message text into a plain text with all spaces and
# punctuation removed.
plainText = sub(r'[^A-Z]', '', message.upper())
cipherText = ""
charIndex = 0
# Encrypt the message 1 character at a time
while charIndex < len(plainText):
cipherText += \
encrypt_character( plainText[charIndex], key)
charIndex += 1
return cipherText
This function can be called:
>>> encrypt_message("HELLO World!", key=23)
'EBIILTLOIA'
The decryption function is very similar to the encryption function, except it calls the decrypt utility instead of the encrypt utility.
I'm new to Python, coming from Java and C. How can I increment a char? In Java or C, chars and ints are practically interchangeable, and in certain loops, it's very useful to me to be able to do increment chars, and index arrays by chars.
How can I do this in Python? It's bad enough not having a traditional for(;;) looper - is there any way I can achieve what I want to achieve without having to rethink my entire strategy?
In Python 2.x, just use the ord and chr functions:
>>> ord('c')
99
>>> ord('c') + 1
100
>>> chr(ord('c') + 1)
'd'
>>>
Python 3.x makes this more organized and interesting, due to its clear distinction between bytes and unicode. By default, a "string" is unicode, so the above works (ord receives Unicode chars and chr produces them).
But if you're interested in bytes (such as for processing some binary data stream), things are even simpler:
>>> bstr = bytes('abc', 'utf-8')
>>> bstr
b'abc'
>>> bstr[0]
97
>>> bytes([97, 98, 99])
b'abc'
>>> bytes([bstr[0] + 1, 98, 99])
b'bbc'
"bad enough not having a traditional for(;;) looper"?? What?
Are you trying to do
import string
for c in string.lowercase:
...do something with c...
Or perhaps you're using string.uppercase or string.letters?
Python doesn't have for(;;) because there are often better ways to do it. It also doesn't have character math because it's not necessary, either.
Check this: USING FOR LOOP
for a in range(5):
x='A'
val=chr(ord(x) + a)
print(val)
LOOP OUTPUT: A B C D E
I came from PHP, where you can increment char (A to B, Z to AA, AA to AB etc.) using ++ operator. I made a simple function which does the same in Python. You can also change list of chars to whatever (lowercase, uppercase, etc.) is your need.
# Increment char (a -> b, az -> ba)
def inc_char(text, chlist = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
# Unique and sort
chlist = ''.join(sorted(set(str(chlist))))
chlen = len(chlist)
if not chlen:
return ''
text = str(text)
# Replace all chars but chlist
text = re.sub('[^' + chlist + ']', '', text)
if not len(text):
return chlist[0]
# Increment
inc = ''
over = False
for i in range(1, len(text)+1):
lchar = text[-i]
pos = chlist.find(lchar) + 1
if pos < chlen:
inc = chlist[pos] + inc
over = False
break
else:
inc = chlist[0] + inc
over = True
if over:
inc += chlist[0]
result = text[0:-len(inc)] + inc
return result
There is a way to increase character using ascii_letters from string package which ascii_letters is a string that contains all English alphabet, uppercase and lowercase:
>>> from string import ascii_letters
>>> ascii_letters[ascii_letters.index('a') + 1]
'b'
>>> ascii_letters
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
Also it can be done manually;
>>> letters = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> letters[letters.index('c') + 1]
'd'
def doubleChar(str):
result = ''
for char in str:
result += char * 2
return result
print(doubleChar("amar"))
output:
aammaarr
For me i made the fallowing as a test.
string_1="abcd"
def test(string_1):
i = 0
p = ""
x = len(string_1)
while i < x:
y = (string_1)[i]
i=i+1
s = chr(ord(y) + 1)
p=p+s
print(p)
test(string_1)