Issue while trying to convert ascii characters to a binary value [duplicate] - python

Using this code to take a string and convert it to binary:
bin(reduce(lambda x, y: 256*x+y, (ord(c) for c in 'hello'), 0))
this outputs:
0b110100001100101011011000110110001101111
Which, if I put it into this site (on the right hand site) I get my message of hello back. I'm wondering what method it uses. I know I could splice apart the string of binary into 8's and then match it to the corresponding value to bin(ord(character)) or some other way. Really looking for something simpler.

For ASCII characters in the range [ -~] on Python 2:
>>> import binascii
>>> bin(int(binascii.hexlify('hello'), 16))
'0b110100001100101011011000110110001101111'
In reverse:
>>> n = int('0b110100001100101011011000110110001101111', 2)
>>> binascii.unhexlify('%x' % n)
'hello'
In Python 3.2+:
>>> bin(int.from_bytes('hello'.encode(), 'big'))
'0b110100001100101011011000110110001101111'
In reverse:
>>> n = int('0b110100001100101011011000110110001101111', 2)
>>> n.to_bytes((n.bit_length() + 7) // 8, 'big').decode()
'hello'
To support all Unicode characters in Python 3:
def text_to_bits(text, encoding='utf-8', errors='surrogatepass'):
bits = bin(int.from_bytes(text.encode(encoding, errors), 'big'))[2:]
return bits.zfill(8 * ((len(bits) + 7) // 8))
def text_from_bits(bits, encoding='utf-8', errors='surrogatepass'):
n = int(bits, 2)
return n.to_bytes((n.bit_length() + 7) // 8, 'big').decode(encoding, errors) or '\0'
Here's single-source Python 2/3 compatible version:
import binascii
def text_to_bits(text, encoding='utf-8', errors='surrogatepass'):
bits = bin(int(binascii.hexlify(text.encode(encoding, errors)), 16))[2:]
return bits.zfill(8 * ((len(bits) + 7) // 8))
def text_from_bits(bits, encoding='utf-8', errors='surrogatepass'):
n = int(bits, 2)
return int2bytes(n).decode(encoding, errors)
def int2bytes(i):
hex_string = '%x' % i
n = len(hex_string)
return binascii.unhexlify(hex_string.zfill(n + (n & 1)))
Example
>>> text_to_bits('hello')
'0110100001100101011011000110110001101111'
>>> text_from_bits('110100001100101011011000110110001101111') == u'hello'
True

Built-in only python
Here is a pure python method for simple strings, left here for posterity.
def string2bits(s=''):
return [bin(ord(x))[2:].zfill(8) for x in s]
def bits2string(b=None):
return ''.join([chr(int(x, 2)) for x in b])
s = 'Hello, World!'
b = string2bits(s)
s2 = bits2string(b)
print 'String:'
print s
print '\nList of Bits:'
for x in b:
print x
print '\nString:'
print s2
String:
Hello, World!
List of Bits:
01001000
01100101
01101100
01101100
01101111
00101100
00100000
01010111
01101111
01110010
01101100
01100100
00100001
String:
Hello, World!

I'm not sure how you think you can do it other than character-by-character -- it's inherently a character-by-character operation. There is certainly code out there to do this for you, but there is no "simpler" way than doing it character-by-character.
First, you need to strip the 0b prefix, and left-zero-pad the string so it's length is divisible by 8, to make dividing the bitstring up into characters easy:
bitstring = bitstring[2:]
bitstring = -len(bitstring) % 8 * '0' + bitstring
Then you divide the string up into blocks of eight binary digits, convert them to ASCII characters, and join them back into a string:
string_blocks = (bitstring[i:i+8] for i in range(0, len(bitstring), 8))
string = ''.join(chr(int(char, 2)) for char in string_blocks)
If you actually want to treat it as a number, you still have to account for the fact that the leftmost character will be at most seven digits long if you want to go left-to-right instead of right-to-left.

This is my way to solve your task:
str = "0b110100001100101011011000110110001101111"
str = "0" + str[2:]
message = ""
while str != "":
i = chr(int(str[:8], 2))
message = message + i
str = str[8:]
print message

if you don'y want to import any files you can use this:
with open("Test1.txt", "r") as File1:
St = (' '.join(format(ord(x), 'b') for x in File1.read()))
StrList = St.split(" ")
to convert a text file to binary.
and you can use this to convert it back to string:
StrOrgList = StrOrgMsg.split(" ")
for StrValue in StrOrgList:
if(StrValue != ""):
StrMsg += chr(int(str(StrValue),2))
print(StrMsg)
hope that is helpful, i've used this with some custom encryption to send over TCP.

Are you looking for the code to do it or understanding the algorithm?
Does this do what you need? Specifically a2b_uu and b2a_uu? There are LOTS of other options in there in case those aren't what you want.
(NOTE: Not a Python guy but this seemed like an obvious answer)

Convert binary to its equivalent character.
k=7
dec=0
new=[]
item=[x for x in input("Enter 8bit binary number with , seprator").split(",")]
for i in item:
for j in i:
if(j=="1"):
dec=2**k+dec
k=k-1
else:
k=k-1
new.append(dec)
dec=0
k=7
print(new)
for i in new:
print(chr(i),end="")

This is a spruced up version of J.F. Sebastian's. Thanks for the snippets though J.F. Sebastian.
import binascii, sys
def goodbye():
sys.exit("\n"+"*"*43+"\n\nGood Bye! Come use again!\n\n"+"*"*43+"")
while __name__=='__main__':
print "[A]scii to Binary, [B]inary to Ascii, or [E]xit:"
var1=raw_input('>>> ')
if var1=='a':
string=raw_input('String to convert:\n>>> ')
convert=bin(int(binascii.hexlify(string), 16))
i=2
truebin=[]
while i!=len(convert):
truebin.append(convert[i])
i=i+1
convert=''.join(truebin)
print '\n'+'*'*84+'\n\n'+convert+'\n\n'+'*'*84+'\n'
if var1=='b':
binary=raw_input('Binary to convert:\n>>> ')
n = int(binary, 2)
done=binascii.unhexlify('%x' % n)
print '\n'+'*'*84+'\n\n'+done+'\n\n'+'*'*84+'\n'
if var1=='e':
aus=raw_input('Are you sure? (y/n)\n>>> ')
if aus=='y':
goodbye()

Related

convert string to binary and take one's complement

I'm trying to convert string to binary and take one's complement, after that display the string again. i have seen a couples of related post such as here and here and i'm follow the official work where have been posted in here, in the below code after run the code its showing error AttributeError: 'bytes' object has no attribute 'encode'. i'm using python 3.6
the below code is :
import binascii
def text_to_bits(text, encoding='utf-8', errors='surrogatepass'):
bits = bin(int(binascii.hexlify(text.encode(encoding, errors)), 16))[2:]
return bits.zfill(8 * ((len(bits) + 7) // 8))
def text_from_bits(bits, encoding='utf-8', errors='surrogatepass'):
n = int(bits, 2)
return int2bytes(n).decode(encoding, errors)
def int2bytes(i):
hex_string = '%x' % i
n = len(hex_string)
return binascii.unhexlify(hex_string.zfill(n + (n & 1)))
your_string='hello'
b=your_string.encode('ascii', 'strict')
text_to_bits(b)
is there a way after convert it to binary to take one's complement of it and display the string again?
No need to convert your string into unicode (encode).
Your functions work very well.
Look at the code below:
your_string='hello'
#b=your_string.encode('ascii', 'strict')
b = text_to_bits(your_string)
print(b)
t = text_from_bits(b)
print(t)
Result:
0110100001100101011011000110110001101111
hello
If supporting ASCII is enough, you can do this:
a="Hello World!"
b="".join(bin(ord(x)^255)[2:] for x in a)
print(b)
c="".join(chr(int(b[x:x+8],2)^255) for x in range(0,len(b),8))
print(c)
101101111001101010010011100100111001000011011111101010001001000010001101100100111001101111011110
Hello World!
since ASCII codes are below 128, someASCII ^ 255 (the one's complement) is always going to be a 8-bit number (the most significant bit gets set). bin() prepends a 0b prefix, that is what the [2:] gets rid of.
If you need it for generic bytes, some padding magic has to be applied, like
b="".join(("0000000"+bin(ord(x))[2:])[-8:] for x in a)
I feel like you could do this easier:
st = "hello world"
my_binary = ' '.join(format(ord(x), 'b') for x in st)
print(my_binary)
original = ''.join(chr(int(X[:8], 2)) for X in my_binary.split())
print(original)
References:
Convert Binary String to String
Convert String to Binary
Then just do twos-compliment on the string with something like:
def binary_str_twos(bin_str):
twos = []
first_one = True
# twos compliment
for char in reversed(bin_str):
if char == ' ':
twos.append(char)
elif char == '1':
twos.append('1' if first_one else '0')
if first_one:
first_one = False
else:
twos.append('0' if first_one else '1')
return ''.join(reversed(twos))
Note that this isn't as efficient as working with binary only.
-- Edit --
Working in 8-bit binary without spaces:
st = "hello world"
my_binary = ''.join(format(ord(x), '08b') for x in st)
print(my_binary)
original = ''.join(chr(int(my_binary[i:i+8], 2)) for i in range(0, len(my_binary), 8))
print(original)

Python: Reversibly encode alphanumeric string to integer

I want to convert a string (composed of alphanumeric characters) into an integer and then convert this integer back into a string:
string --> int --> string
In other words, I want to represent an alphanumeric string by an integer.
I found a working solution, which I included in the answer, but I do not think it is the best solution, and I am interested in other ideas/methods.
Please don't tag this as duplicate just because a lot of similar questions already exist, I specifically want an easy way of transforming a string into an integer and vice versa.
This should work for strings that contain alphanumeric characters, i.e. strings containing numbers and letters.
Here's what I have so far:
First define an string
m = "test123"
string -> bytes
mBytes = m.encode("utf-8")
bytes -> int
mInt = int.from_bytes(mBytes, byteorder="big")
int -> bytes
mBytes = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")
bytes -> string
m = mBytes.decode("utf-8")
All together
m = "test123"
mBytes = m.encode("utf-8")
mInt = int.from_bytes(mBytes, byteorder="big")
mBytes2 = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")
m2 = mBytes2.decode("utf-8")
print(m == m2)
Here is an identical reusable version of the above:
class BytesIntEncoder:
#staticmethod
def encode(b: bytes) -> int:
return int.from_bytes(b, byteorder='big')
#staticmethod
def decode(i: int) -> bytes:
return i.to_bytes(((i.bit_length() + 7) // 8), byteorder='big')
If you're using Python <3.6, remove the optional type annotations.
Test:
>>> s = 'Test123'
>>> b = s.encode()
>>> b
b'Test123'
>>> BytesIntEncoder.encode(b)
23755444588720691
>>> BytesIntEncoder.decode(_)
b'Test123'
>>> _.decode()
'Test123'
Recall that a string can be encoded to bytes, which can then be encoded to an integer. The encodings can then be reversed to get the bytes followed by the original string.
This encoder uses binascii to produce an identical integer encoding to the one in the answer by charel-f. I believe it to be identical because I extensively tested it.
Credit: this answer.
from binascii import hexlify, unhexlify
class BytesIntEncoder:
#staticmethod
def encode(b: bytes) -> int:
return int(hexlify(b), 16) if b != b'' else 0
#staticmethod
def decode(i: int) -> int:
return unhexlify('%x' % i) if i != 0 else b''
If you're using Python <3.6, remove the optional type annotations.
Quick test:
>>> s = 'Test123'
>>> b = s.encode()
>>> b
b'Test123'
>>> BytesIntEncoder.encode(b)
23755444588720691
>>> BytesIntEncoder.decode(_)
b'Test123'
>>> _.decode()
'Test123'
Assuming the character set is merely alphanumeric, i.e. a-z A-Z 0-9, this requires 6 bits per character. As such, using an 8-bit byte-encoding is theoretically an inefficient use of memory.
This answer converts the input bytes into a sequence of 6-bit integers. It encodes these small integers into one large integer using bitwise operations. Whether this actually translates into real-world storage efficiency is measured by sys.getsizeof, and is more likely for larger strings.
This implementation customizes the encoding for the choice of character set. If for example you were working with just string.ascii_lowercase (5 bits) rather than string.ascii_uppercase + string.digits (6 bits), the encoding would be correspondingly efficient.
Unit tests are also included.
import string
class BytesIntEncoder:
def __init__(self, chars: bytes = (string.ascii_letters + string.digits).encode()):
num_chars = len(chars)
translation = ''.join(chr(i) for i in range(1, num_chars + 1)).encode()
self._translation_table = bytes.maketrans(chars, translation)
self._reverse_translation_table = bytes.maketrans(translation, chars)
self._num_bits_per_char = (num_chars + 1).bit_length()
def encode(self, chars: bytes) -> int:
num_bits_per_char = self._num_bits_per_char
output, bit_idx = 0, 0
for chr_idx in chars.translate(self._translation_table):
output |= (chr_idx << bit_idx)
bit_idx += num_bits_per_char
return output
def decode(self, i: int) -> bytes:
maxint = (2 ** self._num_bits_per_char) - 1
output = bytes(((i >> offset) & maxint) for offset in range(0, i.bit_length(), self._num_bits_per_char))
return output.translate(self._reverse_translation_table)
# Test
import itertools
import random
import unittest
class TestBytesIntEncoder(unittest.TestCase):
chars = string.ascii_letters + string.digits
encoder = BytesIntEncoder(chars.encode())
def _test_encoding(self, b_in: bytes):
i = self.encoder.encode(b_in)
self.assertIsInstance(i, int)
b_out = self.encoder.decode(i)
self.assertIsInstance(b_out, bytes)
self.assertEqual(b_in, b_out)
# print(b_in, i)
def test_thoroughly_with_small_str(self):
for s_len in range(4):
for s in itertools.combinations_with_replacement(self.chars, s_len):
s = ''.join(s)
b_in = s.encode()
self._test_encoding(b_in)
def test_randomly_with_large_str(self):
for s_len in range(256):
num_samples = {s_len <= 16: 2 ** s_len,
16 < s_len <= 32: s_len ** 2,
s_len > 32: s_len * 2,
s_len > 64: s_len,
s_len > 128: 2}[True]
# print(s_len, num_samples)
for _ in range(num_samples):
b_in = ''.join(random.choices(self.chars, k=s_len)).encode()
self._test_encoding(b_in)
if __name__ == '__main__':
unittest.main()
Usage example:
>>> encoder = BytesIntEncoder()
>>> s = 'Test123'
>>> b = s.encode()
>>> b
b'Test123'
>>> encoder.encode(b)
3908257788270
>>> encoder.decode(_)
b'Test123'
so I needed transfer a dictionary in terms of numbers,
it may look kinda ugly but it's efficient in the way that every char (english letters) is exactly 2 numbers but it's capable of transfering any kind of unicode char
import json
myDict = {
"le key": "le Valueue",
2 : {
"heya": 1234569,
"3": 4
},
'Α α, Β β, Γ γ' : 'שלום'
}
def convertDictToNum(toBeConverted):
return int(''.join([(lambda c: c if len(c) ==2 else '0'+c )(str(ord(c) - 26)) for c in str(json.dumps(toBeConverted))]))
def loadDictFromNum(toBeDecoded):
toBeDecoded = str(toBeDecoded)
return json.loads(''.join([chr(int(toBeDecoded[cut:cut + 2]) + 26) for cut in range(0, len(toBeDecoded), 2)]))
numbersDict = convertDictToNum(myDict)
print(numbersDict)
# 9708827506817595083206088....
recoveredDict = loadDictFromNum(numbersDict)
print(recoveredDict)
# {'le key': 'le Valueue', '2': {'heya': 1234569, '3': 4}, 'Α α, Β β, Γ γ': 'שלום'}

Integer to string conversion of fixed length [duplicate]

How do I pad a numeric string with zeroes to the left, so that the string has a specific length?
To pad strings:
>>> n = '4'
>>> print(n.zfill(3))
004
To pad numbers:
>>> n = 4
>>> print(f'{n:03}') # Preferred method, python >= 3.6
004
>>> print('%03d' % n)
004
>>> print(format(n, '03')) # python >= 2.6
004
>>> print('{0:03d}'.format(n)) # python >= 2.6 + python 3
004
>>> print('{foo:03d}'.format(foo=n)) # python >= 2.6 + python 3
004
>>> print('{:03d}'.format(n)) # python >= 2.7 + python3
004
String formatting documentation.
Just use the rjust method of the string object.
This example creates a 10-character length string, padding as necessary:
>>> s = 'test'
>>> s.rjust(10, '0')
>>> '000000test'
Besides zfill, you can use general string formatting:
print(f'{number:05d}') # (since Python 3.6), or
print('{:05d}'.format(number)) # or
print('{0:05d}'.format(number)) # or (explicit 0th positional arg. selection)
print('{n:05d}'.format(n=number)) # or (explicit `n` keyword arg. selection)
print(format(number, '05d'))
Documentation for string formatting and f-strings.
For Python 3.6+ using f-strings:
>>> i = 1
>>> f"{i:0>2}" # Works for both numbers and strings.
'01'
>>> f"{i:02}" # Works only for numbers.
'01'
For Python 2.6 to Python 3.5:
>>> "{:0>2}".format("1") # Works for both numbers and strings.
'01'
>>> "{:02}".format(1) # Works only for numbers.
'01'
Those standard format specifiers are [[fill]align][minimumwidth] and [0][minimumwidth].
>>> '99'.zfill(5)
'00099'
>>> '99'.rjust(5,'0')
'00099'
if you want the opposite:
>>> '99'.ljust(5,'0')
'99000'
str(n).zfill(width) will work with strings, ints, floats... and is Python 2.x and 3.x compatible:
>>> n = 3
>>> str(n).zfill(5)
'00003'
>>> n = '3'
>>> str(n).zfill(5)
'00003'
>>> n = '3.0'
>>> str(n).zfill(5)
'003.0'
What is the most pythonic way to pad a numeric string with zeroes to the left, i.e., so the numeric string has a specific length?
str.zfill is specifically intended to do this:
>>> '1'.zfill(4)
'0001'
Note that it is specifically intended to handle numeric strings as requested, and moves a + or - to the beginning of the string:
>>> '+1'.zfill(4)
'+001'
>>> '-1'.zfill(4)
'-001'
Here's the help on str.zfill:
>>> help(str.zfill)
Help on method_descriptor:
zfill(...)
S.zfill(width) -> str
Pad a numeric string S with zeros on the left, to fill a field
of the specified width. The string S is never truncated.
Performance
This is also the most performant of alternative methods:
>>> min(timeit.repeat(lambda: '1'.zfill(4)))
0.18824880896136165
>>> min(timeit.repeat(lambda: '1'.rjust(4, '0')))
0.2104538488201797
>>> min(timeit.repeat(lambda: f'{1:04}'))
0.32585487607866526
>>> min(timeit.repeat(lambda: '{:04}'.format(1)))
0.34988890308886766
To best compare apples to apples for the % method (note it is actually slower), which will otherwise pre-calculate:
>>> min(timeit.repeat(lambda: '1'.zfill(0 or 4)))
0.19728074967861176
>>> min(timeit.repeat(lambda: '%04d' % (0 or 1)))
0.2347015216946602
Implementation
With a little digging, I found the implementation of the zfill method in Objects/stringlib/transmogrify.h:
static PyObject *
stringlib_zfill(PyObject *self, PyObject *args)
{
Py_ssize_t fill;
PyObject *s;
char *p;
Py_ssize_t width;
if (!PyArg_ParseTuple(args, "n:zfill", &width))
return NULL;
if (STRINGLIB_LEN(self) >= width) {
return return_self(self);
}
fill = width - STRINGLIB_LEN(self);
s = pad(self, fill, 0, '0');
if (s == NULL)
return NULL;
p = STRINGLIB_STR(s);
if (p[fill] == '+' || p[fill] == '-') {
/* move sign to beginning of string */
p[0] = p[fill];
p[fill] = '0';
}
return s;
}
Let's walk through this C code.
It first parses the argument positionally, meaning it doesn't allow keyword arguments:
>>> '1'.zfill(width=4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: zfill() takes no keyword arguments
It then checks if it's the same length or longer, in which case it returns the string.
>>> '1'.zfill(0)
'1'
zfill calls pad (this pad function is also called by ljust, rjust, and center as well). This basically copies the contents into a new string and fills in the padding.
static inline PyObject *
pad(PyObject *self, Py_ssize_t left, Py_ssize_t right, char fill)
{
PyObject *u;
if (left < 0)
left = 0;
if (right < 0)
right = 0;
if (left == 0 && right == 0) {
return return_self(self);
}
u = STRINGLIB_NEW(NULL, left + STRINGLIB_LEN(self) + right);
if (u) {
if (left)
memset(STRINGLIB_STR(u), fill, left);
memcpy(STRINGLIB_STR(u) + left,
STRINGLIB_STR(self),
STRINGLIB_LEN(self));
if (right)
memset(STRINGLIB_STR(u) + left + STRINGLIB_LEN(self),
fill, right);
}
return u;
}
After calling pad, zfill moves any originally preceding + or - to the beginning of the string.
Note that for the original string to actually be numeric is not required:
>>> '+foo'.zfill(10)
'+000000foo'
>>> '-foo'.zfill(10)
'-000000foo'
For the ones who came here to understand and not just a quick answer.
I do these especially for time strings:
hour = 4
minute = 3
"{:0>2}:{:0>2}".format(hour,minute)
# prints 04:03
"{:0>3}:{:0>5}".format(hour,minute)
# prints '004:00003'
"{:0<3}:{:0<5}".format(hour,minute)
# prints '400:30000'
"{:$<3}:{:#<5}".format(hour,minute)
# prints '4$$:3####'
"0" symbols what to replace with the "2" padding characters, the default is an empty space
">" symbols allign all the 2 "0" character to the left of the string
":" symbols the format_spec
When using Python >= 3.6, the cleanest way is to use f-strings with string formatting:
>>> s = f"{1:08}" # inline with int
>>> s
'00000001'
>>> s = f"{'1':0>8}" # inline with str
>>> s
'00000001'
>>> n = 1
>>> s = f"{n:08}" # int variable
>>> s
'00000001'
>>> c = "1"
>>> s = f"{c:0>8}" # str variable
>>> s
'00000001'
I would prefer formatting with an int, since only then the sign is handled correctly:
>>> f"{-1:08}"
'-0000001'
>>> f"{1:+08}"
'+0000001'
>>> f"{'-1':0>8}"
'000000-1'
For numbers:
i = 12
print(f"{i:05d}")
Output
00012
width = 10
x = 5
print "%0*d" % (width, x)
> 0000000005
See the print documentation for all the exciting details!
Update for Python 3.x (7.5 years later)
That last line should now be:
print("%0*d" % (width, x))
I.e. print() is now a function, not a statement. Note that I still prefer the Old School printf() style because, IMNSHO, it reads better, and because, um, I've been using that notation since January, 1980. Something ... old dogs .. something something ... new tricks.
I am adding how to use a int from a length of a string within an f-string because it didn't appear to be covered:
>>> pad_number = len("this_string")
11
>>> s = f"{1:0{pad_number}}" }
>>> s
'00000000001'
For zip codes saved as integers:
>>> a = 6340
>>> b = 90210
>>> print '%05d' % a
06340
>>> print '%05d' % b
90210
Quick timing comparison:
setup = '''
from random import randint
def test_1():
num = randint(0,1000000)
return str(num).zfill(7)
def test_2():
num = randint(0,1000000)
return format(num, '07')
def test_3():
num = randint(0,1000000)
return '{0:07d}'.format(num)
def test_4():
num = randint(0,1000000)
return format(num, '07d')
def test_5():
num = randint(0,1000000)
return '{:07d}'.format(num)
def test_6():
num = randint(0,1000000)
return '{x:07d}'.format(x=num)
def test_7():
num = randint(0,1000000)
return str(num).rjust(7, '0')
'''
import timeit
print timeit.Timer("test_1()", setup=setup).repeat(3, 900000)
print timeit.Timer("test_2()", setup=setup).repeat(3, 900000)
print timeit.Timer("test_3()", setup=setup).repeat(3, 900000)
print timeit.Timer("test_4()", setup=setup).repeat(3, 900000)
print timeit.Timer("test_5()", setup=setup).repeat(3, 900000)
print timeit.Timer("test_6()", setup=setup).repeat(3, 900000)
print timeit.Timer("test_7()", setup=setup).repeat(3, 900000)
> [2.281613943830961, 2.2719342631547077, 2.261691106209631]
> [2.311480238815406, 2.318420542148333, 2.3552384305184493]
> [2.3824197456864304, 2.3457239951596485, 2.3353268829498646]
> [2.312442972404032, 2.318053102249902, 2.3054072168069872]
> [2.3482314132374853, 2.3403386400002475, 2.330108825844775]
> [2.424549090688892, 2.4346475296851438, 2.429691196530058]
> [2.3259756401716487, 2.333549212826732, 2.32049893822186]
I've made different tests of different repetitions. The differences are not huge, but in all tests, the zfill solution was fastest.
If you're looking to pad an integer, and limit the significant figures at the same time (with f strings):
a = 4.432
>> 4.432
a = f'{a:04.1f}'
>> '04.4'
f'{a:04.1f}' this translates to 1 decimal/(float) point, left pad the digit until 4 characters total.
Its ok too:
h = 2
m = 7
s = 3
print("%02d:%02d:%02d" % (h, m, s))
so output will be: "02:07:03"
You could also repeat "0", prepend it to str(n) and get the rightmost width slice. Quick and dirty little expression.
def pad_left(n, width, pad="0"):
return ((pad * width) + str(n))[-width:]
Another approach would be to use a list comprehension with a condition checking for lengths. Below is a demonstration:
# input list of strings that we want to prepend zeros
In [71]: list_of_str = ["101010", "10101010", "11110", "0000"]
# prepend zeros to make each string to length 8, if length of string is less than 8
In [83]: ["0"*(8-len(s)) + s if len(s) < desired_len else s for s in list_of_str]
Out[83]: ['00101010', '10101010', '00011110', '00000000']
I made a function :
def PadNumber(number, n_pad, add_prefix=None):
number_str = str(number)
paded_number = number_str.zfill(n_pad)
if add_prefix:
paded_number = add_prefix+paded_number
print(paded_number)
PadNumber(99, 4)
PadNumber(1011, 8, "b'")
PadNumber('7BEF', 6, "#")
The output :
0099
b'00001011
#007BEF

Python efficient obfuscation of string

I need to obfuscate lines of Unicode text to slow down those who may want to extract them. Ideally this would be done with a built in Python module or a small add-on library; the string length will be the same or less than the original; and the "unobfuscation" be as fast as possible.
I have tried various character swaps and XOR routines, but they are slow. Base64 and hex encoding increase the size considerably. To date the most efficient method I've found is compressing with zlib at the lowest setting (1). Is there a better way?
How about the old ROT13 trick?
Python 3:
>>> import codecs
>>> x = 'some string'
>>> y = codecs.encode(x, 'rot13')
>>> y
'fbzr fgevat'
>>> codecs.decode(y, 'rot13')
u'some string'
Python 2:
>>> x = 'some string'
>>> y = x.encode('rot13')
>>> y
'fbzr fgevat'
>>> y.decode('rot13')
u'some string'
For a unicode string:
>>> x = u'國碼'
>>> print x
國碼
>>> y = x.encode('unicode-escape').encode('rot13')
>>> print y
\h570o\h78op
>>> print y.decode('rot13').decode('unicode-escape')
國碼
This uses a simple, fast encryption scheme on bytes objects.
# For Python 3 - strings are Unicode, print is a function
def obfuscate(byt):
# Use same function in both directions. Input and output are bytes
# objects.
mask = b'keyword'
lmask = len(mask)
return bytes(c ^ mask[i % lmask] for i, c in enumerate(byt))
def test(s):
data = obfuscate(s.encode())
print(len(s), len(data), data)
newdata = obfuscate(data).decode()
print(newdata == s)
simple_string = 'Just plain ASCII'
unicode_string = ('sensei = \N{HIRAGANA LETTER SE}\N{HIRAGANA LETTER N}'
'\N{HIRAGANA LETTER SE}\N{HIRAGANA LETTER I}')
test(simple_string)
test(unicode_string)
Python 2 version:
# For Python 2
mask = 'keyword'
nmask = [ord(c) for c in mask]
lmask = len(mask)
def obfuscate(s):
# Use same function in both directions. Input and output are
# Python 2 strings, ASCII only.
return ''.join([chr(ord(c) ^ nmask[i % lmask])
for i, c in enumerate(s)])
def test(s):
data = obfuscate(s.encode('utf-8'))
print len(s), len(data), repr(data)
newdata = obfuscate(data).decode('utf-8')
print newdata == s
simple_string = u'Just plain ASCII'
unicode_string = (u'sensei = \N{HIRAGANA LETTER SE}\N{HIRAGANA LETTER N}'
'\N{HIRAGANA LETTER SE}\N{HIRAGANA LETTER I}')
test(simple_string)
test(unicode_string)
It depends on the size of your input, if it's over 1K then using numpy is about 60x faster (runs in less than 2% of the naïve Python code).
import time
import numpy as np
mask = b'We are the knights who say "Ni"!'
mask_length = len(mask)
def mask_python(val: bytes) -> bytes:
return bytes(c ^ mask[i % mask_length] for i, c in enumerate(val))
def mask_numpy(val: bytes) -> bytes:
arr = np.frombuffer(val, dtype=np.int8)
length = len(value)
np_mask = np.tile(np.frombuffer(mask, dtype=np.int8), round(length/mask_length+0.5))[:length]
masked = arr ^ np_mask
return masked.tobytes()
value = b'0123456789'
for i in range(9):
start_py = time.perf_counter()
masked_py = mask_python(value)
end_py = time.perf_counter()
start_np = time.perf_counter()
masked_np = mask_numpy(value)
end_np = time.perf_counter()
assert masked_py == masked_np
print(f"{i+1} {len(value)} {end_py-start_py} {end_np-start_np}")
value = value * 10
Note: I'm a novice with numpy, if anyone has any comments on my code I would be very happy to hear about it in comments.
use codecs with hex encoding , like :
>>> codecs.encode(b'test/jimmy', 'hex')
b'746573742f6a696d6d79'
>>> codecs.decode(b'746573742f6a696d6d79', 'hex')
b'test/jimmy'

How do I pad a string with zeroes?

How do I pad a numeric string with zeroes to the left, so that the string has a specific length?
To pad strings:
>>> n = '4'
>>> print(n.zfill(3))
004
To pad numbers:
>>> n = 4
>>> print(f'{n:03}') # Preferred method, python >= 3.6
004
>>> print('%03d' % n)
004
>>> print(format(n, '03')) # python >= 2.6
004
>>> print('{0:03d}'.format(n)) # python >= 2.6 + python 3
004
>>> print('{foo:03d}'.format(foo=n)) # python >= 2.6 + python 3
004
>>> print('{:03d}'.format(n)) # python >= 2.7 + python3
004
String formatting documentation.
Just use the rjust method of the string object.
This example creates a 10-character length string, padding as necessary:
>>> s = 'test'
>>> s.rjust(10, '0')
>>> '000000test'
Besides zfill, you can use general string formatting:
print(f'{number:05d}') # (since Python 3.6), or
print('{:05d}'.format(number)) # or
print('{0:05d}'.format(number)) # or (explicit 0th positional arg. selection)
print('{n:05d}'.format(n=number)) # or (explicit `n` keyword arg. selection)
print(format(number, '05d'))
Documentation for string formatting and f-strings.
For Python 3.6+ using f-strings:
>>> i = 1
>>> f"{i:0>2}" # Works for both numbers and strings.
'01'
>>> f"{i:02}" # Works only for numbers.
'01'
For Python 2.6 to Python 3.5:
>>> "{:0>2}".format("1") # Works for both numbers and strings.
'01'
>>> "{:02}".format(1) # Works only for numbers.
'01'
Those standard format specifiers are [[fill]align][minimumwidth] and [0][minimumwidth].
>>> '99'.zfill(5)
'00099'
>>> '99'.rjust(5,'0')
'00099'
if you want the opposite:
>>> '99'.ljust(5,'0')
'99000'
str(n).zfill(width) will work with strings, ints, floats... and is Python 2.x and 3.x compatible:
>>> n = 3
>>> str(n).zfill(5)
'00003'
>>> n = '3'
>>> str(n).zfill(5)
'00003'
>>> n = '3.0'
>>> str(n).zfill(5)
'003.0'
What is the most pythonic way to pad a numeric string with zeroes to the left, i.e., so the numeric string has a specific length?
str.zfill is specifically intended to do this:
>>> '1'.zfill(4)
'0001'
Note that it is specifically intended to handle numeric strings as requested, and moves a + or - to the beginning of the string:
>>> '+1'.zfill(4)
'+001'
>>> '-1'.zfill(4)
'-001'
Here's the help on str.zfill:
>>> help(str.zfill)
Help on method_descriptor:
zfill(...)
S.zfill(width) -> str
Pad a numeric string S with zeros on the left, to fill a field
of the specified width. The string S is never truncated.
Performance
This is also the most performant of alternative methods:
>>> min(timeit.repeat(lambda: '1'.zfill(4)))
0.18824880896136165
>>> min(timeit.repeat(lambda: '1'.rjust(4, '0')))
0.2104538488201797
>>> min(timeit.repeat(lambda: f'{1:04}'))
0.32585487607866526
>>> min(timeit.repeat(lambda: '{:04}'.format(1)))
0.34988890308886766
To best compare apples to apples for the % method (note it is actually slower), which will otherwise pre-calculate:
>>> min(timeit.repeat(lambda: '1'.zfill(0 or 4)))
0.19728074967861176
>>> min(timeit.repeat(lambda: '%04d' % (0 or 1)))
0.2347015216946602
Implementation
With a little digging, I found the implementation of the zfill method in Objects/stringlib/transmogrify.h:
static PyObject *
stringlib_zfill(PyObject *self, PyObject *args)
{
Py_ssize_t fill;
PyObject *s;
char *p;
Py_ssize_t width;
if (!PyArg_ParseTuple(args, "n:zfill", &width))
return NULL;
if (STRINGLIB_LEN(self) >= width) {
return return_self(self);
}
fill = width - STRINGLIB_LEN(self);
s = pad(self, fill, 0, '0');
if (s == NULL)
return NULL;
p = STRINGLIB_STR(s);
if (p[fill] == '+' || p[fill] == '-') {
/* move sign to beginning of string */
p[0] = p[fill];
p[fill] = '0';
}
return s;
}
Let's walk through this C code.
It first parses the argument positionally, meaning it doesn't allow keyword arguments:
>>> '1'.zfill(width=4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: zfill() takes no keyword arguments
It then checks if it's the same length or longer, in which case it returns the string.
>>> '1'.zfill(0)
'1'
zfill calls pad (this pad function is also called by ljust, rjust, and center as well). This basically copies the contents into a new string and fills in the padding.
static inline PyObject *
pad(PyObject *self, Py_ssize_t left, Py_ssize_t right, char fill)
{
PyObject *u;
if (left < 0)
left = 0;
if (right < 0)
right = 0;
if (left == 0 && right == 0) {
return return_self(self);
}
u = STRINGLIB_NEW(NULL, left + STRINGLIB_LEN(self) + right);
if (u) {
if (left)
memset(STRINGLIB_STR(u), fill, left);
memcpy(STRINGLIB_STR(u) + left,
STRINGLIB_STR(self),
STRINGLIB_LEN(self));
if (right)
memset(STRINGLIB_STR(u) + left + STRINGLIB_LEN(self),
fill, right);
}
return u;
}
After calling pad, zfill moves any originally preceding + or - to the beginning of the string.
Note that for the original string to actually be numeric is not required:
>>> '+foo'.zfill(10)
'+000000foo'
>>> '-foo'.zfill(10)
'-000000foo'
For the ones who came here to understand and not just a quick answer.
I do these especially for time strings:
hour = 4
minute = 3
"{:0>2}:{:0>2}".format(hour,minute)
# prints 04:03
"{:0>3}:{:0>5}".format(hour,minute)
# prints '004:00003'
"{:0<3}:{:0<5}".format(hour,minute)
# prints '400:30000'
"{:$<3}:{:#<5}".format(hour,minute)
# prints '4$$:3####'
"0" symbols what to replace with the "2" padding characters, the default is an empty space
">" symbols allign all the 2 "0" character to the left of the string
":" symbols the format_spec
When using Python >= 3.6, the cleanest way is to use f-strings with string formatting:
>>> s = f"{1:08}" # inline with int
>>> s
'00000001'
>>> s = f"{'1':0>8}" # inline with str
>>> s
'00000001'
>>> n = 1
>>> s = f"{n:08}" # int variable
>>> s
'00000001'
>>> c = "1"
>>> s = f"{c:0>8}" # str variable
>>> s
'00000001'
I would prefer formatting with an int, since only then the sign is handled correctly:
>>> f"{-1:08}"
'-0000001'
>>> f"{1:+08}"
'+0000001'
>>> f"{'-1':0>8}"
'000000-1'
For numbers:
i = 12
print(f"{i:05d}")
Output
00012
width = 10
x = 5
print "%0*d" % (width, x)
> 0000000005
See the print documentation for all the exciting details!
Update for Python 3.x (7.5 years later)
That last line should now be:
print("%0*d" % (width, x))
I.e. print() is now a function, not a statement. Note that I still prefer the Old School printf() style because, IMNSHO, it reads better, and because, um, I've been using that notation since January, 1980. Something ... old dogs .. something something ... new tricks.
I am adding how to use a int from a length of a string within an f-string because it didn't appear to be covered:
>>> pad_number = len("this_string")
11
>>> s = f"{1:0{pad_number}}" }
>>> s
'00000000001'
For zip codes saved as integers:
>>> a = 6340
>>> b = 90210
>>> print '%05d' % a
06340
>>> print '%05d' % b
90210
Quick timing comparison:
setup = '''
from random import randint
def test_1():
num = randint(0,1000000)
return str(num).zfill(7)
def test_2():
num = randint(0,1000000)
return format(num, '07')
def test_3():
num = randint(0,1000000)
return '{0:07d}'.format(num)
def test_4():
num = randint(0,1000000)
return format(num, '07d')
def test_5():
num = randint(0,1000000)
return '{:07d}'.format(num)
def test_6():
num = randint(0,1000000)
return '{x:07d}'.format(x=num)
def test_7():
num = randint(0,1000000)
return str(num).rjust(7, '0')
'''
import timeit
print timeit.Timer("test_1()", setup=setup).repeat(3, 900000)
print timeit.Timer("test_2()", setup=setup).repeat(3, 900000)
print timeit.Timer("test_3()", setup=setup).repeat(3, 900000)
print timeit.Timer("test_4()", setup=setup).repeat(3, 900000)
print timeit.Timer("test_5()", setup=setup).repeat(3, 900000)
print timeit.Timer("test_6()", setup=setup).repeat(3, 900000)
print timeit.Timer("test_7()", setup=setup).repeat(3, 900000)
> [2.281613943830961, 2.2719342631547077, 2.261691106209631]
> [2.311480238815406, 2.318420542148333, 2.3552384305184493]
> [2.3824197456864304, 2.3457239951596485, 2.3353268829498646]
> [2.312442972404032, 2.318053102249902, 2.3054072168069872]
> [2.3482314132374853, 2.3403386400002475, 2.330108825844775]
> [2.424549090688892, 2.4346475296851438, 2.429691196530058]
> [2.3259756401716487, 2.333549212826732, 2.32049893822186]
I've made different tests of different repetitions. The differences are not huge, but in all tests, the zfill solution was fastest.
If you're looking to pad an integer, and limit the significant figures at the same time (with f strings):
a = 4.432
>> 4.432
a = f'{a:04.1f}'
>> '04.4'
f'{a:04.1f}' this translates to 1 decimal/(float) point, left pad the digit until 4 characters total.
Its ok too:
h = 2
m = 7
s = 3
print("%02d:%02d:%02d" % (h, m, s))
so output will be: "02:07:03"
You could also repeat "0", prepend it to str(n) and get the rightmost width slice. Quick and dirty little expression.
def pad_left(n, width, pad="0"):
return ((pad * width) + str(n))[-width:]
Another approach would be to use a list comprehension with a condition checking for lengths. Below is a demonstration:
# input list of strings that we want to prepend zeros
In [71]: list_of_str = ["101010", "10101010", "11110", "0000"]
# prepend zeros to make each string to length 8, if length of string is less than 8
In [83]: ["0"*(8-len(s)) + s if len(s) < desired_len else s for s in list_of_str]
Out[83]: ['00101010', '10101010', '00011110', '00000000']
I made a function :
def PadNumber(number, n_pad, add_prefix=None):
number_str = str(number)
paded_number = number_str.zfill(n_pad)
if add_prefix:
paded_number = add_prefix+paded_number
print(paded_number)
PadNumber(99, 4)
PadNumber(1011, 8, "b'")
PadNumber('7BEF', 6, "#")
The output :
0099
b'00001011
#007BEF

Categories