How do I pad a numeric string with zeroes to the left, so that the string has a specific length?
To pad strings:
>>> n = '4'
>>> print(n.zfill(3))
004
To pad numbers:
>>> n = 4
>>> print(f'{n:03}') # Preferred method, python >= 3.6
004
>>> print('%03d' % n)
004
>>> print(format(n, '03')) # python >= 2.6
004
>>> print('{0:03d}'.format(n)) # python >= 2.6 + python 3
004
>>> print('{foo:03d}'.format(foo=n)) # python >= 2.6 + python 3
004
>>> print('{:03d}'.format(n)) # python >= 2.7 + python3
004
String formatting documentation.
Just use the rjust method of the string object.
This example creates a 10-character length string, padding as necessary:
>>> s = 'test'
>>> s.rjust(10, '0')
>>> '000000test'
Besides zfill, you can use general string formatting:
print(f'{number:05d}') # (since Python 3.6), or
print('{:05d}'.format(number)) # or
print('{0:05d}'.format(number)) # or (explicit 0th positional arg. selection)
print('{n:05d}'.format(n=number)) # or (explicit `n` keyword arg. selection)
print(format(number, '05d'))
Documentation for string formatting and f-strings.
For Python 3.6+ using f-strings:
>>> i = 1
>>> f"{i:0>2}" # Works for both numbers and strings.
'01'
>>> f"{i:02}" # Works only for numbers.
'01'
For Python 2.6 to Python 3.5:
>>> "{:0>2}".format("1") # Works for both numbers and strings.
'01'
>>> "{:02}".format(1) # Works only for numbers.
'01'
Those standard format specifiers are [[fill]align][minimumwidth] and [0][minimumwidth].
>>> '99'.zfill(5)
'00099'
>>> '99'.rjust(5,'0')
'00099'
if you want the opposite:
>>> '99'.ljust(5,'0')
'99000'
str(n).zfill(width) will work with strings, ints, floats... and is Python 2.x and 3.x compatible:
>>> n = 3
>>> str(n).zfill(5)
'00003'
>>> n = '3'
>>> str(n).zfill(5)
'00003'
>>> n = '3.0'
>>> str(n).zfill(5)
'003.0'
What is the most pythonic way to pad a numeric string with zeroes to the left, i.e., so the numeric string has a specific length?
str.zfill is specifically intended to do this:
>>> '1'.zfill(4)
'0001'
Note that it is specifically intended to handle numeric strings as requested, and moves a + or - to the beginning of the string:
>>> '+1'.zfill(4)
'+001'
>>> '-1'.zfill(4)
'-001'
Here's the help on str.zfill:
>>> help(str.zfill)
Help on method_descriptor:
zfill(...)
S.zfill(width) -> str
Pad a numeric string S with zeros on the left, to fill a field
of the specified width. The string S is never truncated.
Performance
This is also the most performant of alternative methods:
>>> min(timeit.repeat(lambda: '1'.zfill(4)))
0.18824880896136165
>>> min(timeit.repeat(lambda: '1'.rjust(4, '0')))
0.2104538488201797
>>> min(timeit.repeat(lambda: f'{1:04}'))
0.32585487607866526
>>> min(timeit.repeat(lambda: '{:04}'.format(1)))
0.34988890308886766
To best compare apples to apples for the % method (note it is actually slower), which will otherwise pre-calculate:
>>> min(timeit.repeat(lambda: '1'.zfill(0 or 4)))
0.19728074967861176
>>> min(timeit.repeat(lambda: '%04d' % (0 or 1)))
0.2347015216946602
Implementation
With a little digging, I found the implementation of the zfill method in Objects/stringlib/transmogrify.h:
static PyObject *
stringlib_zfill(PyObject *self, PyObject *args)
{
Py_ssize_t fill;
PyObject *s;
char *p;
Py_ssize_t width;
if (!PyArg_ParseTuple(args, "n:zfill", &width))
return NULL;
if (STRINGLIB_LEN(self) >= width) {
return return_self(self);
}
fill = width - STRINGLIB_LEN(self);
s = pad(self, fill, 0, '0');
if (s == NULL)
return NULL;
p = STRINGLIB_STR(s);
if (p[fill] == '+' || p[fill] == '-') {
/* move sign to beginning of string */
p[0] = p[fill];
p[fill] = '0';
}
return s;
}
Let's walk through this C code.
It first parses the argument positionally, meaning it doesn't allow keyword arguments:
>>> '1'.zfill(width=4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: zfill() takes no keyword arguments
It then checks if it's the same length or longer, in which case it returns the string.
>>> '1'.zfill(0)
'1'
zfill calls pad (this pad function is also called by ljust, rjust, and center as well). This basically copies the contents into a new string and fills in the padding.
static inline PyObject *
pad(PyObject *self, Py_ssize_t left, Py_ssize_t right, char fill)
{
PyObject *u;
if (left < 0)
left = 0;
if (right < 0)
right = 0;
if (left == 0 && right == 0) {
return return_self(self);
}
u = STRINGLIB_NEW(NULL, left + STRINGLIB_LEN(self) + right);
if (u) {
if (left)
memset(STRINGLIB_STR(u), fill, left);
memcpy(STRINGLIB_STR(u) + left,
STRINGLIB_STR(self),
STRINGLIB_LEN(self));
if (right)
memset(STRINGLIB_STR(u) + left + STRINGLIB_LEN(self),
fill, right);
}
return u;
}
After calling pad, zfill moves any originally preceding + or - to the beginning of the string.
Note that for the original string to actually be numeric is not required:
>>> '+foo'.zfill(10)
'+000000foo'
>>> '-foo'.zfill(10)
'-000000foo'
For the ones who came here to understand and not just a quick answer.
I do these especially for time strings:
hour = 4
minute = 3
"{:0>2}:{:0>2}".format(hour,minute)
# prints 04:03
"{:0>3}:{:0>5}".format(hour,minute)
# prints '004:00003'
"{:0<3}:{:0<5}".format(hour,minute)
# prints '400:30000'
"{:$<3}:{:#<5}".format(hour,minute)
# prints '4$$:3####'
"0" symbols what to replace with the "2" padding characters, the default is an empty space
">" symbols allign all the 2 "0" character to the left of the string
":" symbols the format_spec
When using Python >= 3.6, the cleanest way is to use f-strings with string formatting:
>>> s = f"{1:08}" # inline with int
>>> s
'00000001'
>>> s = f"{'1':0>8}" # inline with str
>>> s
'00000001'
>>> n = 1
>>> s = f"{n:08}" # int variable
>>> s
'00000001'
>>> c = "1"
>>> s = f"{c:0>8}" # str variable
>>> s
'00000001'
I would prefer formatting with an int, since only then the sign is handled correctly:
>>> f"{-1:08}"
'-0000001'
>>> f"{1:+08}"
'+0000001'
>>> f"{'-1':0>8}"
'000000-1'
For numbers:
i = 12
print(f"{i:05d}")
Output
00012
width = 10
x = 5
print "%0*d" % (width, x)
> 0000000005
See the print documentation for all the exciting details!
Update for Python 3.x (7.5 years later)
That last line should now be:
print("%0*d" % (width, x))
I.e. print() is now a function, not a statement. Note that I still prefer the Old School printf() style because, IMNSHO, it reads better, and because, um, I've been using that notation since January, 1980. Something ... old dogs .. something something ... new tricks.
I am adding how to use a int from a length of a string within an f-string because it didn't appear to be covered:
>>> pad_number = len("this_string")
11
>>> s = f"{1:0{pad_number}}" }
>>> s
'00000000001'
For zip codes saved as integers:
>>> a = 6340
>>> b = 90210
>>> print '%05d' % a
06340
>>> print '%05d' % b
90210
Quick timing comparison:
setup = '''
from random import randint
def test_1():
num = randint(0,1000000)
return str(num).zfill(7)
def test_2():
num = randint(0,1000000)
return format(num, '07')
def test_3():
num = randint(0,1000000)
return '{0:07d}'.format(num)
def test_4():
num = randint(0,1000000)
return format(num, '07d')
def test_5():
num = randint(0,1000000)
return '{:07d}'.format(num)
def test_6():
num = randint(0,1000000)
return '{x:07d}'.format(x=num)
def test_7():
num = randint(0,1000000)
return str(num).rjust(7, '0')
'''
import timeit
print timeit.Timer("test_1()", setup=setup).repeat(3, 900000)
print timeit.Timer("test_2()", setup=setup).repeat(3, 900000)
print timeit.Timer("test_3()", setup=setup).repeat(3, 900000)
print timeit.Timer("test_4()", setup=setup).repeat(3, 900000)
print timeit.Timer("test_5()", setup=setup).repeat(3, 900000)
print timeit.Timer("test_6()", setup=setup).repeat(3, 900000)
print timeit.Timer("test_7()", setup=setup).repeat(3, 900000)
> [2.281613943830961, 2.2719342631547077, 2.261691106209631]
> [2.311480238815406, 2.318420542148333, 2.3552384305184493]
> [2.3824197456864304, 2.3457239951596485, 2.3353268829498646]
> [2.312442972404032, 2.318053102249902, 2.3054072168069872]
> [2.3482314132374853, 2.3403386400002475, 2.330108825844775]
> [2.424549090688892, 2.4346475296851438, 2.429691196530058]
> [2.3259756401716487, 2.333549212826732, 2.32049893822186]
I've made different tests of different repetitions. The differences are not huge, but in all tests, the zfill solution was fastest.
If you're looking to pad an integer, and limit the significant figures at the same time (with f strings):
a = 4.432
>> 4.432
a = f'{a:04.1f}'
>> '04.4'
f'{a:04.1f}' this translates to 1 decimal/(float) point, left pad the digit until 4 characters total.
Its ok too:
h = 2
m = 7
s = 3
print("%02d:%02d:%02d" % (h, m, s))
so output will be: "02:07:03"
You could also repeat "0", prepend it to str(n) and get the rightmost width slice. Quick and dirty little expression.
def pad_left(n, width, pad="0"):
return ((pad * width) + str(n))[-width:]
Another approach would be to use a list comprehension with a condition checking for lengths. Below is a demonstration:
# input list of strings that we want to prepend zeros
In [71]: list_of_str = ["101010", "10101010", "11110", "0000"]
# prepend zeros to make each string to length 8, if length of string is less than 8
In [83]: ["0"*(8-len(s)) + s if len(s) < desired_len else s for s in list_of_str]
Out[83]: ['00101010', '10101010', '00011110', '00000000']
I made a function :
def PadNumber(number, n_pad, add_prefix=None):
number_str = str(number)
paded_number = number_str.zfill(n_pad)
if add_prefix:
paded_number = add_prefix+paded_number
print(paded_number)
PadNumber(99, 4)
PadNumber(1011, 8, "b'")
PadNumber('7BEF', 6, "#")
The output :
0099
b'00001011
#007BEF
Related
I want to convert a string (composed of alphanumeric characters) into an integer and then convert this integer back into a string:
string --> int --> string
In other words, I want to represent an alphanumeric string by an integer.
I found a working solution, which I included in the answer, but I do not think it is the best solution, and I am interested in other ideas/methods.
Please don't tag this as duplicate just because a lot of similar questions already exist, I specifically want an easy way of transforming a string into an integer and vice versa.
This should work for strings that contain alphanumeric characters, i.e. strings containing numbers and letters.
Here's what I have so far:
First define an string
m = "test123"
string -> bytes
mBytes = m.encode("utf-8")
bytes -> int
mInt = int.from_bytes(mBytes, byteorder="big")
int -> bytes
mBytes = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")
bytes -> string
m = mBytes.decode("utf-8")
All together
m = "test123"
mBytes = m.encode("utf-8")
mInt = int.from_bytes(mBytes, byteorder="big")
mBytes2 = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")
m2 = mBytes2.decode("utf-8")
print(m == m2)
Here is an identical reusable version of the above:
class BytesIntEncoder:
#staticmethod
def encode(b: bytes) -> int:
return int.from_bytes(b, byteorder='big')
#staticmethod
def decode(i: int) -> bytes:
return i.to_bytes(((i.bit_length() + 7) // 8), byteorder='big')
If you're using Python <3.6, remove the optional type annotations.
Test:
>>> s = 'Test123'
>>> b = s.encode()
>>> b
b'Test123'
>>> BytesIntEncoder.encode(b)
23755444588720691
>>> BytesIntEncoder.decode(_)
b'Test123'
>>> _.decode()
'Test123'
Recall that a string can be encoded to bytes, which can then be encoded to an integer. The encodings can then be reversed to get the bytes followed by the original string.
This encoder uses binascii to produce an identical integer encoding to the one in the answer by charel-f. I believe it to be identical because I extensively tested it.
Credit: this answer.
from binascii import hexlify, unhexlify
class BytesIntEncoder:
#staticmethod
def encode(b: bytes) -> int:
return int(hexlify(b), 16) if b != b'' else 0
#staticmethod
def decode(i: int) -> int:
return unhexlify('%x' % i) if i != 0 else b''
If you're using Python <3.6, remove the optional type annotations.
Quick test:
>>> s = 'Test123'
>>> b = s.encode()
>>> b
b'Test123'
>>> BytesIntEncoder.encode(b)
23755444588720691
>>> BytesIntEncoder.decode(_)
b'Test123'
>>> _.decode()
'Test123'
Assuming the character set is merely alphanumeric, i.e. a-z A-Z 0-9, this requires 6 bits per character. As such, using an 8-bit byte-encoding is theoretically an inefficient use of memory.
This answer converts the input bytes into a sequence of 6-bit integers. It encodes these small integers into one large integer using bitwise operations. Whether this actually translates into real-world storage efficiency is measured by sys.getsizeof, and is more likely for larger strings.
This implementation customizes the encoding for the choice of character set. If for example you were working with just string.ascii_lowercase (5 bits) rather than string.ascii_uppercase + string.digits (6 bits), the encoding would be correspondingly efficient.
Unit tests are also included.
import string
class BytesIntEncoder:
def __init__(self, chars: bytes = (string.ascii_letters + string.digits).encode()):
num_chars = len(chars)
translation = ''.join(chr(i) for i in range(1, num_chars + 1)).encode()
self._translation_table = bytes.maketrans(chars, translation)
self._reverse_translation_table = bytes.maketrans(translation, chars)
self._num_bits_per_char = (num_chars + 1).bit_length()
def encode(self, chars: bytes) -> int:
num_bits_per_char = self._num_bits_per_char
output, bit_idx = 0, 0
for chr_idx in chars.translate(self._translation_table):
output |= (chr_idx << bit_idx)
bit_idx += num_bits_per_char
return output
def decode(self, i: int) -> bytes:
maxint = (2 ** self._num_bits_per_char) - 1
output = bytes(((i >> offset) & maxint) for offset in range(0, i.bit_length(), self._num_bits_per_char))
return output.translate(self._reverse_translation_table)
# Test
import itertools
import random
import unittest
class TestBytesIntEncoder(unittest.TestCase):
chars = string.ascii_letters + string.digits
encoder = BytesIntEncoder(chars.encode())
def _test_encoding(self, b_in: bytes):
i = self.encoder.encode(b_in)
self.assertIsInstance(i, int)
b_out = self.encoder.decode(i)
self.assertIsInstance(b_out, bytes)
self.assertEqual(b_in, b_out)
# print(b_in, i)
def test_thoroughly_with_small_str(self):
for s_len in range(4):
for s in itertools.combinations_with_replacement(self.chars, s_len):
s = ''.join(s)
b_in = s.encode()
self._test_encoding(b_in)
def test_randomly_with_large_str(self):
for s_len in range(256):
num_samples = {s_len <= 16: 2 ** s_len,
16 < s_len <= 32: s_len ** 2,
s_len > 32: s_len * 2,
s_len > 64: s_len,
s_len > 128: 2}[True]
# print(s_len, num_samples)
for _ in range(num_samples):
b_in = ''.join(random.choices(self.chars, k=s_len)).encode()
self._test_encoding(b_in)
if __name__ == '__main__':
unittest.main()
Usage example:
>>> encoder = BytesIntEncoder()
>>> s = 'Test123'
>>> b = s.encode()
>>> b
b'Test123'
>>> encoder.encode(b)
3908257788270
>>> encoder.decode(_)
b'Test123'
so I needed transfer a dictionary in terms of numbers,
it may look kinda ugly but it's efficient in the way that every char (english letters) is exactly 2 numbers but it's capable of transfering any kind of unicode char
import json
myDict = {
"le key": "le Valueue",
2 : {
"heya": 1234569,
"3": 4
},
'Α α, Β β, Γ γ' : 'שלום'
}
def convertDictToNum(toBeConverted):
return int(''.join([(lambda c: c if len(c) ==2 else '0'+c )(str(ord(c) - 26)) for c in str(json.dumps(toBeConverted))]))
def loadDictFromNum(toBeDecoded):
toBeDecoded = str(toBeDecoded)
return json.loads(''.join([chr(int(toBeDecoded[cut:cut + 2]) + 26) for cut in range(0, len(toBeDecoded), 2)]))
numbersDict = convertDictToNum(myDict)
print(numbersDict)
# 9708827506817595083206088....
recoveredDict = loadDictFromNum(numbersDict)
print(recoveredDict)
# {'le key': 'le Valueue', '2': {'heya': 1234569, '3': 4}, 'Α α, Β β, Γ γ': 'שלום'}
Using this code to take a string and convert it to binary:
bin(reduce(lambda x, y: 256*x+y, (ord(c) for c in 'hello'), 0))
this outputs:
0b110100001100101011011000110110001101111
Which, if I put it into this site (on the right hand site) I get my message of hello back. I'm wondering what method it uses. I know I could splice apart the string of binary into 8's and then match it to the corresponding value to bin(ord(character)) or some other way. Really looking for something simpler.
For ASCII characters in the range [ -~] on Python 2:
>>> import binascii
>>> bin(int(binascii.hexlify('hello'), 16))
'0b110100001100101011011000110110001101111'
In reverse:
>>> n = int('0b110100001100101011011000110110001101111', 2)
>>> binascii.unhexlify('%x' % n)
'hello'
In Python 3.2+:
>>> bin(int.from_bytes('hello'.encode(), 'big'))
'0b110100001100101011011000110110001101111'
In reverse:
>>> n = int('0b110100001100101011011000110110001101111', 2)
>>> n.to_bytes((n.bit_length() + 7) // 8, 'big').decode()
'hello'
To support all Unicode characters in Python 3:
def text_to_bits(text, encoding='utf-8', errors='surrogatepass'):
bits = bin(int.from_bytes(text.encode(encoding, errors), 'big'))[2:]
return bits.zfill(8 * ((len(bits) + 7) // 8))
def text_from_bits(bits, encoding='utf-8', errors='surrogatepass'):
n = int(bits, 2)
return n.to_bytes((n.bit_length() + 7) // 8, 'big').decode(encoding, errors) or '\0'
Here's single-source Python 2/3 compatible version:
import binascii
def text_to_bits(text, encoding='utf-8', errors='surrogatepass'):
bits = bin(int(binascii.hexlify(text.encode(encoding, errors)), 16))[2:]
return bits.zfill(8 * ((len(bits) + 7) // 8))
def text_from_bits(bits, encoding='utf-8', errors='surrogatepass'):
n = int(bits, 2)
return int2bytes(n).decode(encoding, errors)
def int2bytes(i):
hex_string = '%x' % i
n = len(hex_string)
return binascii.unhexlify(hex_string.zfill(n + (n & 1)))
Example
>>> text_to_bits('hello')
'0110100001100101011011000110110001101111'
>>> text_from_bits('110100001100101011011000110110001101111') == u'hello'
True
Built-in only python
Here is a pure python method for simple strings, left here for posterity.
def string2bits(s=''):
return [bin(ord(x))[2:].zfill(8) for x in s]
def bits2string(b=None):
return ''.join([chr(int(x, 2)) for x in b])
s = 'Hello, World!'
b = string2bits(s)
s2 = bits2string(b)
print 'String:'
print s
print '\nList of Bits:'
for x in b:
print x
print '\nString:'
print s2
String:
Hello, World!
List of Bits:
01001000
01100101
01101100
01101100
01101111
00101100
00100000
01010111
01101111
01110010
01101100
01100100
00100001
String:
Hello, World!
I'm not sure how you think you can do it other than character-by-character -- it's inherently a character-by-character operation. There is certainly code out there to do this for you, but there is no "simpler" way than doing it character-by-character.
First, you need to strip the 0b prefix, and left-zero-pad the string so it's length is divisible by 8, to make dividing the bitstring up into characters easy:
bitstring = bitstring[2:]
bitstring = -len(bitstring) % 8 * '0' + bitstring
Then you divide the string up into blocks of eight binary digits, convert them to ASCII characters, and join them back into a string:
string_blocks = (bitstring[i:i+8] for i in range(0, len(bitstring), 8))
string = ''.join(chr(int(char, 2)) for char in string_blocks)
If you actually want to treat it as a number, you still have to account for the fact that the leftmost character will be at most seven digits long if you want to go left-to-right instead of right-to-left.
This is my way to solve your task:
str = "0b110100001100101011011000110110001101111"
str = "0" + str[2:]
message = ""
while str != "":
i = chr(int(str[:8], 2))
message = message + i
str = str[8:]
print message
if you don'y want to import any files you can use this:
with open("Test1.txt", "r") as File1:
St = (' '.join(format(ord(x), 'b') for x in File1.read()))
StrList = St.split(" ")
to convert a text file to binary.
and you can use this to convert it back to string:
StrOrgList = StrOrgMsg.split(" ")
for StrValue in StrOrgList:
if(StrValue != ""):
StrMsg += chr(int(str(StrValue),2))
print(StrMsg)
hope that is helpful, i've used this with some custom encryption to send over TCP.
Are you looking for the code to do it or understanding the algorithm?
Does this do what you need? Specifically a2b_uu and b2a_uu? There are LOTS of other options in there in case those aren't what you want.
(NOTE: Not a Python guy but this seemed like an obvious answer)
Convert binary to its equivalent character.
k=7
dec=0
new=[]
item=[x for x in input("Enter 8bit binary number with , seprator").split(",")]
for i in item:
for j in i:
if(j=="1"):
dec=2**k+dec
k=k-1
else:
k=k-1
new.append(dec)
dec=0
k=7
print(new)
for i in new:
print(chr(i),end="")
This is a spruced up version of J.F. Sebastian's. Thanks for the snippets though J.F. Sebastian.
import binascii, sys
def goodbye():
sys.exit("\n"+"*"*43+"\n\nGood Bye! Come use again!\n\n"+"*"*43+"")
while __name__=='__main__':
print "[A]scii to Binary, [B]inary to Ascii, or [E]xit:"
var1=raw_input('>>> ')
if var1=='a':
string=raw_input('String to convert:\n>>> ')
convert=bin(int(binascii.hexlify(string), 16))
i=2
truebin=[]
while i!=len(convert):
truebin.append(convert[i])
i=i+1
convert=''.join(truebin)
print '\n'+'*'*84+'\n\n'+convert+'\n\n'+'*'*84+'\n'
if var1=='b':
binary=raw_input('Binary to convert:\n>>> ')
n = int(binary, 2)
done=binascii.unhexlify('%x' % n)
print '\n'+'*'*84+'\n\n'+done+'\n\n'+'*'*84+'\n'
if var1=='e':
aus=raw_input('Are you sure? (y/n)\n>>> ')
if aus=='y':
goodbye()
How do I pad a numeric string with zeroes to the left, so that the string has a specific length?
To pad strings:
>>> n = '4'
>>> print(n.zfill(3))
004
To pad numbers:
>>> n = 4
>>> print(f'{n:03}') # Preferred method, python >= 3.6
004
>>> print('%03d' % n)
004
>>> print(format(n, '03')) # python >= 2.6
004
>>> print('{0:03d}'.format(n)) # python >= 2.6 + python 3
004
>>> print('{foo:03d}'.format(foo=n)) # python >= 2.6 + python 3
004
>>> print('{:03d}'.format(n)) # python >= 2.7 + python3
004
String formatting documentation.
Just use the rjust method of the string object.
This example creates a 10-character length string, padding as necessary:
>>> s = 'test'
>>> s.rjust(10, '0')
>>> '000000test'
Besides zfill, you can use general string formatting:
print(f'{number:05d}') # (since Python 3.6), or
print('{:05d}'.format(number)) # or
print('{0:05d}'.format(number)) # or (explicit 0th positional arg. selection)
print('{n:05d}'.format(n=number)) # or (explicit `n` keyword arg. selection)
print(format(number, '05d'))
Documentation for string formatting and f-strings.
For Python 3.6+ using f-strings:
>>> i = 1
>>> f"{i:0>2}" # Works for both numbers and strings.
'01'
>>> f"{i:02}" # Works only for numbers.
'01'
For Python 2.6 to Python 3.5:
>>> "{:0>2}".format("1") # Works for both numbers and strings.
'01'
>>> "{:02}".format(1) # Works only for numbers.
'01'
Those standard format specifiers are [[fill]align][minimumwidth] and [0][minimumwidth].
>>> '99'.zfill(5)
'00099'
>>> '99'.rjust(5,'0')
'00099'
if you want the opposite:
>>> '99'.ljust(5,'0')
'99000'
str(n).zfill(width) will work with strings, ints, floats... and is Python 2.x and 3.x compatible:
>>> n = 3
>>> str(n).zfill(5)
'00003'
>>> n = '3'
>>> str(n).zfill(5)
'00003'
>>> n = '3.0'
>>> str(n).zfill(5)
'003.0'
What is the most pythonic way to pad a numeric string with zeroes to the left, i.e., so the numeric string has a specific length?
str.zfill is specifically intended to do this:
>>> '1'.zfill(4)
'0001'
Note that it is specifically intended to handle numeric strings as requested, and moves a + or - to the beginning of the string:
>>> '+1'.zfill(4)
'+001'
>>> '-1'.zfill(4)
'-001'
Here's the help on str.zfill:
>>> help(str.zfill)
Help on method_descriptor:
zfill(...)
S.zfill(width) -> str
Pad a numeric string S with zeros on the left, to fill a field
of the specified width. The string S is never truncated.
Performance
This is also the most performant of alternative methods:
>>> min(timeit.repeat(lambda: '1'.zfill(4)))
0.18824880896136165
>>> min(timeit.repeat(lambda: '1'.rjust(4, '0')))
0.2104538488201797
>>> min(timeit.repeat(lambda: f'{1:04}'))
0.32585487607866526
>>> min(timeit.repeat(lambda: '{:04}'.format(1)))
0.34988890308886766
To best compare apples to apples for the % method (note it is actually slower), which will otherwise pre-calculate:
>>> min(timeit.repeat(lambda: '1'.zfill(0 or 4)))
0.19728074967861176
>>> min(timeit.repeat(lambda: '%04d' % (0 or 1)))
0.2347015216946602
Implementation
With a little digging, I found the implementation of the zfill method in Objects/stringlib/transmogrify.h:
static PyObject *
stringlib_zfill(PyObject *self, PyObject *args)
{
Py_ssize_t fill;
PyObject *s;
char *p;
Py_ssize_t width;
if (!PyArg_ParseTuple(args, "n:zfill", &width))
return NULL;
if (STRINGLIB_LEN(self) >= width) {
return return_self(self);
}
fill = width - STRINGLIB_LEN(self);
s = pad(self, fill, 0, '0');
if (s == NULL)
return NULL;
p = STRINGLIB_STR(s);
if (p[fill] == '+' || p[fill] == '-') {
/* move sign to beginning of string */
p[0] = p[fill];
p[fill] = '0';
}
return s;
}
Let's walk through this C code.
It first parses the argument positionally, meaning it doesn't allow keyword arguments:
>>> '1'.zfill(width=4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: zfill() takes no keyword arguments
It then checks if it's the same length or longer, in which case it returns the string.
>>> '1'.zfill(0)
'1'
zfill calls pad (this pad function is also called by ljust, rjust, and center as well). This basically copies the contents into a new string and fills in the padding.
static inline PyObject *
pad(PyObject *self, Py_ssize_t left, Py_ssize_t right, char fill)
{
PyObject *u;
if (left < 0)
left = 0;
if (right < 0)
right = 0;
if (left == 0 && right == 0) {
return return_self(self);
}
u = STRINGLIB_NEW(NULL, left + STRINGLIB_LEN(self) + right);
if (u) {
if (left)
memset(STRINGLIB_STR(u), fill, left);
memcpy(STRINGLIB_STR(u) + left,
STRINGLIB_STR(self),
STRINGLIB_LEN(self));
if (right)
memset(STRINGLIB_STR(u) + left + STRINGLIB_LEN(self),
fill, right);
}
return u;
}
After calling pad, zfill moves any originally preceding + or - to the beginning of the string.
Note that for the original string to actually be numeric is not required:
>>> '+foo'.zfill(10)
'+000000foo'
>>> '-foo'.zfill(10)
'-000000foo'
For the ones who came here to understand and not just a quick answer.
I do these especially for time strings:
hour = 4
minute = 3
"{:0>2}:{:0>2}".format(hour,minute)
# prints 04:03
"{:0>3}:{:0>5}".format(hour,minute)
# prints '004:00003'
"{:0<3}:{:0<5}".format(hour,minute)
# prints '400:30000'
"{:$<3}:{:#<5}".format(hour,minute)
# prints '4$$:3####'
"0" symbols what to replace with the "2" padding characters, the default is an empty space
">" symbols allign all the 2 "0" character to the left of the string
":" symbols the format_spec
When using Python >= 3.6, the cleanest way is to use f-strings with string formatting:
>>> s = f"{1:08}" # inline with int
>>> s
'00000001'
>>> s = f"{'1':0>8}" # inline with str
>>> s
'00000001'
>>> n = 1
>>> s = f"{n:08}" # int variable
>>> s
'00000001'
>>> c = "1"
>>> s = f"{c:0>8}" # str variable
>>> s
'00000001'
I would prefer formatting with an int, since only then the sign is handled correctly:
>>> f"{-1:08}"
'-0000001'
>>> f"{1:+08}"
'+0000001'
>>> f"{'-1':0>8}"
'000000-1'
For numbers:
i = 12
print(f"{i:05d}")
Output
00012
width = 10
x = 5
print "%0*d" % (width, x)
> 0000000005
See the print documentation for all the exciting details!
Update for Python 3.x (7.5 years later)
That last line should now be:
print("%0*d" % (width, x))
I.e. print() is now a function, not a statement. Note that I still prefer the Old School printf() style because, IMNSHO, it reads better, and because, um, I've been using that notation since January, 1980. Something ... old dogs .. something something ... new tricks.
I am adding how to use a int from a length of a string within an f-string because it didn't appear to be covered:
>>> pad_number = len("this_string")
11
>>> s = f"{1:0{pad_number}}" }
>>> s
'00000000001'
For zip codes saved as integers:
>>> a = 6340
>>> b = 90210
>>> print '%05d' % a
06340
>>> print '%05d' % b
90210
Quick timing comparison:
setup = '''
from random import randint
def test_1():
num = randint(0,1000000)
return str(num).zfill(7)
def test_2():
num = randint(0,1000000)
return format(num, '07')
def test_3():
num = randint(0,1000000)
return '{0:07d}'.format(num)
def test_4():
num = randint(0,1000000)
return format(num, '07d')
def test_5():
num = randint(0,1000000)
return '{:07d}'.format(num)
def test_6():
num = randint(0,1000000)
return '{x:07d}'.format(x=num)
def test_7():
num = randint(0,1000000)
return str(num).rjust(7, '0')
'''
import timeit
print timeit.Timer("test_1()", setup=setup).repeat(3, 900000)
print timeit.Timer("test_2()", setup=setup).repeat(3, 900000)
print timeit.Timer("test_3()", setup=setup).repeat(3, 900000)
print timeit.Timer("test_4()", setup=setup).repeat(3, 900000)
print timeit.Timer("test_5()", setup=setup).repeat(3, 900000)
print timeit.Timer("test_6()", setup=setup).repeat(3, 900000)
print timeit.Timer("test_7()", setup=setup).repeat(3, 900000)
> [2.281613943830961, 2.2719342631547077, 2.261691106209631]
> [2.311480238815406, 2.318420542148333, 2.3552384305184493]
> [2.3824197456864304, 2.3457239951596485, 2.3353268829498646]
> [2.312442972404032, 2.318053102249902, 2.3054072168069872]
> [2.3482314132374853, 2.3403386400002475, 2.330108825844775]
> [2.424549090688892, 2.4346475296851438, 2.429691196530058]
> [2.3259756401716487, 2.333549212826732, 2.32049893822186]
I've made different tests of different repetitions. The differences are not huge, but in all tests, the zfill solution was fastest.
If you're looking to pad an integer, and limit the significant figures at the same time (with f strings):
a = 4.432
>> 4.432
a = f'{a:04.1f}'
>> '04.4'
f'{a:04.1f}' this translates to 1 decimal/(float) point, left pad the digit until 4 characters total.
Its ok too:
h = 2
m = 7
s = 3
print("%02d:%02d:%02d" % (h, m, s))
so output will be: "02:07:03"
You could also repeat "0", prepend it to str(n) and get the rightmost width slice. Quick and dirty little expression.
def pad_left(n, width, pad="0"):
return ((pad * width) + str(n))[-width:]
Another approach would be to use a list comprehension with a condition checking for lengths. Below is a demonstration:
# input list of strings that we want to prepend zeros
In [71]: list_of_str = ["101010", "10101010", "11110", "0000"]
# prepend zeros to make each string to length 8, if length of string is less than 8
In [83]: ["0"*(8-len(s)) + s if len(s) < desired_len else s for s in list_of_str]
Out[83]: ['00101010', '10101010', '00011110', '00000000']
I made a function :
def PadNumber(number, n_pad, add_prefix=None):
number_str = str(number)
paded_number = number_str.zfill(n_pad)
if add_prefix:
paded_number = add_prefix+paded_number
print(paded_number)
PadNumber(99, 4)
PadNumber(1011, 8, "b'")
PadNumber('7BEF', 6, "#")
The output :
0099
b'00001011
#007BEF
I need to obfuscate lines of Unicode text to slow down those who may want to extract them. Ideally this would be done with a built in Python module or a small add-on library; the string length will be the same or less than the original; and the "unobfuscation" be as fast as possible.
I have tried various character swaps and XOR routines, but they are slow. Base64 and hex encoding increase the size considerably. To date the most efficient method I've found is compressing with zlib at the lowest setting (1). Is there a better way?
How about the old ROT13 trick?
Python 3:
>>> import codecs
>>> x = 'some string'
>>> y = codecs.encode(x, 'rot13')
>>> y
'fbzr fgevat'
>>> codecs.decode(y, 'rot13')
u'some string'
Python 2:
>>> x = 'some string'
>>> y = x.encode('rot13')
>>> y
'fbzr fgevat'
>>> y.decode('rot13')
u'some string'
For a unicode string:
>>> x = u'國碼'
>>> print x
國碼
>>> y = x.encode('unicode-escape').encode('rot13')
>>> print y
\h570o\h78op
>>> print y.decode('rot13').decode('unicode-escape')
國碼
This uses a simple, fast encryption scheme on bytes objects.
# For Python 3 - strings are Unicode, print is a function
def obfuscate(byt):
# Use same function in both directions. Input and output are bytes
# objects.
mask = b'keyword'
lmask = len(mask)
return bytes(c ^ mask[i % lmask] for i, c in enumerate(byt))
def test(s):
data = obfuscate(s.encode())
print(len(s), len(data), data)
newdata = obfuscate(data).decode()
print(newdata == s)
simple_string = 'Just plain ASCII'
unicode_string = ('sensei = \N{HIRAGANA LETTER SE}\N{HIRAGANA LETTER N}'
'\N{HIRAGANA LETTER SE}\N{HIRAGANA LETTER I}')
test(simple_string)
test(unicode_string)
Python 2 version:
# For Python 2
mask = 'keyword'
nmask = [ord(c) for c in mask]
lmask = len(mask)
def obfuscate(s):
# Use same function in both directions. Input and output are
# Python 2 strings, ASCII only.
return ''.join([chr(ord(c) ^ nmask[i % lmask])
for i, c in enumerate(s)])
def test(s):
data = obfuscate(s.encode('utf-8'))
print len(s), len(data), repr(data)
newdata = obfuscate(data).decode('utf-8')
print newdata == s
simple_string = u'Just plain ASCII'
unicode_string = (u'sensei = \N{HIRAGANA LETTER SE}\N{HIRAGANA LETTER N}'
'\N{HIRAGANA LETTER SE}\N{HIRAGANA LETTER I}')
test(simple_string)
test(unicode_string)
It depends on the size of your input, if it's over 1K then using numpy is about 60x faster (runs in less than 2% of the naïve Python code).
import time
import numpy as np
mask = b'We are the knights who say "Ni"!'
mask_length = len(mask)
def mask_python(val: bytes) -> bytes:
return bytes(c ^ mask[i % mask_length] for i, c in enumerate(val))
def mask_numpy(val: bytes) -> bytes:
arr = np.frombuffer(val, dtype=np.int8)
length = len(value)
np_mask = np.tile(np.frombuffer(mask, dtype=np.int8), round(length/mask_length+0.5))[:length]
masked = arr ^ np_mask
return masked.tobytes()
value = b'0123456789'
for i in range(9):
start_py = time.perf_counter()
masked_py = mask_python(value)
end_py = time.perf_counter()
start_np = time.perf_counter()
masked_np = mask_numpy(value)
end_np = time.perf_counter()
assert masked_py == masked_np
print(f"{i+1} {len(value)} {end_py-start_py} {end_np-start_np}")
value = value * 10
Note: I'm a novice with numpy, if anyone has any comments on my code I would be very happy to hear about it in comments.
use codecs with hex encoding , like :
>>> codecs.encode(b'test/jimmy', 'hex')
b'746573742f6a696d6d79'
>>> codecs.decode(b'746573742f6a696d6d79', 'hex')
b'test/jimmy'
I'm new to Python, coming from Java and C. How can I increment a char? In Java or C, chars and ints are practically interchangeable, and in certain loops, it's very useful to me to be able to do increment chars, and index arrays by chars.
How can I do this in Python? It's bad enough not having a traditional for(;;) looper - is there any way I can achieve what I want to achieve without having to rethink my entire strategy?
In Python 2.x, just use the ord and chr functions:
>>> ord('c')
99
>>> ord('c') + 1
100
>>> chr(ord('c') + 1)
'd'
>>>
Python 3.x makes this more organized and interesting, due to its clear distinction between bytes and unicode. By default, a "string" is unicode, so the above works (ord receives Unicode chars and chr produces them).
But if you're interested in bytes (such as for processing some binary data stream), things are even simpler:
>>> bstr = bytes('abc', 'utf-8')
>>> bstr
b'abc'
>>> bstr[0]
97
>>> bytes([97, 98, 99])
b'abc'
>>> bytes([bstr[0] + 1, 98, 99])
b'bbc'
"bad enough not having a traditional for(;;) looper"?? What?
Are you trying to do
import string
for c in string.lowercase:
...do something with c...
Or perhaps you're using string.uppercase or string.letters?
Python doesn't have for(;;) because there are often better ways to do it. It also doesn't have character math because it's not necessary, either.
Check this: USING FOR LOOP
for a in range(5):
x='A'
val=chr(ord(x) + a)
print(val)
LOOP OUTPUT: A B C D E
I came from PHP, where you can increment char (A to B, Z to AA, AA to AB etc.) using ++ operator. I made a simple function which does the same in Python. You can also change list of chars to whatever (lowercase, uppercase, etc.) is your need.
# Increment char (a -> b, az -> ba)
def inc_char(text, chlist = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
# Unique and sort
chlist = ''.join(sorted(set(str(chlist))))
chlen = len(chlist)
if not chlen:
return ''
text = str(text)
# Replace all chars but chlist
text = re.sub('[^' + chlist + ']', '', text)
if not len(text):
return chlist[0]
# Increment
inc = ''
over = False
for i in range(1, len(text)+1):
lchar = text[-i]
pos = chlist.find(lchar) + 1
if pos < chlen:
inc = chlist[pos] + inc
over = False
break
else:
inc = chlist[0] + inc
over = True
if over:
inc += chlist[0]
result = text[0:-len(inc)] + inc
return result
There is a way to increase character using ascii_letters from string package which ascii_letters is a string that contains all English alphabet, uppercase and lowercase:
>>> from string import ascii_letters
>>> ascii_letters[ascii_letters.index('a') + 1]
'b'
>>> ascii_letters
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
Also it can be done manually;
>>> letters = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> letters[letters.index('c') + 1]
'd'
def doubleChar(str):
result = ''
for char in str:
result += char * 2
return result
print(doubleChar("amar"))
output:
aammaarr
For me i made the fallowing as a test.
string_1="abcd"
def test(string_1):
i = 0
p = ""
x = len(string_1)
while i < x:
y = (string_1)[i]
i=i+1
s = chr(ord(y) + 1)
p=p+s
print(p)
test(string_1)