What's actually happening when I convert an int to a string? - python

I understand it's easy to convert an int to a string by using the built-in method str(). However, what's actually happening? I understand it may point to the
__str__ method of the int object but how does it then compute the “informal” string representation? Tried looking at the source and didn't find a lead; any help appreciated.

Python repeatedly divides the int by 10 and uses % 10 to get the decimal digits one by one.
Just to make sure we're looking at the right code, here's the function Python 2.7 uses to convert ints to strings:
static PyObject *
int_to_decimal_string(PyIntObject *v) {
char buf[sizeof(long)*CHAR_BIT/3+6], *p, *bufend;
long n = v->ob_ival;
unsigned long absn;
p = bufend = buf + sizeof(buf);
absn = n < 0 ? 0UL - n : n;
do {
*--p = '0' + (char)(absn % 10);
absn /= 10;
} while (absn);
if (n < 0)
*--p = '-';
return PyString_FromStringAndSize(p, bufend - p);
}
This allocates enough space to store the characters of the string, then fills the digits in one by one, starting at the end. When it's done with the digits, it sticks a - sign on the front if the number is negative and constructs a Python string object from the characters. Translating that into Python, we get the following:
def int_to_decimal_string(n):
chars = [None] * enough # enough room for any int's string representation
abs_n = abs(n)
i = 0
while True:
i += 1
chars[-i] = str(abs_n % 10) # chr(ord('0') + abs_n % 10) is more accurate
abs_n //= 10
if not abs_n:
break
if n < 0:
i += 1
chars[-i] = '-'
return ''.join(chars[-i:])

Internally the Int object is stored as 2's complement representation like in C (well, this is true if range value allow it, python can automagically convert it to some other representation if it does not fit any more).
Now to get the string representation you have to change that to a string (and a string merely some unmutable list of chars). The algorithm is simple mathematical computing: divide the number by 10 (integer division) and keep the remainder, add that to character code '0'. You get the unit digit. Go on with the result of the division until the result of the division is zero. It's as simple as that.
This approach works with any integer representation but of course it will be more efficient to call the ltoa C library function or equivalent C code to do that if possible than code it in python.

When you call str() on an object it calls it's classes __ str__ magic method.
for example
class NewThing:
def __init__(self, name):
self.name = name
def __str__(self):
return self.name
From there you can use the str() method on the object, or use it directly in strings.
>> thing = NewThing("poop")
>> print thing
>> poop
More info on magic methods here
Not sure if this is what you wanted, but I can't comment yet to ask clarifying questions.

Related

Python recursion (format issue)

Write a recursive function replace_digit(n, d, r) which replaces each occurrence of digit d in the number n by r.
replace_digit(31242154125, 1, 0) => 30242054025
My code is as such
def replace_digit(n, d, r):
y=str(n)
if len(y)==0:
return ''
else:
if y[0]== str(d):
return str(r) + replace_digit(str(n)[1:],d,r)
else:
return y[0]+ replace_digit(str(n)[1:],d,r)
However, the answer I get is in a string format. Any idea how to convert into an integer format? I have been stuck for quite some time on this :(
If your recursive function must return an integer, then return integers. You can always convert the returned integer back into a string for recursive calls.
You'll have to stop when you run out of digits before calling, so only recurse if there are 2 or more characters in y.
However, this approach a big problem: leading zeros are dropped when converting to int():
>>> int('025')
25
You have two options here:
Pad the number when you convert to a string (using str.zfill() or format(), and use the length of the value you passed into the recursive call).
Recurse from the end. This would also allow you to not use strings.
Here is an approach using zero-padding:
def replace_digit(n, d, r):
nstr = str(n)
first, rest = nstr[0], nstr[1:]
if rest:
rest = str(replace_digit(rest, d, r)).zfill(len(rest))
if first == str(d):
first = str(r)
return int(first + rest)
Note that you always want to separate out the first character from the tail anyway, so I used variables for both.
This way, you can use if rest: to guard against recursing when there are no digits left, and you can call str() on the return value. The function returns the int() conversion of the (possibly replaced first value) with the updated rest value.
Demo:
>>> replace_digit(31242154125, 1, 0)
30242054025
Recursing from the opposite end would not have problems with zeros, except if the input value was 0 to begin with. However, you could instead use division and modules operations to work on the integer value directly:
number % 10 gives you the right-most digit, as an integer.
number // 10 gives you the remaining numbers, again as integer.
You could combine the two operations into one using the divmod() function. Personally, I don't do so, as I don't think it particularly improves readability, and using the operators is slightly faster when using CPython.
You can re-combine the recursive call result with the (possibly replaced) last digit by multiplying the returned value by 10 again:
def replace_digit(n, d, r):
head, last = n // 10, n % 10
if head:
head = replace_digit(head, d, r)
if last == d:
last = r
return (head * 10) + last
This works for any natural number, including 0:
>>> replace_digit(0, 1, 0)
0
>>> replace_digit(0, 0, 1)
1
>>> replace_digit(31242154125, 1, 0)
30242054025
>>> replace_digit(31242154125, 4, 9)
31292159125

Display the middle elements of a string

Recently i tried learning to program and after finishing my first tutorial I am trying tackling some problems from codewars.com.
"You are going to be given a word. Your job is to return the middle character of the word. If the word's length is odd, return the middle character. If the word's length is even, return the middle 2 characters."
Here is my solution:
def get_middle(n):
if len(n) % 2 == 0:
return n[(len(n)/2) - 1] and n[(len(n)/2)]
else:
return n[(len(n)/2) + 0.5]
Unfortunately when executing the function with for example "abc" I always get:
Traceback (most recent call last) <ipython-input-24-46429b2608e5> in <module>
----> 1 print(get_middle("abc"))
<ipython-input-23-56ccbf5e17f7> in get_middle(n)
3 return n[(len(n)/2) - 1] and n[(len(n)/2)]
4 else:
----> 5 return n[(len(n)/2) + 1]
TypeError: string indices must be integers
I don't understand why I always get the this kind of error. Aren't all my string indices integers?
I know there are are a lot of different solutions out there, but I really would like to know why mine isn't working the way I intended it to.
Thanks in advance!
In Python, there are two kinds of division: integer division and float division.
print(4 / 2)
---> 2.0
print(4 // 2)
---> 2
in Python 2, dividing one integer to an another integer,it comes an integer.
Since Python doesn't declare data types in advance, The interpreter automatically detects the type so you never know when you want to use integers and when you want to use a float.
Since floats lose precision, it's not advised to use them in integral calculations
To solve this problem, future Python modules included a new type of division called integer division given by the operator //
Now, / performs - float division, and
// performs - integer division.
def get_middle(n):
if len(n) % 2 == 0:
return n[(len(n)//2) - 1] and n[(int(len(n)/2))]
else:
return n[int(len(n)/2+ 0.5)]
The issue with our code is that division casts integer to float type automatically and Python starts complaining about it. Simple solution would be to add second / symbol to division or in else case cast it to integer:
def get_middle(n):
if len(n) % 2 == 0:
return n[(len(n)//2) - 1] and n[(len(n)//2)]
else:
return n[int((len(n)/2) + 0.5)]
Try math.floor:
import math
def get_middle(value):
length = len(value)
if length % 2 == 0:
# even length, pick the middle 2 characters
start = length // 2 - 1
end = length // 2 + 1
else:
# odd length, pick the middle character
start = math.floor(length // 2)
end = start + 1
return value[start:end]
A suggestion if you are learning programming, try to break down your steps rather than doing it all in one line, it helps a lot when trying to understand the error messages.
If you divide an odd integer by 2 with the /operator, you get a float. This float should be explicitly converted to an integer when it is used as an indice.

Convert from decimal to any base number in Python

This function takes in any base-10 integer and returns the string representation of that number in its specified base-32 form:
def encodeN(n,N,D="0123456789qwertyuiopasdfghjklzxc"):
return (encodeN(n//N,N)+D[n%N]).lstrip("0") if n>0 else "0"
Example:
print (encodeN(16002,32))
Output:
ya2
But I have a problem with writing a decoding function from base-32 back to base-10. How can I write it? Can I enter custom nonstandard characters to extend the base-n?
You could cheat:
tmap = str.maketrans('qwertyuiopasdfghjklzxc', 'abcdefghijklmnopqrstuv')
result = int(inputvalue.translate(tmap), 32)
Demo:
>>> tmap = str.maketrans('qwertyuiopasdfghjklzxc', 'abcdefghijklmnopqrstuv')
>>> inputvalue = 'ya2'
>>> int(inputvalue.translate(tmap), 32)
16002
int() is perfectly capable of translating arbitrary bases back to integer values; all you need to do is use the standard progression of letters. The str.translate() call above maps your custom progression to the standard.
Otherwise, take each character from your input string, starting at the left, map that to an integer from your character map, and multiply by the base N each time:
def decodeN(n, N, D={c: i for i, c in enumerate("0123456789qwertyuiopasdfghjklzxc")}):
result = 0
for c in n:
result = (result * N) + D[c]
return result
This is the slower option; str.translate() and int() both use optimised C code to do their jobs, and will always be faster than a pure-python approach.
Translating that to a recursive version to match your encodeN() implementation:
def decodeN(n, N, D={c: i for i, c in enumerate("0123456789qwertyuiopasdfghjklzxc")}):
return decodeN(n[:-1], N) * N + D[n[-1]] if n else 0
With the same recursive structure, you could write:
def encodeN(n,N,D="0123456789qwertyuiopasdfghjklzxc"):
return (encodeN(n//N,N)+D[n%N]).lstrip("0") if n>0 else "0"
def decodeN(n,N,D="0123456789qwertyuiopasdfghjklzxc"):
return decodeN(n[:-1],N) * N + D.index(n[-1]) if n else 0
It seems to work fine:
print(encodeN(16002, 32))
# "ya2"
print(decodeN("ya2", 32))
# 16002
print(all(decodeN(encodeN(x, b), b) == x for b in range(2, 33) for x in range(10000)))
# True
print(all(encodeN(decodeN(str(x),32), 32) == str(x) for b in range(2, 33) for x in range(10000)))
# True
It's not very efficient though. Using a dict like MartijnPieters would be a better idea than using str.index.
>>> import string
>>> len(string.readable)
100
Judging by this you could have up to base 100 with no issues like duplicating characters or changing the encoding. But if we take out \t\n\r\x0b\x0c we get to 94.
Besides this you would have to result to some kind of custom rules, duplicating characters or prefixing them and such.

Python: How to convert a string of zeros and ones to binary [duplicate]

I'd simply like to convert a base-2 binary number string into an int, something like this:
>>> '11111111'.fromBinaryToInt()
255
Is there a way to do this in Python?
You use the built-in int() function, and pass it the base of the input number, i.e. 2 for a binary number:
>>> int('11111111', 2)
255
Here is documentation for Python 2, and for Python 3.
Just type 0b11111111 in python interactive interface:
>>> 0b11111111
255
Another way to do this is by using the bitstring module:
>>> from bitstring import BitArray
>>> b = BitArray(bin='11111111')
>>> b.uint
255
Note that the unsigned integer (uint) is different from the signed integer (int):
>>> b.int
-1
Your question is really asking for the unsigned integer representation; this is an important distinction.
The bitstring module isn't a requirement, but it has lots of performant methods for turning input into and from bits into other forms, as well as manipulating them.
Using int with base is the right way to go. I used to do this before I found int takes base also. It is basically a reduce applied on a list comprehension of the primitive way of converting binary to decimal ( e.g. 110 = 2**0 * 0 + 2 ** 1 * 1 + 2 ** 2 * 1)
add = lambda x,y : x + y
reduce(add, [int(x) * 2 ** y for x, y in zip(list(binstr), range(len(binstr) - 1, -1, -1))])
If you wanna know what is happening behind the scene, then here you go.
class Binary():
def __init__(self, binNumber):
self._binNumber = binNumber
self._binNumber = self._binNumber[::-1]
self._binNumber = list(self._binNumber)
self._x = [1]
self._count = 1
self._change = 2
self._amount = 0
print(self._ToNumber(self._binNumber))
def _ToNumber(self, number):
self._number = number
for i in range (1, len (self._number)):
self._total = self._count * self._change
self._count = self._total
self._x.append(self._count)
self._deep = zip(self._number, self._x)
for self._k, self._v in self._deep:
if self._k == '1':
self._amount += self._v
return self._amount
mo = Binary('101111110')
Here's another concise way to do it not mentioned in any of the above answers:
>>> eval('0b' + '11111111')
255
Admittedly, it's probably not very fast, and it's a very very bad idea if the string is coming from something you don't have control over that could be malicious (such as user input), but for completeness' sake, it does work.
A recursive Python implementation:
def int2bin(n):
return int2bin(n >> 1) + [n & 1] if n > 1 else [1]
If you are using python3.6 or later you can use f-string to do the
conversion:
Binary to decimal:
>>> print(f'{0b1011010:#0}')
90
>>> bin_2_decimal = int(f'{0b1011010:#0}')
>>> bin_2_decimal
90
binary to octal hexa and etc.
>>> f'{0b1011010:#o}'
'0o132' # octal
>>> f'{0b1011010:#x}'
'0x5a' # hexadecimal
>>> f'{0b1011010:#0}'
'90' # decimal
Pay attention to 2 piece of information separated by colon.
In this way, you can convert between {binary, octal, hexadecimal, decimal} to {binary, octal, hexadecimal, decimal} by changing right side of colon[:]
:#b -> converts to binary
:#o -> converts to octal
:#x -> converts to hexadecimal
:#0 -> converts to decimal as above example
Try changing left side of colon to have octal/hexadecimal/decimal.
For large matrix (10**5 rows and up) it is better to use a vectorized matmult. Pass in all rows and cols in one shot. It is extremely fast. There is no looping in python here. I originally designed it for converting many binary columns like 0/1 for like 10 different genre columns in MovieLens into a single integer for each example row.
def BitsToIntAFast(bits):
m,n = bits.shape
a = 2**np.arange(n)[::-1] # -1 reverses array of powers of 2 of same length as bits
return bits # a
For the record to go back and forth in basic python3:
a = 10
bin(a)
# '0b1010'
int(bin(a), 2)
# 10
eval(bin(a))
# 10

Can a string ever get shorter when converted to upper/lowercase?

A string may get longer (in terms of Unicode codepoints) when converted to upper or lower case. For example, 'ß'.upper() evaluates to 'SS'. But are there strings that get shorter? That is, does there exist a string s such that the expression
len(s.lower()) < len(s) or len(s.upper()) < len(s)
evaluates to True?
I think this may be implementation dependent. I'll answer based on the CPython source.
It seems to me that there are two possible situations where calling lower on a string can make it shorter.
Some combination of two Unicode points next to one another get converted into one Unicode point.
A single Unicode point gets converted into an empty string.
We can determine whether case 1 is possible by examining the type signature of the internal lowercase conversion function. Here it is in Objects/unicodectype.c.
int _PyUnicode_ToLowerFull(Py_UCS4 ch, Py_UCS4 *res)
{
const _PyUnicode_TypeRecord *ctype = gettyperecord(ch);
if (ctype->flags & EXTENDED_CASE_MASK) {
int index = ctype->lower & 0xFFFF;
int n = ctype->lower >> 24;
int i;
for (i = 0; i < n; i++)
res[i] = _PyUnicode_ExtendedCase[index + i];
return n;
}
res[0] = ch + ctype->lower;
return 1;
}
I don't 100% understand this code, but I observe that the first parameter ch is a single Unicode point. Since it operates only on individual characters and not character combinations, it seems like case 1 is ruled out; combinations of code points won't get turned into a smaller sequence.
With that out of the way, we can determine whether case 2 ever occurs by just iterating up to sys.maxunicode and seeing if any single value has a length of zero after lowering.
>>> import sys
>>> unicode_chars = list(map(chr, range(sys.maxunicode+1)))
>>> [x for x in unicode_chars if len(x.lower()) == 0]
[]
Looks like case 2 is also busted.
We can apply the above logic to upper as well. For case 1, the implementation for _PyUnicode_ToUpperFull is nearly identical to its lower counterpart; and for case 2, the corresponding list comprehension likewise returns an empty list.
Conclusion
Nope, lower and upper never make anything shorter.

Categories