What is the largest value Python 3's int can hold? - python

I read that sys.maxsize is the largest value Python 3's int can hold.
However, it seems not to be the case; I can put much bigger number and it still does not overflow.
What is the limit that int can hold in Python 3? I am asking because I am converting a string to an integer and I am wondering if I need to worry about a possibility of overflow when doing the conversion.

From the docs what's new page:
The sys.maxint constant was removed, since there is no longer a limit
to the value of integers. However, sys.maxsize can be used as an
integer larger than any practical list or string index. It conforms to
the implementation’s “natural” integer size and is typically the same
as sys.maxint in previous releases on the same platform (assuming the
same build options).

Related

In Python, what is `sys.maxsize`?

I assumed that this number ( 2^63 - 1 ) was the maximum value python could handle, or store as a variable. But these commands seem to be working fine:
>>> sys.maxsize
9223372036854775807
>>> a=sys.maxsize + 1
>>> a
9223372036854775808
So is there any significance at all? Can Python handle arbitrarily large numbers, if computation resoruces permitt?
Note, here's the print-out of my version is:
>>> sys.version
3.5.2 |Anaconda custom (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]'
Python can handle arbitrarily large integers in computation. Any integer too big to fit in 64 bits (or whatever the underlying hardware limit is) is handled in software. For that reason, Python 3 doesn't have a sys.maxint constant.
The value sys.maxsize, on the other hand, reports the platform's pointer size, and that limits the size of Python's data structures such as strings and lists.
Documentation for sys.maxsize:
An integer giving the maximum value a variable of type Py_ssize_t can take. It’s usually 2**31 - 1 on a 32-bit platform and 2**63 - 1 on a 64-bit platform. python3
The largest positive integer supported by the platform’s Py_ssize_t type, and thus the maximum size lists, strings, dicts, and many other containers can have. python2
What is Py_ssize_t?
It is an index type (number type for indexing things, like lists). It is the signed version of size_t (from the C language).
We don't use a normal number/ int, because this is unbounded in Python.
In Python, we don't use size_t because we want to support negative indexing, in Python we can do my_list[-4:]. So Py_ssize_t provides negative and positive numbers between a range.
The _t stands for type, to inform developers that size_t is a type name, not a variable. Just a convention.
So what is the effect of having a limit on Py_ssize_t? Why does this limit list, strings, dict size?
There is no way to index a list with an element larger than this. The list cannot get bigger than this, because it won't accept a non Py_ssize_t.
In the dictionary case, Py_ssize_t is used as the hash. Python doesn't use linked lists in its dictionary implementation, it uses Open Addressing/ probing, where if a collision is found, we a systematic way of getting another place to find the key and put the value. So you can't have more than Py_ssize_t in a dictionary in Python.
In all practical cases (64 bit machines aka. probably you), you will run out of memory before you max out Py_ssize_t. Trying dict.fromkeys(range(sys.maxsize + 5)) never got there, it just slowed my computer down.

In Python, how can I translate *(1+(int*)&x)?

This question is a follow-up of this one. In Sun's math library (in C), the expression
*(1+(int*)&x)
is used to retrieve the high word of the floating point number x. Here, the OS is assumed 64-bit, with little-endian representation.
I am thinking how to translate the C expression above into Python? The difficulty here is how to translate the '&', and '*' in the expression. Btw, maybe Python has some built-in function that retrieves the high word of a floating point number?
You can do this more easily with struct:
high_word = struct.pack('<d', x)[4:8]
return struct.unpack('<i', high_word)[0]
Here, high_word is a bytes object (or a str in 2.x) consisting of the four most significant bytes of x in little endian order (using IEEE 64-bit floating point format). We then unpack it back into a 32-bit integer (which is returned in a singleton tuple, hence the [0]).
This always uses little-endian for everything, regardless of your platform's underlying endianness. If you need to use native endianness, replace the < with = (and use > or ! to force big endian). It also guarantees 64-bit doubles and 32-bit ints, which C does not. You can remove that guarantee as well, but there is no good reason to do so since it makes your question nonsensical.
While this could be done with pointer arithmetic, it would involve messing around with ctypes and the conversion from Python float to C float would still be relatively expensive. The struct code is much easier to read.

Maximum and Minimum values for ints

How do I represent minimum and maximum values for integers in Python? In Java, we have Integer.MIN_VALUE and Integer.MAX_VALUE.
See also: What is the maximum float in Python?.
Python 3
In Python 3, this question doesn't apply. The plain int type is unbounded.
However, you might actually be looking for information about the current interpreter's word size, which will be the same as the machine's word size in most cases. That information is still available in Python 3 as sys.maxsize, which is the maximum value representable by a signed word. Equivalently, it's the size of the largest possible list or in-memory sequence.
Generally, the maximum value representable by an unsigned word will be sys.maxsize * 2 + 1, and the number of bits in a word will be math.log2(sys.maxsize * 2 + 2). See this answer for more information.
Python 2
In Python 2, the maximum value for plain int values is available as sys.maxint:
>>> sys.maxint # on my system, 2**63-1
9223372036854775807
You can calculate the minimum value with -sys.maxint - 1 as shown in the docs.
Python seamlessly switches from plain to long integers once you exceed this value. So most of the time, you won't need to know it.
If you just need a number that's bigger than all others, you can use
float('inf')
in similar fashion, a number smaller than all others:
float('-inf')
This works in both python 2 and 3.
The sys.maxint constant has been removed from Python 3.0 onward, instead use sys.maxsize.
Integers
PEP 237: Essentially, long renamed to int. That is, there is only one built-in integral type, named int; but it behaves mostly like the old long type.
...
The sys.maxint constant was removed, since there is no longer a limit to the value of integers. However, sys.maxsize can be used as an integer larger than any practical list or string index. It conforms to the implementation’s “natural” integer size and is typically the same as sys.maxint in previous releases on the same platform (assuming the same build options).
For Python 3, it is
import sys
max_size = sys.maxsize
min_size = -sys.maxsize - 1
In Python integers will automatically switch from a fixed-size int representation into a variable width long representation once you pass the value sys.maxint, which is either 231 - 1 or 263 - 1 depending on your platform. Notice the L that gets appended here:
>>> 9223372036854775807
9223372036854775807
>>> 9223372036854775808
9223372036854775808L
From the Python manual:
Numbers are created by numeric literals or as the result of built-in functions and operators. Unadorned integer literals (including binary, hex, and octal numbers) yield plain integers unless the value they denote is too large to be represented as a plain integer, in which case they yield a long integer. Integer literals with an 'L' or 'l' suffix yield long integers ('L' is preferred because 1l looks too much like eleven!).
Python tries very hard to pretend its integers are mathematical integers and are unbounded. It can, for instance, calculate a googol with ease:
>>> 10**100
10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000L
You may use 'inf' like this:
import math
bool_true = 0 < math.inf
bool_false = 0 < -math.inf
Refer: math — Mathematical functions
If you want the max for array or list indices (equivalent to size_t in C/C++), you can use numpy:
np.iinfo(np.intp).max
This is same as sys.maxsize however advantage is that you don't need import sys just for this.
If you want max for native int on the machine:
np.iinfo(np.intc).max
You can look at other available types in doc.
For floats you can also use sys.float_info.max.
sys.maxsize is not the actually the maximum integer value which is supported. You can double maxsize and multiply it by itself and it stays a valid and correct value.
However, if you try sys.maxsize ** sys.maxsize, it will hang your machine for a significant amount of time. As many have pointed out, the byte and bit size does not seem to be relevant because it practically doesn't exist. I guess python just happily expands it's integers when it needs more memory space. So in general there is no limit.
Now, if you're talking about packing or storing integers in a safe way where they can later be retrieved with integrity then of course that is relevant. I'm really not sure about packing but I know python's pickle module handles those things well. String representations obviously have no practical limit.
So really, the bottom line is: what is your applications limit? What does it require for numeric data? Use that limit instead of python's fairly nonexistent integer limit.
I rely heavily on commands like this.
python -c 'import sys; print(sys.maxsize)'
Max int returned: 9223372036854775807
For more references for 'sys' you should access
https://docs.python.org/3/library/sys.html
https://docs.python.org/3/library/sys.html#sys.maxsize
code given below will help you.
for maximum value you can use sys.maxsize and for minimum you can negate same value and use it.
import sys
ni=sys.maxsize
print(ni)

floats inside tuples changing values when accessed

So I have a list of tuples of two floats each. Each tuple represents a range. I am going through another list of floats which represent values to be fit into the ranges. All of these floats are < 1 but positive, so precision matter. One of my tests to determine if a value fits into a range is failing when it should pass. If I print the value and the range that is causing problems I can tell this much:
curValue = 0.00145000000671
range = (0.0014500000067055225, 0.0020968749796738849)
The conditional that is failing is:
if curValue > range[0] and ... blah :
# do some stuff
From the values given by curValue and range, the test should clearly pass (don't worry about what is in the conditional). Now, if I print explicitly what the value of range[0] is I get:
range[0] = 0.00145000000671
Which would explain why the test is failing. So my question then, is why is the float changing when it is accessed. It has decimal values available up to a certain precision when part of a tuple, and a different precision when accessed. Why would this be? What can I do to ensure my data maintains a consistent amount of precision across my calculations?
The float doesn't change. The built-in numberic types are all immutable. The cause for what you're observing is that:
print range[0] uses str on the float, which (up until very recent versions of Python) printed less digits of a float.
Printing a tuple (be it with repr or str) uses repr on the individual items, which gives a much more accurate representation (again, this isn't true anymore in recent releases which use a better algorithm for both).
As for why the condition doesn't work out the way you expect, it's propably the usual culprit, the limited precision of floats. Try print repr(curVal), repr(range[0]) to see if what Python decided was the closest representation of your float literal possible.
In modern day PC's floats aren't that precise. So even if you enter pi as a constant to 100 decimals, it's only getting a few of them accurate. The same is happening to you. This is because in 32-bit floats you only get 24 bits of mantissa, which limits your precision (and in unexpected ways because it's in base2).
Please note, 0.00145000000671 isn't the exact value as stored by Python. Python only diplays a few decimals of the complete stored float if you use print. If you want to see exactly how python stores the float use repr.
If you want better precision use the decimal module.
It isn't changing per se. Python is doing its best to store the data as a float, but that number is too precise for float, so Python modifies it before it is even accessed (in the very process of storing it). Funny how something so small is such a big pain.
You need to use a arbitrary fixed point module like Simple Python Fixed Point or the decimal module.
Not sure it would work in this case, because I don't know if Python's limiting in the output or in the storage itself, but you could try doing:
if curValue - range[0] > 0 and...

How do languages such as Python overcome C's Integral data limits?

While doing some random experimentation with a factorial program in C, Python and Scheme. I came across this fact:
In C, using 'unsigned long long' data type, the largest factorial I can print is of 65. which is '9223372036854775808' that is 19 digits as specified here.
In Python, I can find the factorial of a number as large as 999 which consists of a large number of digits, much more than 19.
How does CPython achieve this? Does it use a data type like 'octaword' ?
I might be missing some fundamental facts here. So, I would appreciate some insights and/or references to read. Thanks!
UPDATE: Thank you all for the explanation. Does that means, CPython is using the GNU Multi-precision library (or some other similar library)?
UPDATE 2: I am looking for Python's 'bignum' implementation in the sources. Where exactly it is? Its here at http://svn.python.org/view/python/trunk/Objects/longobject.c?view=markup. Thanks Baishampayan.
It's called Arbitrary Precision Arithmetic. There's more here: http://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic
Looking at the Python source code, it seems the long type (at least in pre-Python 3 code) is defined in longintrepr.h like this -
/* Long integer representation.
The absolute value of a number is equal to
SUM(for i=0 through abs(ob_size)-1) ob_digit[i] * 2**(SHIFT*i)
Negative numbers are represented with ob_size < 0;
zero is represented by ob_size == 0.
In a normalized number, ob_digit[abs(ob_size)-1] (the most significant
digit) is never zero. Also, in all cases, for all valid i,
0 <= ob_digit[i] <= MASK.
The allocation function takes care of allocating extra memory
so that ob_digit[0] ... ob_digit[abs(ob_size)-1] are actually available.
CAUTION: Generic code manipulating subtypes of PyVarObject has to
aware that longs abuse ob_size's sign bit.
*/
struct _longobject {
PyObject_VAR_HEAD
digit ob_digit[1];
};
The actual usable interface of the long type is then defined in longobject.h by creating a new type PyLongObject like this -
typedef struct _longobject PyLongObject;
And so on.
There is more stuff happening inside longobject.c, you can take a look at those for more details.
Data types such as int in C are directly mapped (more or less) to the data types supported by the processor. So the limits on C's int are essentially the limits imposed by the processor hardware.
But one can implement one's own int data type entirely in software. You can for example use an array of digits as your underlying representation. May be like this:
class MyInt {
private int [] digits;
public MyInt(int noOfDigits) {
digits = new int[noOfDigits];
}
}
Once you do that you may use this class and store integers containing as many digits as you want, as long as you don't run out memory.
Perhaps Python is doing something like this inside its virtual machine. You may want to read this article on Arbitrary Precision Arithmetic to get the details.
Not octaword. It implemented bignum structure to store arbitary-precision numbers.
Python assigns to long integers (all ints in Python 3) just as much space as they need -- an array of "digits" (base being a power of 2) allocated as needed.

Categories