How do languages such as Python overcome C's Integral data limits? - python

While doing some random experimentation with a factorial program in C, Python and Scheme. I came across this fact:
In C, using 'unsigned long long' data type, the largest factorial I can print is of 65. which is '9223372036854775808' that is 19 digits as specified here.
In Python, I can find the factorial of a number as large as 999 which consists of a large number of digits, much more than 19.
How does CPython achieve this? Does it use a data type like 'octaword' ?
I might be missing some fundamental facts here. So, I would appreciate some insights and/or references to read. Thanks!
UPDATE: Thank you all for the explanation. Does that means, CPython is using the GNU Multi-precision library (or some other similar library)?
UPDATE 2: I am looking for Python's 'bignum' implementation in the sources. Where exactly it is? Its here at http://svn.python.org/view/python/trunk/Objects/longobject.c?view=markup. Thanks Baishampayan.

It's called Arbitrary Precision Arithmetic. There's more here: http://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic

Looking at the Python source code, it seems the long type (at least in pre-Python 3 code) is defined in longintrepr.h like this -
/* Long integer representation.
The absolute value of a number is equal to
SUM(for i=0 through abs(ob_size)-1) ob_digit[i] * 2**(SHIFT*i)
Negative numbers are represented with ob_size < 0;
zero is represented by ob_size == 0.
In a normalized number, ob_digit[abs(ob_size)-1] (the most significant
digit) is never zero. Also, in all cases, for all valid i,
0 <= ob_digit[i] <= MASK.
The allocation function takes care of allocating extra memory
so that ob_digit[0] ... ob_digit[abs(ob_size)-1] are actually available.
CAUTION: Generic code manipulating subtypes of PyVarObject has to
aware that longs abuse ob_size's sign bit.
*/
struct _longobject {
PyObject_VAR_HEAD
digit ob_digit[1];
};
The actual usable interface of the long type is then defined in longobject.h by creating a new type PyLongObject like this -
typedef struct _longobject PyLongObject;
And so on.
There is more stuff happening inside longobject.c, you can take a look at those for more details.

Data types such as int in C are directly mapped (more or less) to the data types supported by the processor. So the limits on C's int are essentially the limits imposed by the processor hardware.
But one can implement one's own int data type entirely in software. You can for example use an array of digits as your underlying representation. May be like this:
class MyInt {
private int [] digits;
public MyInt(int noOfDigits) {
digits = new int[noOfDigits];
}
}
Once you do that you may use this class and store integers containing as many digits as you want, as long as you don't run out memory.
Perhaps Python is doing something like this inside its virtual machine. You may want to read this article on Arbitrary Precision Arithmetic to get the details.

Not octaword. It implemented bignum structure to store arbitary-precision numbers.

Python assigns to long integers (all ints in Python 3) just as much space as they need -- an array of "digits" (base being a power of 2) allocated as needed.

Related

Understanding memory allocation for large integers in Python

How does Python allocate memory for large integers?
An int type has a size of 28 bytes and as I keep increasing the value of the int, the size increases in increments of 4 bytes.
Why 28 bytes initially for any value as low as 1?
Why increments of 4 bytes?
PS: I am running Python 3.5.2 on a x86_64 (64 bit machine). Any pointers/resources/PEPs on how the (3.0+) interpreters work on such huge numbers is what I am looking for.
Code illustrating the sizes:
>>> a=1
>>> print(a.__sizeof__())
28
>>> a=1024
>>> print(a.__sizeof__())
28
>>> a=1024*1024*1024
>>> print(a.__sizeof__())
32
>>> a=1024*1024*1024*1024
>>> print(a.__sizeof__())
32
>>> a=1024*1024*1024*1024*1024*1024
>>> a
1152921504606846976
>>> print(a.__sizeof__())
36
Why 28 bytes initially for any value as low as 1?
I believe #bgusach answered that completely; Python uses C structs to represent objects in the Python world, any objects including ints:
struct _longobject {
PyObject_VAR_HEAD
digit ob_digit[1];
};
PyObject_VAR_HEAD is a macro that when expanded adds another field in the struct (field PyVarObject which is specifically used for objects that have some notion of length) and, ob_digits is an array holding the value for the number. Boiler-plate in size comes from that struct, for small and large Python numbers.
Why increments of 4 bytes?
Because, when a larger number is created, the size (in bytes) is a multiple of the sizeof(digit); you can see that in _PyLong_New where the allocation of memory for a new longobject is performed with PyObject_MALLOC:
/* Number of bytes needed is: offsetof(PyLongObject, ob_digit) +
sizeof(digit)*size. Previous incarnations of this code used
sizeof(PyVarObject) instead of the offsetof, but this risks being
incorrect in the presence of padding between the PyVarObject header
and the digits. */
if (size > (Py_ssize_t)MAX_LONG_DIGITS) {
PyErr_SetString(PyExc_OverflowError,
"too many digits in integer");
return NULL;
}
result = PyObject_MALLOC(offsetof(PyLongObject, ob_digit) +
size*sizeof(digit));
offsetof(PyLongObject, ob_digit) is the 'boiler-plate' (in bytes) for the long object that isn't related with holding its value.
digit is defined in the header file holding the struct _longobject as a typedef for uint32:
typedef uint32_t digit;
and sizeof(uint32_t) is 4 bytes. That's the amount by which you'll see the size in bytes increase when the size argument to _PyLong_New increases.
Of course, this is just how CPython has chosen to implement it. It is an implementation detail and as such you wont find much information in PEPs. The python-dev mailing list would hold implementation discussions if you can find the corresponding thread :-).
Either way, you might find differing behavior in other popular implementations, so don't take this one for granted.
It's actually easy. Python's int is not the kind of primitive you may be used to from other languages, but a full fledged object, with its methods and all the stuff. That is where the overhead comes from.
Then, you have the payload itself, the integer that is being represented. And there is no limit for that, except your memory.
The size of a Python's int is what it needs to represent the number plus a little overhead.
If you want to read further, take a look at the relevant part of the documentation:
Integers have unlimited precision

In Python, how can I translate *(1+(int*)&x)?

This question is a follow-up of this one. In Sun's math library (in C), the expression
*(1+(int*)&x)
is used to retrieve the high word of the floating point number x. Here, the OS is assumed 64-bit, with little-endian representation.
I am thinking how to translate the C expression above into Python? The difficulty here is how to translate the '&', and '*' in the expression. Btw, maybe Python has some built-in function that retrieves the high word of a floating point number?
You can do this more easily with struct:
high_word = struct.pack('<d', x)[4:8]
return struct.unpack('<i', high_word)[0]
Here, high_word is a bytes object (or a str in 2.x) consisting of the four most significant bytes of x in little endian order (using IEEE 64-bit floating point format). We then unpack it back into a 32-bit integer (which is returned in a singleton tuple, hence the [0]).
This always uses little-endian for everything, regardless of your platform's underlying endianness. If you need to use native endianness, replace the < with = (and use > or ! to force big endian). It also guarantees 64-bit doubles and 32-bit ints, which C does not. You can remove that guarantee as well, but there is no good reason to do so since it makes your question nonsensical.
While this could be done with pointer arithmetic, it would involve messing around with ctypes and the conversion from Python float to C float would still be relatively expensive. The struct code is much easier to read.

How to store a number with certain precision?

What I want to do is to calculate square root of a number N and than we have to store this square root with P correct decimal values.For example
Lets,N=2 and P=100
//I want to store square root till P correct decimal values
like storing
1.41421356237309504880...till 100 values
EDIT - Initially I asked question for specifically c++ but as told by vsoftco that it is almost impossible for c++ to do this without boost.So I have tagged python too and not removed c++ tag in hope for correct answer in C++
C++ uses floating-point types (such as float, double etc) to store floating-point values and perform floating point arithmetic. The size of these types (which directly influences their precision) is implementation-defined (usually you won't get more than 128 bit of precision). Also, precision is a relative term: when your numbers grow, you have less and less precision.
So, to answer your question: it is impossible to store a number with arbitrary precision using standard C++ types. You need to use a multi-precision library for that, e.g. Boost.Multiprecision.
Example code using Boost.Multiprecison:
#include <boost/multiprecision/cpp_dec_float.hpp>
#include <iostream>
int main()
{
using namespace boost::multiprecision;
typedef number<cpp_dec_float<100> > cpp_dec_float_100; // 100 decimal places
cpp_dec_float_100 N = 2;
cpp_dec_float_100 result = sqrt(N); // calls boost::multiprecision::sqrt
std::cout << result.str() << std::endl; // displays the result as a string
}
If you use Python, you can make use of decimal module:
from decimal import *
getcontext().prec = 100
result = Decimal.sqrt(Decimal(2))
print("Decimal.sqrt(2): {0}".format(result))
Well, there is a proposed extension to c++ that defines a decimal type family described in http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2009/n2849.pdf. There is some support for it in modern gcc version, however it's still not full, even in 4.9.2
Maybe it will be feasible for you.

Possible to store Python ints in less than 12 bytes?

Is it possible to make Python use less than 12 bytes for an int?
>>> x=int()
>>> x
0
>>> sys.getsizeof(x)
12
I am not a computer specialist but isn't 12 bytes excessive?
The smallest int I want to store is 0, the largest int 147097614, so I shouldn't really need more than 4 bytes.
(There is probably something I misunderstand here as I couldn't find an answer anywhere on the net. Keep that in mind.)
In python, ints are objects just like everything else. Because of that, there is a little extra overhead just associated with the fact that you're using an object which has some associated meta-data.
If you're going to use lots of ints, and it makes sense to lay them out in an array-like structure, you should look into numpy. Numpy ndarray objects will have a little overhead associated with them for the various pieces of meta-data that the array objects keep track of, but the actual data is stored as the datatype you specify (e.g. numpy.int32 for a 4-byte integer.)
Thus, if you have:
import numpy as np
a = np.zeros(5000,dtype=np.int32)
The array will take only slightly more than 4*5000 = 20000 bytes of your memory
Size of an integer object includes the overhead of maintaining other object information along with its value. The additional information can include object type, reference count and other implementation-specific details.
If you store many integers and want to optimize the space spent, use the array module, specifically arrays constructed with array.array('i').
Integers in python are objects, and are therefore stored with extra overhead.
You can read more information about it here
The integer type in cpython is stored in a structure like so:
typedef struct {
PyObject_HEAD
long ob_ival;
} PyIntObject;
PyObject_HEAD is a macro that expands out into a reference count and a pointer to the type object.
So you can see that:
long ob_ival - 4 bytes for a long.
Py_ssize_t ob_refcnt - I would assume to size_t here is 4 bytes.
PyTypeObject *ob_type - Is a pointer, so another 4 bytes.
12 bytes in total!

Maximum and Minimum values for ints

How do I represent minimum and maximum values for integers in Python? In Java, we have Integer.MIN_VALUE and Integer.MAX_VALUE.
See also: What is the maximum float in Python?.
Python 3
In Python 3, this question doesn't apply. The plain int type is unbounded.
However, you might actually be looking for information about the current interpreter's word size, which will be the same as the machine's word size in most cases. That information is still available in Python 3 as sys.maxsize, which is the maximum value representable by a signed word. Equivalently, it's the size of the largest possible list or in-memory sequence.
Generally, the maximum value representable by an unsigned word will be sys.maxsize * 2 + 1, and the number of bits in a word will be math.log2(sys.maxsize * 2 + 2). See this answer for more information.
Python 2
In Python 2, the maximum value for plain int values is available as sys.maxint:
>>> sys.maxint # on my system, 2**63-1
9223372036854775807
You can calculate the minimum value with -sys.maxint - 1 as shown in the docs.
Python seamlessly switches from plain to long integers once you exceed this value. So most of the time, you won't need to know it.
If you just need a number that's bigger than all others, you can use
float('inf')
in similar fashion, a number smaller than all others:
float('-inf')
This works in both python 2 and 3.
The sys.maxint constant has been removed from Python 3.0 onward, instead use sys.maxsize.
Integers
PEP 237: Essentially, long renamed to int. That is, there is only one built-in integral type, named int; but it behaves mostly like the old long type.
...
The sys.maxint constant was removed, since there is no longer a limit to the value of integers. However, sys.maxsize can be used as an integer larger than any practical list or string index. It conforms to the implementation’s “natural” integer size and is typically the same as sys.maxint in previous releases on the same platform (assuming the same build options).
For Python 3, it is
import sys
max_size = sys.maxsize
min_size = -sys.maxsize - 1
In Python integers will automatically switch from a fixed-size int representation into a variable width long representation once you pass the value sys.maxint, which is either 231 - 1 or 263 - 1 depending on your platform. Notice the L that gets appended here:
>>> 9223372036854775807
9223372036854775807
>>> 9223372036854775808
9223372036854775808L
From the Python manual:
Numbers are created by numeric literals or as the result of built-in functions and operators. Unadorned integer literals (including binary, hex, and octal numbers) yield plain integers unless the value they denote is too large to be represented as a plain integer, in which case they yield a long integer. Integer literals with an 'L' or 'l' suffix yield long integers ('L' is preferred because 1l looks too much like eleven!).
Python tries very hard to pretend its integers are mathematical integers and are unbounded. It can, for instance, calculate a googol with ease:
>>> 10**100
10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000L
You may use 'inf' like this:
import math
bool_true = 0 < math.inf
bool_false = 0 < -math.inf
Refer: math — Mathematical functions
If you want the max for array or list indices (equivalent to size_t in C/C++), you can use numpy:
np.iinfo(np.intp).max
This is same as sys.maxsize however advantage is that you don't need import sys just for this.
If you want max for native int on the machine:
np.iinfo(np.intc).max
You can look at other available types in doc.
For floats you can also use sys.float_info.max.
sys.maxsize is not the actually the maximum integer value which is supported. You can double maxsize and multiply it by itself and it stays a valid and correct value.
However, if you try sys.maxsize ** sys.maxsize, it will hang your machine for a significant amount of time. As many have pointed out, the byte and bit size does not seem to be relevant because it practically doesn't exist. I guess python just happily expands it's integers when it needs more memory space. So in general there is no limit.
Now, if you're talking about packing or storing integers in a safe way where they can later be retrieved with integrity then of course that is relevant. I'm really not sure about packing but I know python's pickle module handles those things well. String representations obviously have no practical limit.
So really, the bottom line is: what is your applications limit? What does it require for numeric data? Use that limit instead of python's fairly nonexistent integer limit.
I rely heavily on commands like this.
python -c 'import sys; print(sys.maxsize)'
Max int returned: 9223372036854775807
For more references for 'sys' you should access
https://docs.python.org/3/library/sys.html
https://docs.python.org/3/library/sys.html#sys.maxsize
code given below will help you.
for maximum value you can use sys.maxsize and for minimum you can negate same value and use it.
import sys
ni=sys.maxsize
print(ni)

Categories