In C, C++, and Java, an integer has a certain range. One thing I realized in Python is that I can calculate really large integers such as pow(2, 100). The same equivalent code, in C, pow(2, 100) would clearly cause an overflow since in 32-bit architecture, the unsigned integer type ranges from 0 to 2^32-1. How is it possible for Python to calculate these large numbers?
Basically, big numbers in Python are stored in arrays of 'digits'. That's quoted, right, because each 'digit' could actually be quite a big number on its own. )
You can check the details of implementation in longintrepr.h and longobject.c:
There are two different sets of parameters: one set for 30-bit digits,
stored in an unsigned 32-bit integer type, and one set for 15-bit
digits with each digit stored in an unsigned short. The value of
PYLONG_BITS_IN_DIGIT, defined either at configure time or in pyport.h,
is used to decide which digit size to use.
/* Long integer representation.
The absolute value of a number is equal to
SUM(for i=0 through abs(ob_size)-1) ob_digit[i] * 2**(SHIFT*i)
Negative numbers are represented with ob_size < 0;
zero is represented by ob_size == 0.
In a normalized number, ob_digit[abs(ob_size)-1] (the most significant
digit) is never zero. Also, in all cases, for all valid i,
0 <= ob_digit[i] <= MASK.
The allocation function takes care of allocating extra memory
so that ob_digit[0] ... ob_digit[abs(ob_size)-1] are actually available.
*/
struct _longobject {
PyObject_VAR_HEAD
digit ob_digit[1];
};
How is it possible for Python to calculate these large numbers?
How is it possible for you to calculate these large numbers if you only have the 10 digits 0-9? Well, you use more than one digit!
Bignum arithmetic works the same way, except the individual "digits" are not 0-9 but 0-4294967296 or 0-18446744073709551616.
Related
Bitwise not (~) is well-defined in languages that define a specific bit length and format for ints. Since in Python 3, ints can be any length, they by definition have variable number of bits. Internally, I believe Python uses at least 28 bytes to store an int, but of course these aren't what bitwise not is defined on.
How does Python define bitwise not:
Is the bit length a function of the int size, the native platform, or something else?
Does Python sign extend, zero extend, or do something else?
Python integers emulate 2's-complement represenation with "an infinite" number of sign bits (all 0 bits for >= 0, all 1 bits for < 0).
All bitwise integer operations maintain this useful illusion.
For example, 0 is an infinite number of 0 bits, and ~0 is an infinite number of 1 bits - which is -1 in 2's-complement notation.
>>> ~0
-1
It's generally true that ~i = -i-1, which, in English, can be read as "the 1's-complement of i is the 2's-complement of i minus 1".
For right shifts of integers, Python fills with more copies of the sign bit.
In python one can handle very large integers (for instance uuid.uuid4().int.bit_length() gives 128), but the largest int datastructure the C-API documentation offers is long long, and it is a 64-bit int.
I would love to be able to get a C int128 from a PyLong, but it seems there is no tooling for this. PyLong_AsLongLong for instance cannot handle python integers bigger than 2**64.
Is there some documentation I missed, and it is actually possible?
Is there currently not possible, but some workaround exist? (I would love to use the tooling available in the python C-API for long long with int128, for instance a PyLong_AsInt128AndOverflow function).
Is it a planed feature in a forthcoming python release?
There are a couple of different ways you can access the level of precision you want.
Systems with 64-bit longs often have 128-bit long longs. Notice that the article you link says "at least 64 bits". It's worth checking sizeof(long long) in case there's nothing further to do.
Assuming that is not what you are working with, you'll have to look closer at the raw PyLongObject, which is actually a typedef of the private _longobject structure.
The raw bits are accessible through the ob_digit field, with the length given by ob_size. The data type of the digits, and the actual number of boots they hold is given by the typedef digit and the macro PYLONG_BITS_IN_DIGIT. The latter must be smaller than 8 * sizeof(digit), larger than 8, and a multiple of 5 (so 30 or 15, depending on how your build was done).
Luckily for you, there is an "undocumented" method in the C API that will copy the bytes of the number for you: _PyLong_AsByteArray. The comment in longobject.h reads:
/* _PyLong_AsByteArray: Convert the least-significant 8*n bits of long
v to a base-256 integer, stored in array bytes. Normally return 0,
return -1 on error.
If little_endian is 1/true, store the MSB at bytes[n-1] and the LSB at
bytes[0]; else (little_endian is 0/false) store the MSB at bytes[0] and
the LSB at bytes[n-1].
If is_signed is 0/false, it's an error if v < 0; else (v >= 0) n bytes
are filled and there's nothing special about bit 0x80 of the MSB.
If is_signed is 1/true, bytes is filled with the 2's-complement
representation of v's value. Bit 0x80 of the MSB is the sign bit.
Error returns (-1):
+ is_signed is 0 and v < 0. TypeError is set in this case, and bytes
isn't altered.
+ n isn't big enough to hold the full mathematical value of v. For
example, if is_signed is 0 and there are more digits in the v than
fit in n; or if is_signed is 1, v < 0, and n is just 1 bit shy of
being large enough to hold a sign bit. OverflowError is set in this
case, but bytes holds the least-significant n bytes of the true value.
*/
You should be able to get a UUID with something like
PyLongObject *mylong;
unsigned char myuuid[16];
_PyLong_AsByteArray(mylong, myuuid, sizeof(myuuid), 1, 0);
What is the time complexity of bin(n) in Python, where n is a decimal number (integer)?
For example:
if n = 10, then bin(n) = 0b1010, so here what is happening behind the scenes? How much time does it take to convert from decimal to binary? What is Big O notation for this ?
what is the time complexity of bin(n) in python, where n is decimal number (integer) ?
How much time it takes to convert from decimal to binary ?
There's no conversion for number n from decimal to binary because the inner representation is already binary. An integer value is represented as an array of 64-bit values (for example, if the value is lower than 2^64 - 1 then the array contains one element). Hence, to show it in the binary form one needs to print out from the highest bit to the lowest one.
If you look into the source code for bin() or more specifically macro #define WRITE_DIGITS(p) link here, you will see the following loop:
for (i = 0; i < size_a; ++i) {
...
}
size_a stands for a size of number a (size of the array of 64-bit integers) for which a binary representation must be found.
Then, within the for loop there's a while in which bits from i-th digit of a are extracted and saved into a string representation p:
...
do {
...
cdigit = (char)(accum & (base - 1));
cdigit += (cdigit < 10) ? '0': 'a' - 10;
*--p = cdigit;
...
} while (...);
...
The inner loop is executed until all the bits from the current digit are extracted. After this, the outer loops moves to the next digit and so all bits are extracted from it again the inner loop. And so on.
But the number of iterations is equal to a number of bits in the binary representation of a given number n which is log(n). Hence, the time complexity of bin() for a number n is O(log(n))
It is log(n). Think about the simple division, we divide the number by 2 until the remainder becomes 0 or 1 just like tree traversal.
I found some sample code to extract temperature from the Texas Instruments Sensor Tag on github:
https://github.com/msaunby/ble-sensor-pi/blob/master/sensortag/sensor_calcs.py
I don't understand what the following code does:
tosigned = lambda n: float(n-0x10000) if n>0x7fff else float(n)
How i read the above piece of code:
if n>0x7fff: n = float(n-0x10000)
else n = float(n)
Basically what is happening is that the two's complement value(n) is converted to float. Why should this only happen when the value of n is greater than 0x7fff? If the value is 0x7fff or smaller, then we just convert i to float. Why? I don't understand this.
The sample code from Texas Instruments can be found here:
http://processors.wiki.ti.com/index.php/SensorTag_User_Guide#SensorTag_Android_Development
Why is the return value devided by 128.0 in this function in the TI sample code?
private double extractAmbientTemperature(BluetoothGattCharacteristic c) {
int offset = 2;
return shortUnsignedAtOffset(c, offset) / 128.0;
}
I did ask this to the developer, but didn't get a reply.
On disk and in memory integers are stored to a certain bit-width. Modern Python's ints allows us to ignore most of that detail because they can magically expand to whatever size is necessary, but sometimes when we get values from disk or other systems we have to think about how they are actually stored.
The positive values of a 16-bit signed integer will be stored in the range 0x0001-0x7fff, and its negative values from 0x8000-0xffff. If this value was read in some way that didn't already check the sign bit (perhaps as an unsigned integer, or part of a longer integer, or assembled from two bytes) then we need to recover the sign.
How? Well, if the value is over 0x7fff we know that it should be negative, and negative values are stored as two's complement. So we simply subtract 0x10000 from it and we get the negative value.
So you're converting between signed hex and floats. In python, signed floats are displayed as having a negative sign, so you can ignore the way it's actually represented in memory. But in hex, the negative part of the number is represented as part of the value. So, to convert correctly, the shift is put in.
You can play with this yourself using the Python interpreter:
tosigned = lambda n: float(n-0x10000) if n>0x7fff else float(n)
print(tosigned(0x3fff))
versus:
unsigned = lambda n: float(n)
Check this out to learn more:
http://www.swarthmore.edu/NatSci/echeeve1/Ref/BinaryMath/NumSys.html
So, cPython (2.4) has some interesting behaviour when the length of something gets near to 1<<32 (the size of an int).
r = xrange(1<<30)
assert len(r) == 1<<30
is fine, but:
r = xrange(1<<32)
assert len(r) == 1<<32
ValueError: xrange object size cannot be reported`__len__() should return 0 <= outcome
Alex's wowrange has this behaviour as well. wowrange(1<<32).l is fine, but len(wowrange(1<<32)) is bad. I'm guessing there is some floating point behaviour (being read as negative) action going on here.
What exactly is happening here? (this is pretty well-solved below!)
How can I get around it? Longs?
(My specific application is random.sample(xrange(1<<32),ABUNCH)) if people want to tackle that question directly!)
cPython assumes that lists fit in memory. This extends to objects that behave like lists, such as xrange. essentially, the len function expects the __len__ method to return something that is convertable to size_t, which won't happen if the number of logical elements is too large, even if those elements don't actually exist in memory.
You'll find that
xrange(1 << 31 - 1)
is the last one that behaves as you want. This is because the maximum signed (32-bit) integer is 2^31 - 1.
1 << 32 is not a positive signed 32-bit integer (Python's int datatype), so that's why you're getting that error.
In Python 2.6, I can't even do xrange(1 << 32) or xrange(1 << 31) without getting an error, much less len on the result.
Edit If you want a little more detail...
1 << 31 represents the number 0x80000000 which in 2's complement representation is the lowest representable negative number (-1 * 2^31) for a 32-bit int. So yes, due to the bit-wise representation of the numbers you're working with, it's actually becoming negative.
For a 32-bit 2's complement number, 0x7FFFFFFF is the highest representable integer (2^31 - 1) before you "overflow" into negative numbers.
Further reading, if you're interested.
Note that when you see something like 2147483648L in the prompt, the "L" at the end signifies that it's now being represented as a "long integer" (64 bits, usually, I can't make any promises on how Python handles it because I haven't read up on it).
1<<32, when treated as a signed integer, is negative.