Time complexity of bin() in Python

Time complexity of bin() in Python - python

What is the time complexity of bin(n) in Python, where n is a decimal number (integer)?
For example:
if n = 10, then bin(n) = 0b1010, so here what is happening behind the scenes? How much time does it take to convert from decimal to binary? What is Big O notation for this ?

what is the time complexity of bin(n) in python, where n is decimal number (integer) ?
How much time it takes to convert from decimal to binary ?
There's no conversion for number n from decimal to binary because the inner representation is already binary. An integer value is represented as an array of 64-bit values (for example, if the value is lower than 2^64 - 1 then the array contains one element). Hence, to show it in the binary form one needs to print out from the highest bit to the lowest one.
If you look into the source code for bin() or more specifically macro #define WRITE_DIGITS(p) link here, you will see the following loop:
for (i = 0; i < size_a; ++i) {
...
}
size_a stands for a size of number a (size of the array of 64-bit integers) for which a binary representation must be found.
Then, within the for loop there's a while in which bits from i-th digit of a are extracted and saved into a string representation p:
...
do {
...
cdigit = (char)(accum & (base - 1));
cdigit += (cdigit < 10) ? '0': 'a' - 10;
*--p = cdigit;
...
} while (...);
...
The inner loop is executed until all the bits from the current digit are extracted. After this, the outer loops moves to the next digit and so all bits are extracted from it again the inner loop. And so on.
But the number of iterations is equal to a number of bits in the binary representation of a given number n which is log(n). Hence, the time complexity of bin() for a number n is O(log(n))

It is log(n). Think about the simple division, we divide the number by 2 until the remainder becomes 0 or 1 just like tree traversal.

Related

Smallest i with 1/i == 1/(i+1)?

Someone reverse-sorted by 1/i instead of the usual -i and it made me wonder: What is the smallest positive integer case where that fails*? I think it must be where two consecutive integers i and i+1 have the same reciprocal float. The smallest I found is i = 6369051721119404:
i = 6369051721119404
print(1/i == 1/(i+1))
import math
print(math.log2(i))
Output (Try it online!):
True
52.50000001100726
Note that it's only a bit larger than 252.5 (which I only noticed after finding the number with other ways) and 52 is the mantissa bits stored by float, so maybe that is meaningful.
So: What is the smallest where it fails? Is it the one I found? And is there a meaningful explanation for why?
* Meaning it fails to sort correctly, for example sorted([6369051721119404, 6369051721119405], key=lambda i: 1/i) doesn't reverse that list, as both key values are the same.

Suppose i is between 2^n and 2^(n+1) for some n.
Then 1/i is between 2^(-n-1) and 2^-n. Its representation as double-precision floating point is 1.xxx...xxx * 2^(-n-1), where there are 52 x's. The smallest difference that can be expressed at that magnitude is 2^-52 * 2^(-n-1) = 2^(-n-53).
1/i and 1/(i+1) may get rounded to the same number if the difference between them is at most 2^(-n-53). Solving for i: 1/i - 1/(i+1) = 2^(-n-53) ==> i(i+1) = 2^(n+53). The solution is approximately i = 2^((n+53) / 2). This matches our range for i if n = 52. Then i = 2^52.5.
Starting at this value of i there is a possibility to get the same value for 1/i and 1/(i+1). But it won't happen for every i as it depends on how the numbers are rounded. We can start searching at that point, and as you discovered shortly after that we'll find the first occurrence.
Note: We also need to rule out n=51 as it seems to have solutions that fall in range. However, the only integer i that falls in the range 2^51 to 2^52 and is at least the minimal value calculated above is 2^52 itself, which can be ruled out. For larger values we need to switch to n=52 as above.

How to create Python fixed length bits?

I wish to do bitwise negation in Python.
My expectation:
negate(0001) => 1110
But Python's ~0b0001 returns -0b10. It seems Python truncate 1110 into -0b10.
How to keep the leading bits?
Moreover, why
bin(~0b1) yields -0b10?
How many bits are reserved for that datatype?

Python uses arbitrary precision arithmetic, so you don't have to worry about the number of bits used. Also it returns -0b10 for bin(~0b1), because it understands that the result is -2 and represents the number as it is 10 and keeps the sign in the front (only for the negative numbers).
But we can represent the number as we like using format function, like this
def negate(number, bits = 32):
return format(~number & 2 ** bits - 1, "0{}b".format(bits))
print(negate(1))
# 11111111111111111111111111111110
print(negate(1, bits = 4))
# 1110
Or, as suggested by eryksun,
def negate(number, bits = 32):
return "{:0{}b}".format(~number & 2 ** bits - 1, bits)

Python acts as if its integers have infinitely many bits. As such, if you use ~ on one, the string representation can't start with an infinite number of 1s, or generating the string would never terminate. Instead, Python chooses to represent it as a negative number as it would be using two's compliment. If you want to restrict the integer to a number of bits, & it against an appropriate mask:
>>> bin((~1) & 0b1111)
'0b1110'

How does python represent such large integers?

In C, C++, and Java, an integer has a certain range. One thing I realized in Python is that I can calculate really large integers such as pow(2, 100). The same equivalent code, in C, pow(2, 100) would clearly cause an overflow since in 32-bit architecture, the unsigned integer type ranges from 0 to 2^32-1. How is it possible for Python to calculate these large numbers?

Basically, big numbers in Python are stored in arrays of 'digits'. That's quoted, right, because each 'digit' could actually be quite a big number on its own. )
You can check the details of implementation in longintrepr.h and longobject.c:
There are two different sets of parameters: one set for 30-bit digits,
stored in an unsigned 32-bit integer type, and one set for 15-bit
digits with each digit stored in an unsigned short. The value of
PYLONG_BITS_IN_DIGIT, defined either at configure time or in pyport.h,
is used to decide which digit size to use.
/* Long integer representation.
The absolute value of a number is equal to
SUM(for i=0 through abs(ob_size)-1) ob_digit[i] * 2**(SHIFT*i)
Negative numbers are represented with ob_size < 0;
zero is represented by ob_size == 0.
In a normalized number, ob_digit[abs(ob_size)-1] (the most significant
digit) is never zero. Also, in all cases, for all valid i,
0 <= ob_digit[i] <= MASK.
The allocation function takes care of allocating extra memory
so that ob_digit[0] ... ob_digit[abs(ob_size)-1] are actually available.
*/
struct _longobject {
PyObject_VAR_HEAD
digit ob_digit[1];
};

How is it possible for Python to calculate these large numbers?
How is it possible for you to calculate these large numbers if you only have the 10 digits 0-9? Well, you use more than one digit!
Bignum arithmetic works the same way, except the individual "digits" are not 0-9 but 0-4294967296 or 0-18446744073709551616.

What is the nature of the round off error here?

Can someone help me unpack what exactly is going on under the hood here?
>>> 1e16 + 1.
1e+16
>>> 1e16 + 1.1
1.0000000000000002e+16
I'm on 64-bit Python 2.7. For the first, I would assume that since there is only a precision of 15 for float that it's just round-off error. The true floating-point answer might be something like
10000000000000000.999999....
And the decimal just gets lopped of. But the second result makes me question this understanding and can't 1 be represented exactly? Any thoughts?
[Edit: Just to clarify. I'm not in any way suggesting that the answers are "wrong." Clearly, they're right, because, well they are. I'm just trying to understand why.]

It's just rounding as close as it can.
1e16 in floating hex is 0x4341c37937e08000.
1e16+2 is 0x4341c37937e08001.
At this level of magnitude, the smallest difference in precision that you can represent is 2. Adding 1.0 exactly rounds down (because typically IEEE floating point math will round to an even number). Adding values larger than 1.0 will round up to the next representable value.

10^16 = 0x002386f26fc10000 is exactly representable as a double precision floating point number. The next representable number is 1e16+2. 1e16+1 is correctly rounded to 1e16, and 1e16+1.1 is correctly rounded to 1e16+2. Check the output of this C program:
#include <stdio.h>
#include <math.h>
#include <stdint.h>
int main()
{
uint64_t i = 10000000000000000ULL;
double a = (double)i;
double b = nextafter(a,1.0e20); // next representable number
printf("I=0x%016llx\n",i); // 10^16 in hex
printf("A=%a (%.4f)\n",a,a); // double representation
printf("B=%a (%.4f)\n",b,b); // next double
}
Output:
I=0x002386f26fc10000
A=0x1.1c37937e08p+53 (10000000000000000.0000)
B=0x1.1c37937e08001p+53 (10000000000000002.0000)

Let's decode some floats, and see what's actually going on! I'm going to use Common Lisp, which has a handy function for getting at the significand (a.k.a mantissa) and exponent of a floating-point number without needing to twiddle any bits. All floats used are IEEE double-precision floats.
> (integer-decode-float 1.0d0)
4503599627370496
-52
1
That is, if we consider the value stored in the significand as an integer, it is the maximum power of 2 available (4503599627370496 = 2^52), scaled down (2^-52). (It isn't stored as 1 with an exponent of 0 because it's simpler for the significand to never have zeros on the left, and this allows us to skip representing the leftmost 1 bit and have more precision. Numbers not in this form are called denormal.)
Let's look at 1e16.
> (integer-decode-float 1d16)
5000000000000000
1
1
Here we have the representation (5000000000000000) * 2^1. Note that the significand, despite being a nice round decimal number, is not a power of 2; this is because 1e16 is not a power of 2. Every time you multiply by 10, you are multiplying by 2 and 5; multiplying by 2 is just incrementing the exponent, but multiplying by 5 is an "actual" multiplication, and here we've multiplied by 5 16 times.
5000000000000000 = 10001110000110111100100110111111000001000000000000000 (base 2)
Observe that this is a 53-bit binary number, as it should be since double floats have a 53-bit significand.
But the key to understanding the situation is that the exponent is 1. (The exponent being small is an indication that we are getting close to the limits of precision.) This means that the float value is 2^1 = 2 times this significand.
Now, what happens when we try to represent adding 1 to this number? Well, we need to represent 1 at the same scale. But the smallest change we can make in this number is exactly 2, because the least significant bit of the significand has value 2!
That is, if we increment the significand, making the smallest possible change, we get
5000000000000001 = 10001110000110111100100110111111000001000000000000001 (base 2)
and when we apply the exponent, we get 2 * 5000000000000001 = 10000000000000002, which is exactly the value you observed. You can only have either 10000000000000000 or 10000000000000002, and 10000000000000001.1 is closer to the latter.
(Note that the issue here isn't even that decimal numbers aren't exact in binary! There's no binary "repeating decimals" here, and there's plenty of 0 bits on the right end of the significand — it's just that your input neatly falls just beyond the lowest bit.)

With numpy, you can see the next larger and smaller representable IEEE floating point number:
>>> import numpy as np
>>> huge=1e100
>>> tiny=1e-100
>>> np.nextafter(1e16,huge)
10000000000000002.0
>>> np.nextafter(1e16,tiny)
9999999999999998.0
So:
>>> (np.nextafter(1e16,huge)-np.nextafter(1e16,tiny))/2.0
2.0
And:
>>> 1.1>2.0/2
True
Therefore 1e16 + 1.1 is correctly rounded to the next larger IEEE representable number of 10000000000000002.0
As is:
>>> 1e16+1.0000000000000005
1.0000000000000002e+16
and 1e16-(something slightly larger than 1) is rounded down by 2 to the next smaller IEEE number:
>>> 1e16-1.0000000000000005
9999999999999998.0
Keep in mind that 32 bit vs 64 bit Python is irrelevant. It is the size of the IEEE format used that matters. Also keep in mind that the larger the magnitude of the number, the epsilon value (the spread between the two next larger and smaller IEEE values basically) changes.
You can see this in bits as well:
>>> def f_to_bits(f): return struct.unpack('<Q', struct.pack('<d', f))[0]
...
>>> def bits_to_f(bits): return struct.unpack('<d', struct.pack('<Q', bits))[0]
...
>>> bits_to_f(f_to_bits(1e16)+1)
1.0000000000000002e+16
>>> bits_to_f(f_to_bits(1e16)-1)
9999999999999998.0

Safest way to convert float to integer in python?

Python's math module contain handy functions like floor & ceil. These functions take a floating point number and return the nearest integer below or above it. However these functions return the answer as a floating point number. For example:
import math
f=math.floor(2.3)
Now f returns:
2.0
What is the safest way to get an integer out of this float, without running the risk of rounding errors (for example if the float is the equivalent of 1.99999) or perhaps I should use another function altogether?

All integers that can be represented by floating point numbers have an exact representation. So you can safely use int on the result. Inexact representations occur only if you are trying to represent a rational number with a denominator that is not a power of two.
That this works is not trivial at all! It's a property of the IEEE floating point representation that int∘floor = ⌊⋅⌋ if the magnitude of the numbers in question is small enough, but different representations are possible where int(floor(2.3)) might be 1.
To quote from Wikipedia,
Any integer with absolute value less than or equal to 224 can be exactly represented in the single precision format, and any integer with absolute value less than or equal to 253 can be exactly represented in the double precision format.

Use int(your non integer number) will nail it.
print int(2.3) # "2"
print int(math.sqrt(5)) # "2"

You could use the round function. If you use no second parameter (# of significant digits) then I think you will get the behavior you want.
IDLE output.
>>> round(2.99999999999)
3
>>> round(2.6)
3
>>> round(2.5)
3
>>> round(2.4)
2

Combining two of the previous results, we have:
int(round(some_float))
This converts a float to an integer fairly dependably.

That this works is not trivial at all! It's a property of the IEEE floating point representation that int∘floor = ⌊⋅⌋ if the magnitude of the numbers in question is small enough, but different representations are possible where int(floor(2.3)) might be 1.
This post explains why it works in that range.
In a double, you can represent 32bit integers without any problems. There cannot be any rounding issues. More precisely, doubles can represent all integers between and including 253 and -253.
Short explanation: A double can store up to 53 binary digits. When you require more, the number is padded with zeroes on the right.
It follows that 53 ones is the largest number that can be stored without padding. Naturally, all (integer) numbers requiring less digits can be stored accurately.
Adding one to 111(omitted)111 (53 ones) yields 100...000, (53 zeroes). As we know, we can store 53 digits, that makes the rightmost zero padding.
This is where 253 comes from.
More detail: We need to consider how IEEE-754 floating point works.
1 bit 11 / 8 52 / 23 # bits double/single precision
[ sign | exponent | mantissa ]
The number is then calculated as follows (excluding special cases that are irrelevant here):
-1sign × 1.mantissa ×2exponent - bias
where bias = 2exponent - 1 - 1, i.e. 1023 and 127 for double/single precision respectively.
Knowing that multiplying by 2X simply shifts all bits X places to the left, it's easy to see that any integer must have all bits in the mantissa that end up right of the decimal point to zero.
Any integer except zero has the following form in binary:
1x...x where the x-es represent the bits to the right of the MSB (most significant bit).
Because we excluded zero, there will always be a MSB that is one—which is why it's not stored. To store the integer, we must bring it into the aforementioned form: -1sign × 1.mantissa ×2exponent - bias.
That's saying the same as shifting the bits over the decimal point until there's only the MSB towards the left of the MSB. All the bits right of the decimal point are then stored in the mantissa.
From this, we can see that we can store at most 52 binary digits apart from the MSB.
It follows that the highest number where all bits are explicitly stored is
111(omitted)111. that's 53 ones (52 + implicit 1) in the case of doubles.
For this, we need to set the exponent, such that the decimal point will be shifted 52 places. If we were to increase the exponent by one, we cannot know the digit right to the left after the decimal point.
111(omitted)111x.
By convention, it's 0. Setting the entire mantissa to zero, we receive the following number:
100(omitted)00x. = 100(omitted)000.
That's a 1 followed by 53 zeroes, 52 stored and 1 added due to the exponent.
It represents 253, which marks the boundary (both negative and positive) between which we can accurately represent all integers. If we wanted to add one to 253, we would have to set the implicit zero (denoted by the x) to one, but that's impossible.

If you need to convert a string float to an int you can use this method.
Example: '38.0' to 38
In order to convert this to an int you can cast it as a float then an int. This will also work for float strings or integer strings.
>>> int(float('38.0'))
38
>>> int(float('38'))
38
Note: This will strip any numbers after the decimal.
>>> int(float('38.2'))
38

math.floor will always return an integer number and thus int(math.floor(some_float)) will never introduce rounding errors.
The rounding error might already be introduced in math.floor(some_large_float), though, or even when storing a large number in a float in the first place. (Large numbers may lose precision when stored in floats.)

Another code sample to convert a real/float to an integer using variables.
"vel" is a real/float number and converted to the next highest INTEGER, "newvel".
import arcpy.math, os, sys, arcpy.da
.
.
with arcpy.da.SearchCursor(densifybkp,[floseg,vel,Length]) as cursor:
for row in cursor:
curvel = float(row[1])
newvel = int(math.ceil(curvel))

Since you're asking for the 'safest' way, I'll provide another answer other than the top answer.
An easy way to make sure you don't lose any precision is to check if the values would be equal after you convert them.
if int(some_value) == some_value:
some_value = int(some_value)
If the float is 1.0 for example, 1.0 is equal to 1. So the conversion to int will execute. And if the float is 1.1, int(1.1) equates to 1, and 1.1 != 1. So the value will remain a float and you won't lose any precision.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.