Numerical comparison in Python - python

I'm working with very large numbers such as: 632382 to the power of 518061.
When I try calculating it directly using Python (632382**518061), it takes a really long time.
However, when I compare 2 very large numbers:
>>> 632382**518061 > 519432**525806
True
Python does it very quickly.
I assumed that in order to compare both numbers, Python would calculate them beforehand. But since the comparison is much faster than its actual calculation, Python is doing something different.
How is Python able to perform the comparison much faster (apparently without calculating the exact values)?

What takes so long is printing the values.
If I enter
>>> x = 632382**518061
in an interactive Python session, it takes about a second.
If I then enter
>>> x
it takes at least half a minute (I aborted it before it generated any output).1
Evaluating and printing the result of the expression 632382**518061 > 519432**525806 does not require printing the two large numbers, therefore it takes less time.
It still takes longer than evaluating the two numbers (without printing), as expected:
>>> from timeit import timeit
>>> timeit('632382**518061', number=1)
1.312588474999984
>>> timeit('519432**525806', number=1)
1.281405287000041
>>> timeit('632382**518061 > 519432**525806', number=1)
2.685868804999984
1After all, the decimal representation of x has 3005262 digits, which we can calculate much more quickly than with len(str(x)) by using logarithms:
>>> from math import log10, ceil
>>> ceil(518061 * log10(632382))
3005262

Related

How to calculate numbers with large exponents

I was writing a program where I need to calculate insanely huge numbers.
k = int(input())
print(int((2**k)*5 % (10**9 + 7))
Here, k being of the orders of 109
As expected, this was rather slow( taking upto 5 seconds to calculate) whereas my program needs to finish computing in 1 second.
After a little research online I found a function pow(), and by writing
p = 10**9 + 7
print(int(pow(2, k- 1,p)*10))
This works fine for small numbers but messes up at large numbers. I can understand why that is happening( because this isn't essentially what I want to calculate and the modulus operation with such a large number doesn't affect the calculation with small values of k).
I also found libraries like gmpy2 and numpy but I don't know how to use them since I'm just a beginner with python.
So how can I write an expression for what I want to calculate and which works fast enough and doesn't err at large numbers too?
You can optimize your operation by passing the number you want to take modulus from as the third argument of builtin pow and multiplying the result by 5
def func(k):
x = pow(2, k, pow(10,9) + 7) * 5
return int(x)

I'm making mistakes dividing large numbers

I am trying to write a program in python 2.7 that will first see if a number divides the other evenly, and if it does get the result of the division.
However, I am getting some interesting results when I use large numbers.
Currently I am using:
from __future__ import division
import math
a=82348972389472433334783
b=2
if a/b==math.trunc(a/b):
answer=a/b
print 'True' #to quickly see if the if loop was invoked
When I run this I get:
True
But 82348972389472433334783 is clearly not even.
Any help would be appreciated.
That's a crazy way to do it. Just use the remainder operator.
if a % b == 0:
# then b divides a evenly
quotient = a // b
The true division implicitly converts the input to floats which don't provide the precision to store the value of a accurately. E.g. on my machine
>>> int(1E15+1)
1000000000000001
>>> int(1E16+1)
10000000000000000
hence you loose precision. A similar thing happens with your big number (compare int(float(a))-a).
Now, if you check your division, you see the result "is" actually found to be an integer
>>> (a/b).is_integer()
True
which is again not really expected beforehand.
The math.trunc function does something similar (from the docs):
Return the Real value x truncated to an Integral (usually a long integer).
The duck typing nature of python allows a comparison of the long integer and float, see
Checking if float is equivalent to an integer value in python and
Comparing a float and an int in Python.
Why don't you use the modulus operator instead to check if a number can be divided evenly?
n % x == 0

For loop computing recurrence relation takes very long

Q(x)=[Q(x−1)+Q(x−2)]^2
Q(0)=0, Q(1)=1
I need to find Q(29). I wrote a code in python but it is taking too long. How to get the output (any language would be fine)?
Here is the code I wrote:
a=0
b=1
for i in range(28):
c=(a+b)*(a+b)
a=b
b=c
print(b)
I don't think this is a tractable problem with programming. The reason why your code is slow is that the numbers within grow very rapidly, and python uses infinite-precision integers, so it takes its time computing the result.
Try your code with double-precision floats:
a=0.0
b=1.0
for i in range(28):
c=(a+b)*(a+b)
a=b
b=c
print(b)
The answer is inf. This is because the answer is much much larger than the largest representable double-precision number, which is rougly 10^308. You could try using finite-precision integers, but those will have an even smaller representable maximum. Note that using doubles will lead to loss of precision, but surely you don't want to know every single digit of your huuuge number (side note: I happen to know that you do, making your job even harder).
So here's some math background for my skepticism: Your recurrence relation goes
Q[k] = (Q[k-2] + Q[k-1])^2
You can formulate a more tractable sequence from the square root of this sequence:
P[k] = sqrt(Q[k])
P[k] = P[k-2]^2 + P[k-1]^2
If you can solve for P, you'll know Q = P^2.
Now, consider this sequence:
R[k] = R[k-1]^2
Starting from the same initial values, this will always be smaller than P[k], since
P[k] = P[k-2]^2 + P[k-1]^2 >= P[k-1]^2
(but this will be a "pretty close" lower bound as the first term will always be insignificant compared to the second). We can construct this sequence:
R[k] = R[k-1]^2 = R[k-2]^4 = R[k-3]^6 = R[k-m]^(2^m) = R[0]^(2^k)
Since P[1 give or take] starts with value 2, we should consider
R[k] = 2^(2^k)
as a lower bound for P[k], give or take a few exponents of 2. For k=28 this is
P[28] > 2^(2^28) = 2^(268435456) = 10^(log10(2)*2^28) ~ 10^80807124
That's at least 80807124 digits for the final value of P, which is the square root of the number you're looking for. That makes Q[28] larger than 10^1.6e8. If you printed that number into a text file, it would take more than 150 megabytes.
If you imagine you're trying to handle these integers exactly, you'll see why it takes so long, and why you should reconsider your approach. What if you could compute that huge number? What would you do with it? How long would it take python to print that number on your screen? None of this is trivial, so I suggest that you try to solve your problem on paper, or find a way around it.
Note that you can use a symbolic math package such as sympy in python to get a feeling of how hard your problem is:
import sympy as sym
a,b,c,b0 = sym.symbols('a,b,c,b0')
a = 0
b = b0
for k in range(28):
c = (a+b)**2
a = b
b = c
print(c)
This will take a while, but it will fill your screen with the explicit expression for Q[k] with only b0 as parameter. You would "only" have to substitute your values into that monster to obtain the exact result. You could also try sym.simplify on the expression, but I couldn't wait for that to return anything meaningful.
During lunch time I let your loop run, and it finished. The result has
>>> import math
>>> print(math.log10(c))
49287457.71120789
So my lower bound for k=28 is a bit large, probably due to off-by-one errors in the exponent. The memory needed to store this integer is
>>> import sys
>>> sys.getsizeof(c)
21830612
that is roughly 20 MB.
This can be solved with brute force but it is still an interesting problem since it uses two different "slow" operations and there are trade-offs in choosing the correct approach.
There are two places where the native Python implementation of algorithm is slow: the multiplication of large numbers and the conversion of large numbers to a string.
Python uses the Karatsuba algorithm for multiplication. It has a running time of O(n^1.585) where n is the length of the numbers. It does get slower as the numbers get larger but you can compute Q(29).
The algorithm for converting a Python integer to its decimal representation is much slower. It has running time of O(n^2). For large numbers, it is much slower than multiplication.
Note: the times for conversion to a string also include the actual calculation time.
On my computer, computing Q(25) requires ~2.5 seconds but conversion to a string requires ~3 minutes 9 seconds. Computing Q(26) requires ~7.5 seconds but conversion to a string requires ~12 minutes 36 seconds. As the size of the number doubles, multiplication time increases by a factor of 3 and the running time of string conversion increases by a factor of 4. The running time of the conversion to string dominates. Computing Q(29) takes about 3 minutes and 20 seconds but conversion to a string will take more than 12 hours (I didn't actually wait that long).
One option is the gmpy2 module that provides access the very fast GMP library. With gmpy2, Q(26) can be calculated in ~0.2 seconds and converted into a string in ~1.2 seconds. Q(29) can be calculated in ~1.7 seconds and converted into a string in ~15 seconds. Multiplication in GMP is O(n*ln(n)). Conversion to decimal is faster that Python's O(n^2) algorithm but still slower than multiplication.
The fastest option is Python's decimal module. Instead of using a radix-2, or binary, internal representation, it uses a radix-10 (actually of power of 10) internal representation. Calculations are slightly slower but conversion to a string is very fast; it is just O(n). Calculating Q(29) requires ~9.2 seconds but calculating and conversion together only requires ~9.5 seconds. The time for conversion to string is only ~0.3 seconds.
Here is an example program using decimal. It also sums the individual digits of the final value.
import decimal
decimal.getcontext().prec = 200000000
decimal.getcontext().Emax = 200000000
decimal.getcontext().Emin = -200000000
def sum_of_digits(x):
return sum(map(int, (t for t in str(x))))
a = decimal.Decimal(0)
b = decimal.Decimal(1)
for i in range(28):
c = (a + b) * (a + b)
a = b
b = c
temp = str(b)
print(i, len(temp), sum_of_digits(temp))
I didn't include the time for converting the millions of digits into strings and adding them in the discussion above. That time should be the same for each version.
This WILL take too long, since is a kind of geometric progression which tends to infinity.
Example:
a=0
b=1
c=1*1 = 1
a=1
b=1
c=2*2 = 4
a=1
b=4
c=5*5 = 25
a=4
b=25
c= 29*29 = 841
a=25
b=841
.
.
.
You can check if c%10==0 and then divide it, and in the end multiplyit number of times you divided it but in the end it'll be the same large number. If you really need to do this calculation try using C++ it should run it faster than Python.
Here's your code written in C++
#include <cstdlib>
#include <iostream>
using namespace std;
int main(int argc, char *argv[])
{
long long int a=0;
long long int b=1;
long long int c=0;
for(int i=0;i<28;i++){
c=(a+b)*(a+b);
a=b;
b=c;
}
cout << c;
return 0;
}

In practice, why compare integer is better than compare string?

I did this test
import time
def test1():
a=100
b=200
start=time.time()
if (a>b):
c=a
else:
c=b
end=time.time()
print(end-start)
def test2():
a="amisetertzatzaz1111reaet"
b="avieatzfzatzr333333ts"
start=time.time()
if (a>b):
c=a
else:
c=b
end=time.time()
print(end-start)
def test3():
a="100"
b="200"
start=time.time()
if (a>b):
c=a
else:
c=b
end=time.time()
print(end-start)
And obtain as result
1.9073486328125e-06 #test1()
9.5367431640625e-07 #test2()
1.9073486328125e-06 #test3()
Execution times are similar. It's true, use integer instead of string reduce the storage space but what about the execution time?
Timing a single execution of a short piece of code doesn't tell you very much at all. In particular, if you look at the timing numbers from your test1 and test3, you'll see that the numbers are identical. That ought to be a warning sign that, in fact, all that you're seeing here is the resolution of the timer:
>>> 2.0 / 2 ** 20
1.9073486328125e-06
>>> 1.0 / 2 ** 20
9.5367431640625e-07
For better results, you need to run the code many times, and measure and subtract the timing overhead. Python has a built-in module timeit for doing exactly this. Let's time 100 million executions of each kind of comparison:
>>> from timeit import timeit
>>> timeit('100 > 200', number=10**8)
5.98881983757019
>>> timeit('"100" > "200"', number=10**8)
7.528342008590698
so you can see that the difference is not really all that much (string comparison only about 25% slower in this case). So why is string comparison slower? Well, the way to find out is to look at the implementation of the comparison operation.
In Python 2.7, comparison is implemented by the do_cmp function in object.c. (Please open this code in a new window to follow the rest of my analysis.) On line 817, you'll see that if the objects being compared are the same type and if they have a tp_compare function in their class structure, then that function is called. In the case of integer objects, this is what happens, the function being int_compare in intobject.c, which you'll see is very simple.
But strings don't have a tp_compare function, so do_cmp proceeds to call try_rich_to_3way_compare which then calls try_rich_compare_bool up to three times (trying the three comparison operators EQ, LT and GT in turn). This calls try_rich_compare which calls string_richcompare in stringobject.c.
So string comparison is slower because it has to use the complicated "rich comparison" infrastructure, whereas integer comparison is more direct. But even so, it doesn't make all that much difference.
Huh? Since the storage space is reduced, the number of bits that need to be compared is also reduced. Comparing bits is work, doing less work means it goes faster.

Calculate poisson probability percentage

When you use the POISSON function in Excel (or in OpenOffice Calc), it takes two arguments:
an integer
an 'average' number
and returns a float.
In Python (I tried RandomArray and NumPy) it returns an array of random poisson numbers.
What I really want is the percentage that this event will occur (it is a constant number and the array has every time different numbers - so is it an average?).
for example:
print poisson(2.6,6)
returns [1 3 3 0 1 3] (and every time I run it, it's different).
The number I get from calc/excel is 3.19 (POISSON(6,2.16,0)*100).
Am I using the python's poisson wrong (no pun!) or am I missing something?
scipy has what you want
>>> scipy.stats.distributions
<module 'scipy.stats.distributions' from '/home/coventry/lib/python2.5/site-packages/scipy/stats/distributions.pyc'>
>>> scipy.stats.distributions.poisson.pmf(6, 2.6)
array(0.031867055625524499)
It's worth noting that it's pretty easy to calculate by hand, too.
It is easy to do by hand, but you can overflow doing it that way. You can do the exponent and factorial in a loop to avoid the overflow:
def poisson_probability(actual, mean):
# naive: math.exp(-mean) * mean**actual / factorial(actual)
# iterative, to keep the components from getting too large or small:
p = math.exp(-mean)
for i in xrange(actual):
p *= mean
p /= i+1
return p
This page explains why you get an array, and the meaning of the numbers in it, at least.

Categories