Range of int and float in Python

Range of int and float in Python - python

I have these two small programs:
1.
x = 1000
while (1000 * x != x):
x = 1000 * x
print("Done")
2.
x = 1000.0
while (1000.0 * x != x):
x = 1000.0 * x
print("Done")
I am trying to make an informed guess on how these programs would execute. I thought as integers are stored in 4 bytes (32 bits), that the first program will execute the loop until x reaches 2^31 and then maybe give out an error. And I guessed that the second loop would go on forever as floats can store more information than int.
My guess couldn't be any more wrong. The first one seems to go on forever whereas the second exists the loop and prints "Done" after x reaches approximately 10^308–this is when x takes the value inf (presumably infinite).
I can't understand how this works, any explanation would be appreciated. Thank you!

The first example with integers will loop until no memory is available (in which case the process will stop or the machine will swap to death):
x = 1000
while (1000 * x != x):
x = 1000 * x
because integers don't have a fixed size in python, they just use all the memory if available (in the process address range).
In the second example you're multiplying your floating point value, which has a limit, because it's using the processor floating point, 8 bytes (python float generally use C double type)
After reaching the max value, it overflows to inf (infinite) and in that case
1000 * inf == inf
small interactive demo:
>>> f = 10.0**308
>>> f*2
inf
>>> f*2 == f*1000
True
>>>

From this article:
When a variable is initialized with an integer value, that value becomes an integer object, and the variable points to it (references the object).
Python removes this confusion, there is only the integer object. Does it have any limits? Very early versions of Python had a limit that was later removed. The limits now are set by the amount of memory you have in your computer. If you want to create an astronomical integer 5,000 digits long, go ahead. Typing it or reading it will be the only problem! How does Python do all of this? It automatically manages the integer object, which is initially set to 32 bits for speed. If it exceeds 32 bits, then Python increases its size as needed up to the RAM limit.
So example 1 will run as long as your computer has the RAM.

Related

Is there any efficient way to increment the corresponding set positions of an integer in an integer array?

Any solution consuming less than O(Bit Length) time is welcome. I need to process around 100 million large integers.
answer = [0 for i in xrange(100)]
def pluginBits(val):
global answer
for j in xrange(len(answer)):
if val <= 0:
break
answer[j] += (val & 1)
val >>= 1

A speedier way to do this would be to use '{:b}'.format(someval) to convert from integer to a string of '1's and '0's. Python still needs to do similar work to perform this conversion, but doing it at the C layer in the interpreter internals involves significantly less overhead for larger values.
For conversion to actual list of integer 1s and 0s, you could do something like:
# Done once at top level to make translation table:
import string
bitstr_to_intval = string.maketrans(b'01', b'\x00\x01')
# Done for each value to convert:
bits = bytearray('{:b}'.format(origint).translate(bitstr_to_intval))
Since bytearray is a mutable sequence of values in range(256) that iterates the actual int values, you don't need to convert to list; it should be usable in 99% of the places the list would be used, using less memory and running faster.
This does generate the values in the reverse of the order your code produces (that is, bits[-1] here is the same as your answer[0], bits[-2] is your answer[1], etc.), and it's unpadded, but since you're summing bits, the padding isn't needed, and reversing the result is a trivial reversing slice (add [::-1] to the end). Summing the bits from each input can be made much faster by making answer a numpy array (that allows a bulk element-wise addition at the C layer), and putting it all together gets:
import string
bitstr_to_intval = string.maketrans(b'01', b'\x00\x01')
answer = numpy.zeros(100, numpy.uint64)
def pluginBits(val):
bits = bytearray('{:b}'.format(val).translate(bitstr_to_intval))[::-1]
answer[:len(bits)] += bits
In local tests, this definition of pluginBits takes a little under one-seventh the time to sum the bits at each position for 10,000 random input integers of 100 bits each, and gets the same results.

I'm making mistakes dividing large numbers

I am trying to write a program in python 2.7 that will first see if a number divides the other evenly, and if it does get the result of the division.
However, I am getting some interesting results when I use large numbers.
Currently I am using:
from __future__ import division
import math
a=82348972389472433334783
b=2
if a/b==math.trunc(a/b):
answer=a/b
print 'True' #to quickly see if the if loop was invoked
When I run this I get:
True
But 82348972389472433334783 is clearly not even.
Any help would be appreciated.

That's a crazy way to do it. Just use the remainder operator.
if a % b == 0:
# then b divides a evenly
quotient = a // b

The true division implicitly converts the input to floats which don't provide the precision to store the value of a accurately. E.g. on my machine
>>> int(1E15+1)
1000000000000001
>>> int(1E16+1)
10000000000000000
hence you loose precision. A similar thing happens with your big number (compare int(float(a))-a).
Now, if you check your division, you see the result "is" actually found to be an integer
>>> (a/b).is_integer()
True
which is again not really expected beforehand.
The math.trunc function does something similar (from the docs):
Return the Real value x truncated to an Integral (usually a long integer).
The duck typing nature of python allows a comparison of the long integer and float, see
Checking if float is equivalent to an integer value in python and
Comparing a float and an int in Python.

Why don't you use the modulus operator instead to check if a number can be divided evenly?
n % x == 0

For loop computing recurrence relation takes very long

Q(x)=[Q(x−1)+Q(x−2)]^2
Q(0)=0, Q(1)=1
I need to find Q(29). I wrote a code in python but it is taking too long. How to get the output (any language would be fine)?
Here is the code I wrote:
a=0
b=1
for i in range(28):
c=(a+b)*(a+b)
a=b
b=c
print(b)

I don't think this is a tractable problem with programming. The reason why your code is slow is that the numbers within grow very rapidly, and python uses infinite-precision integers, so it takes its time computing the result.
Try your code with double-precision floats:
a=0.0
b=1.0
for i in range(28):
c=(a+b)*(a+b)
a=b
b=c
print(b)
The answer is inf. This is because the answer is much much larger than the largest representable double-precision number, which is rougly 10^308. You could try using finite-precision integers, but those will have an even smaller representable maximum. Note that using doubles will lead to loss of precision, but surely you don't want to know every single digit of your huuuge number (side note: I happen to know that you do, making your job even harder).
So here's some math background for my skepticism: Your recurrence relation goes
Q[k] = (Q[k-2] + Q[k-1])^2
You can formulate a more tractable sequence from the square root of this sequence:
P[k] = sqrt(Q[k])
P[k] = P[k-2]^2 + P[k-1]^2
If you can solve for P, you'll know Q = P^2.
Now, consider this sequence:
R[k] = R[k-1]^2
Starting from the same initial values, this will always be smaller than P[k], since
P[k] = P[k-2]^2 + P[k-1]^2 >= P[k-1]^2
(but this will be a "pretty close" lower bound as the first term will always be insignificant compared to the second). We can construct this sequence:
R[k] = R[k-1]^2 = R[k-2]^4 = R[k-3]^6 = R[k-m]^(2^m) = R[0]^(2^k)
Since P[1 give or take] starts with value 2, we should consider
R[k] = 2^(2^k)
as a lower bound for P[k], give or take a few exponents of 2. For k=28 this is
P[28] > 2^(2^28) = 2^(268435456) = 10^(log10(2)*2^28) ~ 10^80807124
That's at least 80807124 digits for the final value of P, which is the square root of the number you're looking for. That makes Q[28] larger than 10^1.6e8. If you printed that number into a text file, it would take more than 150 megabytes.
If you imagine you're trying to handle these integers exactly, you'll see why it takes so long, and why you should reconsider your approach. What if you could compute that huge number? What would you do with it? How long would it take python to print that number on your screen? None of this is trivial, so I suggest that you try to solve your problem on paper, or find a way around it.
Note that you can use a symbolic math package such as sympy in python to get a feeling of how hard your problem is:
import sympy as sym
a,b,c,b0 = sym.symbols('a,b,c,b0')
a = 0
b = b0
for k in range(28):
c = (a+b)**2
a = b
b = c
print(c)
This will take a while, but it will fill your screen with the explicit expression for Q[k] with only b0 as parameter. You would "only" have to substitute your values into that monster to obtain the exact result. You could also try sym.simplify on the expression, but I couldn't wait for that to return anything meaningful.
During lunch time I let your loop run, and it finished. The result has
>>> import math
>>> print(math.log10(c))
49287457.71120789
So my lower bound for k=28 is a bit large, probably due to off-by-one errors in the exponent. The memory needed to store this integer is
>>> import sys
>>> sys.getsizeof(c)
21830612
that is roughly 20 MB.

This can be solved with brute force but it is still an interesting problem since it uses two different "slow" operations and there are trade-offs in choosing the correct approach.
There are two places where the native Python implementation of algorithm is slow: the multiplication of large numbers and the conversion of large numbers to a string.
Python uses the Karatsuba algorithm for multiplication. It has a running time of O(n^1.585) where n is the length of the numbers. It does get slower as the numbers get larger but you can compute Q(29).
The algorithm for converting a Python integer to its decimal representation is much slower. It has running time of O(n^2). For large numbers, it is much slower than multiplication.
Note: the times for conversion to a string also include the actual calculation time.
On my computer, computing Q(25) requires ~2.5 seconds but conversion to a string requires ~3 minutes 9 seconds. Computing Q(26) requires ~7.5 seconds but conversion to a string requires ~12 minutes 36 seconds. As the size of the number doubles, multiplication time increases by a factor of 3 and the running time of string conversion increases by a factor of 4. The running time of the conversion to string dominates. Computing Q(29) takes about 3 minutes and 20 seconds but conversion to a string will take more than 12 hours (I didn't actually wait that long).
One option is the gmpy2 module that provides access the very fast GMP library. With gmpy2, Q(26) can be calculated in ~0.2 seconds and converted into a string in ~1.2 seconds. Q(29) can be calculated in ~1.7 seconds and converted into a string in ~15 seconds. Multiplication in GMP is O(n*ln(n)). Conversion to decimal is faster that Python's O(n^2) algorithm but still slower than multiplication.
The fastest option is Python's decimal module. Instead of using a radix-2, or binary, internal representation, it uses a radix-10 (actually of power of 10) internal representation. Calculations are slightly slower but conversion to a string is very fast; it is just O(n). Calculating Q(29) requires ~9.2 seconds but calculating and conversion together only requires ~9.5 seconds. The time for conversion to string is only ~0.3 seconds.
Here is an example program using decimal. It also sums the individual digits of the final value.
import decimal
decimal.getcontext().prec = 200000000
decimal.getcontext().Emax = 200000000
decimal.getcontext().Emin = -200000000
def sum_of_digits(x):
return sum(map(int, (t for t in str(x))))
a = decimal.Decimal(0)
b = decimal.Decimal(1)
for i in range(28):
c = (a + b) * (a + b)
a = b
b = c
temp = str(b)
print(i, len(temp), sum_of_digits(temp))
I didn't include the time for converting the millions of digits into strings and adding them in the discussion above. That time should be the same for each version.

This WILL take too long, since is a kind of geometric progression which tends to infinity.
Example:
a=0
b=1
c=1*1 = 1
a=1
b=1
c=2*2 = 4
a=1
b=4
c=5*5 = 25
a=4
b=25
c= 29*29 = 841
a=25
b=841
.
.
.

You can check if c%10==0 and then divide it, and in the end multiplyit number of times you divided it but in the end it'll be the same large number. If you really need to do this calculation try using C++ it should run it faster than Python.
Here's your code written in C++
#include <cstdlib>
#include <iostream>
using namespace std;
int main(int argc, char *argv[])
{
long long int a=0;
long long int b=1;
long long int c=0;
for(int i=0;i<28;i++){
c=(a+b)*(a+b);
a=b;
b=c;
}
cout << c;
return 0;
}

Python: Number ranges that are extremely large?

val = long(raw_input("Please enter the maximum value of the range:")) + 1
start_time = time.time()
numbers = range(0, val)
shuffle(numbers)
I cannot find a simple way to make this work with extremely large inputs - can anyone help?
I saw a question like this - but I could not implement the range function they described in a way that works with shuffle. Thanks.

To get a random permutation of the range [0, n) in a memory efficient manner; you could use numpy.random.permutation():
import numpy as np
numbers = np.random.permutation(n)
If you need only small fraction of values from the range e.g., to get k random values from [0, n) range:
import random
from functools import partial
def sample(n, k):
# assume n is much larger than k
randbelow = partial(random.randrange, n)
# from random.py
result = [None] * k
selected = set()
selected_add = selected.add
for i in range(k):
j = randbelow()
while j in selected:
j = randbelow()
selected_add(j)
result[i] = j
return result
print(sample(10**100, 10))

If you don't need the full list of numbers (and if you are getting billions, its hard to imagine why you would need them all), you might be better off taking a random.sample of your number range, rather than shuffling them all. In Python 3, random.sample can work on a range object too, so your memory use can be quite modest.
For example, here's code that will sample ten thousand random numbers from a range up to whatever maximum value you specify. It should require only a relatively small amount of memory beyond the 10000 result values, even if your maximum is 100 billion (or whatever enormous number you want):
import random
def get10kRandomNumbers(maximum):
pop = range(1, maximum+1) # this is memory efficient in Python 3
sample = random.sample(pop, 10000)
return sample
Alas, this doesn't work as nicely in Python 2, since xrange objects don't allow maximum values greater than the system's integer type can hold.

An important point to note is that it will be impossible for a computer to have the list of numbers in memory if it is larger than a few billion elements: its memory footprint becomes larger than the typical RAM size (as it takes about 4 GB for 1 billion 32-bit numbers).
In the question, val is a long integer, which seems to indicate that you are indeed using more than a billion integer, so this cannot be done conveniently in memory (i.e., shuffling will be slow, as the operating system will swap).
That said, if the number of elements is small enough (let's say smaller than 0.5 billion), then a list of elements can fit in memory thanks to the compact representation offered by the array module, and be shuffled. This can be done with the standard module array:
import array, random
numbers = array.array('I', xrange(10**8)) # or 'L', if the number of bytes per item (numbers.itemsize) is too small with 'I'
random.shuffle(numbers)

Overflow error Python for Modular Cubes

Attempting to solve this problem:
For a positive number n, define S(n) as the sum of the integers x, for which 1 < x < n and x^3 ≡ 1 mod n.
When n=91, there are 8 possible values for x, namely : 9, 16, 22, 29, 53, 74, 79, 81.
Thus, S(91)=9+16+22+29+53+74+79+81=363.
Find S(13082761331670030).
Of course, my code works for S(91) and when attempting to find S(13082761331670030) I get two different errors.
Here is my code:
def modcube(n):
results = []
for k in range(1,n):
if k**3%n==1:
results.append(k)
return results
This produces Overflow error: range has too many items. When I try using 'xrange' instead of 'range' I get an error stating python int too large to convert to c long. I have also just tried several other things without success.
Can anyone point me in the right direction, without telling me exactly how to solve it?
No spoilers please. I've been at it for two days, my next option is to try implementing this in Java since I'm new to Python.

I think you need to understand two concepts here:
1. integer representation in C and in Python
The implementation of Python you use is called CPython, because it is written using the C language. In C, long integers (usually) are 32 bits long. It means it can work with integers between -2147483647 and 2147483648. In Python, when an integer exceeds this range, it converts them to arbitrary precision integers, where the size of the integer is limited only by the memory of your computer. However, operation on those arbitrary integers (called long integers in Python) are order of magnitude slower than operation on 32 bits integers.
2. The difference between range and xrange:
range produces a list. If you have range(10), it stores the list [0, 1, ... 9] entirely in memory. This is why storing a list of 13082761331670030 items in memory is too mush. Assuming each number is 64 bits, it would need 93 TB of RAM to store the entire list!
xrange produces an iterator. It returns each number one by one. This way, it allows to perform operations on each number of the list without needing to store the entire list in memory. But again, performing calculations on 13082761331670030 different numbers could take more time that you think... The other thing about xrange is that it doesn't work with Python long integers; it is limited (for speed reasons) to 32 bits integers. This is why your program doesn't work using xrange.
The bottom line: Project Euler problems are (more or less) classified by degree of difficulty. You should begin by lower problems first.

You wanted hints, not a solution.
Hints:
Consider that the prime factors of 13082761331670030 is equal to the following primes: 2 x 3 x 5 x 7 x 11 x 13 x 17 x 19 x 23 x 29 x 31 x 37 x 41 x 43
Chinese remainder theorem
Just because x^3 ≡ 1 mod n does not mean that there are not other values other than 3 that satisfy this condition. Specifically, prime1 ** (prime2 - 2) % prime2
My Python solution is 86 milliseconds...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.