finding a very big prime number in Python [duplicate] - python

I want to generate two really large prime numbers using an algorithm I found online and changed slightly.
I get this error on line 5:
Python OverflowError: cannot fit 'long' into an index=sized integer
My code:
import math
def atkin(end):
if end < 2: return []
lng = ((end/2)-1+end%2)
**sieve = [True]*(lng+1)**
for i in range(int(math.sqrt(end)) >> 1):
if not sieve[i]: continue
for j in range( (i*(i + 3) << 1) + 3, lng, (i << 1) + 3):
sieve[j] = False
primes = [2]
primes.extend([(i << 1) + 3 for i in range(lng) if sieve[i]])
return primes
How can I fix my error?
If you know a better way to generate large primes, that would be helpful also.

The following code demonstrates the problem that you are running into:
import sys
x = [True]*(sys.maxint+1)
which yields an OverflowError. If you instead do:
x = [True]*(sys.maxint)
then you should get a MemoryError.
Here is what is going on. Python can handle arbitrarily large integers with its own extendible data type. However, when you try to make a list like above, Python tries to convert the number of times the small list is repeated, which is a Python integer, to a C integer of type Py_ssize_t. Py_ssize_t is defined differently depending on your build but can be a ssize_t, long, or int. Essentially, Python checks if the Python integer can fit in the C integer type before doing the conversion and raises the OverflowError if it won't work.

Line 5 trues to allocate a really long list full of True values. Probably your lng is too large to fit that list in memory?
I was not able to exactly reproduce your error; in the worst case I ended up with just a MemoryError instead.
Probably the algorithm is ok (though I can't bet), just try a smaller number.

Related

Problem in handling large number in Python

I was solving a problem on codeforces:- Here is the Question
I wrote python code to solve the same:-
n=int(input())
print(0 if ((n*(n+1))/2)%2==0 else 1)
But it failed for the test-case: 1999999997 See Submission-[TestCase-6]
Why it failed despite Python can handle large numbers effectively ? [See this Thread]
Also the similar logic worked flawlessly when I coded it in CPP [See Submission Here]:-
#include<bits/stdc++.h>
using namespace std;
int main(){
int n;
cin>>n;
long long int sum=1ll*(n*(n+1))/2;
if(sum%2==0) cout<<0;
else cout<<1;
return 0;
}
Ran a test based on the insight from #juanpa.arrivillaga and this has been a great rabbit hole:
number = 1999999997
temp = n * (n+1)
# type(temp) is int, n is 3999999990000000006. We can clearly see that after dividing by 2 we should get an odd number, and therefore output 1
divided = temp / 2
# type(divided) is float. Printing divided for me gives 1.999999995e+18
# divided % 2 is 0
divided_int = temp // 2
# type(divided_int) is int. Printing divided for me gives 1999999995000000003
// Forces integer division, and will always return an integer: 7 // 2 will be equal to 3, not 3.5
As per the other answer you have linked, the int type in python can handle very large numbers.
Float can also handle large numbers, but there are issues with our ability to represent floats across languages. The crux of it is that not all floats can be captured accurately: In many scenarios the difference between 1.999999995e+18 and 1.999999995000000003e+18 is so minute it won't matter, but this is a scenario where it does, as you care a lot about the final digit of the number.
You can learn more about this by watching this video
As mentioned by #juanpa.arrivillaga and #DarrylG in comments, I should have used floor operator// for integer division, the anomaly was cause due to float division by / division operator.
So, the correct code should be:-
n=int(input())
print(0 if (n*(n+1)//2)%2==0 else 1)

Python high precision integer to array of numpy integer

I understand that numpy can't handle non-native integers, but how can I store python high precision integers as an array of native integers (in either endian)? e.g.
a = 105951305240504794066266398962584786593081186897777398483830058739006966285013
can't be stored as a native integer because it's 256 bit. But it can be stored as
A = array([18196013122530525909, 15462736877728584896,
12869567647602165677, 16879016735257358861], dtype=uint64)
using little-endian (i.e. a == A[0] + A[1]<<64 + A[2]<<128 + A[3]<<192) or A[::-1] as big-endian. How can I convert from a to A here?
I want to convert this "python-side" number to "numpy-side" so that I can run highly efficient algorithms on it (e.g. fast multiplication using Fourier transform).
I believe Python internally should already be using similar structure. All I need to do is to "expose" it to numpy, but I'm not sure about the exact structure or how can I "expose" it. The most straight forward way is of course using a while loop:
A = np.zeros(4, 'uint64')
i = 0
while a > 0:
A[i] = a & (2**64-1)
a >>= 64
i += 1
But I'm wondering are there more "native" or "efficient" ways?
Thanks for your help!

Save list of numbers to (binary) file with defined bits per number

I have a list/array of numbers, which I want to save to a binary file.
The crucial part is, that each number should not be saved as a pre-defined data type.
The bits per value are constant for all values in the list but do not correspond to the typical data types (e.g. byte or int).
import numpy as np
# create 10 random numbers in range 0-63
values = np.int32(np.round(np.random.random(10)*63));
# each value requires exactly 6 bits
# how to save this to a file?
# just for debug/information: bit string representation
bitstring = "".join(map(lambda x: str(bin(x)[2:]).zfill(6), values));
print(bitstring)
In the real project, there are more than a million values I want to store with a given bit dephts.
I already tried the module bitstring, but appending each value to the BitArray costs a lot of time...
The may be some numpy-specific way that make things easier, but here's a pure Python (2.x) way to do it. It first converts the list of values into a single integer since Python supports int values of any length. Next it converts that int value into a string of bytes and writes it to the file.
Note: If you're sure all the values will fit within the bit-width specified, the array_to_int() function could be sped up slightly by changing the (value & mask) it's using to just value.
import random
def array_to_int(values, bitwidth):
mask = 2**bitwidth - 1
shift = bitwidth * (len(values)-1)
integer = 0
for value in values:
integer |= (value & mask) << shift
shift -= bitwidth
return integer
# In Python 2.7 int and long don't have the "to_bytes" method found in Python 3.x,
# so here's one way to do the same thing.
def to_bytes(n, length):
return ('%%0%dx' % (length << 1) % n).decode('hex')[-length:]
BITWIDTH = 6
#values = [random.randint(0, 2**BITWIDTH - 1) for _ in range(10)]
values = [0b000001 for _ in range(10)] # create fixed pattern for debugging
values[9] = 0b011111 # make last one different so it can be spotted
# just for debug/information: bit string representation
bitstring = "".join(map(lambda x: bin(x)[2:].zfill(BITWIDTH), values));
print(bitstring)
bigint = array_to_int(values, BITWIDTH)
width = BITWIDTH * len(values)
print('{:0{width}b}'.format(bigint, width=width)) # show integer's value in binary
num_bytes = (width+8 - (width % 8)) // 8 # round to whole number of 8-bit bytes
with open('data.bin', 'wb') as file:
file.write(to_bytes(bigint, num_bytes))
Since you give an example with a string, I'll assume that's how you get the results. This means performance is probably never going to be great. If you can, try creating bytes directly instead of via a string.
Side note: I'm using Python 3 which might require you to make some changes for Python 2. I think this code should work directly in Python 2, but there are some changes around bytearrays and strings between 2 and 3, so make sure to check.
byt = bytearray(len(bitstring)//8 + 1)
for i, b in enumerate(bitstring):
byt[i//8] += (b=='1') << i%8
and for getting the bits back:
bitret = ''
for b in byt:
for i in range(8):
bitret += str((b >> i) & 1)
For millions of bits/bytes you'll want to convert this to a streaming method instead, as you'd need a lot of memory otherwise.

Python prime number code giving runtime error(NZEC) on spoj

I am trying to get an accepted answer for this question:http://www.spoj.com/problems/PRIME1/
It's nothing new, just wanting prime numbers to be generated between two given numbers. Eventually, I have coded the following. But spoj is giving me runtime-error(nzec), and I have no idea how it should be dealt with. I hope you can help me with it. Thanks in advance.
def is_prime(m,n):
myList= []
mySieve= [True] * (n+1)
for i in range(2,n+1):
if mySieve[i]:
myList.append(i)
for x in range(i*i,n+1,i):
mySieve[x]= False
for a in [y for y in myList if y>=m]:
print(a)
t= input()
count = 0
while count <int(t):
m, n = input().split()
count +=1
is_prime(int(m),int(n))
if count == int(t):
break
print("\n")
Looking at the problem definition:
In each of the next t lines there are two numbers m and n (1 <= m <= n <= 1000000000, n-m<=100000) separated by a space.
Looking at your code:
mySieve= [True] * (n+1)
So, if n is 1000000000, you're going to try to create a list of 1000000001 boolean values. That means you're asking Python to allocate storage for a billion pointers. On a 64-bit platform, that's 8GB—which is fine as far as Python's concerned, but might well throw your system into swap hell or get it killed by a limit or watchdog. On a 32-bit platform, that's 4GB—which will guarantee you a MemoryError.
The problem also explicitly has this warning:
Warning: large Input/Output data, be careful with certain languages
So, if you want to implement it this way, you're going to have to come up with a more compact storage. For example, array.array('B', [True]) * (n+1) will only take 1GB instead of 4 or 8. And you can make it even smaller (128MB) if you store it in bits instead of bytes, but that's not quite as trivial a change to code.
Calculating prime numbers between two numbers is meaningless. You can only calculate prime numbers until a given number by using other primes you found before, then show only range you wanted.
Here is a python code and some calculated primes you can continue by using them:
bzr branch http://bzr.ceremcem.net/calc-primes
This code is somewhat trivial but is working correctly and tested well.

summing over a list of int overflow(?) python

Let's consider a list of large integers, for example one given by:
def primesfrom2to(n):
# http://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
""" Input n>=6, Returns a array of primes, 2 <= p < n """
sieve = np.ones(n/3 + (n%6==2), dtype=np.bool)
sieve[0] = False
for i in xrange(int(n**0.5)/3+1):
if sieve[i]:
k=3*i+1|1
sieve[ ((k*k)/3) ::2*k] = False
sieve[(k*k+4*k-2*k*(i&1))/3::2*k] = False
return np.r_[2,3,((3*np.nonzero(sieve)[0]+1)|1)]
primesfrom2to(2000000)
I want to calculate the sum of that, and the expected result is 142913828922.
But if I do:
sum(primesfrom2to(2000000))
I get 1179908154, which is clearly wrong. The problem is that I have an int overflow, but I don't understand why. Let's me explain.Consider this testing code:
a=primesfrom2to(2000000)
b=[float(i) for i in a]
c=[long(i) for i in a]
sumI=0
sumF=0
sumL=0
m=0
for i,j,k in zip(a,b,c):
m=m+1
sumI=sumI+i
sumF=sumF+j
sumL=sumL+k
print sumI,sumF,sumL
if sumI<0:
print i,m
break
I found out that the first integer overflow is happening at a[i=20444]=225289
If I do:
>>> sum(a[:20043])+225289
-2147310677
But if I do:
>>> sum(a[:20043])
2147431330
>>> 2147431330+225289
2147656619L
What's happening? Why such a different behaviour? Why can't sum switch automatically to long type and give the correct result?
Look at the types of your results. You are summing a numpy array, which is using numpy datatypes, which can overflow. When you do sum(a[:20043]), you get a numpy object back (some sort of int32 or the like), which overflows when added to another number. When you manually type in the same number, you're creating a Python builtin int, which can auto-promote to long. Numpy arrays cannot autopromote like Python builtin types, because the array type (and its memory layout) have to be fixed when the array is created. This makes operations much faster at the expense of type flexibility.
You may be able to get around the problem by using a different datatype (like np.int64) instead of np.bool. However, it depends how big your numbers are. A simple example:
# Python types ok
>>> 2**62
4611686018427387904L
>>> 2**63
9223372036854775808L
# numpy types overflow
>>> np.int64(2)**62
4611686018427387904
>>> np.int64(2)**63
-9223372036854775808
Your example works correctly for me on 64-bit Python, so I guess you're using 32-bit Python. If you can use 64-bit types you will be able to get past the limit you found, but as my example shows you will eventually overflow 64-bit ints too if your numbers get super huge.

Categories