so I got a piece of code on which I try ways to make it run faster...because it takes roughtly 40 seconds to execute :
for i in range (1, len(pk), 2):
for j in range (2, len(pk), 2):
b = random.randrange(2 ** Const.ALPHA)
sum += (b * pk[i] * pk[j])
I tried Threads...it doesnt run any faster.
I tried to used sum() with the two for loops embbeded in it. It doesnt run any faster either.
pk elements are very large int.
Right now, len(pk) is equal to 162 and Const.ALPHA is equal to 9. But in the future, it might very well be more than that.
Thx
PS : Btw, you get a cookie if you can guess what's the purpose of the program using these variables.
I don't have an i7 :) and I don't know how big your numbers are. I tried it with 65536-bit pk[i], and your function took almost 10.5 seconds, so I suppose your numbers are still a lot bigger.
But I think the results should be indicative. The function below took 0.45 seconds (for a better than 20x speed-up); it avoids multiplying bignums by turning sum(sum(r(b)*pk[i]*pk[j])) into sum(pk[i]*sum(r(b)*pk[j])). Not only does this do fewer multiplications, the majority of the multiplications which are left are smallnum * bignum instead of bignum * bignum.
The use of generators and list comprehensions might not help. YMMV.
def xmul(pk):
# The speedup from these locals is insignificant, but
# it lets the inline generator fit on a line.
rmax = 2**const.ALPHA
r = random.randrange
return sum(
pki * sum(r(rmax) * pkj for pkj in pk[2::2])
for pki in pk[1::2]
)
Related
I was writing a program where I need to calculate insanely huge numbers.
k = int(input())
print(int((2**k)*5 % (10**9 + 7))
Here, k being of the orders of 109
As expected, this was rather slow( taking upto 5 seconds to calculate) whereas my program needs to finish computing in 1 second.
After a little research online I found a function pow(), and by writing
p = 10**9 + 7
print(int(pow(2, k- 1,p)*10))
This works fine for small numbers but messes up at large numbers. I can understand why that is happening( because this isn't essentially what I want to calculate and the modulus operation with such a large number doesn't affect the calculation with small values of k).
I also found libraries like gmpy2 and numpy but I don't know how to use them since I'm just a beginner with python.
So how can I write an expression for what I want to calculate and which works fast enough and doesn't err at large numbers too?
You can optimize your operation by passing the number you want to take modulus from as the third argument of builtin pow and multiplying the result by 5
def func(k):
x = pow(2, k, pow(10,9) + 7) * 5
return int(x)
I am trying to get an accepted answer for this question:http://www.spoj.com/problems/PRIME1/
It's nothing new, just wanting prime numbers to be generated between two given numbers. Eventually, I have coded the following. But spoj is giving me runtime-error(nzec), and I have no idea how it should be dealt with. I hope you can help me with it. Thanks in advance.
def is_prime(m,n):
myList= []
mySieve= [True] * (n+1)
for i in range(2,n+1):
if mySieve[i]:
myList.append(i)
for x in range(i*i,n+1,i):
mySieve[x]= False
for a in [y for y in myList if y>=m]:
print(a)
t= input()
count = 0
while count <int(t):
m, n = input().split()
count +=1
is_prime(int(m),int(n))
if count == int(t):
break
print("\n")
Looking at the problem definition:
In each of the next t lines there are two numbers m and n (1 <= m <= n <= 1000000000, n-m<=100000) separated by a space.
Looking at your code:
mySieve= [True] * (n+1)
So, if n is 1000000000, you're going to try to create a list of 1000000001 boolean values. That means you're asking Python to allocate storage for a billion pointers. On a 64-bit platform, that's 8GB—which is fine as far as Python's concerned, but might well throw your system into swap hell or get it killed by a limit or watchdog. On a 32-bit platform, that's 4GB—which will guarantee you a MemoryError.
The problem also explicitly has this warning:
Warning: large Input/Output data, be careful with certain languages
So, if you want to implement it this way, you're going to have to come up with a more compact storage. For example, array.array('B', [True]) * (n+1) will only take 1GB instead of 4 or 8. And you can make it even smaller (128MB) if you store it in bits instead of bytes, but that's not quite as trivial a change to code.
Calculating prime numbers between two numbers is meaningless. You can only calculate prime numbers until a given number by using other primes you found before, then show only range you wanted.
Here is a python code and some calculated primes you can continue by using them:
bzr branch http://bzr.ceremcem.net/calc-primes
This code is somewhat trivial but is working correctly and tested well.
I am trying to do a double integral by first interpolating the data to make a surface. I am using numba to try and speed this process up, but it's just taking too long.
Here is my code, with the images needed to run the code located at here and here.
Noting that your code has a quadruple-nested set of for loops, I focused on optimizing the inner pair. Here's the old code:
for i in xrange(K.shape[0]):
for j in xrange(K.shape[1]):
print(i,j)
'''create an r vector '''
r=(i*distX,j*distY,z)
for x in xrange(img.shape[0]):
for y in xrange(img.shape[1]):
'''create an ksi vector, then calculate
it's norm, and the dot product of r and ksi'''
ksi=(x*distX,y*distY,z)
ksiNorm=np.linalg.norm(ksi)
ksiDotR=float(np.dot(ksi,r))
'''calculate the integrand'''
temp[x,y]=img[x,y]*np.exp(1j*k*ksiDotR/ksiNorm)
'''interpolate so that we can do the integral and take the integral'''
temp2=rbs(a,b,temp.real)
K[i,j]=temp2.integral(0,n,0,m)
Since K and img are each about 2000x2000, the innermost statements need to be executed sixteen trillion times. This is simply not practical using Python, but we can shift the work into C and/or Fortran using NumPy to vectorize. I did this one careful step at a time to try to make sure the results will match; here's what I ended up with:
'''create all r vectors'''
R = np.empty((K.shape[0], K.shape[1], 3))
R[:,:,0] = np.repeat(np.arange(K.shape[0]), K.shape[1]).reshape(K.shape) * distX
R[:,:,1] = np.arange(K.shape[1]) * distY
R[:,:,2] = z
'''create all ksi vectors'''
KSI = np.empty((img.shape[0], img.shape[1], 3))
KSI[:,:,0] = np.repeat(np.arange(img.shape[0]), img.shape[1]).reshape(img.shape) * distX
KSI[:,:,1] = np.arange(img.shape[1]) * distY
KSI[:,:,2] = z
# vectorized 2-norm; see http://stackoverflow.com/a/7741976/4323
KSInorm = np.sum(np.abs(KSI)**2,axis=-1)**(1./2)
# loop over entire K, which is same shape as img, rows first
# this loop populates K, one pixel at a time (so can be parallelized)
for i in xrange(K.shape[0]):
for j in xrange(K.shape[1]):
print(i, j)
KSIdotR = np.dot(KSI, R[i,j])
temp = img * np.exp(1j * k * KSIdotR / KSInorm)
'''interpolate so that we can do the integral and take the integral'''
temp2 = rbs(a, b, temp.real)
K[i,j] = temp2.integral(0, n, 0, m)
The inner pair of loops is now completely gone, replaced by vectorized operations done in advance (at a space cost linear in the size of the inputs).
This reduces the time per iteration of the outer two loops from 340 seconds to 1.3 seconds on my Macbook Air 1.6 GHz i5, without using Numba. Of the 1.3 seconds per iteration, 0.68 seconds are spent in the rbs function, which is scipy.interpolate.RectBivariateSpline. There is probably room to optimize further--here are some ideas:
Reenable Numba. I don't have it on my system. It may not make much difference at this point, but easy for you to test.
Do more domain-specific optimization, such as trying to simplify the fundamental calculations being done. My optimizations are intended to be lossless, and I don't know your problem domain so I can't optimize as deeply as you may be able to.
Try to vectorize the remaining loops. This may be tough unless you are willing to replace the scipy RBS function with something supporting multiple calculations per call.
Get a faster CPU. Mine is pretty slow; you can probably get a speedup of at least 2x simply by using a better computer than my tiny laptop.
Downsample your data. Your test images are 2000x2000 pixels, but contain fairly little detail. If you cut their linear dimensions by 2-10x, you'd get a huge speedup.
So that's it for me for now. Where does this leave you? Assuming a slightly better computer and no further optimization work, even the optimized code would take about a month to process your test images. If you only have to do this once, maybe that's fine. If you need to do it more often, or need to iterate on the code as you try different things, you probably need to keep optimizing--starting with that RBS function which consumes more than half the time now.
Bonus tip: your code would be a lot easier to deal with if it didn't have nearly-identical variable names like k and K, nor used j as a variable name and also as a complex number suffix (0j).
I'm attempting to learn python and I thought trying to develop my own prime sieve would be an interesting problem for the afternoon. When required thus far, I would just import a version of the Sieve of Eratosthenes that I found online -- it's this that I used as my benchmark.
After trying several different optimizations, I thought I had written a pretty decent sieve:
def sieve3(n):
top = n+1
sieved = dict.fromkeys(xrange(3,top,2), True)
for si in sieved:
if si * si > top:
break
if sieved[si]:
for j in xrange((si*2) + si, top, si*2): [****]
sieved[j] = False
return [2] + [pr for pr in sieved if sieved[pr]]
Using the first 1,000,000 integers as my range, this code would generate the correct number of primes and was only about 3-5x slower than my benchmark. I was about to give up and pat myself on the back when I tried it on a larger range, but it no longer worked!
n = 1,000 -- Benchmark = 168 in 0.00010 seconds
n = 1,000 -- Sieve3 = 168 in 0.00022 seconds
n = 4,194,304 -- Benchmark = 295,947 in 0.288 seconds
n = 4,194,304 -- Sieve3 = 295,947 in 1.443 seconds
n = 4,194,305 -- Benchmark = 295,947 in 3.154 seconds
n = 4,194,305 -- Sieve3 = 2,097,153 in 0.8465 seconds
I think the problem comes from the line with [****], but I can't figure out why it's so broken. It's supposed to mark each odd multiple of 'j' as False and it works most of the time, but for anything above 4,194,304 the sieve is broken. (To be fair, it breaks on random other numbers too, like 10,000 for instance).
I made a change and it significantly slowed my code down, but it would actually work for all values. This version includes all numbers (not just odds) but is otherwise identical.
def sieve2(n):
top = n+1
sieved = dict.fromkeys(xrange(2,top), True)
for si in sieved:
if si * si > top:
break
if sieved[si]:
for j in xrange((si*2), top, si):
sieved[j] = False
return [pr for pr in sieved if sieved[pr]]
Can anyone help me figure out why my original function (sieve3) doesn't work consistently?
Edit: I forgot to mention, that when Sieve3 'breaks', sieve3(n) returns n/2.
The sieve requires the loop over candidate primes to be ordered. The code in question is enumerating the keys of a dictionary, which are not guaranteed to be ordered. Instead, go ahead and use the xrange you used to initialize the dictionary for your main sieve loop as well as the return result loop as follows:
def sieve3(n):
top = n+1
sieved = dict.fromkeys(xrange(3,top,2), True)
for si in xrange(3,top,2):
if si * si > top:
break
if sieved[si]:
for j in xrange(3*si, top, si*2):
sieved[j] = False
return [2] + [pr for pr in xrange(3,top,2) if sieved[pr]]
It's because dictionary keys are not ordered. Some of the time, by chance, for si in sieved: will loop through your keys in increasing order.
With your last example, the first value si gets is big enough to break the loop immediately.
You can simply use:
for si in sorted(sieved):
Well, look at the runtime -- you see that the runtime on the last case you showed was almost 5 times faster than the benchmark, while it had usually been 5 times slower. So that is a red flag, maybe you aren't performing all of the iterations? (And it is 5 times faster while having almost 10 times as many primes...)
I don't have time to look into the code more right now, but I hope this helps.
Problem 48 description from Project Euler:
The series, 1^1 + 2^2 + 3^3 + ... + 10^10 = 10405071317. Find the last
ten digits of the series, 1^1 + 2^2 + 3^3 + ... + 1000^1000.
I've just solved this problem using a one-liner in Python:
print sum([i**i for i in range(1,1001)])%(10**10)
I did it that way almost instantly, as I remembered that division mod n is very fast in Python. But I still don't understand how does this work under the hood (what optimizations does Python do?) and why is this so fast.
Could you please explain this to me? Is the mod 10**10 operation optimized to be applied for every iteration of the list comprehension instead of the whole sum?
$ time python pe48.py
9110846700
real 0m0.070s
user 0m0.047s
sys 0m0.015s
Given that
print sum([i**i for i in range(1,1001)])%(10**10)
and
print sum([i**i for i in range(1,1001)])
function equally fast in Python, the answer to your last question is 'no'.
So, Python must be able to do integer exponentiation really fast. And it so happens that integer exponentiation is O(log(n)) multiplications: http://en.wikipedia.org/wiki/Exponentiation#Efficient_computation_of_integer_powers
Essentially what is done is, instead of doing 2^100 = 2*2*2*2*2... 100 times, you realize that 2^100 is also 2^64 * 2^32 * 2^4 , that you can square 2 over and over and over to get 2^2 then 2^4 then 2^8... etc, and once you've found the values of all three of those components you multiply them for a final answer. This requires much fewer multiplication operations. The specifics of how to go about it are a bit more complex, but Python is mature enough to be well optimized on such a core feature.
No, it's applied to the whole sum. The sum itself is very fast to compute. Doing exponents isn't that hard to do quickly if the arguments are integers.