python prime crunching: processing pool is slower? - python

So I've been messing around with python's multiprocessing lib for the last few days and I really like the processing pool. It's easy to implement and I can visualize a lot of uses. I've done a couple of projects I've heard about before to familiarize myself with it and recently finished a program that brute forces games of hangman.
Anywho, I was doing an execution time compairison of summing all the prime numbers between 1 million and 2 million both single threaded and through a processing pool. Now, for the hangman cruncher, putting the games in a processing pool improved execution time by about 8 times (i7 with 8 cores), but when grinding out these primes, it actually increased processing time by almost a factor of 4.
Can anyone tell me why this is? Here is the code for anyone interested in looking at or testing it:
#!/user/bin/python.exe
import math
from multiprocessing import Pool
global primes
primes = []
def log(result):
global primes
if result:
primes.append(result[1])
def isPrime( n ):
if n < 2:
return False
if n == 2:
return True, n
max = int(math.ceil(math.sqrt(n)))
i = 2
while i <= max:
if n % i == 0:
return False
i += 1
return True, n
def main():
global primes
#pool = Pool()
for i in range(1000000, 2000000):
#pool.apply_async(isPrime,(i,), callback = log)
temp = isPrime(i)
log(temp)
#pool.close()
#pool.join()
print sum(primes)
return
if __name__ == "__main__":
main()
It'll currently run in a single thread, to run through the processing pool, uncomment the pool statements and comment out the other lines in the main for loop.

the most efficient way to use multiprocessing is to divide the work into n equal sized chunks, with n the size of the pool, which should be approximately the number of cores on your system. The reason for this is that the work of starting subprocesses and communicating between them is quite large. If the size of the work is small compared to the number of work chunks, then the overhead of IPC becomes significant.
In your case, you're asking multiprocessing to process each prime individually. A better way to deal with the problem is to pass each worker a range of values, (probably just a start and end value) and have it return all of the primes in that range it found.
In the case of identifying large-ish primes, the work done grows with the starting value, and so You probably don't want to divide the total range into exactly n chunks, but rather n*k equal chunks, with k some reasonable, small number, say 10 - 100. that way, when some workers finish before others, there's more work left to do and it can be balanced efficiently across all workers.
Edit: Here's an improved example to show what that solution might look like. I've changed as little as possible so you can compare apples to apples.
#!/user/bin/python.exe
import math
from multiprocessing import Pool
global primes
primes = set()
def log(result):
global primes
if result:
# since the result is a batch of primes, we have to use
# update instead of add (or for a list, extend instead of append)
primes.update(result)
def isPrime( n ):
if n < 2:
return False
if n == 2:
return True, n
max = int(math.ceil(math.sqrt(n)))
i = 2
while i <= max:
if n % i == 0:
return False
i += 1
return True, n
def isPrimeWorker(start, stop):
"""
find a batch of primes
"""
primes = set()
for i in xrange(start, stop):
if isPrime(i):
primes.add(i)
return primes
def main():
global primes
pool = Pool()
# pick an arbitrary chunk size, this will give us 100 different
# chunks, but another value might be optimal
step = 10000
# use xrange instead of range, we don't actually need a list, just
# the values in that range.
for i in xrange(1000000, 2000000, step):
# call the *worker* function with start and stop values.
pool.apply_async(isPrimeWorker,(i, i+step,), callback = log)
pool.close()
pool.join()
print sum(primes)
return
if __name__ == "__main__":
main()

Related

getting prime number script multithreaded

I have been trying to widen my understanding of python recently, so I decided to make a prime number calculator to work on my optimization skills. I have worked all day on this and have improved the time to go through 0-100,000 to .15 seconds from 25~30 ish seconds. However, I am dealing with exponential growth in difficulty as I go higher in the iterations.
My question now is; how I would implement multi-threading? I have tried following tutorials and trying to create my code in a modular fashion, with functions, but I have been banging my head against this problem for over four hours with no progress. Any help would be much appreciated
It may be a bit chaotic, but the idea here is that the main loop calls the range_primes function, which loops over a specified amount of numbers and checks whether they are prime, and returns a list with the values found to be prime from the range. My thought process was that I could break off "chunks" of the number line and feed it to different processes to efficiently manage resources.
One of the main problems I ran into was that I could not figure out how to append all of the lists returned by the processes to a master output list. Just thought of this now, what about writing to a file? I/O operations are slow, but it might be easier to write to a new line on a txt file than anything else.
maybe a class of some sort to hold a process and its outputs, which can then be queried for the values?
my current (not working) code:
from time import perf_counter
import math
import multiprocessing
import subprocess
#process a chunk of numbers given start and length
def chunk_process(index:int =0, chunk:int = 13) -> list:
tmplst = []
tmplst = range_primes(((index)*chunk)+1, chunk*(index+1))
return tmplst
#check if a given number is prime
def isPrime_brute(num:int) -> bool:
if num > 1:
for i in range(2, int(math.sqrt(num))+1):
if (num % i) == 0:
return False
else:
return True#only returns true if nothing found
else:
return False
#get which numbers are primes in a given range
def range_primes(min_num:int=0, max_num:int=100) -> list:
out_list = []
mid = 0
#print(f'starting a test to list primes between {max_num} and {min_num}')
# print('Notifications will be given at the halfway point')
for i in range(min_num, max_num):
if isPrime_brute(i):
out_list.append(i)
return out_list
#print(out_list)
if __name__ == "__main__":
#setup multiprocessing
pool = multiprocessing.Pool(multiprocessing.cpu_count())
cores = multiprocessing.cpu_count()
maxsent = 0
#setup other stuff
time_run = 30 #S
startPoint = 0
loop_num = 0
primes = []
tmplst = []
chunk_size = 7
max_assigned = 0
print(f"starting calculation for {time_run} seconds")
start_time = perf_counter()
#majic(intentional)
while (perf_counter()-start_time) <= time_run:
r = pool.map_async(chunk_process, [i for i in range(maxsent, maxsent+cores+1)])
maxsent = maxsent+cores+1
#outputs
elapsed = (perf_counter()-start_time)#seconds taken
print(r)
print(f"{elapsed} S were used to find {len(r)} prime numbers")
print(f"{startPoint+chunk_size} numbers were tried")
print(f"with {chunk_size} chunking")
I know that it is not very cleanly written, but I just pieced it together in an evening and am not very experienced.
This version will run for a certain amount of time and quit operations shortly after a threshold is reached. As far as the multiprocessing goes, It is a patchwork of stuff that I have tried from different tutorials/documentation. I don't really have any idea how to proceed. Any input would be much appreciated.

Why does this 'optimized' prime checker run at the same speed as the regular version?

Given this plain is_prime1 function which checks all the divisors from 1 to sqrt(p) with some bit-playing in order to avoid even numbers which are of-course not primes.
import time
def is_prime1(p):
if p & 1 == 0:
return False
# if the LSD is 5 then it is divisible by 5 (i.e. not a prime)
elif p % 10 == 5:
return False
for k in range(2, int(p ** 0.5) + 1):
if p % k == 0:
return False
return True
Versus this "optimized" version. The idea is to save all the primes we have found until a certain number p, then we iterate on the primes (using this basic arithmetic rule that every number is a product of primes) so we don't iterate through the numbers until sqrt(p) but over the primes we found which supposed to be a tiny bit compared to sqrt(p). We also iterate only on half the elements, because then the largest prime would most certainly won't "fit" in the number p.
import time
global mem
global lenMem
mem = [2]
lenMem = 1
def is_prime2(p):
global mem
global lenMem
# if p is even then the LSD is off
if p & 1 == 0:
return False
# if the LSD is 5 then it is divisible by 5 (i.e. not a prime)
elif p % 10 == 5:
return False
for div in mem[0: int(p ** 0.5) + 1]:
if p % div == 0:
return False
mem.append(p)
lenMem += 1
return True
The only idea I have in mind is that "global variables are expensive and time consuming" but I don't know if there is another way, and if there is, will it really help?
On average, when running this same program:
start = time.perf_counter()
for p in range(2, 100000):
print(f'{p} is a prime? {is_prime2(p)}') # change to is_prime1 or is_prime2
end = time.perf_counter()
I get that for is_prime1 the average time for checking the numbers 1-100K is ~0.99 seconds and so is_prime2 (maybe a difference of +0.01s on average, maybe as I said the usage of global variables ruin some performance?)
The difference is a combination of three things:
You're just not doing that much less work. Your test case includes testing a ton of small numbers, where the distinction between testing "all numbers from 2 to square root" and testing "all primes from 2 to square root" just isn't that much of a difference. Your "average case" is roughly the midpoint of the range, 50,000, square root of 223.6, which means testing 48 primes, or testing 222 numbers if the number is prime, but most numbers aren't prime, and most numbers have at least one small factor (proof left as exercise), so you short-circuit and don't actually test most of the numbers in either set (if there's a factor below 8, which applies to ~77% of all numbers, you've saved maybe two tests by limiting yourself to primes)
You're slicing mem every time, which is performed eagerly, and completely, even if you don't use all the values (and as noted, you almost never do for the non-primes). This isn't a huge cost, but then, you weren't getting huge savings from skipping non-primes, so it likely eats what little savings you got from the other optimization.
(You found this one, good show) Your slice of primes took a number of primes to test equal to the square root of number to test, not all primes less than the square root of the number to test. So you actually performed the same number of tests, just with different numbers (many of them primes larger than the square root that definitely don't need to be tested).
A side-note:
Your up-front tests aren't actually saving you much work; you redo both tests in the loop, so they're wasted effort when the number is prime (you test them both twice). And your test for divisibility by five is pointless; % 10 is no faster than % 5 (computers don't operate in base-10 anyway), and if not p % 5: is a slightly faster, more direct, and more complete (your test doesn't recognize multiples of 10, just multiples of 5 that aren't multiples of 10) way to test for divisibility.
The tests are also wrong, because they don't exclude the base case (they say 2 and 5 are not prime, because they're divisible by 2 and 5 respectively).
First of all, you should remove the print call, it is very time consuming.
You should just time your function, not the print function, so you could do it like this:
start = time.perf_counter()
for p in range(2, 100000):
## print(f'{p} is a prime? {is_prime2(p)}') # change to is_prime1 or is_prime2
is_prime1(p)
end = time.perf_counter()
print ("prime1", end-start)
start = time.perf_counter()
for p in range(2, 100000):
## print(f'{p} is a prime? {is_prime2(p)}') # change to is_prime1 or is_prime2
is_prime2(p)
end = time.perf_counter()
print ("prime2", end-start)
is_prime1 is still faster for me.
If you want to hold primes in global memory to accelerate multiple calls, you need to ensure that the primes list is properly populated even when the function is called with numbers in random order. The way is_prime2() stores and uses the primes assumes that, for example, it is called with 7 before being called with 343. If not, 343 will be treated as a prime because 7 is not yet in the primes list.
So the function must compute and store all primes up to √49 before it can respond to the is_prime(343) call.
In order to quickly build a primes list, the Sieve of Eratosthenes is one of the fastest method. But, since you don't know in advance how many primes you need, you can't allocate the sieve's bit flags in advance. What you can do is use a rolling window of the sieve to move forward by chunks (of let"s say 1000000 bits at a time). When a number beyond your maximum prime is requested, you just generate more primes chunk by chunk until you have enough to respond.
Also, since you're going to build a list of primes, you might as well make it a set and check if the requested number is in it to respond to the function call. This will require generating more primes than needed for divisions but, in the spirit of accelerating subsequent calls, that should not be an issue.
Here's an example of an isPrime() function that uses that approach:
primes = {3}
sieveMax = 3
sieveChunk = 1000000 # must be an even number
def isPrime(n):
if not n&1: return n==2
global primes,sieveMax, sieveChunk
while n>sieveMax:
base,sieveMax = sieveMax, sieveMax + sieveChunk
sieve = [True]* sieveChunk
for p in primes:
i = (p - base%p)%p
sieve[i::p]=[False]*len(sieve[i::p])
for i in range(0, sieveChunk,2):
if not sieve[i]: continue
p = i + base
primes.add(p)
sieve[i::p] = [False]*len(sieve[i::p])
return n in primes
On the first call to an unknown prime, it will perform slower than the divisions approach but as the prime list builds up, it will provide much better response time.

How to stop loop running out of memory?

I've come back to programming after a long haitus so please forgive any stupid errors/inefficient code.
I am creating an encryption program that uses the RSA method of encryption which involves finding the coprimes of numbers to generate a key. I am using the Euclidean algorithm to generate highest common factors and then add the coprime to the list if HCF == 1. I generate two lists of coprimes for different numbers then compare to find coprimes in both sets. The basic code is below:
def gcd(a, b):
while b:
a,b=b,a%b
return a
def coprimes(n):
cp = []
for i in range(1,n):
if gcd(i, n) == 1:
cp.append(i)
print(cp)
def compare(n,m):
a = coprimes(n)
b = coprimes(m)
c = []
for i in a:
if i in b:
c.append(i)
print(c)
This code works perfectly for small numbers and gives me what I want but execution takes forever and is finally Killed when comupting for extremely large numbers in the billions range, which is necessary for even a moderate level of security.
I assume this is a memory issue but I cant work out how to do this in a non memory intensive way. I tried multiprocessing but that just made my computer unusable due to the amount of processes running.
How can I calculate the coprimes of large numbers and then compare two sets of coprimes in an efficent and workable way?
If the only thing you're worried about is running out of memory here you could use generators.
def coprimes(n):
for i in range(1,n):
if gcd(i, n) == 1:
yield i
This way you can use the coprime value then discard it once you don't need it. However, nothing is going to change the fact your code is O(N^2) and will always perform slow for large primes. And this assumes Euclid's algorithm is constant time, which it is not.
You could change the strategy and approach this from the perspective of common prime factors. The common coprimes between n and m will be all numbers that are not divisible by any of their common prime factors.
def primeFactors(N):
p = 2
while p*p<=N:
count = 0
while N%p == 0:
count += 1
N //= p
if count: yield p
p += 1 + (p&1)
if N>1: yield N
import math
def compare2(n,m):
# skip list for multiples of common prime factors
skip = { p:p for p in primeFactors(math.gcd(n,m)) }
for x in range(1,min(m,n)):
if x in skip:
p = skip[x] # skip multiple of common prime
m = x + p # determine next multiple to skip
while m in skip: m += p # for that prime
skip[m] = p
else:
yield x # comon coprime of n and m
The performance is considerably better than matching lists of coprimes, especially on larger numbers:
from timeit import timeit
timeit(lambda:list(compare2(10**5,2*10**5)),number=1)
# 0.025 second
timeit(lambda:list(compare2(10**6,2*10**6)),number=1)
# 0.196 second
timeit(lambda:list(compare2(10**7,2*10**7)),number=1)
# 2.18 seconds
timeit(lambda:list(compare2(10**8,2*10**8)),number=1)
# 20.3 seconds
timeit(lambda:list(compare2(10**9,2*10**9)),number=1)
# 504 seconds
At some point, building lists of all the coprimes becomes a bottleneck and you should just use/process them as they come out of the generator (for example to count how many there are):
timeit(lambda:sum(1 for _ in compare2(10**9,2*10**9)),number=1)
# 341 seconds
Another way to approach this, which is somewhat slower than the prime factor approach but much simpler to code, would be to list coprimes of the gcd between n and m:
import math
def compare3(n,m):
d = math.gcd(n,m)
for c in range(1,min(n,m)):
if math.gcd(c,d) == 1:
yield c
timeit(lambda:list(compare3(10**6,2*10**6)),number=1)
# 0.28 second
timeit(lambda:list(compare3(10**7,2*10**7)),number=1)
# 2.84 seconds
timeit(lambda:list(compare3(10**8,2*10**8)),number=1)
# 30.8 seconds
Given that it uses no memory resource, it could be advantageous in some cases:
timeit(lambda:sum(1 for _ in compare3(10**9,2*10**9)),number=1)
# 326 seconds

How to run multiple calculations at once using python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have recently started using python and to try learn I have set a task of being able to run two chunks of code at once.
I have 1 chunk of code to generate and append prime numbers into a list
primes=[]
for num in range(1,999999999999 + 1):
if num > 1:
for i in range(2,num):
if (num % i) == 0:
break
else:
primes.append(num)
And another chunk of code to use the prime numbers generates to find perfect numbers
limit = 25000000000000000000
for p in primes():
pp = 2**p
perfect = (pp - 1) * (pp // 2)
if perfect > limit:
break
elif is_prime(pp - 1):
print(perfect)
I have heard of something to do with importing thread or something along those lines but I am very confused by it, if anyone can help by giving me clear instructions on what to do that would be very appreciated. I have only been learning python for about a week now.
Final note, I didn't code these calculations myself but I have modified them to what I need them for
You can use the multiprocessing library to accomplish this. The basic idea is to have two sets of processes. The first process can fill up a queue with primes, and then you can delegate other processes to deal with those primes and print your perfect numbers.
I did make some changes and implemented a basic is_prime function. (Note that for this implementation you only need to check until the square root of the number). There are better methods but that's not what this question is about.
Anyways, our append_primes function is the same as your first loop, except instead of appending a prime to a list, it puts a prime into a queue. We need some sort of signal to say that we're done appending primes, which is why we have q.put("DONE") at the end. The "DONE" is arbitrary and can be any kind of signal you want, as long as you handle it appropriately.
Then, the perfect_number is kind of like your second loop. It accepts a single prime and prints out a perfect number, if it exists. You may want to return it instead, but that depends on your requirements.
Finally, all of the logic that runs and performs the multiprocessing has to sit inside an if __name__ == "__main__" block to avoid being re-run over and over as the file is pickled and sent to the new process. We initialize our queue and create/start the process to append primes to this queue.
While that's running, we create our own version of a multiprocessing pool. Standard mp pools don't play along with queues, so we have to get a little fancy. We initialize the maximum number of processes we want to run and set it to the current cpu count minus 1 (since 1 will be running the append_primes function.
We loop over q until "DONE" is returned (remember, that's our signal from append_primes). We'll continuously loop over the process pool until we find an available process. Once that happens, we create and start the process, then move on to the next number.
Finally, we do some cleanup and make sure everything in processes is done by calling Process.join() which blocks until the process is done executing. We also ensure prime_finder has finished.
import multiprocessing as mp
import os
import queue
import time
def is_prime(n):
""" Returns True if n is prime """
for i in range(2, int(n**0.5)):
if n%i == 0:
return False
return True
def append_primes(max, q):
""" Searches for primes between 2 and max and adds them to the Queue (q) """
pid = os.getpid()
for num in range(2, int(max)+1):
if is_prime(num):
print(f"{pid} :: Put {num} in queue.")
q.put(num)
q.put("DONE") # A signal to stop processing
return
def perfect_number(prime, limit = 25000000000000000000):
""" Prints the perfect number, if it exists, given the prime """
pp = 2**prime
perfect = (pp - 1) * (pp // 2)
if perfect > limit:
return
if is_prime(pp - 1):
print(f"{os.getpid()} :: Perfect: {perfect}", flush = True)
return
if __name__ == "__main__":
q = mp.Queue()
max = 1000 # When to stop looking for primes
prime_finder = mp.Process(target = append_primes, args = (max, q,))
prime_finder.start()
n_processes = os.cpu_count() - 1 # -1 because 1 is for prime_finder
processes = [None]*n_processes
for prime in iter(q.get, "DONE"):
proc_started = False
while not proc_started: # Check each process till we find an 'available' one.
for m, proc in enumerate(processes):
if proc is None or not proc.is_alive():
processes[m] = mp.Process(target = perfect_number, args = (prime, ))
processes[m].start()
proc_started = True # Get us out of the while loop
break # and out of the for loop.
for proc in processes:
if proc is None: # In case max < n_processes
continue
proc.join()
prime_finder.join()
Comment out the print statement in append_primes if you only want to see the perfect number. The number that appears before is the process' ID (just so that you can see there are multiple processes working at the same time)
Why do 2 for loops at once when you can just put the logic of the second loop inside the first loop: Just instead of the break in the perfects loop use a bool to determine if you've reached the limit.
Also you don't need to check if num > 1. Just start the range at 2
primes=[]
limit = 25_000_000_000_000_000_000
reached_limit = False
def is_prime(n):
return 2 in [n,2**n%n]
for num in range(2, 1_000_000_000_000):
for i in range(2,num):
if (num % i) == 0:
break
else:
primes.append(num)
if not reached_limit:
pp = 2 ** num
perfect = (pp - 1) * (pp // 2)
if perfect > limit:
reached_limit = True
elif is_prime(pp-1):
print(perfect)

Make sure python script is still running

I have a small script that calculates something. It uses a primitive brute force algorithm and is inherently slow. I expect it to take about 30 minutes to complete. The script only has one print statement at the end when it is done. I would like to have something o make sure the script is still running. I do no want to include prints statements for each iteration of the loop, that seems unnecessary. How can I make sure a script that takes very long to execute is still running at a given time during the script execution. I do not want to cause my script to slow down because of this though. This is my script.
def triangle_numbers(num):
numbers = []
for item in range(1, num):
if num % item == 0:
numbers.append(item)
numbers.append(num)
return numbers
count = 1
numbers = []
while True:
if len(numbers) == 501:
print number
print count
break
numbers = triangle_numbers(count)
count += 1
You could print every 500 loops (or choose another number).
while True:
if len(numbers) == 501:
print number
print count
break
numbers = triangle_numbers(count)
count += 1
# print every 500 loops
if count % 500 == 0:
print count
This will let you know not only if it is running (which it obviously is unless it has finished), but how fast it is going (which I think might be more helpful to you).
FYI:
I expect your program will take more like 30 weeks than 30 minutes to compute. Try this:
'''
1. We only need to test for factors up to the square root of num.
2. Unless we are at the end, we only care about the number of numbers,
not storing them in a list.
3. xrange is better than range in this case.
4. Since 501 is odd, the number must be a perfect square.
'''
def divisors_count(sqrt):
num = sqrt * sqrt
return sum(2 for item in xrange(1, sqrt) if num % item == 0) + 1
def divisors(sqrt):
num = sqrt * sqrt
for item in xrange(1, sqrt):
if num % item == 0:
numbers.append(item)
numbers.append(item / sqrt)
numbers.append(sqrt)
return sorted(numbers)
sqrt = 1
while divisors_count(sqrt) != 501:
if sqrt % 500 == 0:
print sqrt * sqrt
sqrt += 1
print triangle_numbers(sqrt)
print sqrt * sqrt
though I suspect this will still take a long time. (In fact, I'm not convinced it will terminate.)
configure some external tool like supervisor
Supervisor starts its subprocesses via fork/exec and subprocesses don’t daemonize. The operating system signals Supervisor immediately when a process terminates, unlike some solutions that rely on troublesome PID files and periodic polling to restart failed processes.

Categories