I have been trying to widen my understanding of python recently, so I decided to make a prime number calculator to work on my optimization skills. I have worked all day on this and have improved the time to go through 0-100,000 to .15 seconds from 25~30 ish seconds. However, I am dealing with exponential growth in difficulty as I go higher in the iterations.
My question now is; how I would implement multi-threading? I have tried following tutorials and trying to create my code in a modular fashion, with functions, but I have been banging my head against this problem for over four hours with no progress. Any help would be much appreciated
It may be a bit chaotic, but the idea here is that the main loop calls the range_primes function, which loops over a specified amount of numbers and checks whether they are prime, and returns a list with the values found to be prime from the range. My thought process was that I could break off "chunks" of the number line and feed it to different processes to efficiently manage resources.
One of the main problems I ran into was that I could not figure out how to append all of the lists returned by the processes to a master output list. Just thought of this now, what about writing to a file? I/O operations are slow, but it might be easier to write to a new line on a txt file than anything else.
maybe a class of some sort to hold a process and its outputs, which can then be queried for the values?
my current (not working) code:
from time import perf_counter
import math
import multiprocessing
import subprocess
#process a chunk of numbers given start and length
def chunk_process(index:int =0, chunk:int = 13) -> list:
tmplst = []
tmplst = range_primes(((index)*chunk)+1, chunk*(index+1))
return tmplst
#check if a given number is prime
def isPrime_brute(num:int) -> bool:
if num > 1:
for i in range(2, int(math.sqrt(num))+1):
if (num % i) == 0:
return False
else:
return True#only returns true if nothing found
else:
return False
#get which numbers are primes in a given range
def range_primes(min_num:int=0, max_num:int=100) -> list:
out_list = []
mid = 0
#print(f'starting a test to list primes between {max_num} and {min_num}')
# print('Notifications will be given at the halfway point')
for i in range(min_num, max_num):
if isPrime_brute(i):
out_list.append(i)
return out_list
#print(out_list)
if __name__ == "__main__":
#setup multiprocessing
pool = multiprocessing.Pool(multiprocessing.cpu_count())
cores = multiprocessing.cpu_count()
maxsent = 0
#setup other stuff
time_run = 30 #S
startPoint = 0
loop_num = 0
primes = []
tmplst = []
chunk_size = 7
max_assigned = 0
print(f"starting calculation for {time_run} seconds")
start_time = perf_counter()
#majic(intentional)
while (perf_counter()-start_time) <= time_run:
r = pool.map_async(chunk_process, [i for i in range(maxsent, maxsent+cores+1)])
maxsent = maxsent+cores+1
#outputs
elapsed = (perf_counter()-start_time)#seconds taken
print(r)
print(f"{elapsed} S were used to find {len(r)} prime numbers")
print(f"{startPoint+chunk_size} numbers were tried")
print(f"with {chunk_size} chunking")
I know that it is not very cleanly written, but I just pieced it together in an evening and am not very experienced.
This version will run for a certain amount of time and quit operations shortly after a threshold is reached. As far as the multiprocessing goes, It is a patchwork of stuff that I have tried from different tutorials/documentation. I don't really have any idea how to proceed. Any input would be much appreciated.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have recently started using python and to try learn I have set a task of being able to run two chunks of code at once.
I have 1 chunk of code to generate and append prime numbers into a list
primes=[]
for num in range(1,999999999999 + 1):
if num > 1:
for i in range(2,num):
if (num % i) == 0:
break
else:
primes.append(num)
And another chunk of code to use the prime numbers generates to find perfect numbers
limit = 25000000000000000000
for p in primes():
pp = 2**p
perfect = (pp - 1) * (pp // 2)
if perfect > limit:
break
elif is_prime(pp - 1):
print(perfect)
I have heard of something to do with importing thread or something along those lines but I am very confused by it, if anyone can help by giving me clear instructions on what to do that would be very appreciated. I have only been learning python for about a week now.
Final note, I didn't code these calculations myself but I have modified them to what I need them for
You can use the multiprocessing library to accomplish this. The basic idea is to have two sets of processes. The first process can fill up a queue with primes, and then you can delegate other processes to deal with those primes and print your perfect numbers.
I did make some changes and implemented a basic is_prime function. (Note that for this implementation you only need to check until the square root of the number). There are better methods but that's not what this question is about.
Anyways, our append_primes function is the same as your first loop, except instead of appending a prime to a list, it puts a prime into a queue. We need some sort of signal to say that we're done appending primes, which is why we have q.put("DONE") at the end. The "DONE" is arbitrary and can be any kind of signal you want, as long as you handle it appropriately.
Then, the perfect_number is kind of like your second loop. It accepts a single prime and prints out a perfect number, if it exists. You may want to return it instead, but that depends on your requirements.
Finally, all of the logic that runs and performs the multiprocessing has to sit inside an if __name__ == "__main__" block to avoid being re-run over and over as the file is pickled and sent to the new process. We initialize our queue and create/start the process to append primes to this queue.
While that's running, we create our own version of a multiprocessing pool. Standard mp pools don't play along with queues, so we have to get a little fancy. We initialize the maximum number of processes we want to run and set it to the current cpu count minus 1 (since 1 will be running the append_primes function.
We loop over q until "DONE" is returned (remember, that's our signal from append_primes). We'll continuously loop over the process pool until we find an available process. Once that happens, we create and start the process, then move on to the next number.
Finally, we do some cleanup and make sure everything in processes is done by calling Process.join() which blocks until the process is done executing. We also ensure prime_finder has finished.
import multiprocessing as mp
import os
import queue
import time
def is_prime(n):
""" Returns True if n is prime """
for i in range(2, int(n**0.5)):
if n%i == 0:
return False
return True
def append_primes(max, q):
""" Searches for primes between 2 and max and adds them to the Queue (q) """
pid = os.getpid()
for num in range(2, int(max)+1):
if is_prime(num):
print(f"{pid} :: Put {num} in queue.")
q.put(num)
q.put("DONE") # A signal to stop processing
return
def perfect_number(prime, limit = 25000000000000000000):
""" Prints the perfect number, if it exists, given the prime """
pp = 2**prime
perfect = (pp - 1) * (pp // 2)
if perfect > limit:
return
if is_prime(pp - 1):
print(f"{os.getpid()} :: Perfect: {perfect}", flush = True)
return
if __name__ == "__main__":
q = mp.Queue()
max = 1000 # When to stop looking for primes
prime_finder = mp.Process(target = append_primes, args = (max, q,))
prime_finder.start()
n_processes = os.cpu_count() - 1 # -1 because 1 is for prime_finder
processes = [None]*n_processes
for prime in iter(q.get, "DONE"):
proc_started = False
while not proc_started: # Check each process till we find an 'available' one.
for m, proc in enumerate(processes):
if proc is None or not proc.is_alive():
processes[m] = mp.Process(target = perfect_number, args = (prime, ))
processes[m].start()
proc_started = True # Get us out of the while loop
break # and out of the for loop.
for proc in processes:
if proc is None: # In case max < n_processes
continue
proc.join()
prime_finder.join()
Comment out the print statement in append_primes if you only want to see the perfect number. The number that appears before is the process' ID (just so that you can see there are multiple processes working at the same time)
Why do 2 for loops at once when you can just put the logic of the second loop inside the first loop: Just instead of the break in the perfects loop use a bool to determine if you've reached the limit.
Also you don't need to check if num > 1. Just start the range at 2
primes=[]
limit = 25_000_000_000_000_000_000
reached_limit = False
def is_prime(n):
return 2 in [n,2**n%n]
for num in range(2, 1_000_000_000_000):
for i in range(2,num):
if (num % i) == 0:
break
else:
primes.append(num)
if not reached_limit:
pp = 2 ** num
perfect = (pp - 1) * (pp // 2)
if perfect > limit:
reached_limit = True
elif is_prime(pp-1):
print(perfect)
def _odd_iter():
n = 1
while True:
n = n + 2
yield n
def filt(n):
return lambda x: x % n > 0
def primes():
yield 2
it = _odd_iter()
while True:
n = next(it)
yield n
it = filter(filt(n),it)
For example: 【3,5,7,9,11,13,15 ......】
If I have to take number 7 from this sequence I want to judge whether it is a prime number that must be divided in 3 and 5 to determine And 3,5 of these information must be stored up even if the inert load or the more information will be stored in the future will be more and more slow calculation of the actual experiment but in fact generate prime speed is not lower and the memory does not explode and I want to know what the internal principles
In Python 3, as your post is tagged, filter is a lazily-evaluated generator-type object. If you tried to evaluate that entire filter object with e.g. it = list(filter(filt(n),it)), you would have a bad time. You would have an equally bad time if you ran your code in Python 2, in which filter() automatically returns a list.
A filter on an infinite iterable is not inherently problematic, though, because you can use it in a perfectly acceptable way, like a for loop:
it = filter(filt(n),it)
for iteration in it:
if input():
print(iteration)
else:
break
I currently have ↓ set as my randprime(p,q) function. Is there any way to condense this, via something like a genexp or listcomp? Here's my function:
n = randint(p, q)
while not isPrime(n):
n = randint(p, q)
It's better to just generate the list of primes, and then choose from that line.
As is, with your code there is the slim chance that it will hit an infinite loop, either if there are no primes in the interval or if randint always picks a non-prime then the while loop will never end.
So this is probably shorter and less troublesome:
import random
primes = [i for i in range(p,q) if isPrime(i)]
n = random.choice(primes)
The other advantage of this is there is no chance of deadlock if there are no primes in the interval. As stated this can be slow depending on the range, so it would be quicker if you cached the primes ahead of time:
# initialising primes
minPrime = 0
maxPrime = 1000
cached_primes = [i for i in range(minPrime,maxPrime) if isPrime(i)]
#elsewhere in the code
import random
n = random.choice([i for i in cached_primes if p<i<q])
Again, further optimisations are possible, but are very much dependant on your actual code... and you know what they say about premature optimisations.
Here is a script written in python to generate n random prime integers between tow given integers:
import numpy as np
def getRandomPrimeInteger(bounds):
for i in range(bounds.__len__()-1):
if bounds[i + 1] > bounds[i]:
x = bounds[i] + np.random.randint(bounds[i+1]-bounds[i])
if isPrime(x):
return x
else:
if isPrime(bounds[i]):
return bounds[i]
if isPrime(bounds[i + 1]):
return bounds[i + 1]
newBounds = [0 for i in range(2*bounds.__len__() - 1)]
newBounds[0] = bounds[0]
for i in range(1, bounds.__len__()):
newBounds[2*i-1] = int((bounds[i-1] + bounds[i])/2)
newBounds[2*i] = bounds[i]
return getRandomPrimeInteger(newBounds)
def isPrime(x):
count = 0
for i in range(int(x/2)):
if x % (i+1) == 0:
count = count+1
return count == 1
#ex: get 50 random prime integers between 100 and 10000:
bounds = [100, 10000]
for i in range(50):
x = getRandomPrimeInteger(bounds)
print(x)
So it would be great if you could use an iterator to give the integers from p to q in random order (without replacement). I haven't been able to find a way to do that. The following will give random integers in that range and will skip anything that it's tested already.
import random
fail = False
tested = set([])
n = random.randint(p,q)
while not isPrime(n):
tested.add(n)
if len(tested) == p-q+1:
fail = True
break
while n in s:
n = random.randint(p,q)
if fail:
print 'I failed'
else:
print n, ' is prime'
The big advantage of this is that if say the range you're testing is just (14,15), your code would run forever. This code is guaranteed to produce an answer if such a prime exists, and tell you there isn't one if such a prime does not exist. You can obviously make this more compact, but I'm trying to show the logic.
next(i for i in itertools.imap(lambda x: random.randint(p,q)|1,itertools.count()) if isPrime(i))
This starts with itertools.count() - this gives an infinite set.
Each number is mapped to a new random number in the range, by itertools.imap(). imap is like map, but returns an iterator, rather than a list - we don't want to generate a list of inifinite random numbers!
Then, the first matching number is found, and returned.
Works efficiently, even if p and q are very far apart - e.g. 1 and 10**30, which generating a full list won't do!
By the way, this is not more efficient than your code above, and is a lot more difficult to understand at a glance - please have some consideration for the next programmer to have to read your code, and just do it as you did above. That programmer might be you in six months, when you've forgotten what this code was supposed to do!
P.S - in practice, you might want to replace count() with xrange (NOT range!) e.g. xrange((p-q)**1.5+20) to do no more than that number of attempts (balanced between limited tests for small ranges and large ranges, and has no more than 1/2% chance of failing if it could succeed), otherwise, as was suggested in another post, you might loop forever.
PPS - improvement: replaced random.randint(p,q) with random.randint(p,q)|1 - this makes the code twice as efficient, but eliminates the possibility that the result will be 2.
I have a small script that calculates something. It uses a primitive brute force algorithm and is inherently slow. I expect it to take about 30 minutes to complete. The script only has one print statement at the end when it is done. I would like to have something o make sure the script is still running. I do no want to include prints statements for each iteration of the loop, that seems unnecessary. How can I make sure a script that takes very long to execute is still running at a given time during the script execution. I do not want to cause my script to slow down because of this though. This is my script.
def triangle_numbers(num):
numbers = []
for item in range(1, num):
if num % item == 0:
numbers.append(item)
numbers.append(num)
return numbers
count = 1
numbers = []
while True:
if len(numbers) == 501:
print number
print count
break
numbers = triangle_numbers(count)
count += 1
You could print every 500 loops (or choose another number).
while True:
if len(numbers) == 501:
print number
print count
break
numbers = triangle_numbers(count)
count += 1
# print every 500 loops
if count % 500 == 0:
print count
This will let you know not only if it is running (which it obviously is unless it has finished), but how fast it is going (which I think might be more helpful to you).
FYI:
I expect your program will take more like 30 weeks than 30 minutes to compute. Try this:
'''
1. We only need to test for factors up to the square root of num.
2. Unless we are at the end, we only care about the number of numbers,
not storing them in a list.
3. xrange is better than range in this case.
4. Since 501 is odd, the number must be a perfect square.
'''
def divisors_count(sqrt):
num = sqrt * sqrt
return sum(2 for item in xrange(1, sqrt) if num % item == 0) + 1
def divisors(sqrt):
num = sqrt * sqrt
for item in xrange(1, sqrt):
if num % item == 0:
numbers.append(item)
numbers.append(item / sqrt)
numbers.append(sqrt)
return sorted(numbers)
sqrt = 1
while divisors_count(sqrt) != 501:
if sqrt % 500 == 0:
print sqrt * sqrt
sqrt += 1
print triangle_numbers(sqrt)
print sqrt * sqrt
though I suspect this will still take a long time. (In fact, I'm not convinced it will terminate.)
configure some external tool like supervisor
Supervisor starts its subprocesses via fork/exec and subprocesses don’t daemonize. The operating system signals Supervisor immediately when a process terminates, unlike some solutions that rely on troublesome PID files and periodic polling to restart failed processes.
So I've been messing around with python's multiprocessing lib for the last few days and I really like the processing pool. It's easy to implement and I can visualize a lot of uses. I've done a couple of projects I've heard about before to familiarize myself with it and recently finished a program that brute forces games of hangman.
Anywho, I was doing an execution time compairison of summing all the prime numbers between 1 million and 2 million both single threaded and through a processing pool. Now, for the hangman cruncher, putting the games in a processing pool improved execution time by about 8 times (i7 with 8 cores), but when grinding out these primes, it actually increased processing time by almost a factor of 4.
Can anyone tell me why this is? Here is the code for anyone interested in looking at or testing it:
#!/user/bin/python.exe
import math
from multiprocessing import Pool
global primes
primes = []
def log(result):
global primes
if result:
primes.append(result[1])
def isPrime( n ):
if n < 2:
return False
if n == 2:
return True, n
max = int(math.ceil(math.sqrt(n)))
i = 2
while i <= max:
if n % i == 0:
return False
i += 1
return True, n
def main():
global primes
#pool = Pool()
for i in range(1000000, 2000000):
#pool.apply_async(isPrime,(i,), callback = log)
temp = isPrime(i)
log(temp)
#pool.close()
#pool.join()
print sum(primes)
return
if __name__ == "__main__":
main()
It'll currently run in a single thread, to run through the processing pool, uncomment the pool statements and comment out the other lines in the main for loop.
the most efficient way to use multiprocessing is to divide the work into n equal sized chunks, with n the size of the pool, which should be approximately the number of cores on your system. The reason for this is that the work of starting subprocesses and communicating between them is quite large. If the size of the work is small compared to the number of work chunks, then the overhead of IPC becomes significant.
In your case, you're asking multiprocessing to process each prime individually. A better way to deal with the problem is to pass each worker a range of values, (probably just a start and end value) and have it return all of the primes in that range it found.
In the case of identifying large-ish primes, the work done grows with the starting value, and so You probably don't want to divide the total range into exactly n chunks, but rather n*k equal chunks, with k some reasonable, small number, say 10 - 100. that way, when some workers finish before others, there's more work left to do and it can be balanced efficiently across all workers.
Edit: Here's an improved example to show what that solution might look like. I've changed as little as possible so you can compare apples to apples.
#!/user/bin/python.exe
import math
from multiprocessing import Pool
global primes
primes = set()
def log(result):
global primes
if result:
# since the result is a batch of primes, we have to use
# update instead of add (or for a list, extend instead of append)
primes.update(result)
def isPrime( n ):
if n < 2:
return False
if n == 2:
return True, n
max = int(math.ceil(math.sqrt(n)))
i = 2
while i <= max:
if n % i == 0:
return False
i += 1
return True, n
def isPrimeWorker(start, stop):
"""
find a batch of primes
"""
primes = set()
for i in xrange(start, stop):
if isPrime(i):
primes.add(i)
return primes
def main():
global primes
pool = Pool()
# pick an arbitrary chunk size, this will give us 100 different
# chunks, but another value might be optimal
step = 10000
# use xrange instead of range, we don't actually need a list, just
# the values in that range.
for i in xrange(1000000, 2000000, step):
# call the *worker* function with start and stop values.
pool.apply_async(isPrimeWorker,(i, i+step,), callback = log)
pool.close()
pool.join()
print sum(primes)
return
if __name__ == "__main__":
main()