How to run multiple calculations at once using python? [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have recently started using python and to try learn I have set a task of being able to run two chunks of code at once.
I have 1 chunk of code to generate and append prime numbers into a list
primes=[]
for num in range(1,999999999999 + 1):
if num > 1:
for i in range(2,num):
if (num % i) == 0:
break
else:
primes.append(num)
And another chunk of code to use the prime numbers generates to find perfect numbers
limit = 25000000000000000000
for p in primes():
pp = 2**p
perfect = (pp - 1) * (pp // 2)
if perfect > limit:
break
elif is_prime(pp - 1):
print(perfect)
I have heard of something to do with importing thread or something along those lines but I am very confused by it, if anyone can help by giving me clear instructions on what to do that would be very appreciated. I have only been learning python for about a week now.
Final note, I didn't code these calculations myself but I have modified them to what I need them for

You can use the multiprocessing library to accomplish this. The basic idea is to have two sets of processes. The first process can fill up a queue with primes, and then you can delegate other processes to deal with those primes and print your perfect numbers.
I did make some changes and implemented a basic is_prime function. (Note that for this implementation you only need to check until the square root of the number). There are better methods but that's not what this question is about.
Anyways, our append_primes function is the same as your first loop, except instead of appending a prime to a list, it puts a prime into a queue. We need some sort of signal to say that we're done appending primes, which is why we have q.put("DONE") at the end. The "DONE" is arbitrary and can be any kind of signal you want, as long as you handle it appropriately.
Then, the perfect_number is kind of like your second loop. It accepts a single prime and prints out a perfect number, if it exists. You may want to return it instead, but that depends on your requirements.
Finally, all of the logic that runs and performs the multiprocessing has to sit inside an if __name__ == "__main__" block to avoid being re-run over and over as the file is pickled and sent to the new process. We initialize our queue and create/start the process to append primes to this queue.
While that's running, we create our own version of a multiprocessing pool. Standard mp pools don't play along with queues, so we have to get a little fancy. We initialize the maximum number of processes we want to run and set it to the current cpu count minus 1 (since 1 will be running the append_primes function.
We loop over q until "DONE" is returned (remember, that's our signal from append_primes). We'll continuously loop over the process pool until we find an available process. Once that happens, we create and start the process, then move on to the next number.
Finally, we do some cleanup and make sure everything in processes is done by calling Process.join() which blocks until the process is done executing. We also ensure prime_finder has finished.
import multiprocessing as mp
import os
import queue
import time
def is_prime(n):
""" Returns True if n is prime """
for i in range(2, int(n**0.5)):
if n%i == 0:
return False
return True
def append_primes(max, q):
""" Searches for primes between 2 and max and adds them to the Queue (q) """
pid = os.getpid()
for num in range(2, int(max)+1):
if is_prime(num):
print(f"{pid} :: Put {num} in queue.")
q.put(num)
q.put("DONE") # A signal to stop processing
return
def perfect_number(prime, limit = 25000000000000000000):
""" Prints the perfect number, if it exists, given the prime """
pp = 2**prime
perfect = (pp - 1) * (pp // 2)
if perfect > limit:
return
if is_prime(pp - 1):
print(f"{os.getpid()} :: Perfect: {perfect}", flush = True)
return
if __name__ == "__main__":
q = mp.Queue()
max = 1000 # When to stop looking for primes
prime_finder = mp.Process(target = append_primes, args = (max, q,))
prime_finder.start()
n_processes = os.cpu_count() - 1 # -1 because 1 is for prime_finder
processes = [None]*n_processes
for prime in iter(q.get, "DONE"):
proc_started = False
while not proc_started: # Check each process till we find an 'available' one.
for m, proc in enumerate(processes):
if proc is None or not proc.is_alive():
processes[m] = mp.Process(target = perfect_number, args = (prime, ))
processes[m].start()
proc_started = True # Get us out of the while loop
break # and out of the for loop.
for proc in processes:
if proc is None: # In case max < n_processes
continue
proc.join()
prime_finder.join()
Comment out the print statement in append_primes if you only want to see the perfect number. The number that appears before is the process' ID (just so that you can see there are multiple processes working at the same time)

Why do 2 for loops at once when you can just put the logic of the second loop inside the first loop: Just instead of the break in the perfects loop use a bool to determine if you've reached the limit.
Also you don't need to check if num > 1. Just start the range at 2
primes=[]
limit = 25_000_000_000_000_000_000
reached_limit = False
def is_prime(n):
return 2 in [n,2**n%n]
for num in range(2, 1_000_000_000_000):
for i in range(2,num):
if (num % i) == 0:
break
else:
primes.append(num)
if not reached_limit:
pp = 2 ** num
perfect = (pp - 1) * (pp // 2)
if perfect > limit:
reached_limit = True
elif is_prime(pp-1):
print(perfect)

Related

getting prime number script multithreaded

I have been trying to widen my understanding of python recently, so I decided to make a prime number calculator to work on my optimization skills. I have worked all day on this and have improved the time to go through 0-100,000 to .15 seconds from 25~30 ish seconds. However, I am dealing with exponential growth in difficulty as I go higher in the iterations.
My question now is; how I would implement multi-threading? I have tried following tutorials and trying to create my code in a modular fashion, with functions, but I have been banging my head against this problem for over four hours with no progress. Any help would be much appreciated
It may be a bit chaotic, but the idea here is that the main loop calls the range_primes function, which loops over a specified amount of numbers and checks whether they are prime, and returns a list with the values found to be prime from the range. My thought process was that I could break off "chunks" of the number line and feed it to different processes to efficiently manage resources.
One of the main problems I ran into was that I could not figure out how to append all of the lists returned by the processes to a master output list. Just thought of this now, what about writing to a file? I/O operations are slow, but it might be easier to write to a new line on a txt file than anything else.
maybe a class of some sort to hold a process and its outputs, which can then be queried for the values?
my current (not working) code:
from time import perf_counter
import math
import multiprocessing
import subprocess
#process a chunk of numbers given start and length
def chunk_process(index:int =0, chunk:int = 13) -> list:
tmplst = []
tmplst = range_primes(((index)*chunk)+1, chunk*(index+1))
return tmplst
#check if a given number is prime
def isPrime_brute(num:int) -> bool:
if num > 1:
for i in range(2, int(math.sqrt(num))+1):
if (num % i) == 0:
return False
else:
return True#only returns true if nothing found
else:
return False
#get which numbers are primes in a given range
def range_primes(min_num:int=0, max_num:int=100) -> list:
out_list = []
mid = 0
#print(f'starting a test to list primes between {max_num} and {min_num}')
# print('Notifications will be given at the halfway point')
for i in range(min_num, max_num):
if isPrime_brute(i):
out_list.append(i)
return out_list
#print(out_list)
if __name__ == "__main__":
#setup multiprocessing
pool = multiprocessing.Pool(multiprocessing.cpu_count())
cores = multiprocessing.cpu_count()
maxsent = 0
#setup other stuff
time_run = 30 #S
startPoint = 0
loop_num = 0
primes = []
tmplst = []
chunk_size = 7
max_assigned = 0
print(f"starting calculation for {time_run} seconds")
start_time = perf_counter()
#majic(intentional)
while (perf_counter()-start_time) <= time_run:
r = pool.map_async(chunk_process, [i for i in range(maxsent, maxsent+cores+1)])
maxsent = maxsent+cores+1
#outputs
elapsed = (perf_counter()-start_time)#seconds taken
print(r)
print(f"{elapsed} S were used to find {len(r)} prime numbers")
print(f"{startPoint+chunk_size} numbers were tried")
print(f"with {chunk_size} chunking")
I know that it is not very cleanly written, but I just pieced it together in an evening and am not very experienced.
This version will run for a certain amount of time and quit operations shortly after a threshold is reached. As far as the multiprocessing goes, It is a patchwork of stuff that I have tried from different tutorials/documentation. I don't really have any idea how to proceed. Any input would be much appreciated.

Why does the python filter do not overflow when it processes an infinite sequence?

def _odd_iter():
n = 1
while True:
n = n + 2
yield n
def filt(n):
return lambda x: x % n > 0
def primes():
yield 2
it = _odd_iter()
while True:
n = next(it)
yield n
it = filter(filt(n),it)
For example: 【3,5,7,9,11,13,15 ......】
If I have to take number 7 from this sequence I want to judge whether it is a prime number that must be divided in 3 and 5 to determine And 3,5 of these information must be stored up even if the inert load or the more information will be stored in the future will be more and more slow calculation of the actual experiment but in fact generate prime speed is not lower and the memory does not explode and I want to know what the internal principles
In Python 3, as your post is tagged, filter is a lazily-evaluated generator-type object. If you tried to evaluate that entire filter object with e.g. it = list(filter(filt(n),it)), you would have a bad time. You would have an equally bad time if you ran your code in Python 2, in which filter() automatically returns a list.
A filter on an infinite iterable is not inherently problematic, though, because you can use it in a perfectly acceptable way, like a for loop:
it = filter(filt(n),it)
for iteration in it:
if input():
print(iteration)
else:
break

Python Displaying Progress

I have this python code to generate prime numbers. I added a little piece of code (between # Start progress code and # End progress code) to display the progress of the operation but it slowed down the operation.
#!/usr/bin/python
a = input("Enter a number: ")
f = open('data.log', 'w')
for x in range (2, a):
p = 1
# Start progress code
s = (float(x)/float(a))*100
print '\rProcessing ' + str(s) + '%',
# End progress code
for i in range(2, x-1):
c = x % i
if c == 0:
p = 0
break
if p != 0:
f.write(str(x) + ", ")
print '\rData written to \'data.log\'. Press Enter to exit...'
raw_input()
My question is how to show the progress of the operation without slowing down the actual code/loop. Thanks in advance ;-)
To answer your question I/O is very expensive, and so printing out your progress will have a huge impact on performance. I would avoid printing if possible.
If you are concerned about speed, there is a very nice optimization you can use to greatly speed up your code.
For you inner for loop, instead of
for i in range(2, x-1):
c = x % i
if c == 0:
p = 0
break
use
for i in range(2, x-1**(1.0/2)):
c = x % i
if c == 0:
p = 0
break
You only need to iterate from the range of 2 to the square root of the number you are primality testing.
You can use this optimization to offset any performance loss from printing your progress.
Your inner loop is O(n) time. If you're experiencing lag toward huge numbers then it's pretty normal. Also you're converting x and a into float while performing division; as they get bigger, it could slow down your process.
First, I hope this is a toy problem, because (on quick glance) it looks like the whole operation is O(n^2).
You probably want to put this at the top:
from __future__ import division # Make floating point division the default and enable the "//" integer division operator.
Typically for huge loops where each iteration is inexpensive, progress isn't output every iteration because it would take too long (as you say you are experiencing). Try outputting progress either a fixed number of times or with a fixed duration between outputs:
N_OUTPUTS = 100
OUTPUT_EVERY = (a-2) // 5
...
# Start progress code
if a % OUTPUT_EVERY == 0:
print '\rProcessing {}%'.format(x/a),
# End progress code
Or if you want to go by time instead:
UPDATE_DT = 0.5
import time
t = time.time()
...
# Start progress code
if time.time() - t > UPDATE_DT:
print '\rProcessing {}%'.format(x/a),
t = time.time()
# End progress code
That's going to be a little more expensive, but will guarantee that even as the inner loop slows down, you won't be left in the dark for more than one iteration or 0.5 seconds, whichever takes longer.

Make sure python script is still running

I have a small script that calculates something. It uses a primitive brute force algorithm and is inherently slow. I expect it to take about 30 minutes to complete. The script only has one print statement at the end when it is done. I would like to have something o make sure the script is still running. I do no want to include prints statements for each iteration of the loop, that seems unnecessary. How can I make sure a script that takes very long to execute is still running at a given time during the script execution. I do not want to cause my script to slow down because of this though. This is my script.
def triangle_numbers(num):
numbers = []
for item in range(1, num):
if num % item == 0:
numbers.append(item)
numbers.append(num)
return numbers
count = 1
numbers = []
while True:
if len(numbers) == 501:
print number
print count
break
numbers = triangle_numbers(count)
count += 1
You could print every 500 loops (or choose another number).
while True:
if len(numbers) == 501:
print number
print count
break
numbers = triangle_numbers(count)
count += 1
# print every 500 loops
if count % 500 == 0:
print count
This will let you know not only if it is running (which it obviously is unless it has finished), but how fast it is going (which I think might be more helpful to you).
FYI:
I expect your program will take more like 30 weeks than 30 minutes to compute. Try this:
'''
1. We only need to test for factors up to the square root of num.
2. Unless we are at the end, we only care about the number of numbers,
not storing them in a list.
3. xrange is better than range in this case.
4. Since 501 is odd, the number must be a perfect square.
'''
def divisors_count(sqrt):
num = sqrt * sqrt
return sum(2 for item in xrange(1, sqrt) if num % item == 0) + 1
def divisors(sqrt):
num = sqrt * sqrt
for item in xrange(1, sqrt):
if num % item == 0:
numbers.append(item)
numbers.append(item / sqrt)
numbers.append(sqrt)
return sorted(numbers)
sqrt = 1
while divisors_count(sqrt) != 501:
if sqrt % 500 == 0:
print sqrt * sqrt
sqrt += 1
print triangle_numbers(sqrt)
print sqrt * sqrt
though I suspect this will still take a long time. (In fact, I'm not convinced it will terminate.)
configure some external tool like supervisor
Supervisor starts its subprocesses via fork/exec and subprocesses don’t daemonize. The operating system signals Supervisor immediately when a process terminates, unlike some solutions that rely on troublesome PID files and periodic polling to restart failed processes.

python prime crunching: processing pool is slower?

So I've been messing around with python's multiprocessing lib for the last few days and I really like the processing pool. It's easy to implement and I can visualize a lot of uses. I've done a couple of projects I've heard about before to familiarize myself with it and recently finished a program that brute forces games of hangman.
Anywho, I was doing an execution time compairison of summing all the prime numbers between 1 million and 2 million both single threaded and through a processing pool. Now, for the hangman cruncher, putting the games in a processing pool improved execution time by about 8 times (i7 with 8 cores), but when grinding out these primes, it actually increased processing time by almost a factor of 4.
Can anyone tell me why this is? Here is the code for anyone interested in looking at or testing it:
#!/user/bin/python.exe
import math
from multiprocessing import Pool
global primes
primes = []
def log(result):
global primes
if result:
primes.append(result[1])
def isPrime( n ):
if n < 2:
return False
if n == 2:
return True, n
max = int(math.ceil(math.sqrt(n)))
i = 2
while i <= max:
if n % i == 0:
return False
i += 1
return True, n
def main():
global primes
#pool = Pool()
for i in range(1000000, 2000000):
#pool.apply_async(isPrime,(i,), callback = log)
temp = isPrime(i)
log(temp)
#pool.close()
#pool.join()
print sum(primes)
return
if __name__ == "__main__":
main()
It'll currently run in a single thread, to run through the processing pool, uncomment the pool statements and comment out the other lines in the main for loop.
the most efficient way to use multiprocessing is to divide the work into n equal sized chunks, with n the size of the pool, which should be approximately the number of cores on your system. The reason for this is that the work of starting subprocesses and communicating between them is quite large. If the size of the work is small compared to the number of work chunks, then the overhead of IPC becomes significant.
In your case, you're asking multiprocessing to process each prime individually. A better way to deal with the problem is to pass each worker a range of values, (probably just a start and end value) and have it return all of the primes in that range it found.
In the case of identifying large-ish primes, the work done grows with the starting value, and so You probably don't want to divide the total range into exactly n chunks, but rather n*k equal chunks, with k some reasonable, small number, say 10 - 100. that way, when some workers finish before others, there's more work left to do and it can be balanced efficiently across all workers.
Edit: Here's an improved example to show what that solution might look like. I've changed as little as possible so you can compare apples to apples.
#!/user/bin/python.exe
import math
from multiprocessing import Pool
global primes
primes = set()
def log(result):
global primes
if result:
# since the result is a batch of primes, we have to use
# update instead of add (or for a list, extend instead of append)
primes.update(result)
def isPrime( n ):
if n < 2:
return False
if n == 2:
return True, n
max = int(math.ceil(math.sqrt(n)))
i = 2
while i <= max:
if n % i == 0:
return False
i += 1
return True, n
def isPrimeWorker(start, stop):
"""
find a batch of primes
"""
primes = set()
for i in xrange(start, stop):
if isPrime(i):
primes.add(i)
return primes
def main():
global primes
pool = Pool()
# pick an arbitrary chunk size, this will give us 100 different
# chunks, but another value might be optimal
step = 10000
# use xrange instead of range, we don't actually need a list, just
# the values in that range.
for i in xrange(1000000, 2000000, step):
# call the *worker* function with start and stop values.
pool.apply_async(isPrimeWorker,(i, i+step,), callback = log)
pool.close()
pool.join()
print sum(primes)
return
if __name__ == "__main__":
main()

Categories