Python Multiprocessing starmap - python

i am trying to run a function in parallel with Multiprocessing starmap.
data = [(i, board) for i in range(board.width)]
if __name__ == '__main__':
p = mp.Pool(processes=mp.cpu_count())
ratings = p.starmap(self.rate, data)
print("Ratings: " + ratings)
My problem is that print is never executed. The Function just returns with None.
self.rate() should return a number.
Github: https://github.com/Builder20/Connect4/tree/develop
Any ideas?

Obvious the function loops. you can add logging to see which parts loops, I see the only candidate.
while canSet == -1:
opponentColumn = random.randint(0, board.width)
canSet = board.setStone(self.other, opponentColumn)
add
logging.info(board)
to see how it progresses

Related

Check if a function has been called from a loop

Is it possible to check dynamically whether a function has been called from within a loop?
Example:
def some_function():
if called_from_loop:
print("Hey, I'm called from a loop!")
for i in range(0, 1):
some_function()
some_function()
Expected output:
> Hey, I'm called from a loop!
Not sure if this is going to be 100% reliable. It's also rather cumbersome and would have to be written inside the function of interest rather than in its own discrete function for fairly obvious reasons.
We can get the source code for the currently executing .py file
We can also get a stack trace.
Therefore, we can determine the line number in the code from where our function has been called. Thus we can navigate the source code (backwards) taking care to note indentation and see where the indentation decreases and if the line where that occurs starts with either 'for' or 'while' then we were called from within a loop.
So here's the fun:
import traceback
import inspect
import sys
def getws(s):
return len(s) - len(s.lstrip())
def func():
source = inspect.getsource(sys.modules[__name__]).splitlines()
stack = traceback.extract_stack()
lineno = stack[-2].lineno-1
isLooped = False
if (ws := getws(source[lineno])) > 0:
for i in range(lineno-1, -1, -1):
if (_ws := getws(line := source[i])) < ws:
if (tokens := line.split()) and tokens[0] in {'for', 'while'}:
isLooped = True
break
if (ws := _ws) == 0:
break
print('Called from loop' if isLooped else 'Not from loop')
def looper():
for _ in range(1):
# banana
func()
def looper2(): # this code is nonsense
i = 5
while i > 0:
if i < 10:
func()
break
looper()
looper2()
func()
while True:
func()
break
if __name__ == '__main__':
func()
Output:
Called from loop
Called from loop
Not from loop
Called from loop
Not from loop

Returning the first non-zero result from Pool.async_map

I am using the python multiprocessing library in order to run a number of tests on a large array of numbers.
I have the follow syntax:
import multiprocessing as mp
pool = mp.Pool(processes = 6)
res = pool.async_map(testFunction, arrayOfNumbers)
However I want to return the first number that passes the test, and then exit. I am not interested in storing the array of results.
Currently testFunction will return 0 for any numbers that fail, so if doing this without any optimisation, I would wait for it to finish and use:
return filter(lambda x: x != 0, res)[0]
assuming there is a result. However since it is running asynchronously, I want to get the non-zero value as soon as possible.
What is the best approach to this?
I am not sure if this is the best approach, but it is a working approach. Adding tasks to a queue is non blocking and the program will keep operating. Now by storing all the possible return values I can iterate over them by myself.
The return values are actually close to a promise object, now by checking their ready() function I can check if the result is ready to be read. Then using the get() method I can verify what that value is. If I know the value is 0, I can terminate the pool early and return the final result.
A minimal working example demonstrating this is the following:
import time
import multiprocessing as mp
def worker(value):
print('working')
time.sleep(3)
return value
def main():
pool = mp.Pool(2) # Only two workers
results = []
for n in range(0, 8):
value = 0 if n == 0 else 1
results.append(pool.apply_async(worker, (value,)))
running = True
while running:
for result in results:
if result.ready() and result.get() == 0:
print(f"There was a zero returned")
pool.terminate()
running = False
if all(result.ready() for result in results):
running = False
pool.close()
pool.join()
if __name__ == '__main__':
main()
The expected output would be:
working
working
working
There was a zero returned
Process finished with exit code 0
I created a small pool of 2 processes, that are calling a function that will sleep for 3 seconds and then return either 1 or 0. Currently the first task will return a 0, and the program will early terminate after the results are available.
If there is no terminating task, the line:
if all(result.ready() for result in results):
running = False
Will terminate the loop if all processes are done.
If you would like to now all the results, you can use:
print([result.get() for result in results if result.ready()])

Python - using returns in conditional statements

I want to use return instead of print statements but when I replace the print statements with return I don't get anything back. I know I'm missing something obvious:
def consecCheck(A):
x = sorted(A)
for i in enumerate(x):
if i[1] == x[0]:
continue
print x[i[0]], x[i[0]-1]
p = x[i[0]] - x[i[0]-1]
print p
if p > 1:
print "non-consecutive"
break
elif x[i[0]] == len(x):
print "consecutive"
if __name__ == "__main__":
consecCheck([1,2,3,5])
-----UPDATE------ HERE IS CORRECTED CODE AFTER TAKING HEATH3N's answer:
def consecCheck(A):
x = sorted(A)
for i in enumerate(x):
if i[1] == x[0]:
continue
print x[i[0]], x[i[0]-1]
p = x[i[0]] - x[i[0]-1]
print p
if p > 1:
a = "non-consecutive"
break
elif x[i[0]] == len(x):
a = "consecutive"
return a
if __name__ == "__main__":
print consecCheck([4,3,7,1,5])
I don't think you understand what a return statement does:
A return statement causes execution to leave the current subroutine and resume at the point in the code immediately after where the subroutine was called, known as its return address.
You need to wrap consecCheck([1,2,3,5]) in a print statement. Otherwise, all it does it call the function (which no longer prints anything) and goes back to what it was doing.
print takes python object and outputs a printed representation to console/output window
when return statement used in function execution of program to calling location, also if function execution reaches to return statement then no other line will be executed. read detail difference between print and return
So In your case if you want to show result in output console you may do it as following example:
def my_function():
# your code
return <calculated-value>
val = my_function()
print(val) # so you can store return value of function in `val` and then print it or you can just directly write print(my_function())
In your code you are printing values and continuing execution in that case you might consider using yield keyword suggested by #COLDSPEED or just use print to all statement except last one

How to get all pool.apply_async processes to stop once any one process has found a match in python

I have the following code that is leveraging multiprocessing to iterate through a large list and find a match. How can I get all processes to stop once a match is found in any one processes? I have seen examples but I none of them seem to fit into what I am doing here.
#!/usr/bin/env python3.5
import sys, itertools, multiprocessing, functools
alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ12234567890!##$%^&*?,()-=+[]/;"
num_parts = 4
part_size = len(alphabet) // num_parts
def do_job(first_bits):
for x in itertools.product(first_bits, *itertools.repeat(alphabet, num_parts-1)):
# CHECK FOR MATCH HERE
print(''.join(x))
# EXIT ALL PROCESSES IF MATCH FOUND
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=4)
results = []
for i in range(num_parts):
if i == num_parts - 1:
first_bit = alphabet[part_size * i :]
else:
first_bit = alphabet[part_size * i : part_size * (i+1)]
pool.apply_async(do_job, (first_bit,))
pool.close()
pool.join()
Thanks for your time.
UPDATE 1:
I have implemented the changes suggested in the great approach by #ShadowRanger and it is nearly working the way I want it to. So I have added some logging to give an indication of progress and put a 'test' key in there to match.
I want to be able to increase/decrease the iNumberOfProcessors independently of the num_parts. At this stage when I have them both at 4 everything works as expected, 4 processes spin up (one extra for the console). When I change the iNumberOfProcessors = 6, 6 processes spin up but only for of them have any CPU usage. So it appears 2 are idle. Where as my previous solution above, I was able to set the number of cores higher without increasing the num_parts, and all of the processes would get used.
I am not sure about how to refactor this new approach to give me the same functionality. Can you have a look and give me some direction with the refactoring needed to be able to set iNumberOfProcessors and num_parts independently from each other and still have all processes used?
Here is the updated code:
#!/usr/bin/env python3.5
import sys, itertools, multiprocessing, functools
alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ12234567890!##$%^&*?,()-=+[]/;"
num_parts = 4
part_size = len(alphabet) // num_parts
iProgressInterval = 10000
iNumberOfProcessors = 6
def do_job(first_bits):
iAttemptNumber = 0
iLastProgressUpdate = 0
for x in itertools.product(first_bits, *itertools.repeat(alphabet, num_parts-1)):
sKey = ''.join(x)
iAttemptNumber = iAttemptNumber + 1
if iLastProgressUpdate + iProgressInterval <= iAttemptNumber:
iLastProgressUpdate = iLastProgressUpdate + iProgressInterval
print("Attempt#:", iAttemptNumber, "Key:", sKey)
if sKey == 'test':
print("KEY FOUND!! Attempt#:", iAttemptNumber, "Key:", sKey)
return True
def get_part(i):
if i == num_parts - 1:
first_bit = alphabet[part_size * i :]
else:
first_bit = alphabet[part_size * i : part_size * (i+1)]
return first_bit
if __name__ == '__main__':
# with statement with Py3 multiprocessing.Pool terminates when block exits
with multiprocessing.Pool(processes = iNumberOfProcessors) as pool:
# Don't need special case for final block; slices can
for gotmatch in pool.imap_unordered(do_job, map(get_part, range(num_parts))):
if gotmatch:
break
else:
print("No matches found")
UPDATE 2:
Ok here is my attempt at trying #noxdafox suggestion. I have put together the following based on the link he provided with his suggestion. Unfortunately when I run it I get the error:
... line 322, in apply_async
raise ValueError("Pool not running")
ValueError: Pool not running
Can anyone give me some direction on how to get this working.
Basically the issue is that my first attempt did multiprocessing but did not support canceling all processes once a match was found.
My second attempt (based on #ShadowRanger suggestion) solved that problem, but broke the functionality of being able to scale the number of processes and num_parts size independently, which is something my first attempt could do.
My third attempt (based on #noxdafox suggestion), throws the error outlined above.
If anyone can give me some direction on how to maintain the functionality of my first attempt (being able to scale the number of processes and num_parts size independently), and add the functionality of canceling all processes once a match was found it would be much appreciated.
Thank you for your time.
Here is the code from my third attempt based on #noxdafox suggestion:
#!/usr/bin/env python3.5
import sys, itertools, multiprocessing, functools
alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ12234567890!##$%^&*?,()-=+[]/;"
num_parts = 4
part_size = len(alphabet) // num_parts
iProgressInterval = 10000
iNumberOfProcessors = 4
def find_match(first_bits):
iAttemptNumber = 0
iLastProgressUpdate = 0
for x in itertools.product(first_bits, *itertools.repeat(alphabet, num_parts-1)):
sKey = ''.join(x)
iAttemptNumber = iAttemptNumber + 1
if iLastProgressUpdate + iProgressInterval <= iAttemptNumber:
iLastProgressUpdate = iLastProgressUpdate + iProgressInterval
print("Attempt#:", iAttemptNumber, "Key:", sKey)
if sKey == 'test':
print("KEY FOUND!! Attempt#:", iAttemptNumber, "Key:", sKey)
return True
def get_part(i):
if i == num_parts - 1:
first_bit = alphabet[part_size * i :]
else:
first_bit = alphabet[part_size * i : part_size * (i+1)]
return first_bit
def grouper(iterable, n, fillvalue=None):
args = [iter(iterable)] * n
return itertools.zip_longest(*args, fillvalue=fillvalue)
class Worker():
def __init__(self, workers):
self.workers = workers
def callback(self, result):
if result:
self.pool.terminate()
def do_job(self):
print(self.workers)
pool = multiprocessing.Pool(processes=self.workers)
for part in grouper(alphabet, part_size):
pool.apply_async(do_job, (part,), callback=self.callback)
pool.close()
pool.join()
print("All Jobs Queued")
if __name__ == '__main__':
w = Worker(4)
w.do_job()
You can check this question to see an implementation example solving your problem.
This works also with concurrent.futures pool.
Just replace the map method with apply_async and iterated over your list from the caller.
Something like this.
for part in grouper(alphabet, part_size):
pool.apply_async(do_job, part, callback=self.callback)
grouper recipe
multiprocessing isn't really designed to cancel tasks, but you can simulate it for your particular case by using pool.imap_unordered and terminating the pool when you get a hit:
def do_job(first_bits):
for x in itertools.product(first_bits, *itertools.repeat(alphabet, num_parts-1)):
# CHECK FOR MATCH HERE
print(''.join(x))
if match:
return True
# If we exit loop without a match, function implicitly returns falsy None for us
# Factor out part getting to simplify imap_unordered use
def get_part(i):
if i == num_parts - 1:
first_bit = alphabet[part_size * i :]
else:
first_bit = alphabet[part_size * i : part_size * (i+1)]
if __name__ == '__main__':
# with statement with Py3 multiprocessing.Pool terminates when block exits
with multiprocessing.Pool(processes=4) as pool:
# Don't need special case for final block; slices can
for gotmatch in pool.imap_unordered(do_job, map(get_part, range(num_parts))):
if gotmatch:
break
else:
print("No matches found")
This will run do_job for each part, returning results as fast as it can get them. When a worker returns True, the loop breaks, and the with statement for the Pool is exited, terminate-ing the Pool (dropping all work in progress).
Note that while this works, it's kind of abusing multiprocessing; it won't handle canceling individual tasks without terminating the whole Pool. If you need more fine grained task cancellation, you'll want to look at concurrent.futures, but even there, it can only cancel undispatched tasks; once they're running, they can't be cancelled without terminating the Executor or using a side-band means of termination (having the task poll some interprocess object intermittently to determine if it should continue running).

Python: Question about multiprocessing / multithreading and shared resources

Here's the simplest multi threading example I found so far:
import multiprocessing
import subprocess
def calculate(value):
return value * 10
if __name__ == '__main__':
pool = multiprocessing.Pool(None)
tasks = range(10000)
results = []
r = pool.map_async(calculate, tasks, callback=results.append)
r.wait() # Wait on the results
print results
I have two lists and one index to access the elements in each list. The ith position on the first list is related to the ith position on the second. I didn't use a dict because the lists are ordered.
What I was doing was something like:
for i in xrange(len(first_list)):
# do something with first_list[i] and second_list[i]
So, using that example, I think can make a function sort of like this:
#global variables first_list, second_list, i
first_list, second_list, i = None, None, 0
#initialize the lists
...
#have a function to do what the loop did and inside it increment i
def function:
#do stuff
i += 1
But, that makes i a shared resource and I'm not sure if that'd be safe. It also seems to me my design is not lending itself well to this multithreaded approach, but I'm not sure how to fix it.
Here's a working example of what I wanted (Edit an image you want to use):
import multiprocessing
import subprocess, shlex
links = ['http://www.example.com/image.jpg']*10 # don't use this URL
names = [str(i) + '.jpg' for i in range(10)]
def download(i):
command = 'wget -O ' + names[i] + ' ' + links[i]
print command
args = shlex.split(command)
return subprocess.call(args, shell=False)
if __name__ == '__main__':
pool = multiprocessing.Pool(None)
tasks = range(10)
r = pool.map_async(download, tasks)
r.wait() # Wait on the results
First off, it might be beneficial to make one list of tuples, for example
new_list[i] = (first_list[i], second_list[i])
That way, as you change i, you ensure that you are always operating on the same items from first_list and second_list.
Secondly, assuming there are no relations between the i and i-1 entries in your lists, you can use your function to operate on one given i value, and spawn a thread to handle each i value. Consider
indices = range(len(new_list))
results = []
r = pool.map_async(your_function, indices, callback=results.append)
r.wait() # Wait on the results
This should give you what you want.

Categories