How do I manage multiple processes in Python?

How do I manage multiple processes in Python? - python

I have a simple (i hope) question:
my problems started when i wrote a GUI.
i cannot refresh the user interface while executing heavy computations.
-if i use threads there is the G.I.L. (not too slow but the gui freezes)
i tryed so many things that my last hope is starting a new process (and here the problem)
first of all:
-i never used processes before (it could be a semantic error)
-i don't know the limitations ( and exceptions ) of processes
-i am running with cpython 3.1.2 , on Mac os x v 10.6.8
here is an example (not the real code but the result is the same) of what i need to solve:
from multiprocessing import *
def bob(q):
print(q)
A=Process(target=bob,args=("something"))
A.start()
A.is_alive()
A.join()
and the output is:
True
it doesn't print "something",so i guess it doesn't run the process,but "A.is_alive()" says it is running and when the interpreter arrives to "A.join()" it waits more or less forever
can someone explain me this?

You need to add comma: args=("something",).
Comma creates a tuple otherwise it is just a string in parentheses.

You should give a list of arguments, not just the argument. This does the job for me:
from multiprocessing import *
def bob(q):
print(q)
A=Process(target=bob,args=["something"])
A.start()
A.is_alive()
A.join()
The following using sleep-sort (http://stackoverflow.com/questions/6474318/what-is-the-time-complexity-of-the-sleep-sort) to sort upper case characters A-Z
somestring="DGAECBF"
from multiprocessing import *
def bob(t):
import time
time.sleep(ord(t)-ord("A"))
print(t)
p=[]
for c in somestring :
p.append(Process(target=bob,args=([c])))
p[-1].start()
for pp in p:
pp.join()

Related

Python multiprocessing pool.map self._event.wait(timeout) is hanging. Why is pool.map wait not responding?

multiprocessing pool.map works nicely on my old PC but does not work on the new PC.
It hangs in the call to
def wait(self,timeout=None)
self._event.wait(timeout)
at which time the cpu utilization drops to zero% with no further response like it has gone to sleep.
I wrote a simple test.py as follows
import multiprocessing as mp
letters = ['A','B','C']
def doit(letter):
for i in range(1000):
print(str(letter) + ' ' + str(i))
if __name__ == '__main__':
pool = mp.Pool()
pool.map(doit,letters)
This works on the old PC with i7-7700k(4cores,8logical), python365-64bit, Win10Pro, PyCharm2018.1 where the stdout displays letters and numbers in non-sequential order as expected.
Though this same code does not work on the new build i9-7960(16core-32logical), python37-64bit, Win10Pro, PyCharm2018.3
New PC bios version has not been updated from 2017/11 (4 months older)
pool.py appears to be the same on both machines (2006-2008 R Oudkerk)
The codeline where it hangs in the 'wait' function is ...
self._event.wait(timeout)
Any help please on where I might look next to find the cause.
Thanks in advance.
....
EDIT::
My further interpretation -
1. GIL (Global interpreter Lock) is not relevant here as this relates to multi-threading only, not multiprocessing.
2. multiprocessing.manager is unnecessary here as the code is consuming static input and producing independent output. So pool.close and pool.join are not required either, as I am not post-process joining results
3. This link is a good introduction to multiprocessing though I don't see a solution in here.
https://docs.python.org/2/library/multiprocessing.html#windows

Generating and running Haskell code from Python

We are writing a python program that attempts to synthesize a (simple) haskell-function given input-output pairs. Throughout the run of the program, we generate haskell code and check its correctness against the user-supplied examples.
Suppose we get as input "1 2" and expected output "3". We would (eventually)
come up with the plus function. We would then run
(\x y -> x + y) 1 2 in haskell and check if it evaluates to 3.
The way we currently do things is by running the following python code:
from subprocess import Popen, PIPE, STDOUT
proccess = Popen(f'ghc -e "{haskell_code}"', shell=True, stdout=PIPE, stderr=STDOUT)
haskell_output = proc.stdout.read().decode('utf-8').strip('\n')
As neither of us is familiar with ghc, haskell, processes, or really anything to do with any of this, we were hoping someone could help us with preforming this task in a (much) more efficient manner, as this is currently very slow.
Additionally, we would like to be able to perform more than a single statement. For example, we would like to import Data.Char so that our function can use “toUpper”. However, the way we are currently doing this is by sending a single lambda function and the inputs appended to it, and we aren't sure how to add an import statement above that (adding "\n" did not seem to work).
To summarize, we would like the fastest (runtime) solution that would allow us to test haskell functions from python (where we don’t have the code for all haskell functions in advance or at one point in time, but rather test as we generate the code), while allowing us to use more than a single statement (for example, importing).
Apologies if any of this is trivial or stupid, any help would be highly appreciated.

This seems like an odd thing to be doing.. but interesting none the less.
Two things come immediately to mind here. First is to use ghci repl instead of spawning a new process for every eval attempt. The idea is to stream your I/O into the ghci process instead of spawning a new ghc process for each attempt. The overhead of starting a new process for every eval seems to be quite performance killer. I'd usually go for expect, but since you want python, I'll call on pexpect:
import pexpect
import sys
from subprocess import Popen, PIPE, STDOUT
import time
REPL_PS = unicode('Prelude> ')
LOOPS = 100
def time_function(func):
def decorator(*args, **kwargs):
ts = time.time()
func(*args, **kwargs)
te = time.time()
print "total time", (te - ts)
return decorator
#time_function
def repl_loop():
repl = pexpect.spawnu('ghci')
repl.expect(REPL_PS)
for i in range(LOOPS):
repl.sendline('''(\\x y -> x + y) 1 2''')
_, haskell_output = repl.readline(), repl.readline()
repl.expect(REPL_PS)
#time_function
def subproc_loop():
for i in range(LOOPS):
proc = Popen('''ghc -e "(\\x y -> x + y) 1 2"''', shell=True, stdout=PIPE, stderr=STDOUT)
haskell_output = proc.stdout.read().decode('utf-8').strip('n')
# print haskell_output
repl_loop()
subproc_loop()
This gave me a very consistent >2x speed boost.
See pexpect doc for more info: https://github.com/pexpect/pexpect/
The second immediate idea would be to use some distributed computing. I don't have the time to build full blown demo here, but there are many great examples already living in the land of the internet and SO. The idea is to have multiple "python + ghci" processes reading eval attempts from a common queue then push the results to a common eval attempt checker. I don't know much about ghc(i) but a quick check shows that ghci is a multithreaded process so this may require multiple machines to pull off, each machine attempting different subsets of the the attempts in parallel.
Some links that may be of interest here:
How to use multiprocessing queue in Python?
https://docs.python.org/2/library/multiprocessing.html
https://eli.thegreenplace.net/2012/01/24/distributed-computing-in-python-with-multiprocessing

How can I execute a loop multiple times simultaneously to speed things up?

I have a Python script that does requests to a server, and checks their response. About 10% of the responses are special, so it prints a message when it encounters one.
It does 90000 iterations, and I managed to print the current progress in command prompt like this:
print('{0}/{1}{2}{3}{4}{5}'.format(str(current_iteration_number),"90000 ",speed, " requests/s, ready in: ", timeleft, " minutes."),end="\r")
It manages to do about 2.5 requests per second, but I know this could be at least 5 times faster. I tried this by executing the same script 5 times simultaneously (any more would result in the server blocking me for doing what could look like a Ddos attack). Although this works, having to run 5 command prompts at once and manually joining the results is not a nice way of doing things.
How can I let Python execute the 5 loops simultaneously by itself, and print the joined progress and results?

You can use threading or multiprocessing library. Besides if your operation is I/O bound you can use asyncio library, which will give you advantages of concurrency in single-thread.

import multiprocessing
import numpy as np
def my_loop(x):
#some stuff (this here is a silly example since it could be vectorized)
return x**2
pool = multiprocessing.Pool()
pool_list = [float(i) for i in range(90000)]
results = pool.map(my_loop,pool_list)

Python threading.thread.start() doesn't return control to main thread

I'm trying to a program that executes a piece of code in such a way that the user can stop its execution at any time without stopping the main program. I thought I could do this using threading.Thread, but then I ran the following code in IDLE (Python 3.3):
from threading import *
import math
def f():
eval("math.factorial(1000000000)")
t = Thread(target = f)
t.start()
The last line doesn't return: I eventually restarted the shell. Is this a consequence of the Global Interpreter Lock, or am I doing something wrong? I didn't see anything specific to this problem in the threading documentation (http://docs.python.org/3/library/threading.html)
I tried to do the same thing using a process:
from multiprocessing import *
import math
def f():
eval("math.factorial(1000000000)")
p = Process(target = f)
p.start()
p.is_alive()
The last line returns False, even though I ran it only a few seconds after I started the process! Based on my processor usage, I am forced to conclude that the process never started in the first place. Can somebody please explain what I am doing wrong here?

Thread.start() never returns! Could this have something to do with the C implementation of the math library?
As #eryksun pointed out in the comment: math.factorial() is implemented as a C function that doesn't release GIL so no other Python code may run until it returns.
Note: multiprocessing version should work as is: each Python process has its own GIL.
factorial(1000000000) has hundreds millions of digits. Try import time; time.sleep(10) as dummy calculation instead.
If you have issues with multithreaded code in IDLE then try the same code from the command line, to make sure that the error persists.
If p.is_alive() returns False after p.start() is already called then it might mean that there is an error in f() function e.g., MemoryError.
On my machine, p.is_alive() returns True and one of cpus is at 100% if I paste your code from the question into Python shell.
Unrelated: remove wildcard imports such as from multiprocessing import *. They may shadow other names in your code so that you can't be sure what a given name means e.g., threading could define eval function (it doesn't but it could) with a similar but different semantics that might break your code silently.
I want my program to be able to handle ridiculous inputs from the user gracefully
If you pass user input directly to eval() then the user can do anything.
Is there any way to get a process to print, say, an error message without constructing a pipe or other similar structure?
It is an ordinary Python code:
print(message) # works
The difference is that if several processes run print() then the output might be garbled. You could use a lock to synchronize print() calls.

Strange python queue behavior. Crashes if queue isn't named "queue"

The name kind of says it all. I'm writing this program in python 2.7, and I'm trying to take advantage of threaded queues to make a whole bunch of web requests. Here's the problem: I would like to have two different queues, one to handle the threaded requests, and a separate one to handle the responses. If I have a queue in my program that isn't named "queue", for example if I want the initial queue to be named "input_q", then the program crashes and just refuses to work. This makes absolutely no sense to me. In the code below, all of the imported custom modules work just fine (at least, they did independently, passed all unit tests, and don't see any reason they could be the source of the problem).
Also, via diagnostic statements, I have determined that it crashes just before it spawns the thread pool.
Thanks in advance.
EDIT: Crash may be the wrong term here. It actually just stops. Even after waiting half an hour to complete, when the original program ran in under thirty seconds, the program wouldn't run. When I told it to print out toCheck, it would only make it part way through the list, stop in the middle of an entry, and do nothing.
EDIT2: Sorry for wasting everyones time, I forgot about this post. Someone had changed one of my custom modules (threadcheck). It looks like it was initializing the module, then running along its merry way with the rest of the program. Threadcheck was crashing after initialization, when the program was in the middle of computations, and that crash was taking the whole thing down with it.
code:
from binMod import binExtract
from grabZip import grabZip
import random
import Queue
import time
import threading
import urllib2
from threadCheck import threadUrl
import datetime
queue = Queue.Queue()
#output_q = Queue.Queue()
#input_q = Queue.Queue()
#output = queue
p=90
qb = 22130167533
url = grabZip(qb)
logFile = "log.txt"
metaC = url.grabMetacell()
toCheck = []
print metaC[0]['images']
print "beginning random selection"
for i in range(4):
if (len(metaC[i]['images'])>0):
print metaC[i]['images'][0]
for j in range(len(metaC[i]['images'])):
chance = random.randint(0, 100)
if chance <= p:
toCheck.append(metaC[i]['images'][j]['resolution 7 url'])
print "Spawning threads..."
for i in range(20):
t = threadUrl(queue)
t.setDaemon(True)
t.start()
print "initializing queue..."
for i in range(len(toCheck)):
queue.put(toCheck[i])
queue.join()
#input_q.join()
output = open(logFile, 'a')
done = datetime.datetime.now()
results = "\n %s \t %s \t %s \t %s"%(done, qb, good, bad)
output.write(results)

What the names are is irrelevant to Python -- Python doesn't care, and the objects themselves (for the most part) don't even know the names they have been assigned to. So the problem has to be somewhere else.
As has been suggested in the comments, carefully check your renames of queue.
Also, try it without daemon mode.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I manage multiple processes in Python? - python

You need to add comma: args=("something",). Comma creates a tuple otherwise it is just a string in parentheses.

Related

Python multiprocessing pool.map self._event.wait(timeout) is hanging. Why is pool.map wait not responding?

Generating and running Haskell code from Python

How can I execute a loop multiple times simultaneously to speed things up?

Python threading.thread.start() doesn't return control to main thread

Strange python queue behavior. Crashes if queue isn't named "queue"

Categories

Resources