How do I run some python code in another process? - python

I want to start, from Python, some other Python code, preferably a function, but in another process.
It is mandatory to run this in another process, because I want to run some concurrency tests, like opening a file that was opened exclusively by the parent process (this has to fail).
Requirements:
multiplatform: linux, osx, windows
compatible with Python 2.6-3.x

I would seriously take a look at the documentation for multiprocessing library of Python. From the first sentence of the package's description:
multiprocessing is a package that supports spawning processes using an API similar to the threading module.
It then goes on to say that it side-steps the GIL, which is what it sounds like you're trying to avoid. See their example of a trivial set up:
from multiprocessing import Process
def f(name):
print 'hello', name
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
p.join()
That's a function call being done in another process separate from the process you're inside. Again, all this from the documentation.

Related

Python windows multiprocessing with global imports

I've been tasked with porting some Python code that works perfectly on Linux, to Windows. Unfortunately the code fails with the error:
An attempt has been made to start a new process before current
process has finished its bootstrapping phase.
From what I've been reading this is all due to Windows not using fork, and instead using spawn to create new threads.
This is where I'd post code, but there's a problem also, there is so much of it. One file imports another, which imports another, which starts threads, which has another class that also spawns a new thread.
So from what I've been reading, it's just a matter of using the if __name__ == '__main__': guard to prevent an infinite loop of threads from spawning, but I don't know where to put it.
The python file that I run already has this in place, but threads are spawned from other methods in other classes, so do I also need to put it into those classes?
It also seems that whomever wrote this has used a global file, which gets imported into each of the separate python files, and than in turn also imports other Python files. Sorry for not giving any code, but I really don't even know where to start with this one.
Any input or advice would be greatly appreciated.
Just to clear up any potential confusion, it appears this code is using process, not threads. So there is code like
import multiprocessing as mp
....
manager = mp.Manager()
...
process = mp.Process(blah)
process.start()
process.join()
....
process.terminate()
To add some more information, after making some minor changes, I've got the program to run. It's using Flask to to provide a REST API, that runs functions when it receives various HTTP requests.
It's currently throwing this exception when calling
manager = mp.Manager()
This line is part of the global.py file that is used in each of the other python files.
OK, after some modifications, I've now got past that problem, and onto another one! The processes are now starting, but the variables aren't available when imported from another class, for example
global.py
if __name__ == '__main__':
thingy = manager.dict()
handler.py
from global import *
if thingy['status'] == 'working':
**NameError: name 'thingy' is not defined**

On what CPU cores are my Python processes running?

The setup
I have written a pretty complex piece of software in Python (on a Windows PC). My software starts basically two Python interpreter shells. The first shell starts up (I suppose) when you double click the main.py file. Within that shell, other threads are started in the following way:
# Start TCP_thread
TCP_thread = threading.Thread(name = 'TCP_loop', target = TCP_loop, args = (TCPsock,))
TCP_thread.start()
# Start UDP_thread
UDP_thread = threading.Thread(name = 'UDP_loop', target = UDP_loop, args = (UDPsock,))
TCP_thread.start()
The Main_thread starts a TCP_thread and a UDP_thread. Although these are separate threads, they all run within one single Python shell.
The Main_threadalso starts a subprocess. This is done in the following way:
p = subprocess.Popen(['python', mySubprocessPath], shell=True)
From the Python documentation, I understand that this subprocess is running simultaneously (!) in a separate Python interpreter session/shell. The Main_threadin this subprocess is completely dedicated to my GUI. The GUI starts a TCP_thread for all its communications.
I know that things get a bit complicated. Therefore I have summarized the whole setup in this figure:
I have several questions concerning this setup. I will list them down here:
Question 1 [Solved]
Is it true that a Python interpreter uses only one CPU core at a time to run all the threads? In other words, will the Python interpreter session 1 (from the figure) run all 3 threads (Main_thread, TCP_thread and UDP_thread) on one CPU core?
Answer: yes, this is true. The GIL (Global Interpreter Lock) ensures that all threads run on one CPU core at a time.
Question 2 [Not yet solved]
Do I have a way to track which CPU core it is?
Question 3 [Partly solved]
For this question we forget about threads, but we focus on the subprocess mechanism in Python. Starting a new subprocess implies starting up a new Python interpreter instance. Is this correct?
Answer: Yes this is correct. At first there was some confusion about whether the following code would create a new Python interpreter instance:
p = subprocess.Popen(['python', mySubprocessPath], shell = True)
The issue has been clarified. This code indeed starts a new Python interpreter instance.
Will Python be smart enough to make that separate Python interpreter instance run on a different CPU core? Is there a way to track which one, perhaps with some sporadic print statements as well?
Question 4 [New question]
The community discussion raised a new question. There are apparently two approaches when spawning a new process (within a new Python interpreter instance):
# Approach 1(a)
p = subprocess.Popen(['python', mySubprocessPath], shell = True)
# Approach 1(b) (J.F. Sebastian)
p = subprocess.Popen([sys.executable, mySubprocessPath])
# Approach 2
p = multiprocessing.Process(target=foo, args=(q,))
The second approach has the obvious downside that it targets just a function - whereas I need to open up a new Python script. Anyway, are both approaches similar in what they achieve?
Q: Is it true that a Python interpreter uses only one CPU core at a time to run all the threads?
No. GIL and CPU affinity are unrelated concepts. GIL can be released during blocking I/O operations, long CPU intensive computations inside a C extension anyway.
If a thread is blocked on GIL; it is probably not on any CPU core and therefore it is fair to say that pure Python multithreading code may use only one CPU core at a time on CPython implementation.
Q: In other words, will the Python interpreter session 1 (from the figure) run all 3 threads (Main_thread, TCP_thread and UDP_thread) on one CPU core?
I don't think CPython manages CPU affinity implicitly. It is likely relies on OS scheduler to choose where to run a thread. Python threads are implemented on top of real OS threads.
Q: Or is the Python interpreter able to spread them over multiple cores?
To find out the number of usable CPUs:
>>> import os
>>> len(os.sched_getaffinity(0))
16
Again, whether or not threads are scheduled on different CPUs does not depend on Python interpreter.
Q: Suppose that the answer to Question 1 is 'multiple cores', do I have a way to track on which core each thread is running, perhaps with some sporadic print statements? If the answer to Question 1 is 'only one core', do I have a way to track which one it is?
I imagine, a specific CPU may change from one time-slot to another. You could look at something like /proc/<pid>/task/<tid>/status on old Linux kernels. On my machine, task_cpu can be read from /proc/<pid>/stat or /proc/<pid>/task/<tid>/stat:
>>> open("/proc/{pid}/stat".format(pid=os.getpid()), 'rb').read().split()[-14]
'4'
For a current portable solution, see whether psutil exposes such info.
You could restrict the current process to a set of CPUs:
os.sched_setaffinity(0, {0}) # current process on 0-th core
Q: For this question we forget about threads, but we focus on the subprocess mechanism in Python. Starting a new subprocess implies starting up a new Python interpreter session/shell. Is this correct?
Yes. subprocess module creates new OS processes. If you run python executable then it starts a new Python interpeter. If you run a bash script then no new Python interpreter is created i.e., running bash executable does not start a new Python interpreter/session/etc.
Q: Supposing that it is correct, will Python be smart enough to make that separate interpreter session run on a different CPU core? Is there a way to track this, perhaps with some sporadic print statements as well?
See above (i.e., OS decides where to run your thread and there could be OS API that exposes where the thread is run).
multiprocessing.Process(target=foo, args=(q,)).start()
multiprocessing.Process also creates a new OS process (that runs a new Python interpreter).
In reality, my subprocess is another file. So this example won't work for me.
Python uses modules to organize the code. If your code is in another_file.py then import another_file in your main module and pass another_file.foo to multiprocessing.Process.
Nevertheless, how would you compare it to p = subprocess.Popen(..)? Does it matter if I start the new process (or should I say 'python interpreter instance') with subprocess.Popen(..)versus multiprocessing.Process(..)?
multiprocessing.Process() is likely implemented on top of subprocess.Popen(). multiprocessing provides API that is similar to threading API and it abstracts away details of communication between python processes (how Python objects are serialized to be sent between processes).
If there are no CPU intensive tasks then you could run your GUI and I/O threads in a single process. If you have a series of CPU intensive tasks then to utilize multiple CPUs at once, either use multiple threads with C extensions such as lxml, regex, numpy (or your own one created using Cython) that can release GIL during long computations or offload them into separate processes (a simple way is to use a process pool such as provided by concurrent.futures).
Q: The community discussion raised a new question. There are apparently two approaches when spawning a new process (within a new Python interpreter instance):
# Approach 1(a)
p = subprocess.Popen(['python', mySubprocessPath], shell = True)
# Approach 1(b) (J.F. Sebastian)
p = subprocess.Popen([sys.executable, mySubprocessPath])
# Approach 2
p = multiprocessing.Process(target=foo, args=(q,))
"Approach 1(a)" is wrong on POSIX (though it may work on Windows). For portability, use "Approach 1(b)" unless you know you need cmd.exe (pass a string in this case, to make sure that the correct command-line escaping is used).
The second approach has the obvious downside that it targets just a function - whereas I need to open up a new Python script. Anyway, are both approaches similar in what they achieve?
subprocess creates new processes, any processes e.g., you could run a bash script. multprocessing is used to run Python code in another process. It is more flexible to import a Python module and run its function than to run it as a script. See Call python script with input with in a python script using subprocess.
Since you are using the threading module which is build up on thread. As the documentation suggests, it uses the ''POSIX thread implementation'' pthread of your OS.
The threads are managed by the OS instead of Python interpreter. So the answer will depend on the pthread library in your system. However, CPython uses GIL to prevent multiple threads from executing Python bytecodes simutanously. So they will be sequentialized. But still they can be separated to different cores, which depends on your pthread libs.
Simplly use a debugger and attach it to your python.exe. For example the GDB thread command.
Similar to question 1, the new process is managed by your OS and probably running on a different core. Use debugger or any process monitor to see it. For more details, go to the CreatProcess() documentation page.
1, 2: You have three real threads, but in CPython they're limited by GIL , so, assuming they're running pure python, code you'll see CPU usage as if only one core used.
3: As said gdlmx it's up to OS to choose a core to run a thread on,
but if you really need control, you can set process or thread affinity using
native API via ctypes. Since you are on Windows, it would be like this:
# This will run your subprocess on core#0 only
p = subprocess.Popen(['python', mySubprocessPath], shell = True)
cpu_mask = 1
ctypes.windll.kernel32.SetProcessAffinityMask(p._handle, cpu_mask)
I use here private Popen._handle for simplicty. The clean way would beOpenProcess(p.tid) etc.
And yes, subprocess runs python like everything else in another new process.

is twisted incompatible with multiprocessing events and queues?

I am trying to simulate a network of applications that run using twisted. As part of my simulation I would like to synchronize certain events and be able to feed each process large amounts of data. I decided to use multiprocessing Events and Queues. However, my processes are getting hung.
I wrote the example code below to illustrate the problem. Specifically, (about 95% of the time on my sandy bridge machine), the 'run_in_thread' function finishes, however the 'print_done' callback is not called until after I press Ctrl-C.
Additionally, I can change several things in the example code to make this work more reliably such as: reducing the number of spawned processes, calling self.ready.set from reactor_ready, or changing the delay of deferLater.
I am guessing there is a race condition somewhere between the twisted reactor and blocking multiprocessing calls such as Queue.get() or Event.wait()?
What exactly is the problem I am running into? Is there a bug in my code that I am missing? Can I fix this or is twisted incompatible with multiprocessing events/queues?
Secondly, would something like spawnProcess or Ampoule be the recommended alternative? (as suggested in Mix Python Twisted with multiprocessing?)
Edits (as requested):
I've run into problems with all the reactors I've tried glib2reactor selectreactor, pollreactor, and epollreactor. The epollreactor seems to give the best results and seems to work fine for the example given below but still gives me the same (or a similar) problem in my application. I will continue investigating.
I'm running Gentoo Linux kernel 3.3 and 3.4, python 2.7, and I've tried Twisted 10.2.0, 11.0.0, 11.1.0, 12.0.0, and 12.1.0.
In addition to my sandy bridge machine, I see the same issue on my dual core amd machine.
#!/usr/bin/python
# -*- coding: utf-8 *-*
from twisted.internet import reactor
from twisted.internet import threads
from twisted.internet import task
from multiprocessing import Process
from multiprocessing import Event
class TestA(Process):
def __init__(self):
super(TestA, self).__init__()
self.ready = Event()
self.ready.clear()
self.start()
def run(self):
reactor.callWhenRunning(self.reactor_ready)
reactor.run()
def reactor_ready(self, *args):
task.deferLater(reactor, 1, self.node_ready)
return args
def node_ready(self, *args):
print 'node_ready'
self.ready.set()
return args
def reactor_running():
print 'reactor_running'
df = threads.deferToThread(run_in_thread)
df.addCallback(print_done)
def run_in_thread():
print 'run_in_thread'
for n in processes:
n.ready.wait()
def print_done(dfResult=None):
print 'print_done'
reactor.stop()
if __name__ == '__main__':
processes = [TestA() for i in range(8)]
reactor.callWhenRunning(reactor_running)
reactor.run()
The short answer is yes, Twisted and multiprocessing are not compatible with each other, and you cannot reliably use them as you are attempting to.
On all POSIX platforms, child process management is closely tied to SIGCHLD handling. POSIX signal handlers are process-global, and there can be only one per signal type.
Twisted and stdlib multiprocessing cannot both have a SIGCHLD handler installed. Only one of them can. That means only one of them can reliably manage child processes. Your example application doesn't control which of them will win that ability, so I would expect there to be some non-determinism in its behavior arising from that fact.
However, the more immediate problem with your example is that you load Twisted in the parent process and then use multiprocessing to fork and not exec all of the child processes. Twisted does not support being used like this. If you fork and then exec, there's no problem. However, the lack of an exec of a new process (perhaps a Python process using Twisted) leads to all kinds of extra shared state which Twisted does not account for. In your particular case, the shared state that causes this problem is the internal "waker fd" which is used to implement deferToThread. With the fd shared between the parent and all the children, when the parent tries to wake up the main thread to deliver the result of the deferToThread call, it most likely wakes up one of the child processes instead. The child process has nothing useful to do, so that's just a waste of time. Meanwhile the main thread in the parent never wakes up and never notices your threaded task is done.
It's possible you can avoid this issue by not loading any of Twisted until you've already created the child processes. This would turn your usage into a single-process use case as far as Twisted is concerned (in each process, it would be initially loaded, and then that process would not go on to fork at all, so there's no question of how fork and Twisted interact anymore). This means not even importing Twisted until after you've created the child processes.
Of course, this only helps you out as far as Twisted goes. Any other libraries you use could run into similar trouble (you mentioned glib2, that's a great example of another library that will totally choke if you try to use it like this).
I highly recommend not using the multiprocessing module at all. Instead, use any multi-process approach that involves fork and exec, not fork alone. Ampoule falls into that category.

Starting a process as a subprocess in Python

I am writing a program that uses multiple worker processes (a pre-forking model) with the following code.
from multiprocessing import Process
for i in range(0,3):
Process(target=worker, args=(i,)).start()
I use Windows. I notice that they are run as separate processes when I wanted them to start as subprocesses instead. How do I make them subprocesses of the main process?
I am hesitant to use the subprocess module as it seems suited to run external processes (as far as I have used it).
An update: It seems Windows does not launch new processes as sub-processes. Python doesnt support getppid() (get parent's PID) in Windows.
What do you wall subprocess ? To me they are subprocess of your main process. Here my example and returned output.
import time, os
from multiprocessing import Process
def worker():
print "I'm process %s, my father is %s" % (os.getpid(), os.getppid())
print "I'm the main process %s" % os.getpid()
for i in range(0,3):
Process(target=worker).start()
The output is :
I'm the main process 5897
I'm process 5898, my father is 5897
I'm process 5899, my father is 5897
I'm process 5900, my father is 5897
You have 3 subprocesses attached to a main process...
You seem to be confusing terminology here. A subprocess is a separate process. The processes that are created will be children of the main process of your program, and in that sense are subprocesses. If you want threads, then use multithreading instead of multiprocessing, but note that Python won't use multiple cores/CPUs for multiple threads.
I am hesitant to use the subprocess module as it seems suited to run external processes
I'm sorry, I don't understand this remark.
Short answer: http://docs.python.org/library/threading.html
Longer: I don't understand the question, aitchnyu. In the typical Unix model, the only processes a process can start are subprocesses. I have a strong feeling that there's a vocabulary conflict between the two of us I don't know how to unravel. You seem to have something like an "internal process" in mind; what's an example of that, in any language or operating system?
I can attest that Python's subprocess module is widely used.
You write "... multiple working threads ..." Have you read the documentation to which I refer in the first line at the top of this response?

Multi-threading different scripts

I have aa few scripts written in python.
I am trying to multi thread them.
When Script A starts. I would like scripts B, C, and D to start.
After A runs, I would A2 to run.
After B runs, I would B2 to run, then B3.
C and D have no follow up scripts.
I have checked that the scripts are independent of each other.
I planning on using "exec" to launch them, and would like to use this "launcher" on Linux and Windows."
I have other multi thread scripts mainly do a procedure A with five threads. This throwing me because all procedures are different but could start and run at the same time.
Ok I'm still not sure where exactly your problem is, but that's the way I'd solve the problem:
#Main.py
from multiprocessing import Process
import ScriptA
# import all other scripts as well
def handle_script_a(*args):
print("Call one or several functions from Script A or calculate some stuff beforehand")
ScriptA.foo(*args)
if __name__ == '__main__':
p = Process(target=handle_script_a, args=("Either so", ))
p1 = Process(target=ScriptA.foo, args=("or so", ))
p.start()
p1.start()
p.join()
p1.join()
# ScriptA.py:
def foo(*args):
print("Function foo called with args:")
for arg in args:
print(arg)
You can either call a function directly or if you want to call several functions in one process use a small wrapper for it. No platform dependent code, no ugly execs and you can create/join processes easily in whatever way fancies you.
And a small example of a queue for interprocess communication - pretty much stolen from the python API but well ;)
from multiprocessing import Process, Queue
def f(q):
q.put([42, None, 'hello'])
if __name__ == '__main__':
q = Queue()
p = Process(target=f, args=(q,))
p.start()
print(q.get()) # prints "[42, None, 'hello']"
p.join()
Create the queue and give it one or more processes. Note that get() blocks, if you want non blocking you can use get_nowait() or specify a timeout as 2nd argument. If you want shared objects there'd be multiprocessing.Array or multiprocessing.Value, just read the documentation for specific information doc link
If you've got more questions relative to IPC create a new question - a extremely large topic in itself.
So it doesn't have to be a Python launcher? Back when I was doing heavy sys admin, I wrote a Perl script using the POE framework to run scripts or whatever with a limited concurrency. Worked great. for example when we had to run a script over a thousand user accounts or a couple of hundred data bases. Limit it to just 4 jobs at a time on an 4-cpu box, 16 on a 16-way server, or any arbitrary number. POE does use fork() to create child procs, but on Windows boxes that works fine under cygwin, FWIW.
A while back I was looking for an equivalent event framework for Python. Looking again today I see Twisted--and some posts indicating that it runs even faster than POE--but maybe Twisted is mostly for network client/server? POE's incredibly flexible. It's tricky at first if you're not used to event driven scripting, and even if you are, but events are a lot easier to grock than threads. (Maybe over-kill for your needs? It's years later I'm still surprised there's not a simple utility to control throughput on multi-cpu machines.)

Categories