Opening QtApplication from python process that is forked from QtThread

Opening QtApplication from python process that is forked from QtThread - python

I have a server-client architecture. The server is a QtApplication and contains a QThread.
I am trying to open a new client process using python's built in multiprocessing, from the QThread, and from that new proccess, open a new QtApplication . Such that the server and the client are both running QtApplications
The problem is that I am getting the following error:
WARNING: QApplication was not created in the main() thread.
The QApplication is being created in the main thread of the new processes, so I am not sure why this error is occurring.

A Qoute from Kovid Goyal:
Don't use multiprocessing. multiprocessing is not thread safe, on unix
it uses fork() without exec() which means that it inherits
everything from the parent process including locks (which are in an invalid state in the child process), file handles, global objects
like QApplication and so on. Just as an illustration of the problems
multiprocessing can cause, if you use it with the standard library
logging module you can have your worker processes crash, since the
logging module uses locks.
After all, what do you expect to happen to QApplication on fork()?
There's no way for fork() to have the QApplication object magically
re-initialize itself.
Using multiprocessing will bite you in the rear on any project of
moderate complexity. Heck it bit me on the rear even while
implementing a muti-core replacment for grep. Instead use subprocess
to launch a worker process, feed it a module name, functions and
arguments on stdin using cPickle or json and then have it run the
task.
See: http://python.6.x6.nabble.com/multiprocessing-with-QApplication-td4977972.html

Related

Why is Python's subprocess module called that and not just 'process'?

I am new to threading and processes. I have been trying to understand asyncio. Researching asyncio on Doug Hellinger's Python Module of the Week section of Concurrency, I ran into the multiprocessing, threading, signal and subprocess modules.
I have been wondering why the name subprocess module was named thus. Why is the module not called process. And what is 'sub' [meaning below] about it?
Edit: Forgotten addition
There's a Popen class and I assume the 'P' stands for process.
The Github code comment says:
Popen(...): A class for flexibly executing a command in a new process
Doesn't the existence of the Popen class, give more reason to call the module process instead of subprocess?

Processes in most operating systems form a parent-child relationship. Processes created by another process are called child processes or subprocesses of that process:
A child process in computing is a process created by another
process (the parent process). This technique pertains to multitasking
operating systems, and is sometimes called a subprocess or
traditionally a subtask.
Python subprocess module provides facilities to create new child processes (i.e. every process created with this module will be subprocess of your Python program):
The subprocess module allows you to spawn new processes, connect to
their input/output/error pipes, and obtain their return codes.
It does not deal with arbitrary processes, so it makes sense to name it subprocess instead of just process.

subprocess provides an API for creating and communicating with secondary processes.
The "sub" in the module name refers to the fact that all processes you are going to start here will be child processes of your running Python process. They exist to support your Python code.

Interprocess Communication between two python scripts without STDOUT

I am trying to create a Monitor script that monitors all the threads or a huge python script which has several loggers running, several thread running.
From Monitor.py i could run subprocess and forward the STDOUT which might contain my status of the threads.. but since several loggers are running i am seeing other logging in that..
Question: How can run the main script as a separate process and get custom messages, thread status without interfering with logging. ( passing PIPE as argument ? )
Main_Script.py * Runs Several Threads * Each Thread has separate Loggers.
Monitor.py * Spins up the Main_script.py * Monitors the each of the threads in MainScript.py ( may be obtain other messages from Main_script in the future)
So Far, I tried subprocess, process from Multiprocessing.
Subprocess lets me start the Main_script and forward the stdout back to monitor but I see the logging of threads coming in through the same STDOUT. I am using the “import logging “ Library to log the data from each threads to separate files.
I tried “process” from Multiprocessing. I had to call the main function of the main_script.py as a process and send a PIPE argument to it from monitor.py. Now I can’t see the Main_script.py as a separate process when I run top command.

Normally, you want to change the child process to work like a typical Unix userland tool: the logging and other side-band information goes to stderr (or to a file, or syslog, etc.), and only the actual output goes to stdout.
Then, the problem is easy: just capture stdout to a PIPE that you process, and either capture stderr to a different PIPE, or pass it through to real stderr.
If that's not appropriate for some reason, you need to come up with some other mechanism for IPC: Unix or Windows named pipes, anonymous pipes that you pass by leaking the file descriptor across the fork/exec and then pass the fd as an argument, Unix-domain sockets, TCP or UDP localhost sockets, a higher-level protocol like a web service on top of TCP sockets, mmapped files, anonymous mmaps or pipes that you pass between processes via a Unix-domain socket or Windows API calls, …
As you can see, there are a huge number of options. Without knowing anything about your problem other than that you want "custom messages", it's impossible to tell you which one you want.
While we're at it: If you can rewrite your code around multiprocessing rather than subprocess, there are nice high-level abstractions built in to that module. For example, you can use a Queue that automatically manages synchronization and blocking, and also manages pickling/unpickling so you can just pass any (picklable) object rather than having to worry about serializing to text and parsing the text. Or you can create shared memory holding arrays of int32 objects, or NumPy arrays, or arbitrary structures that you define with ctypes. And so on. Of course you could build the same abstractions yourself, without needing to use multiprocessing, but it's a lot easier when they're there out of the box.
Finally, while your question is tagged ipc and pipe, and titled "Interprocess Communication", your description refers to threads, not processes. If you actually are using a bunch of threads in a single process, you don't need any of this.
You can just stick your results on a queue.Queue, or store them in a list or deque with a Lock around it, or pass in a callback to be called with each new result, or use a higher-level abstraction like concurrent.futures.ThreadPoolExecutor and return a Future object or an iterator of Futures, etc.

Safe to call multiprocessing from a thread in Python?

According to
https://github.com/joblib/joblib/issues/180, and Is there a safe way to create a subprocess from a thread in python?
the Python multiprocessing module does not allow use from within threads. Is this true?
My understanding is that its fine to fork from threads, as long as you
aren't holding a threading.Lock when you do so (in the current thread? anywhere in the process?). However, Python's documentation is silent on whether threading.Lock objects are safely shared after a fork.
There's also this: locks shared from the logging module causes issues with fork. https://bugs.python.org/issue6721
I'm not sure how this issue arises. It sounds like the state of any locks in the process are copied into the child process when the current thread forks (which seems like a design error and certain to deadlock). If so, does using multiprocessing really provide any protection against this (since I'm free to create my multiprocessing.Pool after threading.Lock is created and entered by other threads, and after threads have started that using the not-fork-safe logging module) -- the multiprocessing module docs are also silent about whether multiprocessing.Pools should be allocated before Locks.
Does replacing threading.Lock with multiprocessing.Lock everywhere avoid this issue and allow us to safely combine threads and forks?

It sounds like the state of any locks in the process are copied into the child process when the current thread forks (which seems like a design error and certain to deadlock).
It is not a design error, rather, fork() predates single-process multithreading. The state of all locks is copied into the child process because they're just objects in memory; the entire address-space of the process is copied as is in fork. There are only bad alternatives: either copy all threads over fork, or deny forking in multithreaded application.
Therefore, fork()ing in a multithreading program was never the safe thing to do, unless then followed by execve() or exit() in the child process.
Does replacing threading.Lock with multiprocessing.Lock everywhere avoid this issue and allow us to safely combine threads and forks?
No. Nothing makes it safe to combine threads and forks, it cannot be done.
The problem is that when you have multiple threads in a process, after fork() system call you cannot continue safely running the program in POSIX systems.
For example, Linux manuals fork(2):
After a fork(2) in a multithreaded program, the child can safely call
only async-signal-safe functions (see signal(7)) until such time as it
calls execve(2).
I.e. it is OK to fork() in a multithreaded program and then only call async-signal-safe C functions (which is a rather limited subset of C functions), until the child process has been replaced with another executable!
Unsafe C function calls in child processes are then for example
malloc for dynamic memory allocation
any <stdio.h> functions for formatted input
most of the pthread_* functions required for thread state handling, including creation of new threads...
Thus there is very little what the child process can actually safely do. Unfortunately CPython core developers have been downplaying the problems caused by this. Even now the documentation says:
Note that safely forking a multithreaded process is
problematic.
Quite an euphemism for "impossible".
It is safe to use multiprocessing from a Python process that has multiple threads of control provided that you're not using the fork start method; in Python 3.4+ it is now possible to change the start method. In previous Python versions including all of Python 2, the POSIX systems always behaved as if fork was specified as the start method; this would result in undefined behaviour.
The problems are not limited to just threading.Lock objects but all locks held by the C standard library, the C extensions etc. What is worse that most of the time people would say "it works for me"... until it stops from working.
There were even a cases where a seemingly single-threading Python program is actually multithreading in MacOS X, causing failures and deadlocks upon using multiprocessing.
Another problem is that all opened file handles, their use, shared sockets might behave oddly in programs that forks, but that would be the case even in single-threaded programs.
TL;DR: using multiprocessing in multithreaded programs, with C extensions, with opened sockets etc:
fine in 3.4+ & POSIX if you explicitly specify a starting method that is not fork,
fine in Windows because it doesn't support forking;
in Python 2 - 3.3 on POSIX: you'll mostly shoot yourself in the foot.

How to create a socket object using multiprocessing and pass it back to the parent process

I wish to create a sock.Socket object from a main program using multiprocessing to avoid blocking the main program if the socket is not available during the creation process.
The socket gets created but can not be returned to the main program.
After some reading it seems that objects can not be shared between python processes. It there a possible design pattern to circumvent this? Is multiprocessing suitable for this application or should I be considering another approach?

You should keep it really simple and have a socket handler thread which:
Opens the Socket,
Retains ownership of it,
Handles it for your main program, i.e. on data arriving passes it to the parent via one of an event, pubsub or a callback,
If the connection is lost either reopens it or lets the parent know and end cleanly,
On shutdown closes the socket for you.
Then everything becomes non-blocking. This is a much cleaner design pattern especially if you use pubsub.

is twisted incompatible with multiprocessing events and queues?

I am trying to simulate a network of applications that run using twisted. As part of my simulation I would like to synchronize certain events and be able to feed each process large amounts of data. I decided to use multiprocessing Events and Queues. However, my processes are getting hung.
I wrote the example code below to illustrate the problem. Specifically, (about 95% of the time on my sandy bridge machine), the 'run_in_thread' function finishes, however the 'print_done' callback is not called until after I press Ctrl-C.
Additionally, I can change several things in the example code to make this work more reliably such as: reducing the number of spawned processes, calling self.ready.set from reactor_ready, or changing the delay of deferLater.
I am guessing there is a race condition somewhere between the twisted reactor and blocking multiprocessing calls such as Queue.get() or Event.wait()?
What exactly is the problem I am running into? Is there a bug in my code that I am missing? Can I fix this or is twisted incompatible with multiprocessing events/queues?
Secondly, would something like spawnProcess or Ampoule be the recommended alternative? (as suggested in Mix Python Twisted with multiprocessing?)
Edits (as requested):
I've run into problems with all the reactors I've tried glib2reactor selectreactor, pollreactor, and epollreactor. The epollreactor seems to give the best results and seems to work fine for the example given below but still gives me the same (or a similar) problem in my application. I will continue investigating.
I'm running Gentoo Linux kernel 3.3 and 3.4, python 2.7, and I've tried Twisted 10.2.0, 11.0.0, 11.1.0, 12.0.0, and 12.1.0.
In addition to my sandy bridge machine, I see the same issue on my dual core amd machine.
#!/usr/bin/python
# -*- coding: utf-8 *-*
from twisted.internet import reactor
from twisted.internet import threads
from twisted.internet import task
from multiprocessing import Process
from multiprocessing import Event
class TestA(Process):
def __init__(self):
super(TestA, self).__init__()
self.ready = Event()
self.ready.clear()
self.start()
def run(self):
reactor.callWhenRunning(self.reactor_ready)
reactor.run()
def reactor_ready(self, *args):
task.deferLater(reactor, 1, self.node_ready)
return args
def node_ready(self, *args):
print 'node_ready'
self.ready.set()
return args
def reactor_running():
print 'reactor_running'
df = threads.deferToThread(run_in_thread)
df.addCallback(print_done)
def run_in_thread():
print 'run_in_thread'
for n in processes:
n.ready.wait()
def print_done(dfResult=None):
print 'print_done'
reactor.stop()
if __name__ == '__main__':
processes = [TestA() for i in range(8)]
reactor.callWhenRunning(reactor_running)
reactor.run()

The short answer is yes, Twisted and multiprocessing are not compatible with each other, and you cannot reliably use them as you are attempting to.
On all POSIX platforms, child process management is closely tied to SIGCHLD handling. POSIX signal handlers are process-global, and there can be only one per signal type.
Twisted and stdlib multiprocessing cannot both have a SIGCHLD handler installed. Only one of them can. That means only one of them can reliably manage child processes. Your example application doesn't control which of them will win that ability, so I would expect there to be some non-determinism in its behavior arising from that fact.
However, the more immediate problem with your example is that you load Twisted in the parent process and then use multiprocessing to fork and not exec all of the child processes. Twisted does not support being used like this. If you fork and then exec, there's no problem. However, the lack of an exec of a new process (perhaps a Python process using Twisted) leads to all kinds of extra shared state which Twisted does not account for. In your particular case, the shared state that causes this problem is the internal "waker fd" which is used to implement deferToThread. With the fd shared between the parent and all the children, when the parent tries to wake up the main thread to deliver the result of the deferToThread call, it most likely wakes up one of the child processes instead. The child process has nothing useful to do, so that's just a waste of time. Meanwhile the main thread in the parent never wakes up and never notices your threaded task is done.
It's possible you can avoid this issue by not loading any of Twisted until you've already created the child processes. This would turn your usage into a single-process use case as far as Twisted is concerned (in each process, it would be initially loaded, and then that process would not go on to fork at all, so there's no question of how fork and Twisted interact anymore). This means not even importing Twisted until after you've created the child processes.
Of course, this only helps you out as far as Twisted goes. Any other libraries you use could run into similar trouble (you mentioned glib2, that's a great example of another library that will totally choke if you try to use it like this).
I highly recommend not using the multiprocessing module at all. Instead, use any multi-process approach that involves fork and exec, not fork alone. Ampoule falls into that category.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.