Interprocess Communication between two python scripts without STDOUT - python

I am trying to create a Monitor script that monitors all the threads or a huge python script which has several loggers running, several thread running.
From Monitor.py i could run subprocess and forward the STDOUT which might contain my status of the threads.. but since several loggers are running i am seeing other logging in that..
Question: How can run the main script as a separate process and get custom messages, thread status without interfering with logging. ( passing PIPE as argument ? )
Main_Script.py * Runs Several Threads * Each Thread has separate Loggers.
Monitor.py * Spins up the Main_script.py * Monitors the each of the threads in MainScript.py ( may be obtain other messages from Main_script in the future)
So Far, I tried subprocess, process from Multiprocessing.
Subprocess lets me start the Main_script and forward the stdout back to monitor but I see the logging of threads coming in through the same STDOUT. I am using the “import logging “ Library to log the data from each threads to separate files.
I tried “process” from Multiprocessing. I had to call the main function of the main_script.py as a process and send a PIPE argument to it from monitor.py. Now I can’t see the Main_script.py as a separate process when I run top command.

Normally, you want to change the child process to work like a typical Unix userland tool: the logging and other side-band information goes to stderr (or to a file, or syslog, etc.), and only the actual output goes to stdout.
Then, the problem is easy: just capture stdout to a PIPE that you process, and either capture stderr to a different PIPE, or pass it through to real stderr.
If that's not appropriate for some reason, you need to come up with some other mechanism for IPC: Unix or Windows named pipes, anonymous pipes that you pass by leaking the file descriptor across the fork/exec and then pass the fd as an argument, Unix-domain sockets, TCP or UDP localhost sockets, a higher-level protocol like a web service on top of TCP sockets, mmapped files, anonymous mmaps or pipes that you pass between processes via a Unix-domain socket or Windows API calls, …
As you can see, there are a huge number of options. Without knowing anything about your problem other than that you want "custom messages", it's impossible to tell you which one you want.
While we're at it: If you can rewrite your code around multiprocessing rather than subprocess, there are nice high-level abstractions built in to that module. For example, you can use a Queue that automatically manages synchronization and blocking, and also manages pickling/unpickling so you can just pass any (picklable) object rather than having to worry about serializing to text and parsing the text. Or you can create shared memory holding arrays of int32 objects, or NumPy arrays, or arbitrary structures that you define with ctypes. And so on. Of course you could build the same abstractions yourself, without needing to use multiprocessing, but it's a lot easier when they're there out of the box.
Finally, while your question is tagged ipc and pipe, and titled "Interprocess Communication", your description refers to threads, not processes. If you actually are using a bunch of threads in a single process, you don't need any of this.
You can just stick your results on a queue.Queue, or store them in a list or deque with a Lock around it, or pass in a callback to be called with each new result, or use a higher-level abstraction like concurrent.futures.ThreadPoolExecutor and return a Future object or an iterator of Futures, etc.

Related

Route messages from many processes to a single process without passing python object queue

High level goal:
I'm writing a python code base that will have many processes and threads running, some of which are child processes of each other and others which are not (e.g. independently started in another terminal). Each of these processes needs to write "log" messages eventually to a database. However, I'd prefer for these writes to be non-blocking for the time-sensitive processes, so I want to pass log messages to a log "server" from each "client" process, and then the server can do the blocking writes to the database. I want there to be exactly one database writing server process active at any time, which for now I assume I will initialize manually.
I can envision a few ways to pass information on to the server process.
I could create a python multiprocessing shared queue or pipe and pass this object every time a new process is initialized. This is not preferable because it means that every arbitrary process function must be written with an additional log queue argument, and furthermore the processes would have to be structured such that they all descend from a single ancestor process.
I could use a static address and port stored as an operating system environment port on which the server process listens. Each client process would send log messages to this address. To be non-blocking or low-latency these message sends would likely have to be UDP, meaning delivery would not be guaranteed.
Question:
Is there a middle ground, that allows for the creating of a C-style queue or pipe that can be referenced by any python process (e.g. at some static file location) without needing to explicitly pass it as a python object input to that process?

Sharing multiple pipes with subprocess for Windows and Unix

I currently have a worker subprocess that does a lot of processing for my main application. I have my stdin and stdout already connected between the two, but now I need more than just these two between the main application and the subprocess worker, since the subprocess worker is heavily threaded and should be able to run multiple different workload at the same time.
So for each workload, I want to dynamically create a separate pipe between the main and subprocess. I don't want to run more than 1 subprocess worker in the background, and want everything to run on a single subprocess.
My problem that I ran into is creating named pipes between the main and subprocess, that will work on both Unix an Windows. Unix has a os.mkfifo() that can be used with tempfiles, but that does not work on Windows. The os.pipe() will not work, because the memory block being allocated is only for the main application, and I have no way to link that with the subprocess.
So basically,
tmp_read, tmp_write = os.pipe( )
for both tmp_read and tmp_write, they are represented as integers, or memory blocks on the main app stack. I can't send these integers to the subprocess and connect, as the subprocess has no idea what they mean. Am I missing something, or is it not possible to share an undefined number of pipes between the processes using IPC? I can't use sockets for IPC either, since the computers that this has to run on are heavily restricted and I don't want to deal with blocked ports.

Interleaving stdout from Popen with messages from ZMQ recv

Is there a best-practices approach to poll for the stdout/stderr from a subprocess.Popen as well as a zmq socket?
In my case, I have my main program spawning a Popen subprocess. The subprocess publishes messages via zmq which I then want to subscribe to in my main program.
Waiting on multiple zmq sockets is not complicated with the zmq.Poller but when I want to interleave this with the output from my subprocess itself, I am unsure how to do it in the best way without risking waits or having needless loops.
In the end, I would like to use it like so:
process = Popen([prog, '--publish-to', 'tcp://127.0.0.1:89890'],
stdout=subprocess.PIPE, stderr=subprocess.PIPE, ...)
for (origin, data) in interleave(process, 'tcp://127.0.0.1:89890'):
if origin == 'STDOUT': pass
if origin == 'STDERR': pass
if origin == 'ZMQ': pass
prog --publish-to tcp://127.0.0.1:89890 will then open a zmq.PUB socket and publish data, whereas the interleave function will subscribe to this and also poll for stdout and stderr, yielding whatever data reaches it first.
I know how to define interleave with multiple daemon threads and queues but I don’t know if this approach might have some caveats with regards to lazy reading (ie. stdout might not be processed until the end of the program?) or other things that I have not yet thought about (seems also to be quite a bit of overhead for such a task).
I will be thankful for all ideas or insights.
I aim for at least Python 3.3/3.4 but if this turns out to be much easier with the new async/await tools, I could also use Python 3.5 for the code.
Use zmq.Poller: http://pyzmq.readthedocs.io/en/latest/api/zmq.html#polling. You can register zmq sockets and native file descriptors (e.g. process.stdout.fileno() and process.stderr.fileno()) there, and it will wait until input is available on at least one of the registered sources.
I don't know if it works in Windows, you should try.

Prevent a second process from listening to the same pipe in Python

I have a process that connects to a pipe with Python 2.7's multiprocessing.Listener() and waits for a message with recv(). I run it various on Windows 7 and Ubuntu 11.
On Windows, the pipe is called \\.\pipe\some_unique_id. On Ubuntu, the pipe is called /temp/some_unique_id. Other than that, the code is the same.
All works well, until, in an unrelated bug, monit starts a SECOND copy of the same program. It tries to listen to the exact same pipe.
I had naively* expected that the second connection attempt would fail, leaving the first connection unscathed.
Instead, I find the behaviour is officially undefined.
Note that data in a pipe may become corrupted if two processes (or threads) try to read from or write to the same end of the pipe at the same time.
On Ubuntu, the earlier copies seem to be ignored, and are left without any messages, while the latest version wins.
On Windows, there is some more complex behaviour. Sometimes the original pipe raises an EOFError exception on the recv() call. Sometimes, both listeners are allowed to co-exist and each message is distributed arbitrarily.
Is there a way to open a pipe exclusively, so the second process cannot open the pipe while the first process hasn't closed it or exited?
* I could have sworn I manually tested this exact scenario, but clearly I did not.
Other SO questions I looked at:
several TCP-servers on the same port - I don't (knowngly) set SO_REUSEADDR
Can two applications listen to the same port?
accept() with sockets shared between multiple processes (based on Apache preforking) - there's no forking involved.
Named pipes have the same access symantics as regular files. Any process with read or write permission can open the pipe for reading or writing.
If you had a way to guarantee that the two instances of the Python script were invoked by processes with differing UID's or GID's, then you can implement unique access control using file permissions.
If both instances of the script have the same UID and GID, you can try file locking implemented in Skip Montanaro's FileLock hosted on github. YMMV.
A simpler way to implement this might be to create a lock file in /var/lock that contains the PID of the process creating the lock file and then check for the existence of the lock file before opening the pipe. This scheme is used by most long-running daemons but has problems when the processes that create the lock files terminate in situations that prevent them from removing the lock file.
You could also try a Python System V semaphore to prevent synchronous access.

Python: when to use pty.fork() versus os.fork()

I'm uncertain whether to use pty.fork() or os.fork() when spawning external background processes from my app. (Such as chess engines)
I want the spawned processes to die if the parent is killed, as with spawning apps in a terminal.
What are the ups and downs between the two forks?
The child process created with os.fork() inherits stdin/stdout/stderr from parent process, while the child created with pty.fork() is connected to new pseudo terminal. You need the later when you write a program like xterm: pty.fork() in parent process returns a descriptor to control terminal of child process, so you can visually represent data from it and translate user actions into terminal input sequences.
Update:
From pty(7) man page:
A process that expects to be connected
to a terminal, can open the slave end
of a pseudo-terminal and then be
driven by a program that has
opened the master end. Anything that
is written on the master end is
provided to the process on the slave
end as though it was input typed on
a terminal. For example, writing the
interrupt character (usually
control-C) to the master device
would cause an interrupt signal
(SIGINT) to be generated for the
foreground process group that is
connected to the slave. Conversely,
anything that is written to the
slave end of the pseudo-terminal can
be read by the process that is
connected to the master end.
In the past I've always used the subprocess module for this. It provides a good api for communicating with subprocesses.
You can use call(*popenargs, **kwargs) for blocking execution of them, and I believe using the Popen class can handle async execution.
Check out the docs for more info.
As far as using os.fork vs pty.fork, both are highly platform dependent, and neither will work (or at least is tested) with windows. The pty module seems to be the more constrained of the two by reading the docs. The main difference being the pseudo terminal aspect. So if you aren't willing to architect your code in such a way as to be able to use the subprocess module, I'd probably go with os.fork instead of pty.fork.
Pseudotermials are necessary for some applications that really expect a terminal. An interactive shell is one of these examples but there are many other. The pty.fork option is not there as another os.fork but as a specific API to use a pseudoterminal.

Categories