Interprocess communication with a modified python interpreter

Interprocess communication with a modified python interpreter - python

TL;DR: How can I spawn a different python interpreter (from within python) and create a communication channel between the parent and child when stdin/stdout are unavailable?
I would like my python script to execute a modified python interpreter and through some kind of IPC such as multiprocessing.Pipe communicate with the script that interpreter runs.
Lets say I've got something similar to the following:
subprocess.Popen(args=["/my_modified_python_interpreter.exe",
"--my_additional_flag",
"my_python_script.py"])
Which works fine and well, executes my python script and all.
I would now like to set up some kind of interprocess communication with that modified python interpreter.
Ideally, I would like to share something similar to one of the returned values from multiprocessing.Pipe(), however I will need to share that object with the modified python process (and I suspect multiprocessing.Pipe won't handle that well even if I do that).
Although sending text and binary will be sufficient (I don't need to share python objects or anything), I do need this to be functional on all major OSes (windows, Linux, Mac).
Some more use-case/business explanation
More specifically, the modified interpreter is the IDAPython interpreter that is shipped with IDA to allow scripting within the IDA tool.
Unfortunately, since stdio is already heavily used for the existing user interface functionalities (provided by IDA), I cannot use stdin/stdout for the communication.
I'm searching for possibilities that are better than the one's I thought of:
Use two (rx and tx channels) hard-disk files and pass paths to both as the arguments.
Use a local socket and pass a path as an argument.
Use a memory mapped file and the tagname on windows and some other sync method on other OSes.

After some tinkering with the multiprocessing.Pipe function and the multiprocesing.Connection objects it returns, I realized that serialization of Connection objects is far simpler that I originally thought.
A Connection object has three descripting properties:
fileno - A handle. An arbitrary file descriptor on Unix and a socket on windows.
readable - A boolean controlling whether Connection object can be read.
writable - A boolean controlling whether Connection object can be written.
All three properties are accessible as object attributes and are controllable through the Connection class constructor.
It appears that if:
The process calling Pipe spawns a child process and shares the connection.fileno() number.
The child process creates a Connection object using that file descriptor as the handle.
Both interpreters implement the Connection object roughly the same (And this is the risky part, I guess).
It is possible to Connection.send and Connection.recv between those two processes although they do not share the same interpreter build and the multiprocessing module was not actually used to instantiate the child process.
EDIT:
Please note the Connection class is available as multiprocessing.connection.Connection in python3 and as _multiprocessing.Connection in python2 (which might suggest it's usage is discouraged. YMMV)

Going with the other answer of mine turned out to be a mistake. Because of how handles are inherited in python2 on Windows I couldn't get the same solution to work on Windows machines. I ended up using the far superior Listener and Client interfaces also found in the multiprocessing module.
This question of mine discusses that mistake.

Related

python inter-process mutex for arbitrary processes

I need to mutex several processes running python on a linux host.
They processes are not spawned in a way I control (to be clear, they are my code), so i cannot use multithreading.Lock, at least as I understand it. The resource being synchronized is a series of reads/writes to two separate internal services, which are old, stateful, not designed for concurrent/transactional access, and out of scope to modify.
a couple approaches I'm familiar with but rejected so far:
In native code using shmget / pthread_mutex_lock (eg create a pthread mutex by well-known string name, in shared memory provided by the OS). Im hoping to not have to use/add a ctypes wrapper for this (or ideally have any low-level constructs visible at all here for this high-level app).
Using one of the lock file libraries such as fasteners would work - but requiring any particular file system access is awkward (the library/approach could use it robustly under the hood, but ideally my client code is abstracted from that).
Is there a preferred way to accomplish this in python (under linux; bonus points for cross-platform)?

Options for synchronizing non-child processes:
Use a remote manager. I'm not super familiar with this process, but the docs has at least a simple example.
create a simple server with your own protocol (rather than a manager): something like a socket server on the loopback address for bouncing simple messages around.
use the filesystem: https://pypi.org/project/filelock/
On posix compliant systems, there's a rather straightforward wrapper for IPC constructs posix-ipc. I also found a wrapper for windows semaphores, but it's not quite as simple (though also not difficult per-say). In both cases your program would use a well known string "name" to access / create the mutex. In both cases, care / error checking is needed to handle creation of the mutex properly (see things like O_CREX flag...)

How to catch runtime errors from native code in python?

I have the following problem, Lets have this python function
def func():
run some code here which calls some native code
Inside func() I am calling some functions which in turn calls some native C code.
If any crash happens the whole python process crashes alltoghether.
How is possible to catch and recover from such errors?
One way that came to my mind is run this function in a separate process, but not just starting another process because there is a lot of memory and objects used by the function, will be very hard to split that. Is there something like fork() in C available in python, to create a copy of the same exact process with same memory structures and etc?
Or maybe other ideas?
Update:
It seems that there is no real way of catching the C runtime errors in python, those are at a lower level and crashes the whole Python virtual machine.
As solutions you currently have two options:
Use os.fork() but work only in unix like OS env.
Use multiprocessing and a shared memory model to share big objects between processes. Usual serialization will just not work with objects that have multi-gigabytes in memory (you will just run out of memory). However there is a very good python library called Ray (https://docs.ray.io/en/master/) that performs in-memory big objects serialization using shared memory model and it's ideal for BigData/ML workloads - highly recommended.

As long as you are running on an operating system that supports fork that's already how the multiprocessing module creates subprocesses. You could os.fork, multiprocessing.Process or multiprocessing.Pool to get what you want. You can also use the os.fork() call on these systems.

Why is Python's subprocess' popen so different between unix and windows?

I am trying to write cross-platform code in Python. The code should be spawning new shells and run code.
This lead me to look at Python's subprocess tool and in particular its Popen part. So I read through the documentation for this class Popen doc and find too many "if on Unix/if on Windows" statements. Not very cross-platform, unless I have misunderstood the doc.
What is going on? I understand that the two operating systems are different, but really, there is no way to write a common interface? I mean, the same arguments "windows is different than unix" can be applied to os, system, etc., and they all seem 100 % cross-platform.

The problem is that process management is something deeply engrained in the operating system and differs greatly not only in the implementation but often even in the basic functionality.
It's actually often rather easy to abstract code in for example the os class. Both C libraries, be it *nix or Windows, implement reading files as an I/O stream, so you can even write rather low level file operation functions which work the same in Windows and *nix.
But processes differ greatly. In *nix for example processes are all hierarchical, every process has a parent and all processes go back to the init system running under PID 1. A new process gets created by forking itself, checking if it's the parent or the child and then continuing accordingly.
In Windows processes are strictly non-hierarchical and get created by the CreateProcess () system call, for which you need special privileges.
There a good deal more differences, these were just two examples, but I hope it shows that implementing a platform independent process library is a very daunting task.

Python os.pipe vs multiprocessing.Pipe

Recently I'm studying parallel programming tools in Python. And here are two major differences between os.pipe and multiprocessing.Pipe.(despite the occasion they are used)
os.pipe is unidirectional, multiprocessing.Pipe is bidirectional;
When putting things into pipe/receive things from pipe, os.pipe uses encode/decode, while multiprocessing.Pipe uses pickle/unpickle
I want to know if my understanding is correct, and is there other difference? Thank you.

I believe everything you've stated is correct.
On Linux, os.pipe is just a Python interface for accessing traditional POSIX pipes. On Windows, it's implemented using CreatePipe. When you call it, you get two ordinary file descriptors back. It's unidirectional, and you just write bytes to it on one end that get buffered by the kernel until someone reads from the other side. It's fairly low-level, at least by Python standards.
multiprocessing.Pipe objects are much more high level interface, implemented using multiprocessing.Connection objects. On Linux, these are actually built on top of POSIX sockets, rather than POSIX pipes. On Windows, they're built using the CreateNamedPipe API. As you noted, multiprocessing.Connection objects can send/receive any picklable object, and will automatically handle the pickling/unpickling process, rather than just dealing with bytes. They're capable of being both bidirectional and unidirectional.

How to write a system agnostic Python daemon/service? [duplicate]

I would like to have my Python program run in the background as a daemon, on either Windows or Unix. I see that the python-daemon package is for Unix only; is there an alternative for cross platform? If possible, I would like to keep the code as simple as I can.

In Windows it's called a "service" and you could implement it pretty easily e.g. with the win32serviceutil module, part of pywin32. Unfortunately the two "mental models" -- service vs daemon -- are very different in detail, even though they serve similar purposes, and I know of no Python facade that tries to unify them into a single framework.

This question is 6 years old, but I had the same problem, and the existing answers weren't cross-platform enough for my use case. Though Windows services are often used in similar ways as Unix daemons, at the end of the day they differ substantially, and "the devil's in the details". Long story short, I set out to try and find something that allows me to run the exact same application code on both Unix and Windows, while fulfilling the expectations for a well-behaved Unix daemon (which is better explained elsewhere) as best as possible on both platforms:
Close open file descriptors (typically all of them, but some applications may need to protect some descriptors from closure)
Change the working directory for the process to a suitable location to prevent "Directory Busy" errors
Change the file access creation mask (os.umask in the Python world)
Move the application into the background and make it dissociate itself from the initiating process
Completely divorce from the terminal, including redirecting STDIN, STDOUT, and STDERR to different streams (often DEVNULL), and prevent reacquisition of a controlling terminal
Handle signals, in particular, SIGTERM.
The fundamental problem with cross-platform daemonization is that Windows, as an operating system, really doesn't support the notion of a daemon: applications that start from a terminal (or in any other interactive context, including launching from Explorer, etc) will continue to run with a visible window, unless the controlling application (in this example, Python) has included a windowless GUI. Furthermore, Windows signal handling is woefully inadequate, and attempts to send signals to an independent Python process (as opposed to a subprocess, which would not survive terminal closure) will almost always result in the immediate exit of that Python process without any cleanup (no finally:, no atexit, no __del__, etc).
Windows services (though a viable alternative in many cases) were basically out of the question for me: they aren't cross-platform, and they're going to require code modification. pythonw.exe (a windowless version of Python that ships with all recent Windows Python binaries) is closer, but it still doesn't quite make the cut: in particular, it fails to improve the situation for signal handling, and you still cannot easily launch a pythonw.exe application from the terminal and interact with it during startup (for example, to deliver dynamic startup arguments to your script, say, perhaps, a password, file path, etc), before "daemonizing".
In the end, I settled on using subprocess.Popen with the creationflags=subprocess.CREATE_NEW_PROCESS_GROUP keyword to create an independent, windowless process:
import subprocess
independent_process = subprocess.Popen(
'/path/to/pythonw.exe /path/to/file.py',
creationflags=subprocess.CREATE_NEW_PROCESS_GROUP
)
However, that still left me with the added challenge of startup communications and signal handling. Without going into a ton of detail, for the former, my strategy was:
pickle the important parts of the launching process' namespace
Store that in a tempfile
Add the path to that file in the daughter process' environment before launching
Extract and return the namespace from the "daemonization" function
For signal handling I had to get a bit more creative. Within the "daemonized" process:
Ignore signals in the daemon process, since, as mentioned, they all terminate the process immediately and without cleanup
Create a new thread to manage signal handling
That thread launches daughter signal-handling processes and waits for them to complete
External applications send signals to the daughter signal-handling process, causing it to terminate and complete
Those processes then use the signal number as their return code
The signal handling thread reads the return code, and then calls either a user-defined signal handler, or uses a cytpes API to raise an appropriate exception within the Python main thread
Rinse and repeat for new signals
That all being said, for anyone encountering this problem in the future, I've rolled a library called daemoniker that wraps both proper Unix daemonization and the above Windows strategy into a unified facade. The cross-platform API looks like this:
from daemoniker import Daemonizer
with Daemonizer() as (is_setup, daemonizer):
if is_setup:
# This code is run before daemonization.
do_things_here()
# We need to explicitly pass resources to the daemon; other variables
# may not be correct
is_parent, my_arg1, my_arg2 = daemonizer(
path_to_pid_file,
my_arg1,
my_arg2
)
if is_parent:
# Run code in the parent after daemonization
parent_only_code()
# We are now daemonized, and the parent just exited.
code_continues_here()

Two options come to mind:
Port your program into a windows service. You can probably share much of your code between the two implementations.
Does your program really use any daemon functionality? If not, you rewrite it as a simple server that runs in the background, manages communications through sockets, and perform its tasks. It will probably consume more system resources than a daemon would, but it would be quote platform independent.

In general the concept of a daemon is Unix specific, in particular expected behaviour with respect to file creation masks, process hierarchy, and signal handling.
You may find PEP 3143 useful wherein a proposed continuation of python-daemon is considered for Python 3.2, and many related daemonizing modules and implementations are discussed.

The reason it's unix only is that daemons are a Unix specific concept i.e a background process initiated by the os and usually running as a child of the root PID .
Windows has no direct equivalent of a unix daemon, the closest I can think of is a Windows Service.
There's a program called pythonservice.exe for windows . Not sure if it's supported on all versions of python though

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.