Python: when to use pty.fork() versus os.fork() - python

I'm uncertain whether to use pty.fork() or os.fork() when spawning external background processes from my app. (Such as chess engines)
I want the spawned processes to die if the parent is killed, as with spawning apps in a terminal.
What are the ups and downs between the two forks?

The child process created with os.fork() inherits stdin/stdout/stderr from parent process, while the child created with pty.fork() is connected to new pseudo terminal. You need the later when you write a program like xterm: pty.fork() in parent process returns a descriptor to control terminal of child process, so you can visually represent data from it and translate user actions into terminal input sequences.
Update:
From pty(7) man page:
A process that expects to be connected
to a terminal, can open the slave end
of a pseudo-terminal and then be
driven by a program that has
opened the master end. Anything that
is written on the master end is
provided to the process on the slave
end as though it was input typed on
a terminal. For example, writing the
interrupt character (usually
control-C) to the master device
would cause an interrupt signal
(SIGINT) to be generated for the
foreground process group that is
connected to the slave. Conversely,
anything that is written to the
slave end of the pseudo-terminal can
be read by the process that is
connected to the master end.

In the past I've always used the subprocess module for this. It provides a good api for communicating with subprocesses.
You can use call(*popenargs, **kwargs) for blocking execution of them, and I believe using the Popen class can handle async execution.
Check out the docs for more info.
As far as using os.fork vs pty.fork, both are highly platform dependent, and neither will work (or at least is tested) with windows. The pty module seems to be the more constrained of the two by reading the docs. The main difference being the pseudo terminal aspect. So if you aren't willing to architect your code in such a way as to be able to use the subprocess module, I'd probably go with os.fork instead of pty.fork.

Pseudotermials are necessary for some applications that really expect a terminal. An interactive shell is one of these examples but there are many other. The pty.fork option is not there as another os.fork but as a specific API to use a pseudoterminal.

Related

What is the difference between subprocess.Popen() and os.fork()?

It seems like subprocess.Popen() and os.fork() both are able to create a child process. I would however like to know what the difference is between both. When would you use which one? I tried looking at their source code but I couldn't find fork()'s source code on my machine and it wasn't totally clear how Popen works on Unix machines.
Could someobody please elaborate?
Thanks
subprocess.Popen let's you execute an arbitrary program/command/executable/whatever in its own process.
os.fork only allows you to create a child process that will execute the same script from the exact line in which you called it. As its name suggests, it "simply" forks the current process into 2.
os.fork is only available on Unix, and subprocess.Popen is cross-platfrom.
So I read the documentation for you. Results:
os.fork only exists on Unix. It creates a child process (by cloning the existing process), but that's all it does. When it returns, you have two (mostly) identical processes, both running the same code, both returning from os.fork (but the new process gets 0 from os.fork while the parent process gets the PID of the child process).
subprocess.Popen is more portable (in particular, it works on Windows). It creates a child process, but you must specify another program that the child process should execute. On Unix, it is implemented by calling os.fork (to clone the parent process), then os.execvp (to load the program into the new child process). Because Popen is all about executing a program, it lets you customize the initial environment of the program. You can redirect its standard handles, specify command line arguments, override environment variables, set its working directory, etc. None of this applies to os.fork.
In general, subprocess.Popen is more convenient to use. If you use os.fork, there's a lot you need to handle manually, and it'll only work on Unix systems. On the other hand, if you actually want to clone a process and not execute a new program, os.fork is the way to go.
Subprocess.popen() spawns a new OS level process.
os.fork() creates another process which will resume at exactly the same place as this one. So within the first loop run, you get a fork after which you have two processes, the "original one" (which gets a pid value of the PID of the child process) and the forked one (which gets a pid value of 0).

Spawn a subprocess but kill it if main process gets killed

I am creating a program in Python that listens to varios user interactions and logs them. I have these requirements/restrictions:
I need a separate process that sends those logs to a remote database every hour
I can't do it in the current process because it blocks the UI.
If the main process stops, the background process should also stop.
I've been reading about subprocess but I can't seem to find anything on how to stop both simultaneously. I need the equivalent of spawn_link if anybody know some Erlang/Elixir.
Thanks!
To answer the question in the title (for visitors from google): there are robust solutions on Linux, Windows using OS-specific APIs and less robust but more portable psutil-based solutions.
To fix your specific problem (it is XY problem): use a daemon thread instead of a process.
A thread would allow to perform I/O without blocking GUI, code example (even if GUI you've chosen doesn't provide async. I/O API such as tkinter's createfilehandler() or gtk's io_add_watch()).

Correct daemon behaviour (from PEP 3143) explained

I have some tasks [for my RPi] in Python that involve a lot of sleeping: do something that takes a second or two or three, then go wait for several minutes or hours.
I want to pass control back to the OS (Linux) in that sleep time. For this, I should daemonise those tasks. One way is by using Python's Standard daemon process library.
But daemons aren't so easy to understand. As per the Rationale paragraph of PEP 3143, a well behaved daemon should do the following.
Close all open file descriptors.
Change current working directory.
Reset the file access creation mask.
Run in the background.
Disassociate from process group.
Ignore terminal I/O signals.
Disassociate from control terminal.
Don't reacquire a control terminal.
Correctly handle the following circumstances:
Started by System V init process.
Daemon termination by SIGTERM signal.
Children generate SIGCLD signal.
For a Linux/Unix novice like me, some of this is hardly an explanation. But I want to know why I do what I do. So what is the rationale behind this rationale?
PEP 3142 took these requirements from Unix Network Programming ('UNP') by the late W. Richard Stevens. The explanation below is quoted or summarised from that book. It's not so easily found online, and it may be illegal to download. So I borrowed it from the library. Pages referred to are in the second edition, Volume 1 (1998). (The PEP refers to the first edition, 1990.)
Close all open file descriptors.
"We close any open descriptors inherited from the process that executed the daemon (ie the shell). [..] Some daemons open /dev/null for reading and writing and duplicate the descriptor to standard input, standard output and standard error."
(This 'Howdy World' Python daemon demonstrates this.)
"This guarantees that the common descriptors are open, and a read from any of these descriptors returns 0 (End Of File) and the kernel just discards anything written to any of these three descriptors. The reason for opening these descriptors is so that any library function called by the daemon that assumes it can read from standard input or write to standard output or standard error, will not fail. Alternately, some daemons open a log file that they will write to while running and duplicate its descriptor to standard output and standard error". (UNP p. 337)
Change current working directory
"A printer daemon might change to the printer's spool directory, where it does all its work. [...] The daemon could have been started anywhere in the filesystem, and if it remains there, that filesystem cannot be unmounted." (UNP p 337)
Why would you want to unmount a filesystem? Two reasons:
1. You want to separate (and be able mount and unmount) directories that can fill up with user data from directories dedicated to the OS.
2. If you start a daemon from, say, a USB-stick, you want to be able to unmount that stick without interfering with the daemon.
Reset the file access creation mask.
"So that if the daemon creates its own files, permission bits in the inherited file mode creation mask do not affect the permission bits of the new files." (UNP, p 337)
Run in the background.
By definition,
"a daemon is a process that runs in the background and is independent of control from all terminals". (UNP p 331)
Disassociate from process group.
In order to understand this, you need to understand what a process group is, and that means you need to know what fork does.
What fork does
fork is the only way (in Unix) to create a new process. (in Linux, there is also clone). Key in understanding fork is that it returns twice when called (once): once in the calling process (= parent) with the process ID of the newly created process (= child), and once in the child. "All descriptors known by the parent when forking, are shared with the child when fork returns." (UNP p 102).
When a process wants to execute another program, it creates a new process by calling fork, which creates a copy of itself. Then one of them (usually the child) calls the new program. (UNP, p 102)
Why disassociate from process group
The point is that a session leader may acquire a controlling terminal. A daemon should never do this, it must stay in the background. This is achieved by calling fork twice: the parent forks to create a child, the child forks to create a grandchild. Parent and child are terminated, but grandchild remains. But because it's a grandchild, it's not a session leader, and therefor can't acquire a controlling terminal. (Summarised from UNP par 12.4 p 335)
The double fork is discussed in more detail here, and in the comments below.
Ignore terminal I/O signals.
"Signals generated from terminal keys must not affect any daemons started from that terminal earlier". (UNP p. 331)
Disassociate from control terminal and don't reacquire a control terminal.
By now, the reasons are obvious:
"If the daemon is started from a terminal, we want to be able to use that terminal for other tasks at a later time. For example, if we start the daemon from a terminal, log off the terminal, and someone else logs in on that terminal, we do not want any daemon error messages appearing during the next user's terminal session." (UNP p 331)
Correctly handle the following circumstances:
Started by System V init process
A daemon should be launchable at boot time, obviously.
Daemon termination by SIGTERM signal
SIGTERM means Signal Terminate. At shutdown, the init process normally sends SIGTERM to all processess, waits usually 5 to 20 seconds to give them time to clean up and terminate. (UNP, p 135) Also, a child can send SIGTERM to its parent, when its parent should stop doing what it's doing. (UNP p 408)
Children generate SIGCLD signal
Stevens discusses SIGCHLD, not SIGCLD. The difference between them isn't important for understanding daemon behaviour. If a child terminates, it sends SIGCHLD to it's parent. If a parent doesn't catch it, the child becomes a zombie (UNP p 118). Oh what fun.
On a final note, when I started to find answers to my question in UNP, it soon struck me I really should read more of it. It's 900+ (!) pages, from 1998 (!) but I believe the concepts and the explanations in UNP stand the test of time, gloriously. Stevens not only knew very well what he was talking about, he also understood what was difficult about it, and made it easier to understand. That's really rare.

How to write a system agnostic Python daemon/service? [duplicate]

I would like to have my Python program run in the background as a daemon, on either Windows or Unix. I see that the python-daemon package is for Unix only; is there an alternative for cross platform? If possible, I would like to keep the code as simple as I can.
In Windows it's called a "service" and you could implement it pretty easily e.g. with the win32serviceutil module, part of pywin32. Unfortunately the two "mental models" -- service vs daemon -- are very different in detail, even though they serve similar purposes, and I know of no Python facade that tries to unify them into a single framework.
This question is 6 years old, but I had the same problem, and the existing answers weren't cross-platform enough for my use case. Though Windows services are often used in similar ways as Unix daemons, at the end of the day they differ substantially, and "the devil's in the details". Long story short, I set out to try and find something that allows me to run the exact same application code on both Unix and Windows, while fulfilling the expectations for a well-behaved Unix daemon (which is better explained elsewhere) as best as possible on both platforms:
Close open file descriptors (typically all of them, but some applications may need to protect some descriptors from closure)
Change the working directory for the process to a suitable location to prevent "Directory Busy" errors
Change the file access creation mask (os.umask in the Python world)
Move the application into the background and make it dissociate itself from the initiating process
Completely divorce from the terminal, including redirecting STDIN, STDOUT, and STDERR to different streams (often DEVNULL), and prevent reacquisition of a controlling terminal
Handle signals, in particular, SIGTERM.
The fundamental problem with cross-platform daemonization is that Windows, as an operating system, really doesn't support the notion of a daemon: applications that start from a terminal (or in any other interactive context, including launching from Explorer, etc) will continue to run with a visible window, unless the controlling application (in this example, Python) has included a windowless GUI. Furthermore, Windows signal handling is woefully inadequate, and attempts to send signals to an independent Python process (as opposed to a subprocess, which would not survive terminal closure) will almost always result in the immediate exit of that Python process without any cleanup (no finally:, no atexit, no __del__, etc).
Windows services (though a viable alternative in many cases) were basically out of the question for me: they aren't cross-platform, and they're going to require code modification. pythonw.exe (a windowless version of Python that ships with all recent Windows Python binaries) is closer, but it still doesn't quite make the cut: in particular, it fails to improve the situation for signal handling, and you still cannot easily launch a pythonw.exe application from the terminal and interact with it during startup (for example, to deliver dynamic startup arguments to your script, say, perhaps, a password, file path, etc), before "daemonizing".
In the end, I settled on using subprocess.Popen with the creationflags=subprocess.CREATE_NEW_PROCESS_GROUP keyword to create an independent, windowless process:
import subprocess
independent_process = subprocess.Popen(
'/path/to/pythonw.exe /path/to/file.py',
creationflags=subprocess.CREATE_NEW_PROCESS_GROUP
)
However, that still left me with the added challenge of startup communications and signal handling. Without going into a ton of detail, for the former, my strategy was:
pickle the important parts of the launching process' namespace
Store that in a tempfile
Add the path to that file in the daughter process' environment before launching
Extract and return the namespace from the "daemonization" function
For signal handling I had to get a bit more creative. Within the "daemonized" process:
Ignore signals in the daemon process, since, as mentioned, they all terminate the process immediately and without cleanup
Create a new thread to manage signal handling
That thread launches daughter signal-handling processes and waits for them to complete
External applications send signals to the daughter signal-handling process, causing it to terminate and complete
Those processes then use the signal number as their return code
The signal handling thread reads the return code, and then calls either a user-defined signal handler, or uses a cytpes API to raise an appropriate exception within the Python main thread
Rinse and repeat for new signals
That all being said, for anyone encountering this problem in the future, I've rolled a library called daemoniker that wraps both proper Unix daemonization and the above Windows strategy into a unified facade. The cross-platform API looks like this:
from daemoniker import Daemonizer
with Daemonizer() as (is_setup, daemonizer):
if is_setup:
# This code is run before daemonization.
do_things_here()
# We need to explicitly pass resources to the daemon; other variables
# may not be correct
is_parent, my_arg1, my_arg2 = daemonizer(
path_to_pid_file,
my_arg1,
my_arg2
)
if is_parent:
# Run code in the parent after daemonization
parent_only_code()
# We are now daemonized, and the parent just exited.
code_continues_here()
Two options come to mind:
Port your program into a windows service. You can probably share much of your code between the two implementations.
Does your program really use any daemon functionality? If not, you rewrite it as a simple server that runs in the background, manages communications through sockets, and perform its tasks. It will probably consume more system resources than a daemon would, but it would be quote platform independent.
In general the concept of a daemon is Unix specific, in particular expected behaviour with respect to file creation masks, process hierarchy, and signal handling.
You may find PEP 3143 useful wherein a proposed continuation of python-daemon is considered for Python 3.2, and many related daemonizing modules and implementations are discussed.
The reason it's unix only is that daemons are a Unix specific concept i.e a background process initiated by the os and usually running as a child of the root PID .
Windows has no direct equivalent of a unix daemon, the closest I can think of is a Windows Service.
There's a program called pythonservice.exe for windows . Not sure if it's supported on all versions of python though

How do I execute two programs from python at the same time?

This post explains how to launch a single external program from Python
How shall I launch multipal programs(or threads) at the same time ?
My intended application is a video slide show. I want to launch a image sequence player and a music player at the same time
Thanks in advance
subprocess.Popen doesn't block unless you explicitly ask it to by calling communicate on the returned object, so you can call it more than once to start more than one process.
If you do need to communicate with both sub-processes simultaneously (read their STDOUT, for instance), then invoke subprocess.Popen in separate threads. Each thread can manage a sub-process and communicate with it. Naturally, this leaves you to do all the synchronization but that highly depends on your specific application.

Categories