Process getting stuck after being launched from another process - python

I was working on a specific type of application in Dash which required the action executed by pressing the button to be performed in a separate process. This process, in turn, was parallelizable, and in some cases spawned child processes for the efficient computation. The configuration given in this cases makes the child processes to get stuck. The code below reproduces the situation described as follows:
import multiprocessing
import time
import dash
from dash import html
from dash.dependencies import Input, Output
app = dash.Dash(__name__)
app.layout = html.Div([
html.Button(id='refresh-button', children='Button'),
html.Div(id='dynamic-container1')
])
def run_function(i):
print('hello')
time.sleep(15)
print(f'hello world {i}')
def run_process():
num = 1
print('hello world 00000')
process = multiprocessing.Process(target=run_function, args=(num,))
process.start()
process.join()
print('hello world')
#app.callback(Output('dynamic-container1', 'children'), Input('refresh-button', 'n_clicks'))
def refresh_state(click):
if click == 0 or click is None:
return None
p = multiprocessing.Process(target=run_process)
p.start()
p.join()
return None
if __name__ == '__main__':
app.run_server(debug=True)
The output of this application when pressing the button is always the following:
Connected to pydev debugger (build 172.3968.37)
Dash is running on http://127.0.0.1:8050/
* Serving Flask app 'main' (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: on
pydev debugger: process 6884 is connecting
hello world 00000
which means that the first process utilizing the function run_process() was launched, however, the child process run_function(i) was not even started. I was trying to find an explanation in popular books on multiprocessing in Python and any guidance to these "chaining" processes, but to no avail. From my understanding, the new child process run_function(i) should occupy a separate core (if there is any free core) and not to depend on the resources consumed by the parent process run_process(). Could you, please, explain to me the mechanics of this? I have a doubt that in this code run_function(i) might be coerced to consume the same resources as run_process() does, so the system basically is just restricting any new process from starting from the same resources, but I would like to confirm it from more expert users of Python.
I used Python 3.7 and Pycharm Community 2017.2.3 on Win7 to reproduce this example

I can not reproduce your error and i think this is because you are using windows ( i am on Linux with python 3.9), so i can not find the error for you, but maybe i can give you some hints:
First: To find the error, try to reduce the code to the core of the problem (you can remove the whole dash stuff to check if this is the error). In my tests the results were the same, with and without dash
Second: Windows and Linux handle the multiprocessing stuff a bit different:
windows spawns the process:
The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary to run the process object’s run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver.
Unix Systems fork the process
The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic.
With Unix System you can spawn (multiprocessing.set_start_method("spawn")), with this i can't reproduce you error, but with this fork/spawn example i wanted to make clear, that there sometimes things are a bit different between windows and linux, even if the same packages are used. I think your understanding of multiprocessing is correct. (Maybe this site helps too.)
Third: In the docs are some programming guidelines you should be aware of. Maybe they will help, too. And in general, the multiprocessing package does not work well with a lot of interactive python shells (like IDLE or pycharm), this can may be better on newer versions. Maybe you should try it from terminal to check if this changes something.
I hope this helps a litle bit.

Related

Launching a python script as a background job on Jupyter

I am trying to run a *.py file as a background service in Jupiter notebook.
from IPython.lib import backgroundjobs as bg
jobs = bg.BackgroundJobManager()
jobs.new(%run -i "script.py") # Not working
jobs.new("script.py") # Not working
Ipython/Jupyter background jobs are designed to run either plain code to eval (string), or function. Files and ipython magic commands are not supported.
One thing you can do is to simply read file content and pass it to eval:
from IPython.lib.backgroundjobs import BackgroundJobFunc
with open('script.py') as code:
job = BackgroundJobFunc(exec, code.read())
result = job.run()
BackgroundJobManager is pretty much the same, but a little bit "smarter".
Side note: all background machinery behind this interfaces runs in threads of the same process and share interpreter state and output. So, just keep in mind:
this is not suited for computational-heavy scripts
never run untrusted code this way — this applies to eval overall, but in this case you can into situation when you'll never get GIL back to your "frontend" thread
avoid scripts that use stdout, most probably those will clutch with your main thread

tornado multi process crash with nltk

I'm getting some very strange behaviour in a tornado application.
I'm running a lot of processes each which have their own HTTP servers and running on a different port.
I added a new process to the system which is another TCPServer class that listens on an entirely different port and doesn't have any interaction with the other processes.
I bring the new server up as follows:
def runSimService(port):
sim=SimService()
sim.listen(port)
currentIOLoop = tornado.ioloop.IOLoop.current()
currentIOLoop.start()
class SimService(TCPServer):
def __init__(self,host='localhost',motorport=27017):
TCPServer.__init__(self)
self.log=logging.getLogger("tornado.access")
# Needs to contain a User class log.
self.con=motor.MotorClient(host,motorport)
self.db=self.con.pDB
self.col=self.db.pCol
Basically that's the only code I left when debugging. The crash i get isn't a normal python exception crash which is worrying me.
I'm developing on a mac the moment. Can someone please explain is this crash is something wrong with my code or is there something else happening here?'
UPDATE:
Ok this is really bizare it only seems to happen when I import the following:
from nltk.stem.snowball import SnowballStemmer
OR
from nltk import word_tokenize, pos_tag
or nltk in general...
Could there be some weird interaction between libraries happening? I'm stuck
Code to create processes
if __name__ =='__main__':
AsyncHTTPClient.configure("tornado.curl_httpclient.CurlAsyncHTTPClient",max_clients=2000)
processes=[]
processes.append(Process(target=runSimServer,args=(...,)))
processes.append(Process(target=runServer,args=(...)))
processes.append(Process(target=runServer,args=(...)))
processes.append(Process(target=runServer,args=(...)))
processes.append(Process(target=runServer,args=(...)))
processes.append(Process(target=runServer,args=(...)))
# write pids to pid/ directory for use with the shutdown program
with open("pid/"+__file__.split(".")[0]+".pid","w") as f:
for p in processes:
p.start()
f.write(str(p.pid)+"\n")
Thanks
The key part of that crash message is "multi-threaded process forked". If you are going to use both threads and processes, you must fork all your processes before creating any threads. It looks like nltk is creating some threads when you import it. If you are also using multiple processes (it doesn't look like you are from the code you quoted, but that is obviously incomplete), you must not import nltk until after all processes have been started.

Are IPython engines independent processes?

From the IPython Architecture Overview documentation we know that ...
The IPython engine is a Python instance that takes Python commands over a network connection.
Given that it is a Python instance does that imply that these engines are stand alone processes? I can manually load a set of engines via a command like ipcluster start -n 4. Doing thus is the creation of engines considered the creation of child processes of some parent process or just a means to kick off a set of independent processes that rely on IPC communication to get their work done? I can also invoke an engine via the ipengine command, which is surely standalone as its entered directly to the OS command line with no relation to anything really.
As background I'm trying to drill into how the many IPython engines manipulated through a Client from a python script will interact with another process kicked off in that script.
Here's a simple way to find out the processes involved, print the list of current processes before I fire off the controller and engines and then print the list after they're fired off. There's a wmic command to get the job done...
C:\>wmic process get description,executablepath
Interestingly enough the controller gets 5 python processes going, and each engine creates one additional python process. So from this investigation I also learned that an engine is its own process, as well as the controller...
C:\>wmic process get description,executablepath | findstr ipengine
ipengine.exe C:\Python34\Scripts\ipengine.exe
ipengine.exe C:\Python34\Scripts\ipengine.exe
C:\>wmic process get description,executablepath | findstr ipcontroller
ipcontroller.exe C:\Python34\Scripts\ipcontroller.exe
From the looks of it they all seem standalone, though I don't think the OS's running process list carries any information about how the processes are related as far as the parent/child relationship is concerned. That may be a developer only formalism that has no representation that's tracked in the OS, but I don't know about these sort of internals to know either way.
Here's a definitive quote from MinRK that addresses this question directly:
"Every engine is its own isolated process...Each kernel is a separate
process and can be on any machine... It's like you started a terminal IPython session, and every engine is a separate IPython session. If you do a=5 in this one, a=10 in that one, this guy has 10 this guy has 5."
Here's further definitive validation, inspired by a great SE Hot Network Question on ServerFault that mentioned use of ProcessExplorer which actually tracks parent child processes...
Process Explorer is a Sysinternals tool maintained by Microsoft. It
can display the command line of the process in the process's
properties dialog as well as the parent that launched it, though the
name of that process may no longer be available.
--Corrodias
If I fire off more engines in another command window that section of ProcessExplorer just duplicates exactly as you see in the screenshot.
And just for the sake of completeness, here' what the command ipcluster start --n=5 looks like...

How to start daemon process from python on windows?

Can my python script spawn a process that will run indefinitely?
I'm not too familiar with python, nor with spawning deamons, so I cam up with this:
si = subprocess.STARTUPINFO()
si.dwFlags = subprocess.CREATE_NEW_PROCESS_GROUP | subprocess.CREATE_NEW_CONSOLE
subprocess.Popen(executable, close_fds = True, startupinfo = si)
The process continues to run past python.exe, but is closed as soon as I close the cmd window.
Using the answer Janne Karila pointed out this is how you can run a process that doen't die when its parent dies, no need to use the win32process module.
DETACHED_PROCESS = 8
subprocess.Popen(executable, creationflags=DETACHED_PROCESS, close_fds=True)
DETACHED_PROCESS is a Process Creation Flag that is passed to the underlying CreateProcess function.
This question was asked 3 years ago, and though the fundamental details of the answer haven't changed, given its prevalence in "Windows Python daemon" searches, I thought it might be helpful to add some discussion for the benefit of future Google arrivees.
There are really two parts to the question:
Can a Python script spawn an independent process that will run indefinitely?
Can a Python script act like a Unix daemon on a Windows system?
The answer to the first is an unambiguous yes; as already pointed out; using subprocess.Popen with the creationflags=subprocess.CREATE_NEW_PROCESS_GROUP keyword will suffice:
import subprocess
independent_process = subprocess.Popen(
'python /path/to/file.py',
creationflags=subprocess.CREATE_NEW_PROCESS_GROUP
)
Note that, at least in my experience, CREATE_NEW_CONSOLE is not necessary here.
That being said, the behavior of this strategy isn't quite the same as what you'd expect from a Unix daemon. What constitutes a well-behaved Unix daemon is better explained elsewhere, but to summarize:
Close open file descriptors (typically all of them, but some applications may need to protect some descriptors from closure)
Change the working directory for the process to a suitable location to prevent "Directory Busy" errors
Change the file access creation mask (os.umask in the Python world)
Move the application into the background and make it dissociate itself from the initiating process
Completely divorce from the terminal, including redirecting STDIN, STDOUT, and STDERR to different streams (often DEVNULL), and prevent reacquisition of a controlling terminal
Handle signals, in particular, SIGTERM.
The reality of the situation is that Windows, as an operating system, really doesn't support the notion of a daemon: applications that start from a terminal (or in any other interactive context, including launching from Explorer, etc) will continue to run with a visible window, unless the controlling application (in this example, Python) has included a windowless GUI. Furthermore, Windows signal handling is woefully inadequate, and attempts to send signals to an independent Python process (as opposed to a subprocess, which would not survive terminal closure) will almost always result in the immediate exit of that Python process without any cleanup (no finally:, no atexit, no __del__, etc).
Rolling your application into a Windows service, though a viable alternative in many cases, also doesn't quite fit. The same is true of using pythonw.exe (a windowless version of Python that ships with all recent Windows Python binaries). In particular, they fail to improve the situation for signal handling, and they cannot easily launch an application from a terminal and interact with it during startup (for example, to deliver dynamic startup arguments to your script, say, perhaps, a password, file path, etc), before "daemonizing". Additionally, Windows services require installation, which -- though perfectly possible to do quickly at runtime when you first call up your "daemon" -- modifies the user's system (registry, etc), which would be highly unexpected if you're coming from a Unix world.
In light of that, I would argue that launching a pythonw.exe subprocess using subprocess.CREATE_NEW_PROCESS_GROUP is probably the closest Windows equivalent for a Python process to emulate a traditional Unix daemon. However, that still leaves you with the added challenge of signal handling and startup communications (not to mention making your code platform-dependent, which is always frustrating).
That all being said, for anyone encountering this problem in the future, I've rolled a library called daemoniker that wraps both proper Unix daemonization and the above strategy. It also implements signal handling (for both Unix and Windows systems), and allows you to pass objects to the "daemon" process using pickle. Best of all, it has a cross-platform API:
from daemoniker import Daemonizer
with Daemonizer() as (is_setup, daemonizer):
if is_setup:
# This code is run before daemonization.
do_things_here()
# We need to explicitly pass resources to the daemon; other variables
# may not be correct
is_parent, my_arg1, my_arg2 = daemonizer(
path_to_pid_file,
my_arg1,
my_arg2
)
if is_parent:
# Run code in the parent after daemonization
parent_only_code()
# We are now daemonized, and the parent just exited.
code_continues_here()
For that purpose you could daemonize your python process or as you are using windows environment you would like to run this as a windows service.
You know i like to hate posting only web-links:
But for more information according to your requirement:
A simple way to implement Windows Service. read all comments it will resolve any doubt
If you really want to learn more
First read this
what is daemon process or creating-a-daemon-the-python-way
update:
Subprocess is not the right way to achieve this kind of thing

Multiprocessing launching too many instances of Python VM

I am writing some multiprocessing code (Python 2.6.4, WinXP) that spawns processes to run background tasks. In playing around with some trivial examples, I am running into an issue where my code just continuously spawns new processes, even though I only tell it to spawn a fixed number.
The program itself runs fine, but if I look in Windows TaskManager, I keep seeing new 'python.exe' processes appear. They just keep spawning more and more as the program runs (eventually starving my machine).
For example,
I would expect the code below to launch 2 python.exe processes. The first being the program itself, and the second being the child process it spawns. Any idea what I am doing wrong?
import time
import multiprocessing
class Agent(multiprocessing.Process):
def __init__(self, i):
multiprocessing.Process.__init__(self)
self.i = i
def run(self):
while True:
print 'hello from %i' % self.i
time.sleep(1)
agent = Agent(1)
agent.start()
It looks like you didn't carefully follow the guidelines in the documentation, specifically this section where it talks about "Safe importing of main module".
You need to protect your launch code with an if __name__ == '__main__': block or you'll get what you're getting, I believe.
I believe it comes down to the multiprocessing module not being able to use os.fork() as it does on Linux, where an already-running process is basically cloned in memory. On Windows (which has no such fork()) it must run a new Python interpreter and tell it to import your main module and then execute the start/run method once that's done. If you have code at "module level", unprotected by the name check, then during the import it starts the whole sequence over again, ad infinitum
When I run this in Linux with python2.6, I see a maximum of 4 python2.6 processes and I can't guarantee that they're all from this process. They're definitely not filling up the machine.
Need new python version? Linux/Windows difference?
I don't see anything wrong with that. Works fine on Ubuntu 9.10 (Python 2.6.4).
Are you sure you don't have cron or something starting multiple copies of your script? Or that the spawned script is not calling anything that would start a new instance, for example as a side effect of import if your code runs directly on import?

Categories