Can functions know if they are already multiprocessed in Python (joblib)

Can functions know if they are already multiprocessed in Python (joblib) - python

I have a function that uses multiprocessing (specifically joblib) to speed up a slow routine using multiple cores. It works great; no questions there.
I have a test suite that uses multiprocessing (currently just the multiprocessing.Pool() system, but can change it to joblib) to run each module's test functions independently. It works great; no questions there.
The problem is that I've now integrated the multiprocessing function into the module's test suite, so that the pool process runs the multiprocessing function. I would like to make it so that the inner function knows that it is already being multiprocessed and not spin up more forks of itself. Currently the inner process sometimes hangs, but even if it doesn't, obviously there are no gains to multiprocessing within an already parallel routine.
I can think of several ways (with lock files, setting some sort of global variable, etc.) to determine the state we're in, but I'm wondering if there is some standard way of figuring this out (either in PY multiprocessing or in joblib). If it only works in PY3, that'd be fine, though obviously solutions that also work on 2.7 or lower would be better. Thanks!

Parallel in joblib should be able to sort these things out:
http://pydoc.net/Python/joblib/0.8.3-r1/joblib.parallel/
Two pieces from 0.8.3-r1:
# Set an environment variable to avoid infinite loops
os.environ[JOBLIB_SPAWNED_PROCESS] = '1'
Don't know why they go from a variable referring to the environmental, to the env. itself.. But as you can see. The feature is already implemented in joblib.
# We can now allow subprocesses again
os.environ.pop('__JOBLIB_SPAWNED_PARALLEL__', 0)
Here you can select other versions, if that's more relevant:
http://pydoc.net/Python/joblib/0.8.3-r1/

The answer to the specific question is: I don't know of a ready-made utility.
A minimal(*) core refactoring would to be add a named parameter to your function currently creating child processes. The default parameter would be your current behavior, and an other value would switch to a behavior compatible with how you are running tests(**).
(*: there might be other, may be better, design alternatives to consider but we do not have enough information)
(**: one may say that the introduction of a conditional behavior would require to test that as well, and we are back to square one...)

Look at the current state of
import multiprocessing
am_already_spawned = multiprocessing.current_process().daemon
am_already_spawned will be True if the current_process is a spawned process (and thus won't benefit from more multiprocessing) and False otherwise.

Related

Multiprocessing slower than serial processing in Windows (but not in Linux)

I'm trying to parallelize a for loop to speed-up my code, since the loop processing operations are all independent. Following online tutorials, it seems the standard multiprocessing library in Python is a good start, and I've got this working for basic examples.
However, for my actual use case, I find that parallel processing (using a dual core machine) is actually a little (<5%) slower, when run on Windows. Running the same code on Linux, however, results in a parallel processing speed-up of ~25%, compared to serial execution.
From the docs, I believe this may relate to Window's lack of fork() function, which means the process needs to be initialised fresh each time. However, I don't fully understand this and wonder if anyone can confirm this please?
Particularly,
--> Does this mean that all code in the calling python file gets run for each parallel process on Windows, even initialising classes and importing packages?
--> If so, can this be avoided by somehow passing a copy (e.g. using deepcopy) of the class into the new processes?
--> Are there any tips / other strategies for efficient parallelisation of code design for both unix and windows.
My exact code is long and uses many files, so I have created a pseucode-style example structure which hopefully shows the issue.
# Imports
from my_package import MyClass
imports many other packages / functions
# Initialization (instantiate class and call slow functions that get it ready for processing)
my_class = Class()
my_class.set_up(input1=1, input2=2)
# Define main processing function to be used in loop
def calculation(_input_data):
# Perform some functions on _input_data
......
# Call method of instantiate class to act on data
return my_class.class_func(_input_data)
input_data = np.linspace(0, 1, 50)
output_data = np.zeros_like(input_data)
# For Loop (SERIAL implementation)
for i, x in enumerate(input_data):
output_data[i] = calculation(x)
# PARALLEL implementation (this doesn't work well!)
with multiprocessing.Pool(processes=4) as pool:
results = pool.map_async(calculation, input_data)
results.wait()
output_data = results.get()
EDIT: I do not believe the question is a duplicate of the one suggested, since this relates to a difference in Windows and Linunx, which is not mentioned at all in the suggested duplicate question.

NT Operating Systems lack the UNIX fork primitive. When a new process is created, it starts as a blank process. It's responsibility of the parent to instruct the new process on how to bootstrap.
Python multiprocessing APIs abstracts the process creation trying to give the same feeling for the fork, forkserver and spawn start methods.
When you use the spawn starting method, this is what happens under the hood.
A blank process is created
The blank process starts a brand new Python interpreter
The Python interpreter is given the MFA (Module Function Arguments) you specified via the Process class initializer
The Python interpreter loads the given module resolving all the imports
The target function is looked up within the module and called with the given args and kwargs
The above flow brings few implications.
As you noticed yourself, it is a much more taxing operation compared to fork. That's why you notice such a difference in performance.
As the module gets imported from scratch in the child process, all import side effects are executed anew. This means that constants, global variables, decorators and first level instructions will be executed again.
On the other side, initializations made during the parent process execution will not be propagated to the child. See this example.
This is why in the multiprocessing documentation they added a specific paragraph for Windows in the Programming Guidelines. I highly recommend to read the Programming Guidelines as they already include all the required information to write portable multi-processing code.

Time a python 2 function takes to run WITHOUT ANY imports; is it possible?

So, I have recently been tasked with writing a function in python 2 that can time the execution of another function. This is simple enough, but the catch is I have to do it WITHOUT importing any modules; this naturally includes time, timeit, etc.
Using only built in functions and statements (e.g. sum(), or, yield) is this even possible?
I don't want to see a solution, I need to work that out for myself, but I would greatly appreciate knowing if this is even possible. If not, then I'd rather not waste the time bashing my head against the proverbial brick wall.

If you're on a UNIX (or maybe just Linux) system, yes. Read from /proc/uptime.
It's not super efficient, but hey, builtin functions only.
I'm not sure of a way to do this on Windows.

Simple answer: No, it is not possible.
Here you have a link to the python 2.7 built in functions docs. None of them allow you to measure time. You are forced to use a module.
Python was thought to be used with its modules and it includes a great amalgam of them. I would recomend you to use time for this one.
Sorry for breaking your dreams <3

Depending on the OS you're running and how messy solution can you accept, you can do this without imports.
Ordered by increasing insanity:
Some systems provide virtual files which contain various timers. You can get a sub-second resolution at least on a Linux system by reading a counter from that kind of file before and after execution. Not sure about others.
Can you reuse existing imports? If the file already contains any of threading, multiprocessing, signal, you can construct a timer out of them.
If you have some kind of scheduler running on your system (like cron) you can inject a job into it (by creating a file), which will print out timestamps every time it's run.
You can follow a log file on a busy system and assume the last message was close to the time you read it.
Depending on what accuracy you want, you could measure the amount of time each python bytecode operation takes, then write an interpreter for the code available via function.__code__.co_code. While you run the code, you can sum up all the expected execution times. This is the only pure-python solution which doesn't require a specific OS / environment.
If you're running on a system which allows process memory introspection, you can open it and inject any functionality without technically importing anything.

Two "cheating" methods.
If you're avoiding the import keyword, you can use __import__ to import time, which is actually a module builtin to the python2 executable.
If you know the location of the Python installation, you can use execfile on os.py and use the times function.

Why is it important to protect the main loop when using joblib.Parallel?

The joblib docs contain the following warning:
Under Windows, it is important to protect the main loop of code to
avoid recursive spawning of subprocesses when using joblib.Parallel.
In other words, you should be writing code like this:
import ....
def function1(...):
...
def function2(...):
...
... if __name__ == '__main__':
# do stuff with imports and functions defined about
...
No code should run outside of the “if __name__ == ‘__main__’” blocks,
only imports and definitions.
Initially, I assumed this was just to prevent against the occasional odd case where a function passed to joblib.Parallel called the module recursively, which would mean it was generally good practice but often unnecessary. However, it doesn't make sense to me why this would only be a risk on Windows. Additionally, this answer seems to indicate that failure to protect the main loop resulted in the code running several times slower than it otherwise would have for a very simple non-recursive problem.
Out of curiosity, I ran the super-simple example of an embarrassingly parallel loop from the joblib docs without protecting the main loop on a windows box. My terminal was spammed with the following error until I closed it:
ImportError: [joblib] Attempting to do parallel computing without protecting your import on a system that does not suppo
rt forking. To use parallel-computing in a script, you must protect you main loop using "if __name__ == '__main__'". Ple
ase see the joblib documentation on Parallel for more information
My question is, what about the windows implementation of joblib requires the main loop to be protected in every case?
Apologies if this is a super basic question. I am new to the world of parallelization, so I might just be missing some basic concepts, but I couldn't find this issue discussed explicitly anywhere.
Finally, I want to note that this is purely academic; I understand why it is generally good practice to write one's code in this way, and will continue to do so regardless of joblib.

This is necessary because Windows doesn't have fork(). Because of this limitation, Windows needs to re-import your __main__ module in all the child processes it spawns, in order to re-create the parent's state in the child. This means that if you have the code that spawns the new process at the module-level, it's going to be recursively executed in all the child processes. The if __name__ == "__main__" guard is used to prevent code at the module scope from being re-executed in the child processes.
This isn't necessary on Linux because it does have fork(), which allows it to fork a child process that maintains the same state of the parent, without re-importing the __main__ module.

In case someone stumbles across this in 2021:
Due to the new backend "loky" used by joblib>0.12 protecting the main for loop is no longer required. See https://joblib.readthedocs.io/en/latest/parallel.html

Escaping arbitrary blocks of code

My script accepts arbitrary-length and -content strings of Python code, then runs them inside exec() statements. If the time to run the arbitrary code passes over some predetermined limit, then the exec() statement needs to exit and a boolean flag needs to be set to indicate that a premature exit has occurred.
How can this be accomplished?
Additional information
These pieces of code will be running in parallel in numerous threads (or at least as parallel as you can get with the GIL).
If there is an alternative method in another language, I am willing to try it out.
I plan on cleaning the code to prevent access to anything that might accidentally damage my system (file and system access, import statements, nested calls to exec() or eval(), etc.).
Options I've considered
Since the exec() statements are running in threads, use a poison pill to kill the thread. Unfortunately, I've read that poison pills do not work for all cases.
Running the exec() statements inside processes, then using process.terminate() to kill everything. But I'm running on Windows and I've read that process creation can be expensive. It also complicates communication with the code that's managing all of this.
Allowing only pre-written functions inside the exec() statements and having those functions periodically check for an exit flag then perform clean-up as necessary. This is complicated, time-consuming, and there are too many corner-cases to consider; I am looking for a simpler solution.
I know this is a bit of an oddball question that deserves a "Why would you ever want to allow arbitrary code to run in an exec() statement?" type of response. I'm trying my hand at a bit of self-evolving code. This is my major stumbling block at the moment: if you allow your code to do almost anything, then it can potentially hang forever. How do you regain control and stop it when it does?

This isn't a very detailed answer, but its more than I wanted to put into a comment.
You may want to consider something like this other question for creating functions with timeouts, using multiprocessing as a start.
The problem with threads is that you probably can't use your poison pill approach, as they are not workers taking many small bits of tasks. They would be sitting there blocking on a statement. It would never get the value to exit.
You mentioned that your concern about using processes on Windows is that they are expensive. So what you might do is create your own kind of process pool (a list of processes). They are all pulling from a queue, and you submit new tasks to the queue. If any process exceeds the timeout, you kill it, and replace it in the pool with a new one. That way you limit the overhead of creating new processes only to when they are timing out, instead of creating a new one for every task.

There are a few different options here.
First, start with jdi's suggestion of using multiprocessing. It may be that Windows process creation isn't actually expensive enough to break your use case.
If it actually is a problem, what I'd personally do is use Virtual PC, or even User Mode Linux, to just run the same code in another OS, where process creation is cheap. You get a free sandbox out of that, as well.
If you don't want to do that, jdi's suggestion of processes pools is a bit more work, but should work well as long as you don't have to kill processes very often.
If you really do want everything to be threads, you can do so, as long as you can restrict the way the jobs are written. If the jobs can always be cleanly unwound, you can kill them just by raising an exception. Of course they also have to not catch the specific exception you choose to raise. Obviously neither of these conditions is realistic as a general-purpose solution, but for your use case, it may be fine. The key is to make sure your code evolver never inserts any manual resource-management statements (like opening and closing a file); only with statements. (Alternatively, insert the open and close, but inside a try/finally.) And that's probably a good idea even if you're not doing things this way, because spinning off hundreds of processes that, e.g., each leak as many file handles as they can until they either time out or hit the file limit would slow your machine to a crawl.
If you can restrict the code generator/evolver even further, you could use some form of cooperative threading (e.g., greenlets), which makes things even nicer.
Finally, you could switch from CPython to a different Python implementation that can run multiple interpreter instances in a single process. I don't know whether jython or IronPython can do so. PyPy can do that, and also has a restricted-environment sandbox, but unfortunately I think both of those—and Python 3.x support—are not-ready-for-prime-time features, which means you either have to get a special build of PyPy (probably without the JIT optimizer), or build it yourself. This might be the best long-term solution, but it's probably not what you want today.

Running Multiple Methods At Once In Python

I am trying to run a method which has an infinite loop in it to create a video display. This method is called within another loop that handles hardware input and as such cannot loop as fast as the video, causing lag if I use the outer loop to run the video. Is there a way to start the video loop and then start the hardware loop and run them separately? Currently if I call the video loop it just sits at that loop until it returns.

Yes, you can use Python's own threading module, or a cooperative microthreading module like gevent.
Note that Python's threading mechanism carries this disclaimer for CPython (the default Python implementation on most boxes):
Due to the Global Interpreter Lock, in CPython only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better of use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.
Depending on how the underlying modules you are calling operate, you may find that while using threading, one thread won't give up control very often, if at all. In that case, using cooperative microthreading might be your only option.

Yes, you can use Python's own multiprocessing module.
Note that Multiprocessing does not have to fight the GIL and can work simultaneously for everything you give it to do.
On the other hand there is a warning with the multiprocessing module, when you spawn a process it is a completely separate python interpreter. So its not just a OS controlled thread. It is in itself an entirely different process. This can add overhead to programs but the advantage of completely dodging the GIL makes this only a mild issue.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.