Printed output not displayed when using joblib in jupyter notebook

Printed output not displayed when using joblib in jupyter notebook - python

So I am using joblib to parallelize some code and I noticed that I couldn't print things when using it inside a jupyter notebook.
I tried using doing the same example in ipython and it worked perfectly.
Here is a minimal (not) working example to write in a jupyter notebook cell
from joblib import Parallel, delayed
Parallel(n_jobs=8)(delayed(print)(i) for i in range(10))
So I am getting the output as [None, None, None, None, None, None, None, None, None, None] but nothing is printed.
What I expect to see (print order could be random in reality):
1
2
3
4
5
6
7
8
9
10
[None, None, None, None, None, None, None, None, None, None]
Note:
You can see the prints in the logs of the notebook process. But I would like the prints to happen in the notebook, not the logs of the notebook process.
EDIT
I have opened a Github issue, but with minimal attention so far.

I think this caused in part by the way Parallel spawns the child workers, and how Jupyter Notebook handles IO for those workers. When started without specifying a value for backend, Parallel will default to loky which utilizes a pooling strategy that directly uses a fork-exec model to create the subprocesses.
If you start Notebook from a terminal using
$ jupyter-notebook
the regular stderr and stdout streams appear to remain attached to that terminal, while the notebook session will start in a new browser window. Running the posted code snippet in the notebook does produce the expected output, but it seems to go to stdout and ends up in the terminal (as hinted in the Note in the question). This further supports the suspicion that this behavior is caused by the interaction between loky and notebook, and the way the standard IO streams are handled by notebook for child processes.
This lead me to this discussion on github (active within the past 2 weeks as of this posting) where the authors of notebook appear to be aware of this, but it would seem that there is no obvious and quick fix for the issue at the moment.
If you don't mind switching the backend that Parallel uses to spawn children, you can do so like this:
from joblib import Parallel, delayed
Parallel(n_jobs=8, backend='multiprocessing')(delayed(print)(i) for i in range(10))
with the multiprocessing backend, things work as expected. threading looks to work fine too. This may not be the solution you were hoping for, but hopefully it is sufficient while the notebook authors work on finding a proper solution.
I'll cross-post this to GitHub in case anyone there cares to add to this answer (I don't want to misstate anyone's intent or put words in people mouths!).
Test Environment:
MacOS - Mojave (10.14)
Python - 3.7.3
pip3 - 19.3.1
Tested in 2 configurations. Confirmed to produce the expected output when using both multiprocessing and threading for the backend parameter. Packages install using pip3.
Setup 1:
ipykernel 5.1.1
ipython 7.5.0
jupyter 1.0.0
jupyter-client 5.2.4
jupyter-console 6.0.0
jupyter-core 4.4.0
notebook 5.7.8
Setup 2:
ipykernel 5.1.4
ipython 7.12.0
jupyter 1.0.0
jupyter-client 5.3.4
jupyter-console 6.1.0
jupyter-core 4.6.2
notebook 6.0.3
I also was successful using the same versions as 'Setup 2' but with the notebook package version downgraded to 6.0.2.
Note:
This approach works inconsistently on Windows. Different combinations of software versions yield different results. Doing the most intuitive thing-- upgrading everything to the latest version-- does not guarantee it will work.

In Z4-tier's git link scottgigante's method work in Windows, but opposite to the mentined results: in Jupyter notebook, the "multiprocessing" backend hang forever, but the default loky work well (python 3.8.5 and notebook 6.1.1):
from joblib import Parallel, delayed
import sys
def g(x):
stream = getattr(sys, "stdout")
print("{}".format(x), file=stream)
stream.flush()
return x
Parallel(n_jobs=2)(delayed(g)(x**2) for x in range(5))
executed in 91ms, finished 11:17:25 2021-05-13
[0, 1, 4, 9, 16]
A simpler method is to use the identity function in delay:
Parallel(n_jobs=2)(delayed(lambda y:y)([np.log(x),np.sin(x)]) for x in range(5))
executed in 151ms, finished 09:34:18 2021-05-17
[[-inf, 0.0],
[0.0, 0.8414709848078965],
[0.6931471805599453, 0.9092974268256817],
[1.0986122886681098, 0.1411200080598672],
[1.3862943611198906, -0.7568024953079282]]
Or use like this:
Parallel(n_jobs=2)(delayed(lambda y:[np.log(y),np.sin(y)])(x) for x in range(5))
executed in 589ms, finished 09:44:57 2021-05-17
[[-inf, 0.0],
[0.0, 0.8414709848078965],
[0.6931471805599453, 0.9092974268256817],
[1.0986122886681098, 0.1411200080598672],
[1.3862943611198906, -0.7568024953079282]]

This problem is fixed in the latest version of ipykernel.
To solve the issue just do pip install --upgrade ipykernel.
From what I understand, any version above 6 will do, as mentioned in this Github comment by one of the maintainers.

Related

multiprocessing hangs in Jupyter notebook [duplicate]

I am trying to implement multiprocessing in my code, and so, I thought that I would start my learning with some examples. I used the first example found in this documentation.
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(f, [1, 2, 3]))
When I run the above code I get an AttributeError: can't get attribute 'f' on <module '__main__' (built-in)>. I do not know why I am getting this error. I am also using Python 3.5 if that helps.

This problem seems to be a design feature of multiprocessing.Pool. See https://bugs.python.org/issue25053. For some reason Pool does not always work with objects not defined in an imported module. So you have to write your function into a different file and import the module.
File: defs.py
def f(x):
return x*x
File: run.py
from multiprocessing import Pool
import defs
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(defs.f, [1, 2, 3]))
If you use print or a different built-in function, the example should work. If this is not a bug (according to the link), the given example is chosen badly.

The multiprocessing module has a major limitation when it comes to IPython use:
Functionality within this package requires that the __main__ module be
importable by the children. [...] This means that some examples, such
as the multiprocessing.pool.Pool examples will not work in the
interactive interpreter. [from the documentation]
Fortunately, there is a fork of the multiprocessing module called multiprocess which uses dill instead of pickle to serialization and overcomes this issue conveniently.
Just install multiprocess and replace multiprocessing with multiprocess in your imports:
import multiprocess as mp
def f(x):
return x*x
with mp.Pool(5) as pool:
print(pool.map(f, [1, 2, 3, 4, 5]))
Of course, externalizing the code as suggested in this answer works as well, but I find it very inconvenient: That is not why (and how) I use IPython environments.
<tl;dr> multiprocessing does not work in IPython environments right away, use its fork multiprocess instead.

This answer is for those who get this error on Windows 10 in 2021.
I've researched this error a bit since I got it myself. I get this error when running any examples from the official Python 3 documentation on multiprocessing.
Test environment:
x86 Windows 10.0.19043.1165 + Python 3.9.2 - there is an error
x86 Windows 10.0.19043.1165 + Python 3.9.6 - there is an error
x86 Windows 10.0.19043.1110 + Python 3.9.6 - there is an error
ARM Windows 10.0.21354.1 + Python 3.9.6 - no error (version from DEV branch)
ARM macOS 11.5.2 + Python 3.9.6 - no errors
I have no way to test this situation in other conditions. But my guess is that the problem is on Windows as there is no such bug in the developer version "10.0.21354.1", but this ARM version probably has x86 emulation.
Also note that there was no such bug at the time Python 3.9.2 was released (February). Since all this time I was working on the same computer, I was surprised by the situation when the previously working code stopped working, and only the version for Windows changed.
I was unable to find a bug request with a similar situation in the Python bug tracker (I probably did a poor search). And the message marked "Correct answer" refers to a different situation. The problem is easy to reproduce, you can try to follow any example from the multiprocessing documentation on a freshly installed Windows 10 + Python 3.
Later, I will have the opportunity to check out Python 3.10 and the latest version of Windows 10.
I am also interested in this situation in the context of Windows 11.
If you have information about this error (link to the bug tracker or something similar), be sure to share it.
At the moment I switched to Linux to continue working.

Why not use joblib? Your code is equivalent to:
# pip install joblib
from joblib import Parallel, delayed
def f(x):
return x*x
res = Parallel(
n_jobs=5
)(
delayed(f)(x) for x in [1, 2, 3]
)

If you're using Jupyter notebook (like the OP), then defining the function in a separate cell and executing that cell first fixes the problem. The accepted answer works too, but it's more work. Defining the function before, i.e. above the pool, isn't adequate. It has to be in a completely different notebook cell which is executed first.

Python multiprocessing uses all cores even when limited

I am using the multiprocessing module from Python in order to speed-up my code, but for some reason the program ends up using more than the number of cores I specify. Indeed, when looking at htop to check the core usage, I see that all 60 available cores are used at times instead of 10 as needed.
As an example, this code already produces the behavior:
import numpy as np
import multiprocessing
def square(i):
size = 10000
return np.dot(np.random.normal(size=size), np.random.normal(size=(size, size)) # np.random.normal(size=size))
with multiprocessing.Pool(10) as pool:
t = pool.map(square, range(100))
I am running this in a Jupyter Notebook inside a JupyterHub. Up until a few hours ago, everything was running smoothly (meaning that the number of cores used was not going beyond 10), the only big change that happened is that I updated all my packages via conda update --all and specifically updated scipy with conda install -c conda-forge scipy. However I have no idea what could have caused the issue.

Console output using rpy2 does not work in Jupyter Notebook

Using Jupyter Notebook, when I execute a function from an R package (in my case Climatol) from a notebook that uses the R kernel, messages are displayed as output that report the procedures that are being done. Nothing new.
The code block used is this:
library(maps)
library(mapdata)
library(climatol)
# Apply function (from R kernel)
homogen('Vel',2011,2012,tinc='6 hour',expl=TRUE)
Now, using the Python kernel from another notebook, when I call the same function through rpy2 applying the same parameters, I don't get the same messages that appear in the previous image. Instead I get this:
This time, the code block used is this:
from rpy2.robjects import r
from rpy2.robjects.packages import importr
importr('maps')
importr('mapdata')
importr('climatol')
# Apply function ( from Python kernel)
r["homogen"]("Vel",2011,2012,tinc="6 hour",expl=r['as.logical']("T"))
I ran the mentioned Python code from Sublime Text and in this case the messages are displayed:
The messages are also displayed when running the code from the Windows console, which leads me to think that the downside is Jupyter. That being said, how can I get those messages using Jupyter?
I'm using Python 3.7 and the version of rpy2 is 2.9.4
Thanks for the help.

rpy2 it not fully supported on Windows. Callbacks used to set how R output is handled are likely not working, which results in the issue you are observing.
If this improves in will be in more recent rpy2 versions. The latest rpy2 release is 3.3.6 for example.
Otherwise, for better compatibility on Windows consider running rpy2/jupyter in Docker, or if it works in a WSL (Windows Subsystem for Linux).

ipyparallel's LoadBalancedView bloats memory, how can I avoid that?

This issue may be related to https://github.com/ipython/ipyparallel/issues/207 which is also not marked as solved, yet.
I also opened this issue here https://github.com/ipython/ipyparallel/issues/286
I want to execute multiple tasks in parallel using python and ipyparallel in a jupyter notebook and using 4 local engines by executing ipcluster start in a local console.
Besides that one can also use DirectView, I use LoadBalancedView to map a set of tasks. Each task takes around 0.2 seconds (can vary though) and each task does a MySQL query where it loads some data and then processes it.
Working with ~45000 tasks works fine, however, my memory grows really high. This is actually bad because I want to run another experiment with over 660000 tasks which I can't run anymore because it bloats up my memory limit of 16 GB and then the memory swapping on my local drive starts. However, when using the DirectView my memory grows relatively small and is never full. But I actually need LoadBalancedView.
Even when running a minimal working example without database query this happens (see below).
I am not perfectly familiar with the ipyparallel library but I've read something about logs and caches that the ipcontroler does which may cause this. I am still not sure if it is a bug or if I can change some settings to avoid my problem.
Running a MWE
For my Python 3.5.3 environment running on Windows 10 I use the following (recent) packages:
ipython 6.1.0
ipython_genutils 6.1.0
ipyparallel 6.0.2
jupyter 1.0.0
jupyter_client 4.4.0
jupyter_console 5.0.0
jupyter_core 4.2.0
I would like the following example to work for LoadBalancedView without the immense memory growth (if possible at all):
Start ipcluster start on a console
Run a jupyter notebook with the following three cells:
<1st cell>
import ipyparallel as ipp
rc = ipp.Client()
lview = rc.load_balanced_view()
<2nd cell>
%%px --local
import time
<3rd cell>
def sleep_here(i):
time.sleep(0.2)
return 42
amr = lview.map_async(sleep_here, range(660000))
amr.wait_interactive()

Mayavi example fails, crashes python on Ubuntu

I'm trying to run a standard Mayavi example. I just installed mayavi2 on Ubuntu (Kubuntu 12.04) and this is my first step with Mayavi. Unfortunately, this step is failing.
The examples I wish to run come from here:
http://docs.enthought.com/mayavi/mayavi/auto/examples.html
For example, this one.
The behavior I am seeing is that the plot canvas area is blank (mostly). The popup window is shown and its controls are present and working.
The only errors I am seeing are:
libGL error: failed to load driver: swrast
libGL error: Try again with LIBGL_DEBUG=verbose for more details.
Where would I add LIBGL_DEBUG=verbose?
I'm on Kubuntu 12.04 with:
Python 2.7.3
IPython 1.1.0
wxPython 2.8
vtk 5.8.0-5
setuptools, numpy, scipy - latest versions (just updated)
I am running the examples in IPython (which seems to be the recommended way). I am using this command to start the shell:
ipython --gui=wx --pylab=wx
I also tried running the examples from within an IPython notebook as so:
%run example.py
In all cases the examples fail to display the animation. The window itself is display as are the controls. But the animation canvas is mostly blank, although a flash of the images will sometimes appear.
At least once previously I saw my attempts crash Python. The message was:
The crashed program seems to use third-party or local libraries:
/usr/local/lib/python2.7/dist-packages/traits/ctraits.so
/usr/local/lib/python2.7/dist-packages/tvtk/array_ext.so
However, I am not seeing that crash now.

I found some important clues here:
https://askubuntu.com/questions/283640/libgl-error-failed-to-load-driver-i965
Like that person, I ended up reinstalling my graphics driver and that solved my problem. (The problem wasn't related to mayavi or python after all.)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.