BrokenProcessPool message when using Parallel from Joblib 0.16.0 - Python - python

I've got a situation, when using the parallel function of joblib (v0.16.0)
I have this lines on my code:
with parallel_backend('loky', n_jobs=8):
lineas = Parallel(verbose=10)(delayed(apply_prior_ind_def)(g) for g in df1_merge.groupby(['S1EMP','CONTRA1']))
The problem here is that sometimes the execution fails under the following message:
BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.
There is no apparent traceback to this issue, because without making any changes to the code and just restarting the terminal, it can sometimes restart the execution and finish it correctly.
Hope someone have had a similar issue when using parallel, and can shed a light on it.
Many thanks in advance.
Using Spyder3 and Python 3.6.5

What has worked for me is to have 2 terminals open at the same time. The first one is the one that fails to launch, but the second one is able to run the whole parallel code.

Related

Can't import custom modules in python3.9 when running in wsl2

So I am trying to write some python code that will do two things, that seem to be mutually exclusive on my machine. My PC's host operating system is windows and I run Kali-Linux in WSL2 when I need to test my code on Linux. My code's main function creates two separate multiprocessing.Process objects, assigning a different thread, starting them both one after the other and then calling for them both to be joined. The plan is to allow each to run a simple server application simultaneously on different ports. This does not work when running python3 in PowerShell, as it seems to require access to os.fork() which doesn't work in said environment. When I found this out I pivoted to running in WSL2 which worked fantastically, for a time. After a while of experimenting with some ideas I decided to take some of my code and spin it off into its own file, which I placed in its own 'Libs' folder. WSL2 however, was unable to import this new file, instead giving me the exception ModuleNotFoundError: No module named 'NetStuff'. I originally had added:
sys.path.append('./Libs')
as has worked for me in the past, however when I found that WSL2 was unable to find my module, I printed out sys.path and it revealed that rather than appending my $current_working_directory/Libs like I intended, I was just appending the literal string, which wasn't useful. I then decided to try:
sys.path.append(str(pathlib.Path().resolve()) + '/Libs')
which at the bare minimum shows up as I would expect in sys.path. This, however still didn't work, python was unable to find my module and would unceremoniously crash every time. This led me to try something else, I ran my code in python3 under PowerShell again, which had no issue importing my module, it did still crash due to lacking os.fork() but the import gave no issues. Confused and annoyed I opened my code in IDLE 3.9 which, for some inexplicable reason, was able to import the file, and seemingly use os.fork(). The only major issue with running in IDLE is that it is seemingly incapable of understanding ascii colour escape characters. Given that the goal is to run my code in bash, and ideally also PowerShell, I am not satisfied with this as a solution. I returned to trying to fix the issue in WSL2 by adding my module to /home/Noah/bin, and appending this directory to sys.path, but this has still not so much as given me a new symptom.
I am utterly at a loss at this point. none of the fixes I know off hand are working, and neither are the new ones I've found online. I can't tell if I'm just missing something fundamental about python or if I'm running into a bug, if it's the latter, i can't seem to find other people with the same issue. As a result of my confusion and frustration I am appealing to you, kind users of stackoverflow.
The following is the snippet that is causing me problems in WSL2:
path0 = ('/home/Noah/bin')
path1 = (str(pathlib.Path().resolve()) + '/Libs')
sys.path.append(path0)
sys.path.append(path1)
print(sys.path)
import NetStuff
The following is output of print(sys.path) in WSL2:
['/mnt/c/Users/Noah/VSCodeRepos/Python/BlackPack', '/usr/lib/python39.zip', '/usr/lib/python3.9', '/usr/lib/python3.9/lib-dynload', '/home/noah/.local/lib/python3.9/site-packages', '/usr/local/lib/python3.9/dist-packages', '/usr/lib/python3/dist-packages', '/home/Noah/bin', '/mnt/c/Users/Noah/VSCodeRepos/Python/BlackPack/Libs']
The following is the error being thrown by WSL2:
Traceback (most recent call last):
File "/mnt/c/Users/Noah/VSCodeRepos/Python/BlackPack/BlackPackServer.py", line 21, in <module>
import NetStuff
ModuleNotFoundError: No module named 'NetStuff'
I am specifically hoping to fix the issue with WSL2 at the moment as I am fairly certain that getting the code to run on PowerShell is merely going to require rewriting my code so that it doesn't rely on os.fork(). Thank you for reading my problem, and if I left out any information that you would like to see just tell me and I'll add it in an edit!
Edit: I instantly realized that I should specify that my host machine is running windows 10.

ModuleNotFoundError when using multiprocessing

module path lost in multiprocessing spawn (ModuleNotFoundError)
The so-called solution of inserting sys-path above the importing of the module does not work for me.
Here is my main.py
import multiprocessing
from testing import customfunction
customfunction(1,2,3)
if __name__ == "__main__":
process = multiprocessing.Process(target=customfunction)
process.start()
process.join()
print("DONE")
The main.py works fine up to process.start()
This means customfunction has been imported properly
Here is my testing.py
import random
def customfunction(size, test, hello):
random.seed(size)
print(random.random())
return random.random()
Both main.py and testing.py are in the same folder. A separate folder with an init.py file did not work as well.
I get this error:
from testing import customfunction
ModuleNotFoundError: No module named 'testing'
I can not wrap my head around why does the process created not retain the system pathing in order to import the file. If i place the multiprocessing creation in customfunction it doesn't work either, the same error occurs.
The link I shared at the top does not work for me as well.
Thank you for taking the time to read. If you believe this is a duplicate of another question, please link it and explain, I am new to python.
EDIT:
I am using Windows 10 as my OS
I have installed Spyder 4.1.4 using Anaconda Navigator, Python 3.7.7.
I installed using a executable package.
I have tested this code on VS Code as well.
I am running this via the two IDEs mentioned(E.G VS Code Powershell console and Spyder's Python Console by clicking run)
I generally currently believe it is an issue specific to my computer, and I'll like to know if its replicable in other Windows Systems and whether or not the linked "solution" in the first line works. With that I may be able to pinpoint my errors
This should work if you invoke multiprocessing.Process(target=customfunction, args=(1,2,3)) instead. I can't think of a reason why this would not work on Linux.
Can you update your question and provide the following information?
What OS are you using?
What version of Python are you running, and how was it installed?
How are you running main.py (e.g. from the command line, from an IDE, etc.)?
Any other details about your system configuration that might help others answer your question?
No multiprocessing print outputs (Spyder)
The solution mentioned here seems to solve my problem, I never expected either VS Code or Spyder's Console to have an issue with multiprocessing, but running the code in an external system terminal works.
Thank you to Melih Elibol for helping me think more clearly about the problem, I am new to python.

python dies on windows without stacktrace

I'm loading a large csv file into pandas and when I load too many lines at once I get a dialog box telling me "Python has stopped working" without any error messages in the terminal (screenshot). I suspect it's a memory limitation but it'd be nice to confirm with the python stacktrace directly. Anyone have a similar experience and know how to get at what's happening?
update: Turns out not to have been a memory limitation after all. I think the root cause was a pandas issue, upgrading from 20.3 -> 22.0 seems to have fixed it. I suspect it was related to this: https://github.com/pandas-dev/pandas/issues/16798
Use the trace module to try and force it.
python -u -m trace -t program.py
or
python -m pdb program.py
Python debugger might also provide insight. If neither of these work, it's most likely a memory issue based on the context you provided.
Hope this helps.

Exit code 139 Python [duplicate]

I'm trying to execute a Python script, but I am getting the following error:
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
I'm using python 3.5.2 on a Linux Mint 18.1 Serena OS
Can someone tell me why this happens, and how can I solve?
The SIGSEGV signal indicates a "segmentation violation" or a "segfault". More or less, this equates to a read or write of a memory address that's not mapped in the process.
This indicates a bug in your program. In a Python program, this is either a bug in the interpreter or in an extension module being used (and the latter is the most common cause).
To fix the problem, you have several options. One option is to produce a minimal, self-contained, complete example which replicates the problem and then submit it as a bug report to the maintainers of the extension module it uses.
Another option is to try to track down the cause yourself. gdb is a valuable tool in such an endeavor, as is a debug build of Python and all of the extension modules in use.
After you have gdb installed, you can use it to run your Python program:
gdb --args python <more args if you want>
And then use gdb commands to track down the problem. If you use run then your program will run until it would have crashed and you will have a chance to inspect the state using other gdb commands.
Another possible cause (which I encountered today) is that you're trying to read/write a file which is open. In this case, simply closing the file and rerunning the script solved the issue.
After some times I discovered that I was running a new TensorFlow version that gives error on older computers. I solved the problem downgrading the TensorFlow version to 1.4
When I encounter this problem, I realize there are some memory issues. I rebooted PC and solved it.
This can also be the case if your C-program (e.g. using cpython is trying to access a variable out-of-bound
ctypedef struct ReturnRows:
double[10] your_value
cdef ReturnRows s_ReturnRows # Allocate memory for the struct
s_ReturnRows.your_value = [0] * 12
will fail with
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
For me, I was using the OpenCV library to apply SIFT.
In my code, I replaced cv2.SIFT() to cv2.SIFT_create() and the problem is gone.
Deleted the python interpreter and the 'venv' folder solve my error.
I got this error in PHP, while running PHPUnit. The reason was a circular dependency.
I received the same error when trying to connect to an Oracle DB using the pyodbc module:
connection = pyodbc.connect()
The error occurred on the following occasions:
The DB connection has been opened multiple times in the same python
file
While in debug mode a breakpoint has been reached
while the connection to the DB being open
The error message could be avoided with the following approaches:
Open the DB only once and reuse the connection at all needed places
Properly close the DB connection after using it
Hope, that will help anyone!
11 : SIGSEGV - This signal is arises when an memory segement is illegally accessed.
There is a module name signal in python through which you can handle this kind of OS signals.
If you want to ignore this SIGSEGV signal, you can do this:
signal.signal(signal.SIGSEGV, signal.SIG_IGN)
However, ignoring the signal can cause some inappropriate behaviours to your code, so it is better to handle the SIGSEGV signal with your defined handler like this:
def SIGSEGV_signal_arises(signalNum, stack):
print(f"{signalNum} : SIGSEGV arises")
# Your code
signal.signal(signal.SIGSEGV, SIGSEGV_signal_arises)
I encountered this problem when I was trying to run my code on an external GPU which was disconnected. I set os.environ['PYOPENCL_CTX']=2 where GPU 2 was not connected. So I just needed to change the code to os.environ['PYOPENCL_CTX'] = 1.
For me these three lines of code already reproduced the error, no matter how much free memory was available:
import numpy as np
from sklearn.cluster import KMeans
X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
kmeans = KMeans(n_clusters=1, random_state=0).fit(X)
I could solve the issue by removing an reinstalling the scikit-learn package. A very similar solution to this.
This can also occur if trying to compound threads using concurrent.futures. For example, calling .map inside another .map call.
This can be solved by removing one of the .map calls.
I had the same issue working with kmeans from scikit-learn.
Upgrading from scikit-learn 1.0 to 1.0.2 solved it for me.
This issue is often caused by incompatible libraries in your environment. In my case, it was the pyspark library.
In my case, reverting my most recent conda installs fixed the situation.
I got this error when importing monai. It was solved after I created a new conda environment. Possible reasons I could imagine were either that there were some conflict between different packages, or maybe that my environment name was the same as the package name I wanted to import (monai).
found on other page.
interpreter: python 3.8
cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")
this solved issue for me.
i was getting SIGSEGV with 2.7, upgraded my python to 3.8 then got different error with OpenCV. and found answer on OpenCV 4.0.0 SystemError: <class 'cv2.CascadeClassifier'> returned a result with an error set.
but eventually one line of code fixed it.

Python ImportError for strptime in spyder for windows 7

I can't for the life of me figure out what is causing this very odd error.
I am running a script in python 2.7 in the spyder IDE for windows 7. It uses datetime.datetime.strptime at one point. I can run the code once and it seems fine (although I haven't finished debugging, so exceptions have been raised and it hasn't completed normally yet), then if I try running it again I get the following (end of traceback only is shown):
File "C:\path\to\test.py", line 220, in std_imp
self.data[key].append(dt.datetime.strptime(string_var, string_format_var))
ImportError: Failed to import _strptime because the import lockis held by another thread.
I am not running multiple threads with Threading etc. The only way to get the code to make it past this point is to completely restart the computer. Restarting spyder won't work. Web searches haven't seemed to yield any clues or indications of others who have had this happen.
Does anyone understand what is going on? Is this some sort of GIL problem? What is the import lock, and why does it seem to be preventing me from importing this method of the datetime module once I have already tried running the code once?
The solution, as noted by mfitzp, was to include a dummy call to datetime.datetime.strptime at the beginning of the script.
e.g.
# This is a throwaway variable to deal with a python bug
throwaway = datetime.datetime.strptime('20110101','%Y%m%d')

Categories