Pythons multiprocess module (with dill) gives an unhelpful AssertionError - python

I have installed dill/pathos and its dependencies (with some difficulty) and I'm trying to perform a function over several processes. The class/attribute Model(self.xml,self.exp_data,i).SSR is custom made and depends on loads of other custom functions so I apologize in advance for not being able to provide 'runnable' code. In brief however, it takes some experimental data, integrates a system of ODE's with python's pysces module and calculates the sum of squares (SSR). The purpose for parallelizing this code is to speed up this calculation with multiple parameter sets.
The code:
import multiprocess
def evaluate_chisq(pop):
p = multiprocess.Pool(8)
res= p.map(lambda i:Model(self.xml,self.exp_data,i).SSR , pop)#calcualteSSR with this parameter set
return res
The error message I get is:
File "C:\Anaconda1\lib\site-packages\multiprocess\pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "C:\Anaconda1\lib\site-packages\multiprocess\pool.py", line 567, in get
raise self._value
AssertionError
Then I have tried using map_async :
def evaluate_chisq(pop):
p = multiprocess.Pool(8)
res= p.map_async(lambda i:Model(self.xml,self.exp_data,i).SSR , pop)#calcualteSSR with this parameter set
return res
which returns a <multiprocess.pool.MapResult object at 0x0000000014AF8C18> object which gives me the same error when I attempts to use the MapResult's `get' method
File "C:\Anaconda1\lib\site-packages\multiprocess\pool.py", line 567, in get
raise self._value
AssertionError
Does anybody know what I'm doing wrong?

On Windows you need to use freeze_support from __main__.
See https://docs.python.org/2/library/multiprocessing.html#multiprocessing.freeze_support.

Related

Multiprocessing array .get_lock works on one computer but not another

I am working a somewhat extensive python program that uses multiprocessing. Because I wanted the user to see some progress on the console when running the program, I read about using a shared counter on stackoverflow and after a while of playing around with my code, I got it to work. As I said it's too much code to post here, but the gist is that I instantiate a multiprocessing array after the name==main line,
if __name__ == "__main__":
total_progress_counter = Array('i',[0,0])
and then during the main portion of code I pass this array to a function in other module:
some_name.plot(<other variables>,
total_progress_counter=total_progress_counter)
Then within that other function, I used the .get_lock method that I found described here on stackoverflow:
with total_progress_counter.get_lock():
total_progress_counter[0] += self.total_panels_to_plot
I also update the other component, total_progress_counter[1], in the same function. This works fine for me on my work machine, where I wrote the code, and that machine has a Centos operating system.
But, when I run it on my personal MacBook it gives the following traceback:
Traceback (most recent call last):
File "./program.py", line 775, in <module>
program.run()
File "./program.py", line 177, in run
cases_plotted = pool.map(self.__plot__, all_cases)
File "/opt/anaconda3/lib/python3.8/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/opt/anaconda3/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
AttributeError: 'list' object has no attribute 'get_lock'
I have python3 version 3.8.3 on my personal machine and python3 version 3.7.4 on my work machine. Can anyone help me understand why I'm getting different behavior on these two environments? I'd be grateful, as this is meant to be software others might use on different machines.

Python multiprocessing question on iteration of a function multiple arguments

I need some help on this, I have played around with multiple options on stackoverflow and internet. But I need some help on this as I'm confused. I'm on Python 2.7.
This is my manager for the multi-processing. I just need to iterate function1 based on the n_iterations and collect the result per each iteration.
I have imported the two libraries,
from functools import partial
import multiprocessing
Function1 is;
def function1(v1,v2,v3,v4,v5):
calculate_function = v1+v2+v3+v4+v5
return calculate_function
And the function to handle the multi-processing is,
def multi_process(n_iterations,a1,a2,a3,a4,a5):
sampling_process = partial(function1, v1=a1,v2=a2,v3=a3,v4=a4,v5=a5)
pool = multiprocessing.Pool()
results_set = pool.map(sampling_process, xrange(n_iterations))
pool.close()
pool.join()
return results_set
But I keep getting an error message,
File "model_selection_pooling_ray.py", line 246, in multi_process
results_set = pool.map(sampling_process, xrange(n_iterations))
File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
NameError: global name 'valuofv1' is not defined
(valuofv1 is the actual value on a1)
Can someone please help me out on figuring out what I am doing wrong? Thank you.
the sampling_process does not need any args, you can define the Function1:
def function1(n, v1,v2,v3,v4,v5):
I found the answer, the problem was related to the actual code calling for the function valueof1 which did not exist. After fixing this, I adjusted function1 to include the iterating value as well, def function1(v1,v2,v3,v4,v5,n): to fix it.

matplotlib error when running plotting in multiprocess

I am using python's Multiprocess.Pool to plot some data using multiple processes as follows:
class plotDriver:
def plot(self, parameterList):
numberOfWorkers = len(parameterList)
pool = Pool(numberOfWorkers)
pool.map(plotWorkerFunction, parameterList)
pool.close()
pool.join()
this is a simplified version of my class, the driver also contains other stuffs I choose to omit. The plotWorkderFunction is a single threaded function, which imports matplotlib and does all the plotting and setting figure styles and save the plots to one pdf file, and each worker is not interacting with the other.
I need to call this plot function multiple times since I have many parameterList, like following:
parameters = [parameterList0, parameterList1, ... parameterListn]
for param in parameters:
driver = PlotDriver()
driver.plot(param)
If parameters only contains one parameterList (the for loop only runs once), the code seems working fine. But it consistently fails whenever parameters contains more than one element, with the following error message happening on the second time in the loop.
Traceback (most recent call last):
File "plot.py", line 59, in <module>
plottingDriver.plot(outputFile_handle)
File "/home/yingryic/PlotDriver.py", line 69, in plot
pool.map(plotWrapper, workerParamList)
File "/home/yingryic/.conda/envs/pp/lib/python2.7/multiprocessing/pool.py", line 251, in map
return self.map_async(func.iterable, chunksize).get()
File "/home/yingryic/.conda/envs/pp/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
RuntimeError: In set_text: could not load glyph
X Error: BadIDChoice (invalid resouce ID chosen for this connection) 14
Extension: 138 (RENDER)
Minor opcode: 17 (RenderCreateGlyphSet)
Resouce id: 0xe00002
: Fatal IO error: client killed
any idea what is going wrong and how should I fix?
You can try placing import matplotlib into plotWorkerFunction() so that child processes will have their own copy of the module.

Serialize iterator object to be passed between processes in Python

I have a python script that calculates the eigenvalues of matrices from a list, and I would like to insert these eigenvalues into another collection in the same order as the original matrix and I would like to do this by spawning up multiple processes.
Here is my code:
import time
import collections
import numpy as NP
from scipy import linalg as LA
from joblib import Parallel, delayed
def computeEigenV(unit_of_work):
current_index = unit_of_work[0]
current_matrix = unit_of_work[1]
e_vals, e_vecs = LA.eig(current_matrix)
finished_unit = (current_index, lowEV[::-1])
return finished_unit
def run(work_list):
pool = Parallel( n_jobs = -1, verbose = 1, pre_dispatch = 'all')
results = pool(delayed(computeEigenV)(unit_of_work) for unit_of_work in work_list)
return results
if __name__ == '__main__':
# create original array of matrices
original_matrix_list = []
work_list = []
#basic set up so we can run this test
for i in range(0, 100):
# generate the matrix & unit or work
matrix = NP.random.random_integers(0, 100, (500, 500))
#insert into respective resources
original_matrix_list.append(matrix)
for i, matrix in enumerate(original_matrix_list):
unit_of_work = [i, matrix]
work_list.append(unit_of_work)
work_result = run(work_list)
so work_result should hold all the eigenvalues from each matrix after all processes finish. And the iterator I am using is unit_of_work which is a list containing the index of the matrix (from the original_matrix_list) and the matrix itself.
The weird thing is, if I were to run this code by doing python matrix.py everything works perfectly. But when I use auto (a program that does calculations for differential equations?) to run my script, typing auto matrix.py gives me the following error:
Traceback (most recent call last):
File "matrix.py", line 50, in <module>
work_result = run(work_list)
File "matrix.py", line 27, in run
results = pool(delayed(computeEigenV)(unit_of_work) for unit_of_work in work_list)
File "/Library/Python/2.7/site-packages/joblib/parallel.py", line 805, in __call__
while self.dispatch_one_batch(iterator):
File "/Library/Python/2.7/site-packages/joblib/parallel.py", line 658, in dispatch_one_batch
tasks = BatchedCalls(itertools.islice(iterator, batch_size))
File "/Library/Python/2.7/site-packages/joblib/parallel.py", line 69, in __init__
self.items = list(iterator_slice)
File "matrix.py", line 27, in <genexpr>
results = pool(delayed(computeEigenV)(unit_of_work) for unit_of_work in work_list)
File "/Library/Python/2.7/site-packages/joblib/parallel.py", line 162, in delayed
pickle.dumps(function)
TypeError: expected string or Unicode object, NoneType found
Note: when I ran this with auto I had to change if __name__ == '__main__': to if __name__ == '__builtin__':
I looked up this error and it seems like I am not serializing the iterator unit_of_work correctly when passing it around to different processes. I have then tried to use serialized_unit_of_work = pickle.dumps(unit_of_work), pass that around, and do pickle.loads when I need to use the iterator, but I still get the same error.
Can someone please help point me in the right direction as to how I can fix this? I hesitate to use pickle.dump(obj, file[, protocol]) because eventually I will be running this to calculate eigenvalues of thousands of matrices and I don't really want to create that many files to store the serialized iterator if possible.
Thanks!! :)
You can't pickle an iterator in python2.7 (but you can from 3.4 onward).
Also, pickling works differently in __main__ is different than when not in __main__, and it would seem that auto is doing something odd with __main__. What you often will observe when pickling fails on a particular object is that if instead of running the script with the object in it directly, you run a script as main which imports the portion of the script with the "difficult-to-serialize" object, then pickling will succeed. This is because the object will pickle by reference at a namespace level above where the "difficult" object lives… thus it's never directly pickled.
So, you can probably get away with pickling what you want, by adding a reference layer… a file import or a class. But, if you want to pickle an iterator, you are out of luck unless you move to at least python3.4.

Create SAFEARRAY of Strings in Python

I'm trying to call a COM method that requires a SafeArray of Strings to be passed as reference, which is then filled up with the method results. This is the code in VBA, which works flawlessly:
dimr RC as New RAS41.HECRASController
RC.Project_Open "c:\myProj.prj"
dim numMessages as Long
dim messages() as String
RC.Compute_CurrentPlan( numMessages, messages())
Now, I'm trying to do the same from with Python 3.4, using the win32com module. However, I'm stuck at trying to create the second parameter with the correct type, which according to combrowse.py should be "Pointer SafeArray String".
This was my first attempt:
import win32com
RC = win32com.client.Dispatch("RAS41.HECRASController")
RC.Project_Open("c:\\myProj.prj")
numMessages = 0
messages = []
RC.Compute_CurrentPlan(numMessages, messages)
I also tried constructing that variable as
messages = win32com.client.VARIANT(pythoncom.VT_ARRAY | pythoncom.VT_BSTR, [])
but it didn't work either. Error messages look like this:
Traceback (most recent call last):
File "<pyshell#101>", line 1, in <module>
print(o.Compute_CurrentPlan(1,b))
File "<COMObject RAS41.HECRASController>", line 3, in Compute_CurrentPlan
File "C:\Python34\lib\site-packages\win32com\client\dynamic.py", line 282, in _ApplyTypes_
result = self._oleobj_.InvokeTypes(*(dispid, LCID, wFlags, retType, argTypes) + args)
TypeError: Objects for SAFEARRAYS must be sequences (of sequences), or a buffer object.
Make sure that you python variables are in the right format (Long and String). Try to use something like the following to get the variable types in shape:
messages = ['']
RC.Compute_CurrentPlan(long(numMessages), messages)
To be more flexible with your program you should check the variable types prior to the win32 call.
I realize this is an old question, but I ran into this issue and wanted to share the resolution. I was having issues defining the type of data for the first two arguments, but simply setting them to None works and your number of messages and compute messages are reported (I checked by assigning text = hec.Compute_CurrentPlan(None, None, True) and then print(test)). The third argument is Blocking Mode, set to True, meaning that the RAS computation will complete before moving to the next line of code. I am using Python 3.10.4 and HEC-RAS version 6.3.
import win32com.client
hec = win32com.client.Dispatch('RAS630.HECRASController')
hec.Project_Open(r"C:\myproj.prj")
hec.ShowRAS()
hec.Compute_CurrentPlan(None, None, True)
hec.QuitRAS()

Categories