Python multiprocessing question on iteration of a function multiple arguments

Python multiprocessing question on iteration of a function multiple arguments - python

I need some help on this, I have played around with multiple options on stackoverflow and internet. But I need some help on this as I'm confused. I'm on Python 2.7.
This is my manager for the multi-processing. I just need to iterate function1 based on the n_iterations and collect the result per each iteration.
I have imported the two libraries,
from functools import partial
import multiprocessing
Function1 is;
def function1(v1,v2,v3,v4,v5):
calculate_function = v1+v2+v3+v4+v5
return calculate_function
And the function to handle the multi-processing is,
def multi_process(n_iterations,a1,a2,a3,a4,a5):
sampling_process = partial(function1, v1=a1,v2=a2,v3=a3,v4=a4,v5=a5)
pool = multiprocessing.Pool()
results_set = pool.map(sampling_process, xrange(n_iterations))
pool.close()
pool.join()
return results_set
But I keep getting an error message,
File "model_selection_pooling_ray.py", line 246, in multi_process
results_set = pool.map(sampling_process, xrange(n_iterations))
File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
NameError: global name 'valuofv1' is not defined
(valuofv1 is the actual value on a1)
Can someone please help me out on figuring out what I am doing wrong? Thank you.

the sampling_process does not need any args, you can define the Function1:
def function1(n, v1,v2,v3,v4,v5):

I found the answer, the problem was related to the actual code calling for the function valueof1 which did not exist. After fixing this, I adjusted function1 to include the iterating value as well, def function1(v1,v2,v3,v4,v5,n): to fix it.

Related

ERROR : When trying deployment of model in ML I get-'dict' object is not callable

I have used the below code for my python streamlit deployment of ML Model.
import streamlit as st
import pickle
import numpy as np
import pandas as pd
similarity=pickle.load(open(r'C:\Users\nikso\OneDrive\Desktop\mlproject\similarity.pkl','rb'),buffers=None)
list=pickle.load(open(r'C:\Users\nikso\OneDrive\Desktop\mlproject\movies_dict.pkl','rb'),buffers=None)
movies=pd.DataFrame.from_dict(list)
def recomm(movie):
mov_index=movies[movies['title']==movie].index[0]
sim=similarity[mov_index]
movlist=sorted(list(enumerate(sim)),reverse=True,key=lambda x:x[1])[1:6]
rec_movie=[]
for i in movlist:
# print(i[0])
rec_movie.append(movies.iloc[i[0]]['title'])
return rec_movie
st.title('Movie Recommender System')
selected_movie_name = st.selectbox(
'How would you like to be contacted?',
movies['title'].values)
if st.button('Recommend'):
recom=recomm(selected_movie_name)
# recom=np.array(recom)
for i in recom:
st.write(i)
On colab the code is working fine but on vscode it was showing this error.
File "C:\Users\anaconda3\envs\Streamlit\lib\site-packages\streamlit\scriptrunner\script_runner.py", line 554, in _run_script
exec(code, module.__dict__)
File "C:\Users\OneDrive\Desktop\mlproject\app.py", line 30, in <module>
recom=recomm(selected_movie_name)
File "C:\Users\OneDrive\Desktop\mlproject\app.py", line 15, in recomm
movlist=sorted(list(enumerate(sim)),reverse=True,key=lambda x:x[1])[1:6]
Now I had to use different IDEs for deployement. But when I removed the keyword 'list' in the given line 15 it worked fine. What can be the reason behind it? I am ca begineer and really curious about it. Thank you.

But when I removed the keyword 'list' in the given line 15 it worked fine. What can be the reason behind it?
TL;DR: sorted accepts iterables, and enumerate is already an iterable
Long answer:
When you define list as
list=pickle.load(open(r'C:\Users\nikso\OneDrive\Desktop\mlproject\movies_dict.pkl','rb'),buffers=None)
you're overriding Python's built-in list type. Python lets you do this without issuing any warnings, but the result is that, in your script, list now represents a dictionary object. The result of this is that when you call list(enumerate(sim)) later on, you're treating your dictionary object as a callable, which it is not.
The solution? Avoid overriding Python built-ins whenever you can.
import streamlit as st
import pickle
import numpy as np
import pandas as pd
similarity=pickle.load(open(r'C:\Users\nikso\OneDrive\Desktop\mlproject\similarity.pkl','rb'),buffers=None)
movies_dict=pickle.load(open(r'C:\Users\nikso\OneDrive\Desktop\mlproject\movies_dict.pkl','rb'),buffers=None)
movies=pd.DataFrame.from_dict(movies_dict)
def recomm(movie):
mov_index=movies[movies['title']==movie].index[0]
sim=similarity[mov_index]
movlist=sorted(list(enumerate(sim)),reverse=True,key=lambda x:x[1])[1:6]
rec_movie=[]
for i in movlist:
# print(i[0])
rec_movie.append(movies.iloc[i[0]]['title'])
return rec_movie
st.title('Movie Recommender System')
selected_movie_name = st.selectbox(
'How would you like to be contacted?',
movies['title'].values)
if st.button('Recommend'):
recom=recomm(selected_movie_name)
# recom=np.array(recom)
for i in recom:
st.write(i)
To answer specifically why removing "list" on line 15 seemed to fix the issue, though: sorted accepts iterables, and enumerate is already an iterable. All list is doing on line 15 is gathering the results of enumerate before passing them into sorted. But the fundamental reason why removing list fixed things is because you're overriding Python's built-in, which you probably want to avoid doing.

Factorializing Medium Numpy Ints Creates Runtime Warning

I am checking run times on factorials (have to use the user-defined function), but I receive an odd error. Code I'm working with is as follows:
import numpy as np
import time
np.random.seed(14)
nums = list(np.random.randint(low=100, high=500, size=10))
# nums returns as [207, 444, 368, 427, 349, 458, 334, 256, 238, 308]
def fact(x):
if x == 1:
return 1
else:
return x * fact(x-1)
recursion_times = []
recursion_factorials = []
for i in nums:
t1 = time.perf_counter()
factorial = fact(i)
t2 = time.perf_counter()
execution = t2-t1
recursion_factorials.append(factorial)
recursion_times.append(execution)
print(execution)
When I run the above, I get the following:
RuntimeWarning: overflow encountered in long_scalars"""
But when I run it as below, I get no warnings.
recursion_times = []
recursion_factorials = []
for i in [207, 444, 368, 427, 349, 458, 334, 256, 238, 308]:
t1 = time.perf_counter()
factorial = fact(i)
t2 = time.perf_counter()
execution = t2-t1
recursion_factorials.append(factorial)
recursion_times.append(execution)
print(execution)
I know it's a bit of extra overhead to call the list nums, but why would it trigger a runtime warning? I've tried digging around but I only get dynamically-named variable threads and warning suppression libraries - I'm looking for why this might happen.
For what it's worth, I'm running Python3 in a jupyter notebook. Glad to answer any other questions if it will help.
Thanks in advance for the help!

If (as in the current version of your post) you created nums by calling list on a NumPy array, but wrote an explicit list literal with no NumPy for the second test, then the second test gives no warning because it's not using NumPy. nums is a list of NumPy fixed-width integers, while the other list is a list of ordinary Python ints. Ordinary Python ints don't overflow.
(If you want to create a list of ordinary Python scalars from a NumPy array, the way to do that is with array.tolist(). This is usually undesirable due to performance implications, but it is occasionally necessary to interoperate with code that chokes on NumPy types.)
There would usually be an additional effect due to the default Python warning handling. By default, Python only emits a warning once per code location per Python process. In the original version of your question, it looked like this was causing the difference.
Using a variable or not using a variable has no effect on this warning.

Pythons multiprocess module (with dill) gives an unhelpful AssertionError

I have installed dill/pathos and its dependencies (with some difficulty) and I'm trying to perform a function over several processes. The class/attribute Model(self.xml,self.exp_data,i).SSR is custom made and depends on loads of other custom functions so I apologize in advance for not being able to provide 'runnable' code. In brief however, it takes some experimental data, integrates a system of ODE's with python's pysces module and calculates the sum of squares (SSR). The purpose for parallelizing this code is to speed up this calculation with multiple parameter sets.
The code:
import multiprocess
def evaluate_chisq(pop):
p = multiprocess.Pool(8)
res= p.map(lambda i:Model(self.xml,self.exp_data,i).SSR , pop)#calcualteSSR with this parameter set
return res
The error message I get is:
File "C:\Anaconda1\lib\site-packages\multiprocess\pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "C:\Anaconda1\lib\site-packages\multiprocess\pool.py", line 567, in get
raise self._value
AssertionError
Then I have tried using map_async :
def evaluate_chisq(pop):
p = multiprocess.Pool(8)
res= p.map_async(lambda i:Model(self.xml,self.exp_data,i).SSR , pop)#calcualteSSR with this parameter set
return res
which returns a <multiprocess.pool.MapResult object at 0x0000000014AF8C18> object which gives me the same error when I attempts to use the MapResult's `get' method
File "C:\Anaconda1\lib\site-packages\multiprocess\pool.py", line 567, in get
raise self._value
AssertionError
Does anybody know what I'm doing wrong?

On Windows you need to use freeze_support from __main__.
See https://docs.python.org/2/library/multiprocessing.html#multiprocessing.freeze_support.

Serialize iterator object to be passed between processes in Python

I have a python script that calculates the eigenvalues of matrices from a list, and I would like to insert these eigenvalues into another collection in the same order as the original matrix and I would like to do this by spawning up multiple processes.
Here is my code:
import time
import collections
import numpy as NP
from scipy import linalg as LA
from joblib import Parallel, delayed
def computeEigenV(unit_of_work):
current_index = unit_of_work[0]
current_matrix = unit_of_work[1]
e_vals, e_vecs = LA.eig(current_matrix)
finished_unit = (current_index, lowEV[::-1])
return finished_unit
def run(work_list):
pool = Parallel( n_jobs = -1, verbose = 1, pre_dispatch = 'all')
results = pool(delayed(computeEigenV)(unit_of_work) for unit_of_work in work_list)
return results
if __name__ == '__main__':
# create original array of matrices
original_matrix_list = []
work_list = []
#basic set up so we can run this test
for i in range(0, 100):
# generate the matrix & unit or work
matrix = NP.random.random_integers(0, 100, (500, 500))
#insert into respective resources
original_matrix_list.append(matrix)
for i, matrix in enumerate(original_matrix_list):
unit_of_work = [i, matrix]
work_list.append(unit_of_work)
work_result = run(work_list)
so work_result should hold all the eigenvalues from each matrix after all processes finish. And the iterator I am using is unit_of_work which is a list containing the index of the matrix (from the original_matrix_list) and the matrix itself.
The weird thing is, if I were to run this code by doing python matrix.py everything works perfectly. But when I use auto (a program that does calculations for differential equations?) to run my script, typing auto matrix.py gives me the following error:
Traceback (most recent call last):
File "matrix.py", line 50, in <module>
work_result = run(work_list)
File "matrix.py", line 27, in run
results = pool(delayed(computeEigenV)(unit_of_work) for unit_of_work in work_list)
File "/Library/Python/2.7/site-packages/joblib/parallel.py", line 805, in __call__
while self.dispatch_one_batch(iterator):
File "/Library/Python/2.7/site-packages/joblib/parallel.py", line 658, in dispatch_one_batch
tasks = BatchedCalls(itertools.islice(iterator, batch_size))
File "/Library/Python/2.7/site-packages/joblib/parallel.py", line 69, in __init__
self.items = list(iterator_slice)
File "matrix.py", line 27, in <genexpr>
results = pool(delayed(computeEigenV)(unit_of_work) for unit_of_work in work_list)
File "/Library/Python/2.7/site-packages/joblib/parallel.py", line 162, in delayed
pickle.dumps(function)
TypeError: expected string or Unicode object, NoneType found
Note: when I ran this with auto I had to change if __name__ == '__main__': to if __name__ == '__builtin__':
I looked up this error and it seems like I am not serializing the iterator unit_of_work correctly when passing it around to different processes. I have then tried to use serialized_unit_of_work = pickle.dumps(unit_of_work), pass that around, and do pickle.loads when I need to use the iterator, but I still get the same error.
Can someone please help point me in the right direction as to how I can fix this? I hesitate to use pickle.dump(obj, file[, protocol]) because eventually I will be running this to calculate eigenvalues of thousands of matrices and I don't really want to create that many files to store the serialized iterator if possible.
Thanks!! :)

You can't pickle an iterator in python2.7 (but you can from 3.4 onward).
Also, pickling works differently in __main__ is different than when not in __main__, and it would seem that auto is doing something odd with __main__. What you often will observe when pickling fails on a particular object is that if instead of running the script with the object in it directly, you run a script as main which imports the portion of the script with the "difficult-to-serialize" object, then pickling will succeed. This is because the object will pickle by reference at a namespace level above where the "difficult" object lives… thus it's never directly pickled.
So, you can probably get away with pickling what you want, by adding a reference layer… a file import or a class. But, if you want to pickle an iterator, you are out of luck unless you move to at least python3.4.

python kivy async storage

I use Synchronous JsonStorage for my app and want to switch to Asynchronous.
My sync call was:
store.exists(store_index)
My not working async call is:
def callback_function(store,key,result):
print "exists:",result
store.exists(store_index, callback=callback_function)
this async call raises the following Exception:
store.exists(key=store_index,callback=callback_function)
TypeError: exists() got an unexpected keyword argument 'callback'
I've also tried this:
store.async_exists(store_index, callback=callback_function)
But this raised:
File "main.py", line 199, in __init__ store.async_exists(key=store_index,callback=colorButtonCallback)
File "/home/mike/Dokumente/py-spielwiese/venv/local/lib/python2.7/sitepackages/kivy/storage/__init__.py", line 152, in async_existskey=key, callback=callback)
TypeError: _schedule() got multiple values for keyword argument 'callback'
what am I doing wrong?

This is a bug in Kivy. Your last attempt was pretty much correct (equivalent to the code in #Anzel's answer, though #Anzel's code is a better way to write the same thing). But in the end you will still get the error thrown from _schedule. I've just submitted a PR to fix this in kivy-dev.

async_exists takes callback as arguments, then the key so try changing to:
store.async_exists(callback_function, store_index)
You can read async_exists to see the details.
Hope this helps.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python multiprocessing question on iteration of a function multiple arguments - python

the sampling_process does not need any args, you can define the Function1: def function1(n, v1,v2,v3,v4,v5):

I found the answer, the problem was related to the actual code calling for the function valueof1 which did not exist. After fixing this, I adjusted function1 to include the iterating value as well, def function1(v1,v2,v3,v4,v5,n): to fix it.

Related

ERROR : When trying deployment of model in ML I get-'dict' object is not callable

Factorializing Medium Numpy Ints Creates Runtime Warning

Pythons multiprocess module (with dill) gives an unhelpful AssertionError

Serialize iterator object to be passed between processes in Python

python kivy async storage

Categories

Resources