How to convert to multithreading subprocess

How to convert to multithreading subprocess - python

I have a method in my python (2.7.6) code that I am looking to use multithreading subprocess on by following the advice given in another SO question
This is how the code is currently:
return self.capi(roi_rgb,"",False)
This is how I converted it:
pool = multiprocessing.Pool(None)
result = ""
r = pool.map_async(self.capi(roi_rgb,"",False), callback=result)
r.wait()
return result
but I'm getting errors with the above on the call to pool.map_async
TypeError: map_async() takes at least 3 arguments (3 given)

According to https://docs.python.org/2/library/multiprocessing.html you need to give at least 2 positional arguments where as you gave it one positional and one keyword argument. (The third implicit arg is self)
So you need to pass the method a function and an iterable along with the callback.
P.s.that is a pretty useless error message isn't it?

Related

Python: array as an argument for a function

I wanted to know how to work with an array as a functional argument in Python. I will show a short example:
def polynom(x, coeff_arr):
return coeff_arr[0]+ coeff_arr[1]+x +coeff_arr[2]*x**2
I obviously get the error that 2 positional arguments are needed but 4 were given when I try to run it, can anybody tell me how to do this accept just using (coeff_arr[i]) in the argument of the function?
Cheers

Your question is missing the code you use to call the function, but from the error I infer that you are calling it as polynom(x, coefficient1, coefficient2, coefficient3). Instead you need to either pass the coefficients as a list:
polynom(x, [coefficient1, coefficient2, coefficient3])
Or use the unpacking operator * to define the function as follows, which will take all positional arguments after x and put them into coeff_arr as a list:
def polynom(x, *coeff_arr):
(The unpacking operator can also be used in a function call, which will do the opposite of taking a list and passing its elements as positional arguments:
polynom(x, *[coefficient1, coefficient2, coefficient3])
is equivalent to
polynom(x, coefficient1, coefficient2, coefficient3)
)

parallelize a function of multiple arguments but over one of the arguments

I have function of processing a relatively large dataframe and run time takes quite a while. I was looking at ways of improving run time and I've come across multiprocessing pool. If I understood correctly, this should run the function for the equal chunks of the dataframe in parallel, which means it could potentially run quicker and save time.
So my function takes 4 different arguments, the last three of them are just mainly lookups, while the first one of the four is the data of interest dataframe. so looks something like this:
def functionExample(dataOfInterest, lookup1, lookup2, lookup3):
#do stuff with the data and lookups)
return output1, output2
So based on what I've read, I come to the below way of what I thought should work:
num_partitions = 4
num_cores = 4
def parallelize_dataframe(df, func):
df_split = np.array_split(df, num_partitions)
pool = Pool(num_cores)
df = pd.concat(pool.map(func, df_split))
pool.close()
pool.join()
return df
Then to call the process (where mainly I couldn't figure it out), I've tried the below:
output1, output2= parallelize_dataframe(dataOfInterest, functionExample))
This returns the error:
functionExample() missing 3 required positional arguments: 'lookup1', 'lookup2', and 'lookup3'
Then I try adding the three arguments by doing the below:
output1, output2= parallelize_dataframe(dataOfInterest, functionExample(lookup1, lookup2, lookup3))
This returns the error below suggesting that it took the three arguments as the first three arguments of the function and missing the fourth instead of them being the last three arguments the previous error suggested they were missing:
functionExample() missing 1 required positional arguments: 'lookup1'
and then if I try feeding it the four arguments by doing the below:
output1, output2= parallelize_dataframe(dataOfInterest, functionExample(dataOfInterest, lookup1, lookup2, lookup3))
It returns the error below:
'tuple' object is not callable
I'm not quite sure which of the above is the way to do it, if any at all. Should it be taking all of the functions arguments including the desired dataframe. If so, why is it complaining about tuples?
Any help would be appreciated!
Thanks.

You can perform a partial binding of some arguments to create a new callable via functools.partial:
from functools import partial
output1, output2 = parallelize_dataframe(dataOfInterest,
partial(functionExample, lookup1=lookup1, lookup2=lookup2, lookup3=lookup3))
Note that in the multiprocessing world, partial can be slow, so you may want to find a way to avoid the need to pass the arguments if they're large/expensive to pickle, assuming that's possible in your use case.

In each case, you are trying to call the function, rather than pass the arguments for when the function is called. What you need is a new callable that calls your original with the correct argument.
from functools import partial
output1, output2 = parallelize_dataframe(
dataOfInterest,
partial(functionExample, lookup1=x, lookup2=y, lookup3=z)
)

You could simply modify your function definition to take predefined arguments, or make a function that call your original function using that params.
def functionExample(dataOfInterest, lookup1=x, lookup2=y, lookup3=z):
#do stuff with the data and lookups)
return output1, output2
or
def f(dataOfInterest):
return functionExample(dataOfInterest, lookup1=x, lookup2=y, lookup3=z)
In this way, map() would work as you expect.

How to figure out which Python keyword argument is missing?

When forgetting to pass certain arguments to a function, Python gives the only-somewhat-helpful message "myfunction() takes X arguments (Y given)". Is there a way to figure out the names of the missing arguments, and tell the user? Something like:
try:
#begin blackbox
def f(x,y):
return x*y
f(x=1)
#end blackbox
except Exception as e:
#figure out the missing keyword argument is called "y" and tell the user so
Assuming that the code between begin blackbox and end blackbox is unknown to the exception handler.
Edit: As its been pointed out to me below, Python 3 already has this functionality built in. Let me extend the question then, is there a (probably ugly and hacky) way to do this in Python 2.x?

A much cleaner way to do this would be to wrap the function in another function, pass through the *args, **kwargs, and then use those values when you need them, instead of trying to reconstruct them after the fact. But if you don't want to do that…
In Python 3.x (except very early versions), this is easy, as poke's answer explains. Even easier with 3.3+, with inspect.signature, inspect.getargvalues, and inspect.Signature.bind_partial and friends.
In Python 2.x, there is no way to do this. The exception only has the string 'f() takes exactly 2 arguments (1 given)' in its args.
Except… in CPython 2.x specifically, it's possible with enough ugly and brittle hackery.
You've got a traceback, so you've got its tb_frame and tb_lineno… which is everything you need. So as long as the source is available, the inspect module makes it easy to get the actual function call expression. Then you just need to parse it (via ast) to get the arguments passed, and compare to the function's signature (which, unfortunately, isn't nearly as easy to get in 2.x as in 3.3+, but between f.func_defaults, f.func_code.co_argcount, etc., you can reconstruct it).
But what if the source isn't available? Well, between tb_frame.f_code and tb_lasti, you can find out where the function call was in the bytecode. And the dis module makes that relatively easy to parse. In particular, right before the call, the positional arguments and the name-value pairs for keyword arguments were all pushed on the stack, so you can easily see which names got pushed, and how many positional values, and reconstruct the function call that way. Which you compare to the signature in the same way.
Of course that relies on the some assumptions about how CPython's compiler builds bytecode. It would be perfectly legal to do things in all kinds of different orders as long as the stack ended up with the right values. So, it's pretty brittle. But I think there are already better reasons not to do it.

I would argue that doing this doesn’t really make that much sense. Such an exception is thrown because the programmer missed specifying the argument. So if you knowingly catch the exception, then you could just as well just fix it in the first place.
That being said, in current Python 3 versions, the TypeError that is being thrown does mention which arguments are missing from the call:
"f() missing 1 required positional argument: 'y'"
Unfortunately, the argument name is not mentioned separately, so you would have to extract it from the string:
try:
f(x=1)
except TypeError as e:
if 'required positional argument' in e.args[0]:
argumentNames = e.args[0].split("'")[1::2]
print('Missing arguments are ' + argumentNames)
else:
raise # Re-raise other TypeErrors
As Joran Beasley pointed out in the comments, Python 2 does not tell you which arguments are missing but just how many are missing. So there is no way to tell from the exception which arguments were missing in the call.

except TypeError as e:
import inspect
got_args = int(re.search("\d+.*(\d+)",str(e)).groups()[0])
print "missing args:",inspect.getargspec(f).args[got_args:]
a better method would be a decorator
def arg_decorator(fn):
def func(*args,**kwargs):
try:
return fn(*args,**kwargs)
except TypeError:
arg_spec = inspect.getargspec(fn)
missing_named = [a for a in arg_spec.args if a not in kwargs]
if arg_spec.defaults:
missing_args = missing_named[len(args): -len(arg_spec.defaults) ]
else:
missing_args = missing_named[len(args):]
print "Missing:",missing_args
return func
#arg_decorator
def fn1(x,y,z):
pass
def fn2(x,y):
pass
arged_fn2 = arg_decorator(fn2)
fn1(5,y=2)
arged_fn2(x=1)

With purely the exception to deal with it is not possible to do what you want and handle keyword arguments. This is of course wrt Python 2.7.
The code that generates this message in Python is:
PyErr_Format(PyExc_TypeError,
"%.200s() takes %s %d "
"argument%s (%d given)",
PyString_AsString(co->co_name),
defcount ? "at most" : "exactly",
co->co_argcount,
co->co_argcount == 1 ? "" : "s",
argcount + kwcount);
Taken from lines 3056-3063 from http://hg.python.org/cpython/file/0e5df5b62488/Python/ceval.c
As you can see, there is just not enough information given to the exception as to what arguments are missing. co in this context is the PyCodeObject being called. The only thing given is a string (which you could parse if you like) with the function name, whether or not there is a vararg, how many arguments are expected, and how many arguments were given. As has been pointed out, this does not give you sufficient information as to what argument(s) were not given (in the case of keyword arguments).
Something like inspect or the other debugging modules might be able to give you enough information as to what function was called and how it was called, from which you could figure out what arguments were not given.
I should also mention however that almost certainly, whatever solution you come up with will not be able to handle at least some extension module methods (those written in C) because they don't provide argument information as part of their object.

Hoy can I pass "whole" cli arguments to a function in python

I'm getting stuck with this
I have a python file which is imported from elsewhere as a module, in order to use some functions provided by it. I'm trying a way to call it form CLI, giving it 0 or 5 arguments.
def simulate(index, sourcefile, temperature_file, save=0, outfile='fig.png'):
(...)
# do calculations and spit a nice graph file.
if __name__ == '__main__':
if (len(sys.argv) == 6 ):
# ugly code alert
simulate(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4], sys.argv[5])
else:
(...)
#do some other things and don't bother me
I was wondering if there's a clean way to pass all but first argument to a function.
I tried simulate(sys.argv[1:]) but it throws a single object (list), and since simulate function expects 4 arguments, it doesn't work: TypeError: 'simulate() takes at least 3 arguments (1 given)'
Tried also with simulate(itertools.chain(sys.argv[1:])) with same result.
Since this file is imported elsewhere as a module and this function is being called many times, it seems a bad idea to change the function's signature to recieve a single argument

simulate(*sys.argv[1:])
See "Unpacking Argument Lists" in the tutorial

What you want to use is called "Packing/Unpacking" in Python:
foo(*sys.argv)
See: http://en.wikibooks.org/wiki/Python_Programming/Tuples#Packing_and_Unpacking
If you want "all but first argument":
foo(*sys.argv[1:])
This is called "slicing". See: http://docs.python.org/2.3/whatsnew/section-slices.html

Problems passing argument to python function with decorator

I have the following code, which results in this error:
TypeError('smallTask() takes exactly 1 argument (2 given)',)
#task
def master():
count = 0
obj = { 'var1':'val1', 'var2':'val2' }
while count < 10:
subtask('smallTask',obj).apply_async()
count += 1
#task(name='smallTask')
def smallTask(obj):
print obj
Passing a dictionary to a function, I imagine I need to use **kwargs but if I do that, I get the error that the function takes no arguments yet 2 have been supplied.
I assume the issue here is with either the decorator (have a basic understanding of this but not enough to solve the problem) or the subtask function in Celery.
I don't have enough python knowledge to really proceed..could anyone give me an idea of what's happening and how I can pass the smallTask function a dictionary?

You need to pass arguments for a subtask in the args keyword argument, which must be a tuple according to the celery.subtask() documentation:
subtask('smallTask', args=(obj,)).apply_async()
or use the Task.subtask() method on your smallTask task, but again pass the arguments as a tuple:
smallTask.subtask((obj,)).apply_async()
Alternatively, use star arguments with the Task.s() method:
smallTask.s(obj).apply_async()
The subtasks documentation you yourself linked to use a tuple in the examples; arguments and keyword arguments are two pieces of data that Celery has to store for you until it can run that task, then it'll apply those arguments and keyword arguments for you.
But the celery.subtask() function takes more than just the arguments and keyword arguments for your task; it also takes additional options. In order to work with arbitrary arguments (positional or keyword) to your task, and support other arguments that are not passed to your task, the function signature has no choice but to accept positional arguments as an explicit tuple, and keyword arguments as an explicit dictionary.
The Task.s() method does not accept any arguments other than what the task itself would accept, so it does support passing arguments as if you called the task directly. Internally, this uses catch-all arguments: Task.s(*args, **kwarg), and just passes the captured arguments as a tuple and dictionary on to Task.subtask().

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to convert to multithreading subprocess - python

Related

Python: array as an argument for a function

parallelize a function of multiple arguments but over one of the arguments

How to figure out which Python keyword argument is missing?

Hoy can I pass "whole" cli arguments to a function in python

Problems passing argument to python function with decorator

Categories

Resources